Question about using Golang to send files over TCP.

agolangf · · 491 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>To learn and play with the Golang standard library, I&#39;m trying to write something that syncs my documents folder to my VPS. The idea is that I&#39;d have a server listening on my VPS, and then a client task running hourly on my home PC. It&#39;d look for every .txt, .docx, or .xlsx file in my folders, connect to the server, and say &#34;here comes file <em>dog_bio.txt</em> at 48702 bytes. That done? Okay, here comes <em>cat_bio.txt</em> at 32441 bytes.&#34; And then once that was working, I&#39;d set up two-way communication, so the client can say &#34;here are the files I have&#34; and the server can say &#34;oh I don&#39;t have that one, send that.&#34;</p> <p>I got the absolute basic form down -- <code>net.Listen</code> spawning a goroutine for every new request, which uses <code>io.Copy</code> to dump the contents to a file with a hardcoded name, and then <code>net.Dial</code> connecting to that and using <code>io.Copy</code> to read a file and dump it into a <code>net.Conn</code>.</p> <p>What I&#39;m confused about now is how I could proceed. How would I delineate between the files I&#39;m sending and the filenames/sizes to expect? How would I capture that information on the server side, and what should I look at to enable two-way communication (without going to websockets)? I&#39;ve read the stdlib documentation, but a lot of it flies over my head or doesn&#39;t seem relevant to this specific topic, so I&#39;m a little stumped.</p> <hr/>**评论:**<br/><br/>Perhyte: <pre><p>Assuming you don&#39;t want to just use one of the existing and freely-available tools for stuff like this (rsync/scp/(s)ftp/...):</p> <p>You need to define a protocol. A simple one would be to send</p> <ol> <li>Length of the file name in bytes (N).</li> <li>N bytes of the file name.</li> <li>Size of the file (M).</li> <li>M bytes of file contents.</li> <li>If there are more files, start over at (1) with the next one.</li> <li>If you want the receiving side to be able to detect connection failures, send a 0 in the same format as in (1). This is never the length of a valid file name, so this can be distinguished from connection failures. When reading, if this is either absent or there&#39;s more data after this, produce an error.</li> <li>Close the connection.</li> </ol> <p>The lengths (N and M) could simply be written using <a href="https://golang.org/pkg/encoding/binary/">encoding/binary</a>, either using <code>binary.(Put)Uvarint</code> to encode/decode using a buffer and transferring the used portion over the network or using <code>binary.Read/Write</code> to transfer fixed-size integers. In the latter case, be sure to use the same data types when reading and writing, and use 64 bits for file sizes because files of 4GB+ exist.</p> <p>The file name can be sent by using <code>io.WriteString()</code> and the file contents can be sent using <code>io.Copy()</code>, but when reading you should be careful to only read as much as you need. I&#39;d recommend <code>io.ReadFull()</code> with a buffer of size N for the name and <code>io.CopyN()</code> for the file contents. Actually, to be safe you could use <code>io.CopyN()</code> for writing as well, in case the file size changes after you check it.</p> <p>For the two-way communication, I&#39;d like to recommend again <code>rsync</code>. It&#39;s a wonderful tool :). But if you really want to roll your own, you have some options:</p> <ul> <li>After (2) or (3) above, wait for the server to send a 1/0 byte indicating whether it needs that file. If it doesn&#39;t, skip to (5).</li> <li>First send all file names (and optionally file sizes), then wait for the server to indicate which files it wants (using their names, or a bunch of 1/0 bytes, or more compactly a bunch of bytes where the Nth bit of byte M being 1 means &#34;please send file <code>N+8*M</code>&#34;, or some other method) and send only those (in a predictable order).</li> <li>Many other possibly variations on this.</li> </ul></pre>Morgahl: <pre><p>To ensure file integrity a simple hash in addition to your file size and name suggestion is a better practice. There are many to choose from but depending on your personal level of paranoia you can go anywhere from MD5 to SHA2 with builtins.</p> <p>MD5 is a legitimate choice as long as you are not expecting it to provide deliberate tampering detection. It&#39;s good enough for a simple file transfer to verify the file made it as expected.</p></pre>Perhyte: <pre><p>Yep, certainly true. But my post was long enough as is.</p> <p>And to come back to it, <code>rsync</code> already implements that :þ.</p></pre>Redundancy_: <pre><p>I believe that it&#39;s somewhat more tamper resistant if you also specify length.</p></pre>Morgahl: <pre><p>Yep which is why I said &#34;in addition&#34; :)</p></pre>epiris: <pre><p>If you want some google keywords: differential synchronization, throw in &#34;rsync&#34; etc. This is one of those topics that starts off simple but gets really nuanced when you begin to deep dive the edge cases. The network protocol itself is where you sound stuck, you were on the right track with asking questions and replying. Just change the dialog a little to reflect what&#39;s actually happening.</p> <ul> <li>Client: hey here is a list of files, send me the ones you want</li> <li>Server: I want these files, 12 total</li> <li>Client: hey here is file number one, excluding this greeting there willl be 1845 kb total.</li> <li>(Server reads 8 bytes for the size of his platforms word containing the size in bytes, then reads that number of bytes. Server then increments heir counter for the number of files received.)</li> <li>Client: hey here is file number two, excluding this greeting there will be ...</li> <li>(Server and client repeat this)</li> <li>Server: I received all the files, thanks buddy</li> </ul> <p>So you just need to make structures to reflect your dialog and send them along via the gob pkg or whatever works for you. The binary package may be used for the file sending portion, since really all you need to do is just binary.Put a uint containing the len() of your file, then copy the data directly after.</p> <p>Now you can see though where it gets complicated, when the client describes only the changes in each document and you want to efficiently send a patch of just the changes.</p></pre>qu33ksilver: <pre><p>If you want to implement it using TCP, you have to setup your own protocol, which is a headache and not recommended.</p> <p>You can either wrap your file content along with metadata in a protobuf struct and then send it over to the server. That way, you just deserialize the struct at the server side. You probably have to handle large files in a elegant fashion.</p> <p>Or, move to HTTP. You can setup different endpoints for different purposes. </p> <p>Or, use grpc. It supports 2 way bidirectional communication out of the box.</p></pre>Morgahl: <pre><p>Protobuf is potentially a bad idea in the long run as that requires that the full file be loaded into memory for encoding. Transferring it via <code>io.CopyN</code> will only load what is appropriate. For that matter you might want to wrap the <code>os.File</code> in a <code>bufio.NewReaderSize</code> with a reasonable buffer allocation to speed the transfer and keep memory use reasonable.</p></pre>anacrolix: <pre><p>Framework protocols are overrated.</p></pre>icholy: <pre><p>Say people who enjoy reinventing crappy wheels.</p></pre>kaeshiwaza: <pre><p>Reinventing crappy wheels is a good way &#34;To learn and play with the Golang standard library&#34;</p></pre>djherbis: <pre><p>Here&#39;s an example I wrote a couple years ago: <a href="https://github.com/djherbis/fenc/blob/master/fenc.go" rel="nofollow">https://github.com/djherbis/fenc/blob/master/fenc.go</a></p></pre>jsabey: <pre><p>you could use http.ServeFile, it will support sending files in whole or in chunks, you could even use it for resuming broken downloads</p> <p>you would just need to add a directory listing with hashes to see what files have changed since last &#34;sync&#34;</p></pre>meanMrKetchup: <pre><p>gRPC has great support for bidirectional streaming and the go library is great. I’ve talking with people in the go gRPC slack channel who talk about using it to moving around gigabyte sized files. </p> <p>It might be a fun project to work on but honestly I’d just us rsync or ftp</p></pre>nsd433: <pre><p>You&#39;re at the point where you need a protocol to talk over this TCP connection. You can make one up, or use an existing one. Either way the details are for you to invent, which is why the stdlib documentation doesn&#39;t help.</p></pre>pobody: <pre><p>If you want to do this as an exercise/challenge, that&#39;s fine.</p> <p>But do not, repeat, <strong>DO NOT</strong> use this on or for any data you remotely care about. If you are looking for a way to backup your data, use rsync or one of the other utilities that has had <em>decades</em> to find and work through the subtle corner cases and failure modes.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

491 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传