Tools for async file IO

xuanbao · · 687 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

Disclaimer: Go beginner, have only used it very briefly for a tiny static web server. What's the current state of efficient async file IO in Go? Is it just spawning many goroutines and using io.Copy? I'm thinking of implementing a rough version of <code>cp</code> as a toy project, and looking around, have found things like <a href="https://github.com/traetox/goaio" rel="nofollow">goaio</a> that provide interfaces to os-level async file IO, since the more canonical async primitives like epoll/kqueue don't work for normal files. What else is there? <hr/>**评论：** jeffrallen: <pre>Idiomatic Go code uses blocking reads and writes, and implements concurrency via launching one goroutine per blocking operation. This is how it looks to the Go programmer (and future readers of the code) because it is a simple model to reason about, which makes bugs more rare and maintenance cheaper. It is up to the runtime to arrange that request for blocking I/O is then implemented in an efficient way. For networking file descriptors, they are actually set to nonblocking, and the runtime uses async I/O to multiplex the I/O onto one or a few system threads. I am pretty sure that this does not yet extend to file descriptors opened via os.OpenFile. In that case, each blocking read/write will result in blocking a system thread. Depending on the setting of GOMAXPROCS, the runtime may arrange that other goroutines start running while the one that is blocked on I/O sleeps. If those in turn also block, you will eventually end up with GOMAXPROCS system threads that are blocked in I/O. The effect is the same as async I/O, but maybe with a lower cap on concurrency that you might have implemented via async I/O. As always with performance tradeoffs, you need a real use case with a real measured throughput to decide if you want/need to do extra work beyond programming it the simplest way possible. -jeff</pre>ijustwantaredditacct: <pre><blockquote> I am pretty sure that this does not yet extend to file descriptors opened via os.OpenFile. In that case, each blocking read/write will result in blocking a system thread. </blockquote> This is my understanding as well. <blockquote> Depending on the setting of GOMAXPROCS, the runtime may arrange that other goroutines start running while the one that is blocked on I/O sleeps. If those in turn also block, you will eventually end up with GOMAXPROCS system threads that are blocked in I/O. </blockquote> This does not match my understanding -- I'm under the impression that any thread blocked on i/o sleeps does not count toward the number of runnable threads -- that is, the runtime will create or re-use an existing thread for the io operation and mark the goroutine as sleeping, but still keep GOMAXPROCS threads/goroutines running (This would apply to all syscalls). I believe there's a maximum 10,000 thread limit. <blockquote> The effect is the same as async I/O, but maybe with a lower cap on concurrency that you might have implemented via async I/O. </blockquote> Very close, but not quite, I don't believe. I think if it were async i/o, we might be able to have better options for timing out/setting a deadline on a file read/write. As it stands, it's sort of possible. You can do the file operation in another goroutine, and use a channel to signal it's completion -- at that point you can use <code>select</code> against that channel, and a context, and time out -- the actual i/o would still be underway, but the user-facing end could move on. (Use case here would be, say, properly being able to return a 5xx response if a file operation took more than N seconds). The behavior can be particularly obnoxious if the underlying drive fails, and all threads doing i/o operations against it get stuck in an uninterruptible sleep -- you have to wait for the kernel to time out the i/o operation, which is usually longer than you want to wait. I suppose you could argue the above behavior is an implementation detail -- you could specify a timeout while doing synchronous if the kernel supported it.</pre>jeffrallen: <pre>I went to play and I made this: <a href="https://play.golang.org/p/Gsp9qzGbdv" rel="nofollow">https://play.golang.org/p/Gsp9qzGbdv</a> Using it I was able to look at the execution traces for parallel io.Copy. I found that the call to os.Remove caused the goroutine to be suspended while the potentially blocking call was shunted onto a system thread. But the calls to Read/Write inside of io.Copy are implemented without any preemption, and are actually calls to internal/poll.(FD).Read:121 and internal/poll.(FD).Write:218.(See <a href="https://github.com/golang/go/blob/release-branch.go1.9/src/internal/poll/fd_unix.go#L121" rel="nofollow">https://github.com/golang/go/blob/release-branch.go1.9/src/internal/poll/fd_unix.go#L121</a>) TIL: Something like the netpoller now exists for regular file IO. The execution traces show full utilization of the processors, driving reads and writes. I suppose it could be interesting to compare wall clock timings for 100 file copies with GOMAXPROCS==numcpu and GOMAXPROCS==numcpu*2, but I'm going to leave that to the OP. Tell us what you find!</pre>kostix: <pre>From the <a href="https://golang.org/doc/go1.9#os" rel="nofollow">Go 1.9 release notes</a>: <blockquote> The os package now uses the internal runtime poller for file I/O. This reduces the number of threads required for read/write operations on pipes, and it eliminates races when one goroutine closes a file while another is using the file for I/O. </blockquote></pre>hudsonyard: <pre>May I ask what is the use case here ? as you would probably make this linux only ?</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

687 次点击

加入收藏微博

io

runtime

goroutine

github

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Tools for async file IO

用户登录

今日阅读排行

一周阅读排行

最新主题