Within the handler of a typical web application, is it advisable to execute multiple long-running functions (e.g: database queries) in separate goroutines? And if so, is syncGroup the only way to wait for all those goroutines to finish their jobs?

agolangf · · 479 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

My gut says "yes", but having taken a look at several web application examples and boilerplates, the approach they take tends to be in the form of this (I'm using a Gin handler here as an example, and imaginary User and Billing "repository" structs that fetch data from either a database or an external API. I omitted error handling to make the example shorter) : <pre><code>func GetUserDetailsHandler(c *gin.Context) { //this result presumably comes from the app's database var userResult = UserRepository.FindById( c.getInt("user_id") ) //assume that this result comes from a different data source (e.g: a different database) all together, hence why we're not just doing a join query with "User" var billingInfo = BillingRepository.FindById( c.getInt("user_id") ) c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> In the above scenario, the call to "User.FindById" might use some kind of database driver, but as far as I'm aware, all available Golang database/ORM libraries return data in a "synchronous" fashion (e.g: as return values, not via channels). As such, the call to "User.FindById" will block until it's complete, before I can move on to executing "BillingInfo.FindById", which is not at all ideal since they can both work in parallel. So I figured that the best idea was to make use of go routines + syncGroup to solve the problem. Something like this: <pre><code>func GetUserDetailsHandler(c *gin.Context) { var waitGroup sync.WaitGroup userChannel := make(chan User); billingChannel := make(chan Billing) waitGroup.Add(1) go func() { defer waitGroup.Done() userChannel <- UserRepository.FindById( c.getInt("user_id") ) }() waitGroup.Add(1) go func(){ defer waitGroup.Done() billingChannel <- BillingRepository.FindById( c.getInt("user_id") ) }() waitGroup.Wait() userInfo := <- userChannel billingInfo = <- billingChannel c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> Now, this presumably does the job. But it seems unnecessarily verbose to me, and potentially error prone (if I forget to "Add" to the waitGroup before any go routine, or if I forget to "Wait", then it all falls apart). Is there a better way to do this? Edit: fixed a mistake in the mock code <hr/>**评论：** tv64738: <pre><a href="https://talks.golang.org/2012/concurrency.slide" rel="nofollow">https://talks.golang.org/2012/concurrency.slide</a> <a href="https://godoc.org/golang.org/x/sync/errgroup" rel="nofollow">https://godoc.org/golang.org/x/sync/errgroup</a> <pre><code>import "golang.org/x/sync/errgroup" ... var u User var b Billing g, ctx := errgroup.WithContext(request.Context()) g.Go(func() error { rows, err := db.QueryContext(ctx, "SELECT foo FROM users WHERE id=?", ...) ... u = ... return nil }) g.Go(func() error { rows, err := db.QueryContext(ctx, "SELECT bar FROM invoices WHERE ...", ...) ... b = ... return nil }) if err := g.Wait(); err != nil { return nil, err } // use u and b like normal </code></pre></pre>fmpwizard: <pre>as you are using channels to get the data from those two databases, you don't actually need sync.WaitGroup. pseudo code: <pre><code>func GetUserDetailsHandler(c *gin.Context) { userChannel := make(chan User); billingChannel := make(chan Billing) go func() { userChannel <- UserRepository.FindById( c.getInt("user_id") ) }() go func(){ billingChannel <- BillingRepository.FindById( c.getInt("user_id") ) }() // Up to here, go code went to both databases and is searching for data userInfo := <- userChannel // here we blok until userChannel gets data // we don't fill in billing info until userChannel has data, but // that doesn't mean we didn't go to get it from the database already billingInfo := <- billingChannel // we get here only when both, user and billingInfo have data you may also want to add a timeout channel, in case either user or billing never finish. c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> As for boilerplate, you need to see if this is worth in your actual app, if getting user and billing info take just 50ns, users probably won't notice the diff between waiting 100ns or 60ms if you run them concurrently</pre>Aetheus: <pre>That's a good point! I'm still pretty new to Go, and I had completely forgotten that receiving from a channel blocks! I was so busy looking for a "Promise.all()" equivalent in Go that I didn't even question if I needed one at all. Thanks!</pre>dchapes: <pre><blockquote> ``` func </blockquote> reddit doesn't use that kind of markup. You can select "<a href="https://www.reddit.com/wiki/commenting" rel="nofollow">formatting help</a>" below the comment entry/editing box for details but preformatted text or code should formatted with four leading spaces (or a leading tab). Ideally you should edit your post to have the correct formatting (add the leading space to all code lines).</pre>hell_0n_wheel: <pre><blockquote> Within the handler of a typical web application, is it advisable to execute multiple long-running functions </blockquote> No. Goroutines / channels / etc. aren't the issue, tying up your client for a long time is. This will not scale. At some point you're going to hit a wall on your server, concurrent requests will stack up, and response times will degrade very quickly. Optimizing your database (schema, indices, etc.) will help a lot here, but this will only get you so far. Parallelizing operations the way you're considering (goroutines / channels) still ties them together, in that they're both competing for resources on the same server. If you have to perform multiple operations against a DB, it's better to do so asynchronously. Give your client one endpoint to kick-off the operation and another to check on its success. There's a few ways to do this, depending on how your UI is structured...</pre>metamatic: <pre>I think the real answer is "it depends". If my database server has 16 CPUs, most of them are sitting idle at any given moment, and I've got a RAID array and heavy RAM caching so there's no major I/O bottleneck, then executing two queries in parallel might be faster than executing them one after the other. If my database server is heavily loaded, then executing two queries in parallel isn't likely to result in any speed increase.</pre>hell_0n_wheel: <pre><blockquote> If my database server has 16 CPUs </blockquote> That's vertical scale, and you can only go so far before you hit a very hard brick wall. If you think about horizontal scale instead, you'll never have to worry about the architecture of your infrastructure. Can run on a handful of Raspberry PIs or a rack of servers, just as well.</pre>goomba_gibbon: <pre>Not all DBs horizontally scale, though. There is not enough information here to know for sure. We don't know the DB, any hardware specs, the time for those queries to execute, number of incoming requests etc. If you have tons of capacity on the DB server then why not run a couple of queries in parallel? It sounds like premature optimization otherwise.</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

479 次点击

加入收藏微博

web

gin

context

goroutine

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Within the handler of a typical web application, is it advisable to execute multiple long-running functions (e.g: database queries) in separate goroutines? And if so, is syncGroup the only way to wait for all those goroutines to finish their jobs?

用户登录

今日阅读排行

一周阅读排行

最新主题