Within the handler of a typical web application, is it advisable to execute multiple long-running functions (e.g: database queries) in separate goroutines? And if so, is syncGroup the only way to wait for all those goroutines to finish their jobs?

agolangf · · 339 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>My gut says &#34;yes&#34;, but having taken a look at several web application examples and boilerplates, the approach they take tends to be in the form of this (I&#39;m using a Gin handler here as an example, and imaginary User and Billing &#34;repository&#34; structs that fetch data from either a database or an external API. I omitted error handling to make the example shorter) :</p> <pre><code>func GetUserDetailsHandler(c *gin.Context) { //this result presumably comes from the app&#39;s database var userResult = UserRepository.FindById( c.getInt(&#34;user_id&#34;) ) //assume that this result comes from a different data source (e.g: a different database) all together, hence why we&#39;re not just doing a join query with &#34;User&#34; var billingInfo = BillingRepository.FindById( c.getInt(&#34;user_id&#34;) ) c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> <p>In the above scenario, the call to &#34;User.FindById&#34; might use some kind of database driver, but as far as I&#39;m aware, all available Golang database/ORM libraries return data in a &#34;synchronous&#34; fashion (e.g: as return values, not via channels). As such, the call to &#34;User.FindById&#34; will block until it&#39;s complete, before I can move on to executing &#34;BillingInfo.FindById&#34;, which is not at all ideal since they can both work in parallel.</p> <p>So I figured that the best idea was to make use of go routines + syncGroup to solve the problem. Something like this:</p> <pre><code>func GetUserDetailsHandler(c *gin.Context) { var waitGroup sync.WaitGroup userChannel := make(chan User); billingChannel := make(chan Billing) waitGroup.Add(1) go func() { defer waitGroup.Done() userChannel &lt;- UserRepository.FindById( c.getInt(&#34;user_id&#34;) ) }() waitGroup.Add(1) go func(){ defer waitGroup.Done() billingChannel &lt;- BillingRepository.FindById( c.getInt(&#34;user_id&#34;) ) }() waitGroup.Wait() userInfo := &lt;- userChannel billingInfo = &lt;- billingChannel c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> <p>Now, this presumably does the job. But it seems unnecessarily verbose to me, and potentially error prone (if I forget to &#34;Add&#34; to the waitGroup before any go routine, or if I forget to &#34;Wait&#34;, then it all falls apart). Is there a better way to do this? </p> <p>Edit: fixed a mistake in the mock code</p> <hr/>**评论:**<br/><br/>tv64738: <pre><p><a href="https://talks.golang.org/2012/concurrency.slide" rel="nofollow">https://talks.golang.org/2012/concurrency.slide</a></p> <p><a href="https://godoc.org/golang.org/x/sync/errgroup" rel="nofollow">https://godoc.org/golang.org/x/sync/errgroup</a></p> <pre><code>import &#34;golang.org/x/sync/errgroup&#34; ... var u User var b Billing g, ctx := errgroup.WithContext(request.Context()) g.Go(func() error { rows, err := db.QueryContext(ctx, &#34;SELECT foo FROM users WHERE id=?&#34;, ...) ... u = ... return nil }) g.Go(func() error { rows, err := db.QueryContext(ctx, &#34;SELECT bar FROM invoices WHERE ...&#34;, ...) ... b = ... return nil }) if err := g.Wait(); err != nil { return nil, err } // use u and b like normal </code></pre></pre>fmpwizard: <pre><p>as you are using channels to get the data from those two databases, you don&#39;t actually need sync.WaitGroup.</p> <p>pseudo code:</p> <pre><code>func GetUserDetailsHandler(c *gin.Context) { userChannel := make(chan User); billingChannel := make(chan Billing) go func() { userChannel &lt;- UserRepository.FindById( c.getInt(&#34;user_id&#34;) ) }() go func(){ billingChannel &lt;- BillingRepository.FindById( c.getInt(&#34;user_id&#34;) ) }() // Up to here, go code went to both databases and is searching for data userInfo := &lt;- userChannel // here we blok until userChannel gets data // we don&#39;t fill in billing info until userChannel has data, but // that doesn&#39;t mean we didn&#39;t go to get it from the database already billingInfo := &lt;- billingChannel // we get here only when both, user and billingInfo have data you may also want to add a timeout channel, in case either user or billing never finish. c.JSON(http.StatusOK, gin.H { user_data : userResult, billing_data : billingInfo, }) return } </code></pre> <p>As for boilerplate, you need to see if this is worth in your actual app, if getting user and billing info take just 50ns, users probably won&#39;t notice the diff between waiting 100ns or 60ms if you run them concurrently</p></pre>Aetheus: <pre><p>That&#39;s a good point! I&#39;m still pretty new to Go, and I had completely forgotten that receiving from a channel blocks! </p> <p>I was so busy looking for a &#34;Promise.all()&#34; equivalent in Go that I didn&#39;t even question if I needed one at all. </p> <p>Thanks!</p></pre>dchapes: <pre><blockquote> <p>``` func</p> </blockquote> <p>reddit doesn&#39;t use that kind of markup. You can select &#34;<a href="https://www.reddit.com/wiki/commenting" rel="nofollow">formatting help</a>&#34; below the comment entry/editing box for details but preformatted text or code should formatted with four leading spaces (or a leading tab).</p> <p>Ideally you should edit your post to have the correct formatting (add the leading space to all code lines).</p></pre>hell_0n_wheel: <pre><blockquote> <p>Within the handler of a typical web application, is it advisable to execute multiple long-running functions </p> </blockquote> <p>No. Goroutines / channels / etc. aren&#39;t the issue, tying up your client for a long time is. This will not scale. At some point you&#39;re going to hit a wall on your server, concurrent requests will stack up, and response times will degrade very quickly.</p> <p>Optimizing your database (schema, indices, etc.) will help a lot here, but this will only get you so far. Parallelizing operations the way you&#39;re considering (goroutines / channels) still ties them together, in that they&#39;re both competing for resources on the same server.</p> <p>If you have to perform multiple operations against a DB, it&#39;s better to do so asynchronously. Give your client one endpoint to kick-off the operation and another to check on its success. There&#39;s a few ways to do this, depending on how your UI is structured...</p></pre>metamatic: <pre><p>I think the real answer is &#34;it depends&#34;.</p> <p>If my database server has 16 CPUs, most of them are sitting idle at any given moment, and I&#39;ve got a RAID array and heavy RAM caching so there&#39;s no major I/O bottleneck, then executing two queries in parallel might be faster than executing them one after the other.</p> <p>If my database server is heavily loaded, then executing two queries in parallel isn&#39;t likely to result in any speed increase.</p></pre>hell_0n_wheel: <pre><blockquote> <p>If my database server has 16 CPUs</p> </blockquote> <p>That&#39;s vertical scale, and you can only go so far before you hit a very hard brick wall. If you think about horizontal scale instead, you&#39;ll never have to worry about the architecture of your infrastructure. Can run on a handful of Raspberry PIs or a rack of servers, just as well.</p></pre>goomba_gibbon: <pre><p>Not all DBs horizontally scale, though. There is not enough information here to know for sure. We don&#39;t know the DB, any hardware specs, the time for those queries to execute, number of incoming requests etc.</p> <p>If you have tons of capacity on the DB server then why not run a couple of queries in parallel? It sounds like premature optimization otherwise.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

339 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传