Golang "only" 2k req/s

阅读 190 次  0 条评论
    <p>I&#39;m building an app and I&#39;ve built this very simple Go service to handle an endpoint that is high traffic and critical.</p> <p>The only thing it does is parsing some params and incrementing an hash key on Redis.</p> <p>It was my first time using Go, so I&#39;m not entirely sure if I made any dumb mistake, but the service starts to fail at around 3000 requests per second, and I&#39;m not sure why. The server is a 20 core 64GB Digital Ocean VPS, running Go 1.7, using Nginx as a reverse proxy to Go.</p> <p>Here&#39;s a simplified version of my code (only removed some params for brevity): <a href="https://gist.github.com/ruigomeseu/749955d9acc03000009e67f38df52f45" rel="nofollow">https://gist.github.com/ruigomeseu/749955d9acc03000009e67f38df52f45</a></p> <p>Nginx logs connection refused for 10% of the requests</p> <p>Are the 3000 req/s just too much to handle for a single server, or is there some change I can do to handle more requests? How can I debug what&#39;s going wrong?</p> <p>EDIT: Got rid of Nginx and Redis and requests still capping at around 5k req/s, so there must be some kind of system misconfiguration. The process limits are set correctly: <a href="http://jmp.sh/4LvLTyf" rel="nofollow">http://jmp.sh/4LvLTyf</a></p> <p>Port range: sysctl -a | grep port_range =&gt; net.ipv4.ip_local_port_range = 6000 61000</p> <p>EDIT 2: Turns out it was a dumb move testing it from my laptop. Testing from a Linode VPS got me the 25k req/s if not writing to Redis. However, if I add the Redis part, it maxes out at 2.5k req/s still.</p> <p>It shows errors now (a lot of these): 2017/01/09 17:57:23 http: panic serving 45.79.179.63:59998: dial tcp :6379: getsockopt: connection timed out</p> <hr/>**评论:**<br/><br/>bkeroack: <pre><p><code>incrementing an hash key on Redis</code></p> <p>Knowing nothing else, I would suspect this as your bottleneck if you are doing it synchronously in the handler. Redis is single-threaded, so performance will not scale with core count.</p></pre>FZambia: <pre><p>I bet on this too. Try using single connection to Redis in separate goroutine and batch your requests using Redis pipelining.</p></pre>ruigomeseu: <pre><p>I will look into goroutines and Redis pipelining. Still new to both Redis and Go, all of this goes way over my head. Thanks!</p></pre>FZambia: <pre><p>hm, actually on my machine I get 27k req/s with your code without any modifications. So it seems that your main bottleneck is something different - maybe your testing/setup environment. Pipelining and single connection still make sense but you probably better to find another reason. Try to exclude Redis from equation - do not send requests to it and see what happens.</p></pre>ruigomeseu: <pre><p>I&#39;m just now starting to run some tests.</p> <p>Removing Redis and Nginx from the equation and still get 25% timeouts (10s+) using Loader.io with 10k req/s.</p> <p>So I&#39;m thinking there&#39;s something wrong with my setup as you mentioned.</p> <p>cat /proc/ID/limits: <a href="http://jmp.sh/4LvLTyf" rel="nofollow">http://jmp.sh/4LvLTyf</a></p> <p>What other obvious things should I look at?</p></pre>FZambia: <pre><p>I&#39;d try to exclude Loader.io - you can install <code>wrk</code> for example and run it locally like this (this will send requests from 10 threads with 10 open connections during 10 seconds and will show latency report in the end): <code> wrk -t10 -c10 -d10s --latency http://localhost:5000/?param=1 </code></p></pre>ruigomeseu: <pre><p>Using wrk the req/s seems to cap out at 4k req/s.</p> <p><a href="http://pastebin.com/02QxKZVD" rel="nofollow">http://pastebin.com/02QxKZVD</a></p> <p>This is without Redis or nginx, just serving the request.</p></pre>FZambia: <pre><p>Could you also try with not so huge connections number, for example 10 (<code>-c10</code>) as I showed in example?</p></pre>ruigomeseu: <pre><p>Sure: <a href="http://pastebin.com/bd2nYuLV" rel="nofollow">http://pastebin.com/bd2nYuLV</a></p></pre>FZambia: <pre><p>Lol:) At least there are no socket errors now:) Those are very strange numbers. Here is the same bench of you server from my notebook: <a href="http://pastebin.com/kLsdZCeU" rel="nofollow">http://pastebin.com/kLsdZCeU</a></p> <p>Maybe something wrong with network - are you sure there is no any kind of proxy between? Do you have a possibility to run the same on another machine? I even doubt that profiling will help a lot in this case as the problem looks global...</p></pre>titpetric: <pre><p>You&#39;re checking limits for all pids (nginx, go, redis?) or just go?</p></pre>ruigomeseu: <pre><p>Right now I&#39;m just testing Go itself and still only getting 4k req/s.</p></pre>titpetric: <pre><p>Keep in mind to check limits on the other apps as well, they are process dependant (obviously), so a low limit on either redis or nginx will hurt your system as a whole.</p></pre>d0gsg0w00f: <pre><p>+1 for redis pipelining. Makes a huge speed difference when you batch even 100 requests at a time.</p></pre>titpetric: <pre><p>Well true, usually DA cpus are not great in this regard. One could use nutcracker/twemcached to use the cores for redis more effectively and shard the data over a few more instances. I still think however the likely culprit is some sysctl issue as I&#39;m pushing way more than 10k ops on legacy hardware to redis (and much more expensive ones too - i had to use 10 worker nodes to exhaust a single cpu core on one)</p></pre>xlab_is: <pre><p>Redis can do 150k/s increments on that single core.</p></pre>dlsniper: <pre><p>Not if it needs to open a connection every time. Also artificial benchmarks mean nothing </p></pre>xlab_is: <pre><p>Connections are persistent, there is a connection pool also in the OP&#39;s code. Also I&#39;m not mentioned artificial benchmarks, I&#39;m just saying that 3K/s is a joke throughput for a counter in Redis. Even if redis becomes a bottleneck, the CPU cores are nothing to do with it, it&#39;s probably because of network and disk use.</p></pre>dlsniper: <pre><p>Or because the CPU is hammered by someone else (noisy neighbor) or the CPUs are crap in DO. So many options...</p></pre>tty5: <pre><p>First of all make sure you&#39;re actually testing go:</p> <ol> <li><p>Remove nginx proxy - you don&#39;t need it, even in production</p></li> <li><p>Replace redis operation with something not relying on external service - even <code>&lt;- time.Timer(time.Millisecond)</code> will do</p></li> <li><p>Spin up 2nd droplet (a smaller one) and use siege/ab or something similar to test it</p></li> <li><p>Run go app in the background, not in the foreground pushing logs through your ssh connection</p></li> <li><p>Run the tests over http</p></li> </ol></pre>xlab_is: <pre><p>Try to debug connections to redis. Check the ulimit, for example.</p></pre>ruigomeseu: <pre><p>Ulimit for open files for both redis and Go are high enough. Seems to fail connecting to Go itself. Will update once I get some time to debug this.</p></pre>titpetric: <pre><p>What&#39;s the output of &#39;dmesg&#39;? Usually any packet drops on tcp level show up there. I&#39;d also suggest installing firehol/netdata to see if there are any obvious issues (high io, etc) during your benchmarks.</p></pre>ruigomeseu: <pre><p>dmesg shows:</p> <p>TCP: TCP: Possible SYN flooding on port 5000. Sending cookies. Check SNMP counters.</p> <p>I&#39;ll check those tools and report back.</p></pre>xtrusion: <pre><p>Are you able to use unix sockets instead of TCP for connecting to redis within your GO app? </p></pre>ar1819: <pre><p>While everyone suggest benchmark or optimization, I&#39;m, on the other side, would like to look on pprof output. </p> <p>Tutorial here: <a href="https://blog.golang.org/profiling-go-programs" rel="nofollow">https://blog.golang.org/profiling-go-programs</a>. There are also others, like <a href="https://medium.com/@hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5" rel="nofollow">https://medium.com/@hackintoshrao/daily-code-optimization-using-benchmarks-and-profiling-in-golang-gophercon-india-2016-talk-874c8b4dc3c5</a></p></pre>neopointer: <pre><p>You might be hitting the <em>net.ipv4.ip_local_port_range</em> limit. Check the numbers you are using by running <em>sysctl -a | grep port_range</em> and use <em>netstat</em> during your test to make sure you are not hitting the socket/file descriptors limit. You might have to tune your <em>sysctl</em> configuration.</p></pre>ruigomeseu: <pre><blockquote> <p>sysctl -a | grep port_range net.ipv4.ip_local_port_range = 32768 61000</p> </blockquote> <p>Not sure how to correctly interpret this result.</p> <p>Running netstat while running the test didn&#39;t result in any errors, although I&#39;m not sure if it was supposed to.</p></pre>titpetric: <pre><p>It&#39;s the ports any network service can use for communication (sending in your case). I would suggest some sysctl settings for sure, these work well for me on a high load nginx edge service:</p> <p>~~~ net.ipv4.ip_local_port_range=1024 65000 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_fin_timeout=15 net.core.netdev_max_backlog=4096 net.core.rmem_max=16777216 net.core.somaxconn=4096 net.core.wmem_max=16777216 net.ipv4.tcp_max_syn_backlog=20480 net.ipv4.tcp_max_tw_buckets=400000 net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_syn_retries=2 net.ipv4.tcp_synack_retries=2 net.ipv4.tcp_wmem=4096 65536 16777216 ~~~</p> <p>The settings are mostly dealing with connection allocation and reuse of sockets in FIN_WAIT states and so on. I&#39;d suggest reading up on them to not just be stabbing in the dark. Usually if you hit these problems the start is ulimit -c unlimited (or a very high number) and after that issues usually start with connection tracking (conntrack) and the speed your server is able to open and close these (keepalive reduces that but is not always an option)</p></pre>ruigomeseu: <pre><p>I&#39;ve tried those settings in a new sysctl.d file with local_port_range starting from 6000 instead and still maxing out at 4k req/s even after getting rid of nginx and redis.</p></pre>neopointer: <pre><p>If you <em>sysctl -a | grep port_range net.ipv4.ip_local_port_range</em> again, do you see that the configuration has changed?</p></pre>ruigomeseu: <pre><p>Yes:</p> <p>sysctl -a | grep port_range</p> <p>net.ipv4.ip_local_port_range = 6000 65000</p></pre>neopointer: <pre><p>Use <em>netstat -nltpa | wc -l</em> to count how many sockets you are using during your tests. If I am right, you should start to see some slowdown when the count is close to <strong>28232</strong> (61000 - 32768). You can use <em>grep</em> to make sure you are counting just redis sockets, nginx sockets, etc. After this test, I would suggest that you change only this value to the one <a href="/u/titpetric" rel="nofollow">/u/titpetric</a> suggested.</p> <blockquote> <p>net.ipv4.ip_local_port_range=1024 65000</p> </blockquote> <p>And run your tests again.</p></pre>ruigomeseu: <pre><p>It goes from 3300 to around 6000 when running the load test with 3k connections.</p> <p>I&#39;m using <a href="/u/titpetric" rel="nofollow">/u/titpetric</a> config but 6000 65000 instead. Still theoretically well below the limit.</p></pre>neopointer: <pre><p>what about the machine memory, load average and CPU %? how are they during the tests?</p></pre>ruigomeseu: <pre><p>40GB free ram.</p> <p>Go app shows up to 175% cpu usage</p> <p>load average: 1.51, 0.95, 0.89</p></pre>neopointer: <pre><p>If you&#39;ve got 40GB of ram, I wonder you have like 64 CPUs? If yes, then that load average is nothing... Load average should be less or equal than your CPU count.</p></pre>ruigomeseu: <pre><p>20 cores.</p></pre>titpetric: <pre><p>Load avg is a misleading metric, it just counts running processes that are doing something, generally it can be higher than your cpu core count (but will likely be lower anyway). The reason for this is that individual processes don&#39;t &#34;saturate&#34; each core in general (they may), but the relevant metric is more io as network/disk will usually be saturated way sooner than cpu. This is especially important for write intensive ops, like nginx logging. The access_log directive has some buffers= option that can behave nicer and flush larger chunks of logs to disk, making a less iops intensive workload</p></pre>Irythros: <pre><p>I run something <em>very</em> similar (but a lot more code) and can handle ~80k req/s. User connects, does a change in redis and then exits with some info.</p> <p>I need more information: </p> <p>1) Go version?<br/> 2) Redis version?<br/> 3) OS and version?<br/> 4) CPU, core speed and # of cores?<br/> 5) Memory?<br/> 6) How much memory is allocated for Redis?<br/> 7) Can you post/send your redis config?<br/> 8) What is the output of: <code>ulimit -n</code><br/> 9) What is the output of <code>ethtool -S eth0</code> . Please change eth0 to your public port.<br/> 10) What type of hardware? Is it dedicated or cloud?<br/> 11) Can you post your /etc/sysctl.conf ? </p></pre>ruigomeseu: <pre><p>As I&#39;ve mentioned in the edit note, I&#39;ve tried removing Redis and Nginx and still maxing out at 4k req/s, so I&#39;ll skip the questions about redis, as I&#39;ve concluded that&#39;s not the problem.</p> <p>Go version: go version go1.7 linux/amd64</p> <p>OS: Ubuntu 14.04.5 LTS</p> <p>CPU: 20 cores - Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz</p> <p>RAM: 64GB</p> <p>ulimit -n: 999999</p> <p>ethtool -S eth0: no stats available (eth0 is the right port though)</p> <p>Digital Ocean VPS</p> <p>sysctl as per another comment suggestion: <a href="http://pastebin.com/39X4hyyd" rel="nofollow">http://pastebin.com/39X4hyyd</a></p></pre>chrisdefourire: <pre><p>You&#39;re testing from your own laptop, aren&#39;t you ?</p> <p>So it seems to me you&#39;re actually benchmarking your mbp, and your internet connection.</p> <p>Please try again from your server (ie. benchmark localhost)... then from another Digital Ocean instance...</p></pre>ruigomeseu: <pre><p>I&#39;m setting up a Linode VPS with wrk and I&#39;ll test from there. Loader.io also reported timeouts, so I assumed it was the server fault.</p> <p>I&#39;ll report back with results </p></pre>ruigomeseu: <pre><p>Testing from a Linode instance got me 25k req/s without the Redis part.</p> <p>With Redis still just 2.5k req/s. Now it shows some errors though:</p> <p>2017/01/09 17:57:23 http: panic serving 45.79.179.63:59998: dial tcp :6379: getsockopt: connection timed out</p> <p>Will update op</p></pre>8bitcow: <pre><p>so that redis is the problem, which is weird, it can handle 25k rps easily. what if you setup redis on the same system and re-test, so u eliminate network + the other redis setup, the problem may reside there.</p></pre>chrisdefourire: <pre><p>Good.</p> <p>Now I&#39;d use a lower MaxActive param for the pool. 10k connections sounds a lot like &#34;no max&#34;... Try using 500 and see if you still have panics.</p> <p>Alternatively, you could check how many are created...</p> <p>I&#39;m not sure it&#39;s a good pattern to open thousands of redis connections anyway.</p> <p>Next I&#39;d try using pipelining to limit the number of redis connections, and round trips.</p> <p>Also check if your load generator is not the new bottleneck (it&#39;s less beefy than your server). If so, use more than one.</p></pre>anacrolix: <pre><p>Reuse connections with redis. Use a connection pool. <a href="https://godoc.org/github.com/garyburd/redigo/redis#Pool.Get" rel="nofollow">https://godoc.org/github.com/garyburd/redigo/redis#Pool.Get</a></p></pre>fortytw2: <pre><p>Ensure you have keepalive enabled between nginx and Go, too.</p></pre>tgulacsi: <pre><p>If you optimized everything, and need more performance, then you can try to use unix domain sockets between your processes (nginx &lt;-&gt; Go &lt;-&gt; Redis), as they mean half the overhead as localhost IP.</p></pre>dgryski: <pre><p>What are you using to benchmark?</p> <p>Can you connect to your Go service directly instead of through nginx? </p></pre>DocMerlin: <pre><p>Using ssl in golang directly instead of using ssl though an nginx reverse proxy is about 30% faster.</p></pre>earthboundkid: <pre><p>Counterpoint: <a href="https://blog.gopheracademy.com/advent-2016/exposing-go-on-the-internet/" rel="nofollow">You need to be very careful when you set timeouts for a server on the open Internet</a>.</p></pre>ruigomeseu: <pre><p>I&#39;m using Loader.io.</p> <p>Technically I can, but I would prefer if I could keep the reverse proxy through Nginx.</p></pre>dgryski: <pre><p>Being able to hit the Go service directly can let us know if the bottleneck is in the service or in the nginx&lt;-&gt;service connections.</p> <p>Also, Go rarely gives &#34;connection refused&#34; -- that sounds like the Go service is running out of file descriptors. Either raise the ulimit and (as has been suggested) ensure the nginx is able to use keep-alive connections.</p></pre>schumacherfm: <pre><p>I think ... IIRC ... the package github.com/garyburd/redigo/redis has somewhere a race condition. I&#39;m not sure. I haven&#39;t used it for a long time ... try to compile your code with -race and run again and see the error output of the race detector.</p></pre>

    0条评论

    资源评论:

    (您需要 登录 后才能评论 没有账号 ?)
    • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
    • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet