<p>I am about to start building a super basic chat app that will have to handle hundreds of thousands of concurrent connections from day one (this is being built for a existing community). It will use websockets and most likely RethinkDB.</p>
<p>I've only played around with small test websocket apps, and I am fairly comfortable with that code. However, I've never built anything for this kind of traffic before.</p>
<p>What are some things I should be aware of when I scale up my demo app? Any Tips? Tools? Will I need multiple servers or one big one?</p>
<p>Thanks!</p>
<hr/>**评论:**<br/><br/>bbrazil: <pre><p>Hundreds of thousands of simultaneous connections is not trivial, this will likely take some very careful design and architecting to build and scale. Even that number of open TCP connections will likely require system tuning. </p>
<p>What sort of request rate are you expecting? You should run through the numbers to get an idea of what sort of load will be involved. I've a <a href="http://www.robustperception.io/there-are-100000-seconds-in-a-day/">post</a> on estimation that should help with this.</p>
<p>One thing that seems to be common across this type of system is that status updates (e.g. online/offline) will be around 90% of your traffic.</p></pre>dermesser: <pre><p>A way to avoid this fanout is an IMO acceptable tradeoff: Store the presence status of each client in the database, and have every client poll for it (batched - for all "friends") every 60-120 seconds or on-request. This is good enough for most cases and avoids the HUGE fanout caused by realtime updates.</p></pre>VoidByte: <pre><p>Make sure you "dark launch" the app before you switch to it.</p>
<p>Set it up and program your front end to send x% of traffic to both the existing and the new system. Then once you have that you can also start comparing responses to make sure both are doing the right thing. Slowly ramp up the traffic until you have 100% going to the new app. Is it responding quickly? What do the metrics for errors/messages/etc look like?</p>
<p>Go specific things to worry about:</p>
<ul>
<li>Limit the number of goroutines per chatter. Each goroutine while only a small amount of memory adds up quickly. Many of our devs have talked about things like reducing number of goroutine from 6->5 per end user and the massive effect it has had on our large scale apps.</li>
<li>Watch your memory usage
<ul>
<li>Particularly with things like timers and tickers. Don't use time.Tick or time.After! time.Tick will leave a go routine permanently in the background that tries to tick. time.After won't get gc'ed until after the timer goes off. Instead use NewTicker and NewTimer. Make sure to add a defer to stop them.</li>
</ul></li>
<li>Graphite
<ul>
<li>This is more general purpose application information. Graphite is super useful for keeping an eye on error rates, message rates, etc. It lets you know what is actually going on.</li>
</ul></li>
<li>Hystrix as a bulkhead
<ul>
<li>Hystrix is a cool piece of technology based upon the ideas in "Release It!". Netflix then went and wrote the hystrix libraries. Other people then went and <a href="https://github.com/afex/hystrix-go">ported them to go</a>. It is a useful pattern where you can bulkhead your queries to other systems, like postgres. That way if postgres starts slowing down you can stop the rest of your system from slowing down.</li>
</ul></li>
<li>Go has builtin support for serving a pprof endpoint you can hit to dump a map of your memory usage. This can be useful for tracking down where some errant logging statement is actually creating a crap ton of garbage.</li>
</ul>
<p>Hope this helps!</p></pre>Yojihito: <pre><blockquote>
<p>Limit the number of goroutines</p>
</blockquote>
<p>I always wondered how that could be achieved? At the moment I have a simple </p>
<pre><code>if runtime.NumOfGoroutines() < 1000 {
}
</code></pre></pre>VoidByte: <pre><p>Sorry I'm not referring to limiting the total number of goroutines. Instead think of the number of goroutines that are required to service each user.</p>
<p>Lets have a hypothetical application that has the following goroutines that are spun up each time the user connects:</p>
<ol>
<li>Receive the data from the user</li>
<li>Send the data to the user</li>
<li>Send data to the backend</li>
<li>Monitor something for the user</li>
<li>Monitor something else for the user</li>
</ol>
<p>So in this hypothetical setup we have five goroutines per user. Thats 500k goroutines to serve 100k users. If we could combine goroutine 4 and 5 we could reduce our total goroutines by 100k.</p></pre>Irooniam: <pre><p>As Jerf already said, you're going to need multiple servers.</p>
<p>Since you're talking about chat and 100k+ connections, you will probably want to implement some form of broker-less system so as to prevent the broker being the bottleneck with such high message throughput.</p>
<p>I implemented a distributed chat /message bus using zeromq, using routers/dealers socket types. We had a in-memory presence/registry box that basically kept track of who was connected on what socket in what room. Anytime anyone connected the registry server would broadcast the relevant info (which server, socket id, user id, room, etc) to all the websocket servers. The websocket servers would then open talk to each directly over zeromq (peer-to-peer) and we didn't have to worry about the broker becoming the bottleneck. </p>
<p>I would highly recommend you read the zmq whitepaper and broker and brokerless systems:
<a href="http://zeromq.org/whitepapers:brokerless" rel="nofollow">http://zeromq.org/whitepapers:brokerless</a></p></pre>dermesser: <pre><p>+1 for using zeromq. Makes many things much easier. Plus it's an awesome piece of software.</p></pre>calebdoxsey: <pre><p>If you're going to use RethinkDB it will be doing most of the work for you. Each go server can be stateless.</p>
<p>You can round-robin DNS the servers, or have a front-end that redirects. (like <a href="http://www.example.com" rel="nofollow">www.example.com</a> -> chat-001.example.com) If you build the front-end such that it can auto-reconnect you won't need to do much for failure, just kick bad servers out of the pool.</p>
<p>So you will have:</p>
<ul>
<li>1 or more frontend, traditional http servers servicing your normal application</li>
<li>a bunch of chat http servers which basically proxy requests to RethinkDB</li>
<li>a bunch of RethinkDB servers</li>
</ul>
<p>You could save money by rolling your own, but obviously that's a lot harder. Redis might be an alternative to RethinkDB worth investigating. (BLPOP and BRPOP with a bunch of lists)</p></pre>jerf: <pre><ol>
<li>You will need multiple servers.</li>
<li>At your level of experience, you don't want to write that yourself, so you're going to want to grab something that exists.</li>
<li>The sort of software you're looking for is called a "message queue". I am not an expert in this field, but big names I see referenced a lot are: RabbitMQ, ZeroMQ, ActiveMQ. (You may notice a theme on the naming.) You may even be able to find one where you can hook up a websocket directly to the queuing server, removing a crap-load of code from you. Check permissions in that case (make sure getting other people's messages isn't as easy as just saying you're the other person without authentication).</li>
</ol>
<p>It is likely the message queue is 75% of the work here.</p>
<p>You may still find you need a Go server for business logic, sitting between the users and the message queues. It is well suited for this task and should scale quite well, if all you have are many thousands of goroutines hooked up to various queues.</p>
<p>You may also want to look around and see if there are open source chat servers that will meet your needs. An off-the-shelf scalable XMPP server may offer websockets to you directly, and even perhaps end-user libraries to support what you want to do.</p>
<p>Basically, in your position, if you find yourself writing code to marshal a data structure from one server to another, you've gone wrong somewhere. :) (One <em>user</em> to another, maybe sure, but you shouldn't be actually writing TCP socket transport code.)</p>
<p>The other thing that you should do is work out some profile of how the users use this system, and write a Go scaling tester that logs in as many thousands of users. You can at least get some idea of how well you've scaled; if you can simulate twice your users load, all of them acting twice as fast as normal users do, you're off to a good start. (Do leave yourself a buffer like that!)</p></pre>bcoop713: <pre><p>Could you be more specific on point 2. I won't want to write which part myself? Any examples of something I could grab?</p>
<p>Thanks!</p></pre>jerf: <pre><p>The multiserver message processor. Grab one of the mentioned software programs and build on those.</p></pre>dermesser: <pre><p>At his scale, a single message broker may not do the job. And ZeroMQ is not a message broker (but may help in implementing the system)</p></pre>jerf: <pre><p>I believe I was clear about needing multiple servers? And it is difficult to recommend a specific piece of software with a lot, <em>lot</em> more details.</p>
<p>I have to admit, whoever saw fit to downvote the only reply in this reddit post that <em>actually answers</em>, even if only partially, the question of how to implement this service, probably written by the only person in this discussion to have actually <em>implemented</em> a similar service based on my reading of everybody else's post, should probably hope they are never in this situation themselves.</p></pre>dermesser: <pre><p>Your reaction to being corrected on two small facts of your whole reply is highly unprofessional. </p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传