Scalable web application with google go the right way

polaris · · 535 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Hi guys, a few colleagues and me are planning a start up and we want to create a decent online market place. I heard a lot of good stuff about google go recently and started learning the language. Now i know how i can create my server but what would you suggest as database and client side? I thought about MongoDB and the Javascript engine AngularJS for client side. But i would like to hear what experiences and suggestions you have. Unfortunately there are not many books out there about the topic.</p> <hr/>**评论:**<br/><br/>Emperor_Earth: <pre><p>Keeping this short and simple </p> <p>Follow the lead of big companies dealing with big data </p> <p>You can watch youtube talks of google, fb, twitter, etc </p> <p>They all use mysql as their primary datastore </p> <p>Google (YouTube and spreading) uses Vitess to autoshard/scale MySQL without a caching db<br/> FB uses MemCached in front of MySQL, and they&#39;re rolling out MyRocks, a custom MySQL storage engine based off of their own RocksDB based off of Google&#39;s LevelDB </p> <p>On the frontend, FB does a lot more webapps and are the creators of React, ReactNative, Relay, and GraphQL </p> <p>Two things to consider when dealing with scale are: system scale and engineering scale </p> <ul> <li>For the former, you try to scale with multi-core distributed computing. Golang is great here. MongoDB is bad (google: <code>mongodb performance cliff</code>) but getting better. AngularJS neutral.<br/></li> <li>For the latter, you want to look at how much value each extra person will bring in. For a large product/service with thousands of engineers, it&#39;s very hard for each additional engineer to be able to contribute. A lot of people have trouble scaling from a 6-10 person team to even a 20 man team because they don&#39;t account for engineering scale. Readability of code, composability, enscapulation/loose coupling with tight cohesion, less magic... these are all hallmarks of code that scales well with larger teams. Golang is great for this: it&#39;s simple and composable. MongoDB is terrible as a general data store: a no schema document-store is akin to dynamic/loose typing, but tolerable as a blob store. AngularJS 1.x is bad: it&#39;s complicated and full black-box magic. I haven&#39;t looked at 2.x, but it&#39;s not out yet. ReactJS is the Golang equiv on the front-end. If you want something more mature but still omits the magic, look at EmberJS (Twitch uses this).<br/></li> </ul> <p>Either way, looking the the story of the language&#39;s creation is instructive: </p> <ul> <li>Golang started because Google was dissatisfied how much their teams were fighting C++, specifically on compilation times and coding complexity.<br/></li> <li>AngularJS started at Google as a faster form-prototyping DSL. It gained traction internally so they kept adding features. It&#39;s a project that outgrew its foundation due to catering to a more diverse audience than originally intended. Misko has the unenviable task of starting over with 2.x<br/></li> <li>ReactJS started because FB was having trouble with engineering scale. One example was a very complex ads product that was starting to gain unhealthy bus factors because no one wanted to deal with certain sections of code. It gained traction after being ported to FB Chat and fixing their recurring &#34;new message&#34; notification bug for good.<br/></li> <li>Relay/GraphQL started because the data management story for their newsfeed was terrible, especially with their fast 2-week iteration cycle coupled with their 2 year support pledge. With only 5-consumers (they have more) of their REST api, they&#39;re supporting potentially 270 variations of each endpoint.</li> </ul></pre>metamatic: <pre><p>These days <a href="https://nghenglim.github.io/PostgreSQL-9.5.0-vs-MariaDB-10.1.11-vs-MySQL-5.7.0-year-2016/?time=1" rel="nofollow">PostgreSQL outperforms MySQL</a> a lot of the time, and has the advantage of being <a href="https://en.wikipedia.org/wiki/ACID" rel="nofollow">an actual ACID database</a>. A lot of the time people just pick MySQL because it&#39;s what they know, but the community is <a href="http://www.techrepublic.com/article/postgres-pushes-past-mysql-in-developer-hearts/" rel="nofollow">starting to catch on</a>.</p> <p>Also, actual MySQL is a bit of a dead end at this point thanks to Oracle; if you go that route you should be looking at MariaDB as your MySQL-compatible option.</p></pre>Emperor_Earth: <pre><p>When I talk about MySQL, I&#39;m talking about both MySQL and MariaDB. Thanks for helping clarify that for the newer readers. </p> <p>PostgreSQL definitely more completely follows SQL and has ACID compliance. That said, you lose half of that when you begin sharding. For example, MySQL and PostgreSQL don&#39;t have good/any support intershard joins. (Vitess is changing that.) We&#39;ll also have to disagree w.r.t. performance/scalability side. Your metric is using MySQL 5.7&#39;s InnoDB engine, single-core, non-sharded, non-distributed. </p> <p>These details are highly significant because: </p> <ul> <li>MySQL has a much better distributed machine scaling story, largely as a result of the big 4 using it. For example, PostgreSQL doesn&#39;t have true MVCC support in replicas, due to their WAL design choice. PostgreSQL is tolerable for single data center replication, but requires significantly more bandwidth than MySQL for multi-center replication. In fact, Uber just <a href="https://eng.uber.com/mysql-migration/" rel="nofollow">made the switch</a> from PostgreSQL to MySQL for better scaling-- they wanted to fix a PostgreSQL write amplification problem. <del>That said, they made the classic move of fixing the symptom, not the problem.</del>Actually, I was conflating <a href="https://medium.com/@buckhx/unwinding-uber-s-most-efficient-service-406413c5871d#.p9kb1nqnw" rel="nofollow">articles</a>.</li> <li>InnoDB uses B-trees, whereas <a href="https://github.com/facebook/mysql-5.6/wiki/Getting-Started-with-MyRocks" rel="nofollow">MyRocks</a>, the separate engine I referred to, uses <a href="http://www-users.cs.umn.edu/%7Ehe/diff/p256-severance.pdf" rel="nofollow">LSM</a>. Talking about performance is always a tricky business, because the reality is you&#39;re talking about trade-offs between writes, reads, and space. There&#39;s a recent Harvard study on this, <a href="http://stratos.seas.harvard.edu/files/stratos/files/rum.pdf" rel="nofollow">The RUM Conjecture</a>, if you&#39;re interested. Benchmarks are always tricky, since it&#39;s easy to manipulate the environment to show one participant in better light: (keep db size small relative to ram, for example, for b-tree databases), but you can take a look at <a href="http://smalldatum.blogspot.com/2016/01/myrocks-vs-innodb-with-linkbench-over-7.html" rel="nofollow">Mark Callaghan&#39;s work</a>.<br/></li> </ul> <p>He also talks about RocksDB/MyRocks in his <a href="https://www.youtube.com/watch?v=s_MCe1noDz0" rel="nofollow">keynote</a> at Percona Live 2016. For reference, he&#39;s the head of database infrastructure guy at FB. This <a href="https://jira.mariadb.org/browse/MDEV-9658" rel="nofollow">issue</a> should be worth your while if you&#39;re looking at MariaDB/MyRocks. </p> <p>I actually most like the <a href="https://www.youtube.com/watch?v=5yDO-tmIoXY" rel="nofollow">Vitess</a> story that Google has employed at YouTube since 2011. I&#39;ve been talking with one of their developer&#39;s on adding support for the MyRocks engine, but that will take some time. Internally, they use InnoDB. With their setup, you connect to a vtgate that acts as a reverse-proxy load-balancer with Kubernetes behind that managing your shards. While this creates more ops/dbm work, as a developer, you can query to a multi-sharded database as if it&#39;s just one database. So much of data access/management code you would otherwise have to pollute your business logic code is abstracted out by Vitess. I&#39;d really recommend taking a look at it! </p> <p>Fortunately, changing database engines is trivial pre-beta. This is personally significant since I&#39;m also actually building a startup chasing hockey-stick growth. So, we&#39;re starting with a single MariaDB instance on InnoDB while prototyping. We&#39;re converting to MyRocks as soon as the engine is ported to MariaDB and then setting up a Vitess cluster when we need to scale out.</p></pre>metamatic: <pre><blockquote> <p>Your metric is using MySQL 5.7&#39;s InnoDB engine, single-core, non-sharded, non-distributed.</p> </blockquote> <p>InnoDB: That&#39;s so that we&#39;re comparing like with like.</p> <p>single-core: One of the things MySQL is often criticized for is making poor use of multiple cores, which is why people go for sharding. By looking at single core benchmarks you can cut that out of the equation.</p> <p>non-distributed: Well, you can scale anything if you throw enough instances at it.</p> <p>But I&#39;d be interested to see some benchmarks for comparable sharded distributed setups, if you&#39;ve got &#39;em.</p> <blockquote> <p>Uber just made the switch from PostgreSQL to MySQL for better scaling</p> </blockquote> <p>Yes, but Uber&#39;s move back to MySQL from PostgreSQL was news because most stories are about going in the other direction -- including their own story a couple of years previously. Since the Uber article some <a href="http://blog.2ndquadrant.com/thoughts-on-ubers-list-of-postgres-limitations/" rel="nofollow">have argued that they were mistaken on many of their reasons</a> for the re-migration. Also, they <a href="https://news.ycombinator.com/item?id=12217179" rel="nofollow">still use postgres</a>, just not for that particular table.</p> <p>Anyway, I would say that PostgreSQL is a better <em>general purpose relational</em> database. Once you get into massively scaling particular narrow problems, you&#39;re generally going to need to evaluate needs very carefully, and there won&#39;t be one easy answer. You might even end up with one of the big commercial databases.</p> <blockquote> <p>Fortunately, changing database engines is trivial pre-beta.</p> </blockquote> <p>My biggest problem with MySQL is the laxness of error checking and lack of standards compliance, making migration away from it less trivial than it should be.</p></pre>darkmagician2: <pre><p>Check out Polymer for the frontend instead of Angular, Web Components are the future :) </p></pre>alok4: <pre><p>But still it&#39;s not in production by any major players . And Google is better on killig projets , so you really don&#39;t know when they will decide to kill that project for something else . </p></pre>m3wm3wm3wm: <pre><p>The new version of youtube is built on Polymer.</p></pre>darkmagician2: <pre><p>Web components are used by a huge amount of major companies <a href="https://github.com/Polymer/polymer/wiki/Who&#39;s-using-Polymer%3F" rel="nofollow">https://github.com/Polymer/polymer/wiki/Who&#39;s-using-Polymer%3F</a></p> <p>Also Polymer won&#39;t get canceled unless Chrome does, Polymer is part of the Chrome team</p></pre>zettamaster: <pre><p>is not better, is simply diferent</p></pre>dahlma: <pre><p><a href="/u/Emperor_Earth" rel="nofollow">/u/Emperor_Earth</a> has great advice, to add to that - I would not use MongoDB. MySQL is plenty fast if you use it the correct way.</p> <p>I write applications that receive tons of traffic, the newest one avgs around 43k req/minute spiking daily around 120k req/minute with no significant difference in memory or CPU usage and I use MySQL for the backend, some Redis and have been recently testing out Aerospike to see how performance is, btu MySQL and Redis have been the backbone.</p> <p>A few tips:</p> <ul> <li>Your data should be accessed through the following steps if possible: <ul> <li>Is it in the application cache? If not then...</li> <li>Is it in Redis? If not then...</li> <li>Is it in MySQL?</li> </ul></li> <li>Then you want to write workers that can check caches to see if any updates in the database have been made. You don&#39;t want it constantly recaching the same contents over and over again if you don&#39;t have to. A single call to redis is much quicker than a single call to MySQL, and something like an online market place will have some pretty complex queries to handle.</li> <li>I don&#39;t know what the community&#39;s opinion is on gorm: <a href="https://github.com/jinzhu/gorm" rel="nofollow">https://github.com/jinzhu/gorm</a> but it&#39;s by far one of my favorite things about golang. Their auto-migrations are great, and the fact that I can mark up how indexes should be structure from within my code make managing the database very easy, especially when pushing to production - no more forgetting to make SQL schema updates.</li> </ul> <p>For the client-side, AngularJS is a great option, it&#39;s easy to pick up and run with.</p></pre>MALE_SHOEGAZE: <pre><blockquote> <p>especially when pushing to production - no more forgetting to make SQL schema updates.</p> </blockquote> <p>what on earth are you doing that this is an issue??</p></pre>dahlma: <pre><p>Sorry not sure I understand? If I have to add new tables or fields, using auto migrations and having that checked when the application starts ensures that the scheme is correctly set up the same way as it was on dev, and it all happens automatically.</p> <p>Also when I add a value to a struct, the field is automatically added to the database while on dev, saving me some steps.</p></pre>thewhitetulip: <pre><p>Are you sure using a wrapper like gorm is a good thing? I mean I had done a bit of ORM in Django, and there was no way to get the &#34;current user&#34; for the request to fetch the particular items from the db, without writing some serious stuff.</p> <p>Just curious how to handle db in a scalable way. ORM or no ORM?</p></pre>RalphCorderoy: <pre><p>Small point, it&#39;s not &#34;Google Go&#34;, it&#39;s just &#34;Go&#34;, &#34;the Go programming language&#34;, or &#34;golang&#34; when you&#39;re searching, tagging, etc. Perhaps this should be in the Go FAQ.</p></pre>thewhitetulip: <pre><p>not golang :-) people get mad at you when you call it as golang.</p></pre>weberc2: <pre><p>Go is excellent on the backend, it&#39;s dead simple to get up and running, the deployment story is best-in-class, great tooling, strong ecosystem, etc.</p> <p>Our organization started with Mongo and Python + CoffeeScript. We ditched Mongo because it turns out that the unstructured approach doesn&#39;t scale well as your application becomes more complex and your engineering organization grows beyond a small team. Python and CoffeeScript scale a little better, but the dynamic typing poses the same problem as NoSQL--the lack of structure makes it hard to communicate as your team and application complexity grow (never mind the classes of errors that are entirely precluded). Basically by foregoing static typing, you lose a lot of tooling functionality--goto definition, intelligent autocomplete, all sorts of static analysis, etc.</p> <p>I really recommend you use a SQL database or at least plan to migrate early. Similarly, just use static languages out the gate.</p></pre>rap3: <pre><p>Thanks for the loads of information you provided me :) I found Memecached really exciting. I think ill write a small prototype using it with MySql and golang to get a feeling for it. Some of you talked about scaling the system using a cluster of servers rather than just a single server instance. Our team currently works with AppEngine to get a MVP done fast and easy. When i understood the concept of AppEngine or other hosting providers like AWS or Heroku correct, they take care on load balancing, fail- saveness rather then letting the developer deal with this. Does the same apply for Memecached solutions, or do i miss any basic concept behind this?</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

535 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传