A newbie here, just have some questions (Web services related)

agolangf · · 528 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;m having trouble wrapping my head around GO, as I mainly come from a PHP background (think RESTful web services).</p> <p>I am scaling a database that has a lot of records in it at this stage, and I have a CRON job that runs every day at the same time. It takes an hour to complete in PHP. The CRON job updates/adds more records automatically.</p> <p>Perhaps I&#39;m not understanding how GO works in comparison to PHP and my real concern is, is it even possible to re-write this CRON job (executed on an Ubuntu server) in GO and see major performance improvements. I can tell going forward that as the database increases, the PHP CRON job will not suffice and GO seems like a great alternative to... PHP or god forbid - JAVA..</p> <p>Basically I want to use GO in a scripting type of way to execute one (relatively small) piece of logic (that takes a long unscalable amout of time in PHP).</p> <p>I appreciate any help/information on this! Thanks :)</p> <hr/> <p>Other q&#39;s:</p> <ul> <li>Can I just call one piece of GO code like PHP?</li> <li>Is GO good at fetching webpages and parsing them (e.g. XML)</li> <li>Is it possible to set up a series of GO scripts to work in a similar way to how PHP works to create APIs (CRUD a database and spit out JSON for example)</li> </ul> <p>I could be rewriting my APIs in GO if this is plausible </p> <hr/>**评论:**<br/><br/>mc_hammerd: <pre><p>yea, yea, yea, and yea</p> <p>the biggest speed gain would probably be making 8 go threads and doing the sql updates concurrently; i would test first and see if it does give you a speed increase.</p> <pre><code>arrs = split_updates_into_8_pieces() for i := 0; i&lt;8; i++ { go doUpdates(arrs[i]) } </code></pre></pre>gergo254: <pre><p>Yeah concurrent updates would be good if it&#39;s possible (we don&#39;t know what&#39;s the usecase).</p> <p>In this case the sql update is the slowest part, not the php code.</p></pre>H_o: <pre><p>In short, I have 3000 records that contain URLs.</p> <p>I pull each of these URLs and update my records (300,000+) based on the results (MySQL database).</p> <p>Perhaps the slow point is the loading of each of the web pages as CPU load seems to remain relatively low throughout the operation. I just want to investigate areas of potential improvements.</p> <p>In the future I want to do more text based analysis and GO seems to be a better option over PHP (e.g. processing text to categorize items). Thanks!</p> <p>edit: this 3000 number could grow to 10,000+ meaning more URLs to be retrieved.</p></pre>tmornini: <pre><blockquote> <p>Perhaps the slow point is the loading of each of the web pages as CPU load seems to remain relatively low throughout the operation</p> </blockquote> <p>This is <em>very likely</em> the issue.</p> <p>Go used properly can, effectively, fetch all of the URLs <em>in parallel</em>.</p> <p>This will use a fair, but likely manageable, amount of memory, spike the CPU usage, and very likely spike the CPU usage on the DB server as well.</p> <p>If the program and DB server run on the same server, you&#39;ll likely bake it out, but the job will get done quickly.</p> <p>And, if the load is too high, i.e. the front-end becomes sluggish and/or unreachable, you can make it fetch in smaller batches.</p></pre>H_o: <pre><p>Many thanks for this.</p> <p>Currently yes the DB and web server run on the same box - and its quite a small one too at that.</p> <p>I guess what I&#39;ve learned from this is that at the minute, the current process is actually suitable for now.</p> <p>Perhaps GO is not necessary considering the CPU load isn&#39;t the issue.</p> <p>I could probably implement a rolling process (in GO - I don&#39;t like the idea of a PHP script running indefinitely) that will fetch a record every 30 seconds and have it run continuously, or something along those lines.</p> <p>Or perhaps batch run every 2 hours or so - mainly because at e.g. 30 second intervals, with 3000 records, it would take approx 25 hours for each one to be covered. 500 records every 3 hours would make sense at the moment, which could be increased to 1000 when needed.</p> <p>I&#39;m completely bootstrapping this so I will likely wait until necessary to fire up another box - which will be used for this processing. I imagine I won&#39;t need to do that for some time though it seems.</p> <p>Sorry for rambling, just getting my thoughts out at the same time. Never had to deal with scalability issues before.</p> <p>Really appreciate all your help!</p></pre>tmornini: <pre><p>Consider this approach:</p> <p>1) a driver program runs regularly and writes a JSON object for each URL fetch into a notification topic (i.e. AWS SNS) urls_to_update.</p> <p>2) a queue (i.e. AWS SQS), urls_to_fetch, subscribes to urls_to_update</p> <p>3) one (or more!) workers pull from urls_to_fetch and perform a single HTTP request then writes a record to another SNS topic (urls_fetched) with the details</p> <p>4) a queue, urls_to_update_db, subscribes to urls_fetched</p> <p>5) one (or more!) workers pull from urls_to_update_db and updates the DB</p> <p>This is an event-driven micro-services architecture.</p> <p>It will allow you to add new tasks required for urls_to_update and urls_fetched events without changing existing code.</p> <p>It will allow you scale-out URL fetching and DB updating independently and, should it ever be required, to other machines as well.</p> <p>It will be a breeze to run and maintain. While it is, in a strict sense, less efficient (in CPU and particularly in RAM) than what you&#39;ve built, it will run quickly and put your CPU cycles to use.</p></pre>H_o: <pre><p>Huh interesting... I could essentially split my initial result-set that require updating into a number of parts and have them all updated concurrently. That&#39;s very interesting indeed! Thank you</p></pre>gngeorgiev: <pre><p>It&#39;s all great but:</p> <ul> <li>GO is not a scripting language, you will need to compile into an executable and run that on your server, which in any case is better than PHP</li> <li>You will surely gain improvements as long as your database can handle it. I guess, at some point it will be your bottleneck</li> <li>The answers to your other Qs are all <em>yes</em>, altho I do not understand the first one, but I think I answered it with my first point.</li> </ul></pre>H_o: <pre><p>Yes the first point did answer that, thanks! I was aware of the compilation so just wanted to make sure this was how you could do it.</p> <p>Appreciate it!</p></pre>aerth0x1A4: <pre><p>You can also run something like <code>go run main.go</code> instead of building etc</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

528 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传