Go web page scraper

xuanbao · · 827 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Hey guys, first I&#39;d like to say thanks to everybody that helped me out later last week... So thanks!</p> <p>I&#39;ve got a decent sized project in mind I&#39;d like to make with Go, but I&#39;d like to create a really solid base first. For the moment, I&#39;d like to make a web page scraper that is, for now, grabbing specific info I want out of a specific web page. To give this some context, because I&#39;m terrible at describing things, I want to take the item names, rarity, capacity, value, and how to get text out of this web page: <a href="http://monsterhunter.wikia.com/wiki/MH4U:_Item_List">http://monsterhunter.wikia.com/wiki/MH4U:_Item_List</a> and then put them into a comma delimited text file.</p> <p>This is just for a personal project so I can gain some experience in the language and I won&#39;t be redistributing this websites info or anything like that.</p> <p>Do you guys know of any tutorials or articles I should read that would help me create this? All suggestions and tips are seriously greatly appreciated!</p> <p>I forgot to mention above that I&#39;ve been went through a couple tutorials like this: <a href="http://schier.co/blog/2015/04/26/a-simple-web-scraper-in-go.html">http://schier.co/blog/2015/04/26/a-simple-web-scraper-in-go.html</a> and the reason I&#39;m posting this question instead of just using google to find info is because the stuff you guys had was MUCH better than what I found and hearing multiple opinions/ideas is extremely helpful!</p> <p>Thanks again so much!</p> <hr/>**评论:**<br/><br/>gschier2: <pre><p>Hey there. I&#39;m the author of that blog post.</p> <p>I wrote that post after finishing <a href="http://tour.golang.org/" rel="nofollow">http://tour.golang.org/</a> and watching a few Go talks on concurrency. I personally find the best way of learning is to just try things, so I recommend building a base knowledge of Go concurrency patterns (very useful for a web scraper) and finding some non-go-specific posts on web scraper design (although not required).</p> <p>It&#39;s a pretty open-ended project and there are a lot of different way to go about it, so be creative and have fun! Also, screwing up the first five times you write it will teach you a lot more than getting it right the first time by reading someone else&#39;s tutorials! :D</p> <p>I&#39;d be happy to help out as well if you have any questions.</p></pre>Elegantmetal: <pre><p>Ok great thanks for the advice! Also thanks for that tutorial, it seriously helped</p></pre>Bromlife: <pre><p>Here&#39;s something: <a href="http://rockyj.in/2014/12/12/scraping_with_go.html" rel="nofollow">http://rockyj.in/2014/12/12/scraping_with_go.html</a></p> <p>You might also find this interesting: <a href="https://github.com/ernesto-jimenez/scraperboard" rel="nofollow">https://github.com/ernesto-jimenez/scraperboard</a></p></pre>Elegantmetal: <pre><p>Those both look really interesting thanks!</p></pre>stone_henge: <pre><p>My tip is to scrape from the <a href="http://monsterhunter.wikia.com/wiki/MH4U:_Item_List?action=edit" rel="nofollow">edit page</a>.</p> <p>Then you can write a simple regular expression to match all the fields, like <a href="https://regex101.com/r/jT3hR8/1" rel="nofollow">this</a>.</p></pre>everdev: <pre><p>Fastest performance would be a regex on the HTML response from the server.</p> <p>Fastest dev would probably be GoQuery: <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">https://github.com/PuerkitoBio/goquery</a></p> <p>Lots of tutorials on outputting content to a file. With CSV just be sure to scrub the data for commas before writing to your file.</p></pre>Simpfally: <pre><p>Goquery is good, even if it was a bit annoying to use jquery&#39;s doc to use goquery.</p></pre>KimIlYong: <pre><p>There is a blog post which describes in detail how to use GoQuery to crawl posts from reddit and stores it into a database.</p> <p><a href="http://intogooglego.blogspot.co.at/2015/05/day-7-goquery-html-parsing.html" rel="nofollow">http://intogooglego.blogspot.co.at/2015/05/day-7-goquery-html-parsing.html</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

827 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传