<p>Hey guys, first I'd like to say thanks to everybody that helped me out later last week... So thanks!</p>
<p>I've got a decent sized project in mind I'd like to make with Go, but I'd like to create a really solid base first. For the moment, I'd like to make a web page scraper that is, for now, grabbing specific info I want out of a specific web page. To give this some context, because I'm terrible at describing things, I want to take the item names, rarity, capacity, value, and how to get text out of this web page: <a href="http://monsterhunter.wikia.com/wiki/MH4U:_Item_List">http://monsterhunter.wikia.com/wiki/MH4U:_Item_List</a> and then put them into a comma delimited text file.</p>
<p>This is just for a personal project so I can gain some experience in the language and I won't be redistributing this websites info or anything like that.</p>
<p>Do you guys know of any tutorials or articles I should read that would help me create this? All suggestions and tips are seriously greatly appreciated!</p>
<p>I forgot to mention above that I've been went through a couple tutorials like this: <a href="http://schier.co/blog/2015/04/26/a-simple-web-scraper-in-go.html">http://schier.co/blog/2015/04/26/a-simple-web-scraper-in-go.html</a> and the reason I'm posting this question instead of just using google to find info is because the stuff you guys had was MUCH better than what I found and hearing multiple opinions/ideas is extremely helpful!</p>
<p>Thanks again so much!</p>
<hr/>**评论:**<br/><br/>gschier2: <pre><p>Hey there. I'm the author of that blog post.</p>
<p>I wrote that post after finishing <a href="http://tour.golang.org/" rel="nofollow">http://tour.golang.org/</a> and watching a few Go talks on concurrency. I personally find the best way of learning is to just try things, so I recommend building a base knowledge of Go concurrency patterns (very useful for a web scraper) and finding some non-go-specific posts on web scraper design (although not required).</p>
<p>It's a pretty open-ended project and there are a lot of different way to go about it, so be creative and have fun! Also, screwing up the first five times you write it will teach you a lot more than getting it right the first time by reading someone else's tutorials! :D</p>
<p>I'd be happy to help out as well if you have any questions.</p></pre>Elegantmetal: <pre><p>Ok great thanks for the advice! Also thanks for that tutorial, it seriously helped</p></pre>Bromlife: <pre><p>Here's something: <a href="http://rockyj.in/2014/12/12/scraping_with_go.html" rel="nofollow">http://rockyj.in/2014/12/12/scraping_with_go.html</a></p>
<p>You might also find this interesting: <a href="https://github.com/ernesto-jimenez/scraperboard" rel="nofollow">https://github.com/ernesto-jimenez/scraperboard</a></p></pre>Elegantmetal: <pre><p>Those both look really interesting thanks!</p></pre>stone_henge: <pre><p>My tip is to scrape from the <a href="http://monsterhunter.wikia.com/wiki/MH4U:_Item_List?action=edit" rel="nofollow">edit page</a>.</p>
<p>Then you can write a simple regular expression to match all the fields, like <a href="https://regex101.com/r/jT3hR8/1" rel="nofollow">this</a>.</p></pre>everdev: <pre><p>Fastest performance would be a regex on the HTML response from the server.</p>
<p>Fastest dev would probably be GoQuery: <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">https://github.com/PuerkitoBio/goquery</a></p>
<p>Lots of tutorials on outputting content to a file. With CSV just be sure to scrub the data for commas before writing to your file.</p></pre>Simpfally: <pre><p>Goquery is good, even if it was a bit annoying to use jquery's doc to use goquery.</p></pre>KimIlYong: <pre><p>There is a blog post which describes in detail how to use GoQuery to crawl posts from reddit and stores it into a database.</p>
<p><a href="http://intogooglego.blogspot.co.at/2015/05/day-7-goquery-html-parsing.html" rel="nofollow">http://intogooglego.blogspot.co.at/2015/05/day-7-goquery-html-parsing.html</a></p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传