Hey guys, first I'd like to say thanks to everybody that helped me out later last week... So thanks!
I've got a decent sized project in mind I'd like to make with Go, but I'd like to create a really solid base first. For the moment, I'd like to make a web page scraper that is, for now, grabbing specific info I want out of a specific web page. To give this some context, because I'm terrible at describing things, I want to take the item names, rarity, capacity, value, and how to get text out of this web page: http://monsterhunter.wikia.com/wiki/MH4U:_Item_List and then put them into a comma delimited text file.
This is just for a personal project so I can gain some experience in the language and I won't be redistributing this websites info or anything like that.
Do you guys know of any tutorials or articles I should read that would help me create this? All suggestions and tips are seriously greatly appreciated!
I forgot to mention above that I've been went through a couple tutorials like this: http://schier.co/blog/2015/04/26/a-simple-web-scraper-in-go.html and the reason I'm posting this question instead of just using google to find info is because the stuff you guys had was MUCH better than what I found and hearing multiple opinions/ideas is extremely helpful!
Thanks again so much!
评论:
gschier2:
Elegantmetal:Hey there. I'm the author of that blog post.
I wrote that post after finishing http://tour.golang.org/ and watching a few Go talks on concurrency. I personally find the best way of learning is to just try things, so I recommend building a base knowledge of Go concurrency patterns (very useful for a web scraper) and finding some non-go-specific posts on web scraper design (although not required).
It's a pretty open-ended project and there are a lot of different way to go about it, so be creative and have fun! Also, screwing up the first five times you write it will teach you a lot more than getting it right the first time by reading someone else's tutorials! :D
I'd be happy to help out as well if you have any questions.
Bromlife:Ok great thanks for the advice! Also thanks for that tutorial, it seriously helped
Elegantmetal:Here's something: http://rockyj.in/2014/12/12/scraping_with_go.html
You might also find this interesting: https://github.com/ernesto-jimenez/scraperboard
stone_henge:Those both look really interesting thanks!
everdev:My tip is to scrape from the edit page.
Then you can write a simple regular expression to match all the fields, like this.
Simpfally:Fastest performance would be a regex on the HTML response from the server.
Fastest dev would probably be GoQuery: https://github.com/PuerkitoBio/goquery
Lots of tutorials on outputting content to a file. With CSV just be sure to scrub the data for commas before writing to your file.
KimIlYong:Goquery is good, even if it was a bit annoying to use jquery's doc to use goquery.
There is a blog post which describes in detail how to use GoQuery to crawl posts from reddit and stores it into a database.
http://intogooglego.blogspot.co.at/2015/05/day-7-goquery-html-parsing.html
