<p>Hi,
I have been working with scrapy (python) for scrawling website . Now i want to try wiith Go .
I want to know can we scrawl authentication website with golang ?
What is advantages between goland and scrapy.org ?</p>
<hr/>**评论:**<br/><br/>ishanjain28: <pre><p>Hi, My friend and I run a bunch of websites that track different stock market websites and then alert users(I wrote the backend of all the websites). We were using scrapy+flask earlier because I was not very good at go. Then I learned go and started using it everywhere and really like it now.</p>
<p>The scrapy scraper took about 7-10 seconds from scrapping the website to alerting the user. Then I rewrote the whole thing in go and now It takes about 2 seconds to do the same thing.</p>
<p>The code is much cleaner and I really absolutely like goquery over scrapy. </p></pre>Morgahl: <pre><p>Main advantage in this situation will be Goroutines to drastically speed up the process.</p></pre>bjwschaap: <pre><p>Yes you could. Logging into a site with plain golang code would be something in the lines of:</p>
<pre><code> // Store session cookies after login
jar, _ := cookiejar.New(nil)
// Setup HTTP client, and give it access to the cookiejar
client := &http.Client{
Jar: jar,
}
// Create POST data to send to login form
log.Println("About to login user", *username)
urlData := url.Values{}
urlData.Set("sForm", "login")
urlData.Add("username", *username)
urlData.Add("password", *password)
// Prepare http POST request
req, _ := http.NewRequest("POST", "http://www.mysite.com/login", strings.NewReader(urlData.Encode()))
req.Header.Add("Content-Type", "application/x-www-form-urlencoded")
// Do the http POST to login
resp, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
// ..and we're logged in
log.Println(*username, "logged in succesfully")
resp.Body.Close()
</code></pre>
<p>In conjunction with <a href="https://github.com/PuerkitoBio/goquery" rel="nofollow">https://github.com/PuerkitoBio/goquery</a> this is very powerfull, especially if you use Go routines like Morgahl said. </p>
<p>I don't know Scrapy, but from quickly looking at the Scrapy documentation the overall experience/usage looks quite similar. Using goquery might be somewhat more verbose, but you could easily create convenience functions for that..</p></pre>lasizoillo: <pre><p>Reading comments... no one advantages. Many of them talks about velocity of golang without check default scrapy conf to be a good citizen. Their are unfair (and wrong) metrics.</p>
<p>Scrapy works with event programming, Go with n:m threading system. So probably go consume more ram (2k or 4k of stack size by each connection) but you don't worry about if your code is blocking or not. If your are doing complex things in your spider, probably go consume less cpu. But if your bottleneck its net your language is not important.</p>
<p>Scrapy its a complex and mature framework with a lot of side projects (for example frontera). Go its a language without a comparable framework. Maybe the answer is: use scrapy for complex parser and go for toy or specific ones. Maybe in a future something comparable to scrapy exists in go, but not for now.</p>
<p>And no, no and no, scrapy is not comparable with http+goquery.</p>
<p>PD: I was working with a modified scrapy with prioritized queues and a custom scheduler. Make it in go will be easier than in scrapy (program with twisted its hard), but i need anti-thotling stuff, xpath and css selectors, ... and many things with came with scrapy. Sometimes I dream with a scrago ;-)</p></pre>hell_0n_wheel: <pre><blockquote>
<p>Now i want to try wiith Go</p>
</blockquote>
<p>Then try, and get back to us when you have something to share.</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传