Parallel grep over compressed files: pcgrep

agolangf · · 528 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p><a href="https://github.com/natefinch/pcgrep">https://github.com/natefinch/pcgrep</a></p> <p>I wrote this in a half hour or so, in response to a friend saying he&#39;d spent &#34;several hours&#34; writing something similar in python. Then someone told him about GNU parallel, and he figured he&#39;d just do it in bash. But I thought it was an interesting problem, so I coded up a little solution. Far from complete, but it&#39;s functional and pretty quick (both to write and to run).</p> <p>It&#39;s currently parallel per-file... I added code to run the regex matching in goroutines, which made it faster to search a single file, but the overhead of copying the line data to the goroutines actually slowed it down for num_files &gt;= num_cores... possibly one could add more sophisticated logic to choose how parallel to go based on the number of files, but I was about done with spending time on it, since it wasn&#39;t really filling a need I had.</p> <p>Figured I&#39;d just post this here in case anyone is interested.... feel free to suggest optimizations, I didn&#39;t really spend much time optimizing other than trying out the aforementioned matching in goroutines.</p> <hr/>**评论:**<br/><br/>natefinch: <pre><p>Oh, and a key thing I learned: when you call bufio.Scanner.Bytes(), you&#39;re given a slice of bytes that does no allocation, and if you try to use that slice asynchronously, it will get overwritten the next time you call Bytes(). Took me a while to figure out why I was getting odd output. Always read the docs, folks!</p></pre>volker48: <pre><p>That has bitten me as well.</p></pre>barsonme: <pre><p>Same.</p> <p>I was building a trie and decided to use the Bytes method instead of Text and couldn&#39;t figure out why nothing was inside my trie.</p> <p>RTFM is a good idea

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

528 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传