hunkee - faster way to parse strings to structs

xuanbao · · 80 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;ve encountered that we have no other way to parse raw logs besides regular expressions. Also i was interested, we can easily read data from database into tagged exported fields, why we can&#39;t do the opposite So i wrote <a href="https://github.com/awskii/hunkee" rel="nofollow">a library</a> for parsing data without need of regular expressions. You should provide format string and structure with tagged fields to initialize parser and provide log entry line and structure to parse entry line into it. And it&#39;s 2.5x faster than equialent work with regexp.</p> <p>Project is small, issues\PRs are welcome.</p> <hr/>**评论:**<br/><br/>joushou: <pre><p>This seems like a nice solution to a problem you hit, but I&#39;m going to go on a bit of a rant here:</p> <p>The reason that there &#34;is no other way&#34; to parse human readable log files beside regular expressions, is that human readable formats are only meant to be read by humans. Parsing it with regexps is just a quick hack.</p> <p>If your goal is to have <em>machines</em> read the log, it should be a machine-readable format. Preferably binary, but something like newline-separated-JSON objects make it readable for humans too. Your example code shows what seems to be a temperature readouts (thermostat info?), which is a very good example of something that was probably meant to be <em>machine</em> readable, not human readable.</p> <ul> <li>If your priority is machines, and no human eyes need bother: Protobuf, msgpack, custom binary formats.</li> <li>If you want both machine and human readable: newline-separated JSON, CSV-style outputs (so you just split on a separator and do simple parsing).</li> <li>If your priority is humans, and no machine will ever see this: Regular textual logs.</li> </ul> <p>Your library looks neat for when you&#39;re stuck with someone else giving you textual logs, but if you have the option, <em>don&#39;t parse textual logs</em>.</p></pre>awskii: <pre><p>Thank you for comment. I&#39;m trying to solve exactly such case when the only inputs you have is a log file with human readable messages, and I have no other options. </p></pre>Emacs24: <pre><blockquote> <p>And it&#39;s 2.5x faster than equialent work with regexp.</p> </blockquote> <p>I posted a regexp alternative for log parsing last summer here, <a href="https://github.com/sirkon/ldetool" rel="nofollow">https://github.com/sirkon/ldetool</a> (special DSL with codegenerator into Go with all boilerplating and production-ready checks), and it is not just 2.5 times faster, rather tens or even hundreds times in case of lengthy lines.</p></pre>dgryski: <pre><p>Also <a href="https://medium.com/@dgryski/speeding-up-regexp-matching-with-ragel-4727f1c16027" rel="nofollow">https://medium.com/@dgryski/speeding-up-regexp-matching-with-ragel-4727f1c16027</a></p></pre>
80 次点击  
加入收藏 微博
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传