re2dfa - regular expressions into finite state machines

xuanbao · · 1026 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p><a href="https://github.com/opennota/re2dfa">https://github.com/opennota/re2dfa</a></p> <p>This command-line tool generates Go source files containing matching functions accepting <code>string</code> or <code>[]byte</code> and returning a length of the match at the beginning of the data.</p> <p>Usage:</p> <pre><code>re2dfa ^a+$ main.matchAPlus string </code></pre> <p>This will generate a FSM for the regular expression <code>^a+$</code> and output a Go source file with the package <code>main</code> and the matching function named <code>matchAPlus</code> accepting a <code>string</code>.</p> <ul> <li>All patterns are anchored at the beginning of the data, as if the pattern starts with <code>^</code>.</li> <li>Non-greedy matches are not yet supported.</li> <li>Generation can be slow for regular expressions containing broad unicode ranges, e.g. <code>.*</code> or <code>[^a-z]</code>. I have some ideas about how to optimize this, but just now am too lazy to implement them.</li> </ul> <p>Otherwise, if you found a regexp for which an incorrect FSM is generated, please, submit an issue.</p> <hr/>**评论:**<br/><br/>dgryski: <pre><p>Also <a href="http://www.colm.net/open-source/ragel/">http://www.colm.net/open-source/ragel/</a></p></pre>opennota: <pre><p>It has its own syntax that you need to learn if you want to use it.</p></pre>dgryski: <pre><p>Yes, I spent the time to create a scanner for a toy language so I would have the framework ready for when I needed or for real. <a href="https://github.com/dgryski/dpc/blob/master/lexer.rl" rel="nofollow">https://github.com/dgryski/dpc/blob/master/lexer.rl</a></p></pre>opennota: <pre><p>I once played with it a bit and found that I&#39;m not that talented to write parsers in Ragel. Go is a whole another thing, though.</p></pre>Qinsd: <pre><p>Perhaps a dumb question, but isn&#39;t this / shouldn&#39;t this be done automatically at compile time for the standard regexp library given a constant string?</p></pre>opennota: <pre><p>Not exactly. The current implementation compiles (at runtime) the regular expression into a series of instructions and runs it in a sort of virtual machine.</p> <p><a href="https://swtch.com/%7Ersc/regexp/regexp2.html" rel="nofollow">https://swtch.com/~rsc/regexp/regexp2.html</a></p> <p>As a result, the generated FSM can be 10x faster.</p></pre>fl1pflop: <pre><p>Why is compiling to FSM not the way the regexp package handles regexes? Is compiling that much more complex to do?</p></pre>opennota: <pre><p>No idea. I&#39;m not the guy that you should ask.</p></pre>dgryski: <pre><p>Because Go doesn&#39;t have the two-stage compilation that would be required to implement this.</p> <p>However, a regex engine coupled with &#39;go generate&#39; could be used. For example, a tool could extract &#39;go generate regex&#39; comments and expand a template to produce ragel or ra2dfa code that performs the appropriate matching.</p></pre>sfxpt: <pre><p>Interesting. How am I suppose to use this in this scenarios: for RegexpA, do ActionA, for RegexpB, do ActionB? Is it possible? </p></pre>opennota: <pre><p><code>if match1(s) &gt;= 0 { action1() } else if match2(s) &gt;= 0 { action2() }</code> ?</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

1026 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传