Is there a library for parsing sections of a file?

blov · · 386 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Here&#39;s my problem. I am trying to parse text files and extract portions of them. For the sake of argument, we can assume that the text files are small enough to fit in memory (say - 50 kilobytes max). The files are constructed in such a way:</p> <pre><code>-- section: foo this is the content for section foo -- end -- section: bar this is the content for section bar -- end </code></pre> <p>the sections are <em>not</em> dynamic. that is - the file should always contain the section &#39;foo&#39; and &#39;bar&#39; but may, or may not contain a section &#39;baz&#39;. the order of the sections may be random.</p> <p>what I would like to receive at the output would be:</p> <p><code>(some parser object).getSection(&#39;foo&#39;) =&gt; []byte</code></p> <p>or something of similar matter.</p> <p>I have already written a function which parses this but I am not sattisfied in how it operates. Basically, I am reading the file byte by byte and I am checking if n bytes matches one of the tokens. If so, I am saving start and end positions of the sections. I am pretty sure there is a library which does such a thing, I just can&#39;t find it because I can&#39;t seem to define a class of a problem I am trying to solve ;)</p> <p>I&#39;ve also thought of reading the file in memory and doing a regexp but that just seems as a wrong approach. I have seen that go has <code>text/scanner</code> package, but I haven&#39;t been able to determine whether it&#39;s a good approach</p> <p>thanks in advance!</p> <hr/>**评论:**<br/><br/>justinisrael: <pre><p>If this is the complete spec of the file then it doesn&#39;t seem complex enough to warrant a grammar/parser library. Looks like you can just use a Scanner to scan lines. If it&#39;s a start section, save the name. Then read lines into a body until you hit the end section. There are only two tokens to look for. </p></pre>justinisrael: <pre><p><a href="/u/icholy" rel="nofollow">/u/icholy</a> beat me to it, but here is another version which keeps the results structures, and with the ability to preserve sections order in addition to looking up a specific section: <del><a href="https://play.golang.org/p/D5GvjR8ksV8" rel="nofollow">https://play.golang.org/p/D5GvjR8ksV8</a></del></p> <p><em>Edit:</em> I liked the approach icholy took with a nested scan for the section body and end, so I cleaned up my version a bit more: <a href="https://play.golang.org/p/IStu-G5CuQd" rel="nofollow">https://play.golang.org/p/IStu-G5CuQd</a></p></pre>icholy: <pre><p><a href="https://play.golang.org/p/rj-4e1hXjyo" rel="nofollow">https://play.golang.org/p/rj-4e1hXjyo</a></p></pre>toudi: <pre><p>Thank you very much for the help!</p></pre>Killing_Spark: <pre><p>If you run into problems with memory you could change your format slightly by requiring the first lines to be an &#39;index&#39; of sections available and where they start in the file. Then you dont need to read the whole file to find the last section</p></pre>justinisrael: <pre><p>But reading in the whole file already isn&#39;t necessarily requires. One could scan lines until they read the desired section. </p></pre>Killing_Spark: <pre><p>&#39;last&#39; section. Always assume the worst case that could happen. Also you could assume that the first section takes 90% of the file and if you search for any others you need to scan all of these lines from the first section. I know this is probably not really necessary as he said tge files would probably be in the kib range, but thinking about scalability is never wrong ;) </p></pre>justinisrael: <pre><p>Or you could parse the section names once and store the offsets in the Parser with the assumption they are valid for the life of the parser. Then you don&#39;t have to change the format and you can still scan the file once. </p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

386 次点击  
加入收藏 微博
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传