prose: A library for text processing that supports tokenization, part-of-speech tagging, named-entity extraction, and more.

agolangf · · 353 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p><a href="https://github.com/jdkato/prose">prose is a Golang library</a> designed to aid in a number of tasks related to (English) text processing. Some of its features include:</p> <ul> <li>Splitting text on words, sentences, or arbitrary regexps.</li> <li>Part-of-speech tagging + named-entity extraction.</li> <li>Intelligently converting strings to title case.</li> <li>Counting the number of syllables in a word.</li> <li>Calculating readability metrics such as Flesch–Kincaid, SMOG, and Coleman–Liau.</li> </ul> <p>It&#39;s still under active development, but its core functionality is in place and fairly well tested.</p> <p>Looking forward to hearing any feedback or general thoughts.</p> <hr/>**评论:**<br/><br/>tv64738: <pre><p>Nice. As someone who&#39;s not actively working on NLP tasks, two notes:</p> <ul> <li><p><code>summarize</code> sounds like a thing that returns the gist of a longer text, like <a href="https://www.reddit.com/user/autotldr/comments/" rel="nofollow">https://www.reddit.com/user/autotldr/comments/</a></p></li> <li><p>it would be nice to see outputs of the examples for all of the code samples in the readme; reusing <code>go test</code> <code>Example</code>s would be worthwhile</p></li> </ul></pre>jdkato: <pre><p>Thanks for the feedback!</p> <ul> <li>This is actually something I&#39;m planning on adding to the package (my ultimate goal is readability + usage statistics, sentiment analysis, and some form of a TL;DR generator).</li> <li>Good idea.</li> </ul></pre>leadguit: <pre><p>Sounds interesting - in what languages? Meaning is it english only?</p></pre>jdkato: <pre><p>Yes, essentially. The <code>PragmaticSegmenter</code> (a sentence splitter) currently supports English, Spanish, and French -- but everything else is English-only for now.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

353 次点击  
加入收藏 微博
0 回复
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传