What do you recommend for autocomplete?

agolangf · · 915 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I run jivedata.com and we are moving our ticker search from python to go for performance reasons. What are my options for an autocomplete library? I&#39;ve come across <a href="https://github.com/smartystreets/mafsa">https://github.com/smartystreets/mafsa</a> but not sure about it due to low number of contributors. Essentially a user would type in &#34;ap&#34; to search for &#34;Apple Inc.&#34; and it would return the company&#39;s name, ticker and sec&#39;s cik identifier. (We&#39;d also like for it to support a string &#34;contains&#34; also but that isn&#39;t essential).</p> <hr/>**评论:**<br/><br/>mwholt: <pre><p>Don&#39;t use mafsa - that was an experiment that only kind of worked. It&#39;s a bit clunky and needs more optimizing. (Anyone can feel free to contribute to it.)</p> <p>All you need for a simple, fast, in-memory autocomplete service is a radix tree. Here&#39;s a pretty solid one: <a href="https://github.com/armon/go-radix">https://github.com/armon/go-radix</a></p></pre>gogolang: <pre><p>I also use a radix tree for <a href="http://www.alphahat.com" rel="nofollow">http://www.alphahat.com</a> but I actually wouldn&#39;t recommend it for this use case. Radix tree works fine if you literally just want autocomplete in that you want to take a prefix and find all the strings that have that prefix. </p> <p>From my experience, that&#39;s actually not what you want. You want something that not only starts searching prefix but also searches substrings and misspellings then returns the results sorted by popularity. </p> <p>If you&#39;re ok paying for a solution, then you can use Algolia <a href="https://www.algolia.com/" rel="nofollow">https://www.algolia.com/</a> . It&#39;s basically an API that lets you do what I described above. </p> <p>I&#39;d personally prefer a native solution. Maybe we can collaborate on building one in Go! </p></pre>mwholt: <pre><p>He said all he wanted was to type &#34;ap&#34; and get &#34;Apple, Inc.&#34; - a radix tree does the job and does it well. It&#39;s not hard to implement simple fuzzy matching using edit distances with a tree, too.</p> <p>If you&#39;re doing a substring search, however, you want a different data structure, like a suffix array, or a fully-qualified search engine. Bleve is a good Go library for that.</p></pre>mschoch: <pre><p>Auto-complete is something we&#39;re planning on adding to Bleve. We liked the direction of mafsa, but lack of unicode support was kind of a deal breaker. We&#39;ll probably start with a radix tree and go from there. The hardest part has been finding the right level of integration. Using the search indexes term dictionary is useful for spelling corrections, but less useful for general auto-complete, which is better augmented with data from actual user queries.</p> <p>We&#39;re always seeking contributions to Bleve, so if there is interest in helping to get this done, lets talk.</p></pre>jamra06: <pre><p>This post kind of inspired me to look at this problem. I have this working with runes: <a href="https://github.com/jamra/LevenshteinTrie" rel="nofollow">jamra/levenshteintrie</a> It&#39;s just a POC. No unit tests were done and it&#39;s not working in production.</p> <p>It&#39;s not a mafsa, but a simple trie. Next, I&#39;ll create a mafsa. Where do you discuss features for Bleve? IRC, Google group?</p></pre>gogolang: <pre><p>Yeah, I know what he was asking for and I thought that&#39;s what I needed as well but when examining usage from actual users for the exact same use case on my site, I now realize I need a different solution. Just passing on the benefit of experience. </p></pre>jobenjo: <pre><p>I used elasticsearch. (Connected to the REST API with net/http). Took some setup, but works well.</p></pre>brentadamson: <pre><p>I&#39;ll have to take another look at ES.</p></pre>ecmdome: <pre><p>+1 for ES... Fuzzy search is perfect for what you&#39;re looking for. <a href="https://github.com/mattbaird/elastigo" rel="nofollow">https://github.com/mattbaird/elastigo</a> is a decent library, the documentation on github isn&#39;t up to date but the godoc is good</p></pre>calebdoxsey: <pre><p>How many companies? When I last worked with XBRL filings (which was admittedly a long time ago) there were surprisingly few companies doing it. If you&#39;re in the 1,000s then I&#39;d just do it in javascript (assuming this is a client-side thing).</p> <p>But even if that doesn&#39;t work I don&#39;t think you need to do anything really special for this. If you cache some of your most expensive searches (perhaps all the 1 and 2 letter combos) I think it would probably help.</p> <p>In fact you could do a hybrid solution, with your JS preload the most common really short searches, and then only hit the server for everything else. Typing &#34;A&#34;, then &#34;AA&#34; would then give the appearance of going really fast, and the data is relatively static. (maybe refresh it once a day)</p></pre>brentadamson: <pre><p>I&#39;m actually in the middle of giving up on XBRL...it&#39;s quite a pain if you want to do anything useful with ~18K different elements. There are 500 ways (really!) to calculate Revenue. It becomes even more painful if you want to calculate free cash flow or ebitda, etc. That and a recent bill by Congress to scrap the XBRL requirement for smaller companies made me just throw in the towel. We&#39;re going to be replacing our XBRL in the next couple of weeks with something much, much better....</p> <p>Thanks for the suggestions....I&#39;m gonna be rewriting our API in Go anyway (for performance reasons) so I might as well start w/ the tickers. I&#39;ll keep that JS stuff in mind in case we need to get it faster after that.</p></pre>gogolang: <pre><p>Yeah, XBRL is a major pain in the ass to use. I also started with XBRL and eventually just switched to Quandl data</p></pre>brentadamson: <pre><p>What kind of stuff are you after? Like I said our financials (and by extension screener) will be having a major upgrade soon so let me know your wishlist.</p></pre>dgryski: <pre><p>Also <a href="https://github.com/argusdusty/Ferret" rel="nofollow">https://github.com/argusdusty/Ferret</a></p></pre>gogolang: <pre><p>Awesome!</p></pre>jamra06: <pre><p>I made something a while back. github.com/jamra/gocleo</p> <p>It is used on an internal company search. It is a hybrid prefix and partial match alg that uses bloom filters for speed. </p></pre>brentadamson: <pre><p>How does this compare w/ <a href="https://github.com/duckduckgo/cpp-libface" rel="nofollow">https://github.com/duckduckgo/cpp-libface</a>? From what I can tell DDG would just be prefix search but probably faster? Yours might work better for us....</p></pre>jamra06: <pre><p>I suppose the difference is that a bloom filter can make your search space sub linear. It&#39;s kind of like cheating.</p> <p>This should be pretty darn fast. If you need something faster, there is a bit mashing version of the Levenshtein algorithm that I can implement, which would increase the speed. </p> <p>As far as how it compares with the library you linked... they are very different approaches. That library uses a tree. After reading the associated blog post, they make a good point about how many items to return. That is another point for optimization. </p> <p>If you want just a prefix search, you can remove the Levenshtein part and this would be much faster without much additional work. You can check out the link in the github page where the original creator of this algorithm gives a good writeup. I believe the java version is used by Linked In&#39;s autocomplete.</p></pre>brentadamson: <pre><p>That would be nice if you could speed it up. Although we can cache most of the queries (e.g. &#34;Apple, Microsoft, etc) we have some high volume users of our API that send high-volume queries that don&#39;t hit the cache very often.</p></pre>jamra06: <pre><p>Alright. I&#39;m going to write some benchmarks and start my changes. </p></pre>brentadamson: <pre><p>Also, I was looking at LinkedIn&#39;s autocomplete and they don&#39;t seem to be doing any Levenshtein stuff (unless I am misunderstanding how Levenshtein works)....I tried typing in &#34;brent adamsen&#34; with an &#34;e&#34; rather than an &#34;o&#34; and didn&#39;t get any results.</p></pre>jamra06: <pre><p>You are right. They don&#39;t do any levenshtein stuff. It&#39;s something I added in.</p></pre>mwholt: <pre><p>I haven&#39;t used it, but there&#39;s also <a href="https://autocompeter.com/" rel="nofollow">https://autocompeter.com/</a></p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

915 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传