bleve: a modern search indexing library for Go, with examples

Sourcegraph · · 9150 次点击 · · 开始浏览    
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

Marty Schoch (@mschoch) is an engineer at Couchbase, maker of the high-performance NoSQL distributed database of the same name. Working with Go for almost 2 and half years, he has been using it to prototype new solutions at Couchbase. This talk introduces bleve, a text search indexing library for Go. The slides for this talk have been posted here.

Bleve (pronounced BLEH-vee) is a modern text indexing library written for Go. It supports a variety of features commonly found in search indexers, including filtering, ranking, and faceting.

When you say “search index,” the names that come to mind are typically Lucene, Elasticsearch, and Solr. Those systems are great, especially if you’re already using Java and the JVM, but sometimes you don’t want to pull in that dependency or you want to avoid standing up yet another external service that can complicate deployment.

The Couchbase team wondered, How easy would it be to build a Go library that supported the most commonly used text analysis components of Lucene and that could use an off-the-shelf key-value (KV) store as its underlying data store? Thus, bleve was born.

Here were some of the key points of how they approached building bleve:

  • They initially focused on the most commonly used text analysis components of Lucene.
  • Go interfaces allow users to fill in the gaps with components for their own specific languages and domains.
  • They avoided coming up with a new custom file format. There are many interesting KV stores on the market that can serve as underlying data stores. They currently support LevelDB, Bolt, and ForestDB.

Here’s the set of features bleve currently supports:

  • You can index any Go structure (strings, numeric values, and dates are supported).
  • Search: Term, Phrase, Match, MatchPhrase, Boolean, Fuzzy, Numeric Range, Date Range.
  • Search results with TF/IDF scoring, contextual snippets, and term highlighting.
  • Search result faceting (by term, numeric value, and date).

Getting Started

Installing bleve is easy. Just use the go get command:

$ go get github.com/blevesearch/bleve/...

Including the trailing /... will also install some helpful command-line utilities.

In just 26 lines of code, we can create our first index:

The mapping is a default Index Mapping. The Index Mapping is responsible for describing how your documents should be mapped into the index. The default mapping is designed to work well out of the box, but you’ll want to revisit this to improve the quality of your search results.

The call to the New() function takes two parameters. The first is the path to a directory where the index will be stored and the second is the mapping to be used for this index.

The call to the Index() method takes two parameters. The first is a unique identifier for the document, and the second is the document (a Go struct) to be indexed.

Now that we’ve created an index, we want to open it and search:

The call to the Open() function only takes a single parameter, the path to the index. The mapping is not needed, as it was serialized into the index at the time of creation.

The query describes what we’re looking for. In this case, it is a TermQuery, the simplest kind of query. Term queries look for an exact match of the specified term in the index.

The request describes how the results should be returned. It can control how many results are returned, and whether or not stored fields or facets should also be returned. In this case we use a default request, which will return the first 10 matching documents.

When we run this example we get:

$ ./search_index
    1 matches, showing 1 through 1, took 70.722µs
            1. m1 (0.216978)

This shows the one document we put into the index does match this query.

Indexing Real World Data

To see more of the features in action, let’s index the GopherCon India schedule. We’ll map the data into the structure below:

Now let’s try a more interesting search. This time we’ll do a phrase search for “quality search results”.

When we run this example we get:

$ ./phrase_search_schedule
    1 matches, showing 1 through 1, took 1.73394ms
            1. bleve_-_modern_text_indexing_for_go (1.033644)
            description
                …earch component. But delivering high quality search results requires a long list of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go…
                    summary
                        bleve - modern text indexing for Go
                            speaker
                                Martin Schoch

Now let’s try one more example. So far all the queries we’ve executed have been built programmatically, but sometimes it’s useful to allow end-users to build their own queries. To do this we use a QueryStringQuery:

This particular QueryString shows many options in use:

  • Prefixing with + or - changes that clause to a MUST or MUST NOT (the default is SHOULD).
  • Prefixing with fieldname: restricts matches to a particular field (the default is _all).
  • Placing the term in quotes results in a PhraseQuery.
  • Suffixing a term with ~N performs a FuzzyQuery with edit distance N (default 2).

When we run this example we get:

$ ./query_string_search_schedule
    1 matches, showing 1 through 1, took 10.540776ms
            1. bleve_-_modern_text_indexing_for_go (0.338882)
            description
                …ist of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go applications. This talk will start with a brief introduction to text search …
                    summary
                    bleve - modern text indexing for Go
                    speaker
                    Martin Schoch
                    duration
                        25

Putting it All Together

Bleve also includes a set of optional HTTP handlers. These handlers map all the major bleve operations to HTTP endpoints and assume that your data and index mappings are encoded in JSON documents. By combining the GopherCon India Schedule index with these HTTP handlers it’s very simple to build a web-based search interface.

Here we searched for the term “go”:

image

We can see the search results, including stored fields and snippets for the talk description with matching terms highlighted. Also, on the right-hand side we see two facets, one for the day of the talk, and another for the duration of the talk. By checking these boxes we can easily add/remove filters and drill deeper into the results.

A hosted version of the application is available for you to try out yourself.

Roadmap

Bleve is still very much under active development. However, a very useful set of functionality is already available. We hope to wrap up a few key features and then prepare for a 1.0 stable release:

  • Search result sorting (currently results are sorted only by score)
  • Improved spelling suggest / fuzzy search
  • Performance (so far, focus has been on features and API design)

One More Thing…

In anticipation of GopherCon India we created an initial analyzer for Hindi. It’s still experimental, but the foundation is in place for you to help make it better.

Join the Community

The community around bleve is growing. We can’t accomplish all of our goals for this project ourselves and need help from a community of users interested in improving support for their own languages and search domains. Join us at blevesearch.com!


有疑问加站长微信联系(非本文作者)

本文来自:Sourcegraph Blog

感谢作者:Sourcegraph

查看原文:bleve: a modern search indexing library for Go, with examples

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

9150 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传