Marty Schoch (@mschoch) is an engineer at Couchbase, maker of the high-performance NoSQL distributed database of the same name. Working with Go for almost 2 and half years, he has been using it to prototype new solutions at Couchbase. This talk introduces bleve, a text search indexing library for Go. The slides for this talk have been posted here.
Bleve (pronounced BLEH-vee) is a modern text indexing library written for Go. It supports a variety of features commonly found in search indexers, including filtering, ranking, and faceting.
When you say “search index,” the names that come to mind are typically Lucene, Elasticsearch, and Solr. Those systems are great, especially if you’re already using Java and the JVM, but sometimes you don’t want to pull in that dependency or you want to avoid standing up yet another external service that can complicate deployment.
The Couchbase team wondered, How easy would it be to build a Go library that supported the most commonly used text analysis components of Lucene and that could use an off-the-shelf key-value (KV) store as its underlying data store? Thus, bleve was born.
Here were some of the key points of how they approached building bleve:
- They initially focused on the most commonly used text analysis components of Lucene.
- Go interfaces allow users to fill in the gaps with components for their own specific languages and domains.
- They avoided coming up with a new custom file format. There are many interesting KV stores on the market that can serve as underlying data stores. They currently support LevelDB, Bolt, and ForestDB.
Here’s the set of features bleve currently supports:
- You can index any Go structure (strings, numeric values, and dates are supported).
- Search: Term, Phrase, Match, MatchPhrase, Boolean, Fuzzy, Numeric Range, Date Range.
- Search results with TF/IDF scoring, contextual snippets, and term highlighting.
- Search result faceting (by term, numeric value, and date).
Getting Started
Installing bleve is easy. Just use the go get
command:
$ go get github.com/blevesearch/bleve/...
Including the trailing /...
will also install some helpful command-line utilities.
In just 26 lines of code, we can create our first index:
The mapping
is a default Index Mapping. The Index Mapping is responsible for describing how your documents should be mapped into the index. The default mapping is designed to work well out of the box, but you’ll want to revisit this to improve the quality of your search results.
The call to the New()
function takes two parameters. The first is the path to a directory where the index will be stored and the second is the mapping
to be used for this index.
The call to the Index()
method takes two parameters. The first is a unique identifier for the document, and the second is the document (a Go struct) to be indexed.
Now that we’ve created an index, we want to open it and search:
The call to the Open()
function only takes a single parameter, the path to the index. The mapping is not needed, as it was serialized into the index at the time of creation.
The query
describes what we’re looking for. In this case, it is a TermQuery, the simplest kind of query. Term queries look for an exact match of the specified term in the index.
The request
describes how the results should be returned. It can control how many results are returned, and whether or not stored fields or facets should also be returned. In this case we use a default request, which will return the first 10 matching documents.
When we run this example we get:
$ ./search_index
1 matches, showing 1 through 1, took 70.722µs
1. m1 (0.216978)
This shows the one document we put into the index does match this query.
Indexing Real World Data
To see more of the features in action, let’s index the GopherCon India schedule. We’ll map the data into the structure below:
Now let’s try a more interesting search. This time we’ll do a phrase search for “quality search results”.
When we run this example we get:
$ ./phrase_search_schedule
1 matches, showing 1 through 1, took 1.73394ms
1. bleve_-_modern_text_indexing_for_go (1.033644)
description
…earch component. But delivering high quality search results requires a long list of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go…
summary
bleve - modern text indexing for Go
speaker
Martin Schoch
Now let’s try one more example. So far all the queries we’ve executed have been built programmatically, but sometimes it’s useful to allow end-users to build their own queries. To do this we use a QueryStringQuery
:
This particular QueryString shows many options in use:
- Prefixing with
+
or-
changes that clause to a MUST or MUST NOT (the default is SHOULD). - Prefixing with
fieldname:
restricts matches to a particular field (the default is _all). - Placing the term in quotes results in a
PhraseQuery
. - Suffixing a term with
~N
performs aFuzzyQuery
with edit distance N (default 2).
When we run this example we get:
$ ./query_string_search_schedule
1 matches, showing 1 through 1, took 10.540776ms
1. bleve_-_modern_text_indexing_for_go (0.338882)
description
…ist of text analysis and indexing techniques. With the bleve library, we bring advanced text indexing and search to your Go applications. This talk will start with a brief introduction to text search …
summary
bleve - modern text indexing for Go
speaker
Martin Schoch
duration
25
Putting it All Together
Bleve also includes a set of optional HTTP handlers. These handlers map all the major bleve operations to HTTP endpoints and assume that your data and index mappings are encoded in JSON documents. By combining the GopherCon India Schedule index with these HTTP handlers it’s very simple to build a web-based search interface.
Here we searched for the term “go”:
We can see the search results, including stored fields and snippets for the talk description with matching terms highlighted. Also, on the right-hand side we see two facets, one for the day of the talk, and another for the duration of the talk. By checking these boxes we can easily add/remove filters and drill deeper into the results.
A hosted version of the application is available for you to try out yourself.
Roadmap
Bleve is still very much under active development. However, a very useful set of functionality is already available. We hope to wrap up a few key features and then prepare for a 1.0 stable release:
- Search result sorting (currently results are sorted only by score)
- Improved spelling suggest / fuzzy search
- Performance (so far, focus has been on features and API design)
One More Thing…
In anticipation of GopherCon India we created an initial analyzer for Hindi. It’s still experimental, but the foundation is in place for you to help make it better.
Join the Community
The community around bleve is growing. We can’t accomplish all of our goals for this project ourselves and need help from a community of users interested in improving support for their own languages and search domains. Join us at blevesearch.com!
有疑问加站长微信联系(非本文作者)
本文来自:Sourcegraph Blog
感谢作者:Sourcegraph
查看原文:bleve: a modern search indexing library for Go, with examples