Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Tuesday, September 24, 2013

Hibernate Search based Autocomplete Suggester

In this article, I will show how to implement auto-completion using Hibernate Search.

The same can be achieved using Solr or ElasticSearch. But I decided to use Hibernate Search as its the simplest to get started with, easily integrates with an existing application and leverages the same core - Lucene. And we get all of this without the overhead of managing Solr/ElasticSearch cluster. In all, I found Hibernate Search to be the go-to search engine for simple use cases.

For our use case, we build a product title based auto-completion where often, the user queries are searches for product title. While typing, users should immediately see titles matching their requests, and Hibernate Search should do the hard work to filter the relevant documents in near real-time.

Lets have the following JPA annotated Product entity class.

public class Product {

  @Id
 @Column(name = "sku")
 private String sku;

  @Column(name = "upc")
 private String upc;

  @Column(name = "title")
 private String title;

....
}

We are interested in returning suggestions based on the 'title' field. Title will be indexed based on 2 strategies - N-Gram and Edge N-Gram.

Edge N-Gram - This will match only from the left edge of the suggestion text. For this we use KeywordTokenizerFactory (emits the entire input as a single token)  and EdgeNGramFilterFactory along with some regex cleansing.

N-Gram matches from the start of every word, so that you can get right-truncated suggestions for any word in the text, not only from the first word. The main difference from N-gram is the tokenizer which is StandardTokenizerFactory along with NGramFilterFactory.

Using these strategies, if the document field is "A brown fox" and the query is
a) "A bro"- Will match
b) "bro" - Will match

Implementation: In the entity defined above, we can map 'title' property twice with the above strategies. Below are the annotations to instruct Hibernate to index 'title' twice.


@Entity
@Table(name = "item_master")
@Indexed(index = "Products")
@AnalyzerDefs({

@AnalyzerDef(name = "autocompleteEdgeAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") }),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = StopFilterFactory.class),
 // Index partial words starting at the front, so we can provide
 // Autocomplete functionality
 @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "50") }) }),

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = NGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "5") }),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}),

@AnalyzerDef(name = "standardAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}) // Def
})
public class Product {

....
}


Explanation: 2 custom analyzers - autocompleteEdgeAnalyzer and autocompleteNGramAnalyzer have been defined as per theory in the previous section. Next, we apply these analyzers on the 'title' field to create 2 different indexes. Here is how we do it:


@Column(name = "title")
@Fields({
  @Field(name = "title", index = Index.YES, store = Store.YES,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
  @Field(name = "edgeNGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
  @Field(name = "nGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
})
private String title;

Start indexing:

 public void index() throws InterruptedException {
  getFullTextSession().createIndexer().startAndWait();
 }

Once indexed, inspect the index using Luke and you should be able to see title analyzed and stored as N-Grams and Edge N-Grams.

Search Query:

 private static final String TITLE_EDGE_NGRAM_INDEX = "edgeNGramTitle";
 private static final String TITLE_NGRAM_INDEX = "nGramTitle";

 @Transactional(readOnly = true)
 public synchronized List getSuggestions(final String searchTerm) {

 QueryBuilder titleQB = getFullTextSession().getSearchFactory()
   .buildQueryBuilder().forEntity(Product.class).get();

 Query query = titleQB.phrase().withSlop(2).onField(TITLE_NGRAM_INDEX)
   .andField(TITLE_EDGE_NGRAM_INDEX).boostedTo(5)
   .sentence(searchTerm.toLowerCase()).createQuery();

 FullTextQuery fullTextQuery = getFullTextSession().createFullTextQuery(
    query, Product.class);
 fullTextQuery.setMaxResults(20);

 @SuppressWarnings("unchecked")
 List<product> results = fullTextQuery.list();
 return results;
}

And we have a working suggester.

What next? Expose the functionality via a REST API and integrate it with jQuery, examples of which can be easily found.

You can also use the same strategies with Solr and ElasticSearch.

5 comments:

  1. Thanks for the example, I'm wondering how I would go about using something like this with fuzzy and or maybe wildcard. Example title "hibernate search based auto completer" could also be returned as "Auto copmleter hibernate search" or "Search hibernate"

    ReplyDelete
  2. Fuzzy and wildcard searches are supported. Look @ http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html/search-query.html#search-query-querydsl

    ReplyDelete
  3. Hi Do you have the sources? My queries keep returning empty results even if the index is ok.

    ReplyDelete
  4. Hi do you have the sources ? can't seem to get it to work.

    ReplyDelete
  5. Hi chandra,

    while using analyzers hibernate search does not performing INDEXING , Default indexing (without analyzers) is ok. Coud you help me?

    my configuration on Entity is
    @Field(index=Index.TOKENIZED,analyzer=@Analyzer(definition="standardanalyzer"),store=Store.YES)

    and my Indexing code is

    @Transactional(propagation=Propagation.REQUIRED,isolation=Isolation.REPEATABLE_READ)
    public void indexCategoryMapping()throws HibernateException,NullPointerException,ParseException,
    Exception{
    if(index){
    logger.info("Indexing CategoryMapping Database.......");
    Session session=null;
    FullTextSession fullTextSession=null;

    //Getting current session object.
    session=sessionFactory.getCurrentSession();

    if(session==null){
    logger.info("session is null in IndexCategoryMapping");
    throw new NullPointerException();
    }

    //Creating 'fullTextSession' Object.
    fullTextSession = Search.getFullTextSession(session);

    if(fullTextSession==null){
    logger.info("fullTextSession is null in IndexCategoryMapping");
    throw new NullPointerException();
    }

    @SuppressWarnings("unchecked")
    List categories = session.createQuery("from CategoryMapping as cm").list();

    for (CategoryMapping cm : categories) {
    fullTextSession.index(cm);
    }
    }


    and finally my Analyzer definition is

    @AnalyzerDef(name="standardanalyzer",
    tokenizer =
    @TokenizerDef(factory = StandardTokenizerFactory.class ),
    filters = {
    @TokenFilterDef(factory = StandardFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = StopFilterFactory.class) } )



    Thanks
    raju

    ReplyDelete