Knowledge:Lucene configuration

From OpenKM Documentation
Revision as of 14:51, 17 October 2012 by Pavila (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Lucene case sensitive & insensitive search

Lucene search is case-sensitive, but all input is usually lowercased when passing through QueryParser, so it feels like it is case insensitive (This is the case of the findBySimpleQuery() method.

In other words, don't lowercase your input before indexing, and don't lowercase your queries. For this, pick an Analyzer that does not lowercase like KeywordAnalyzer.

Are Wildcard, Prefix, and Fuzzy queries case sensitive?

Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer, which is the component that performs operations such as stemming and lowercasing. The reason for skipping the Analyzer is that if you were searching for "dogs*" you would not want "dogs" first stemmed to "dog", since that would then match "dog*", which is not the intended query. These queries are case-insensitive anyway because QueryParser makes them lowercase. This behavior can be changed using the setLowercaseExpandedTerms(boolean) method.

Configuration test

Field: to
Index: Index.UN_TOKENIZED
Content: "OKM Paco Avila"
Search: "OKM" -> NADA
Search: "OKM*" -> OK
Search: "okm" -> NADA
Search: "okm*" -> NADA
Field: to
Index: Index.TOKENIZED
Content: "OKM Paco Avila"
Search: "OKM" -> NADA
Search: "OKM*" -> NADA
Search: "okm" -> OK
Search: "okm*" -> OK