Package org.apache.wiki.search
Class LuceneSearchProvider
- java.lang.Object
-
- org.apache.wiki.search.LuceneSearchProvider
-
- All Implemented Interfaces:
WikiProvider
,SearchProvider
- Direct Known Subclasses:
TikaSearchProvider
public class LuceneSearchProvider extends java.lang.Object implements SearchProvider
Interface for the search providers that handle searching the Wiki- Since:
- 2.2.21.
-
-
Field Summary
Fields Modifier and Type Field Description static int
FLAG_CONTEXTS
Create contexts also.protected static org.apache.logging.log4j.Logger
log
protected static java.lang.String
LUCENE_ATTACHMENTS
protected static java.lang.String
LUCENE_AUTHOR
protected static java.lang.String
LUCENE_ID
protected static java.lang.String
LUCENE_PAGE_CONTENTS
protected static java.lang.String
LUCENE_PAGE_KEYWORDS
protected static java.lang.String
LUCENE_PAGE_NAME
protected java.util.List<java.lang.Object[]>
m_updates
static int
MAX_SEARCH_HITS
The maximum number of hits to return from searches.static java.lang.String
PROP_LUCENE_ANALYZER
Which analyzer to use.static java.lang.String[]
SEARCHABLE_FILE_SUFFIXES
These attachment file suffixes will be indexed.-
Fields inherited from interface org.apache.wiki.api.providers.WikiProvider
LATEST_VERSION
-
-
Constructor Summary
Constructors Constructor Description LuceneSearchProvider()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
doFullLuceneReindex()
Performs a full Lucene reindex, if necessary.java.util.Collection<SearchResult>
findPages(java.lang.String query, int flags, Context wikiContext)
Searches pages using a particular combination of flags.java.util.Collection<SearchResult>
findPages(java.lang.String query, Context wikiContext)
Search for pages matching a search query.protected java.lang.String
getAttachmentContent(java.lang.String attachmentName, int version)
Fetches the attachment content from the repository.protected java.lang.String
getAttachmentContent(Attachment att)
protected Engine
getEngine()
Returns the handling engine.java.lang.String
getProviderInfo()
Return a valid HTML string for information.void
initialize(Engine engine, java.util.Properties props)
Initializes the page provider.protected org.apache.lucene.document.Document
luceneIndexPage(Page page, java.lang.String text, org.apache.lucene.index.IndexWriter writer)
Indexes page using the given IndexWriter.void
pageRemoved(Page page)
Delete a page from the search index.void
reindexPage(Page page)
Adds a page-text pair to the lucene update queue.protected void
updateLuceneIndex(Page page, java.lang.String text)
Updates the lucene index for a single page.
-
-
-
Field Detail
-
log
protected static final org.apache.logging.log4j.Logger log
-
PROP_LUCENE_ANALYZER
public static final java.lang.String PROP_LUCENE_ANALYZER
Which analyzer to use. Default is StandardAnalyzer.- See Also:
- Constant Field Values
-
SEARCHABLE_FILE_SUFFIXES
public static final java.lang.String[] SEARCHABLE_FILE_SUFFIXES
These attachment file suffixes will be indexed.
-
LUCENE_ID
protected static final java.lang.String LUCENE_ID
- See Also:
- Constant Field Values
-
LUCENE_PAGE_CONTENTS
protected static final java.lang.String LUCENE_PAGE_CONTENTS
- See Also:
- Constant Field Values
-
LUCENE_AUTHOR
protected static final java.lang.String LUCENE_AUTHOR
- See Also:
- Constant Field Values
-
LUCENE_ATTACHMENTS
protected static final java.lang.String LUCENE_ATTACHMENTS
- See Also:
- Constant Field Values
-
LUCENE_PAGE_NAME
protected static final java.lang.String LUCENE_PAGE_NAME
- See Also:
- Constant Field Values
-
LUCENE_PAGE_KEYWORDS
protected static final java.lang.String LUCENE_PAGE_KEYWORDS
- See Also:
- Constant Field Values
-
m_updates
protected final java.util.List<java.lang.Object[]> m_updates
-
MAX_SEARCH_HITS
public static final int MAX_SEARCH_HITS
The maximum number of hits to return from searches.- See Also:
- Constant Field Values
-
FLAG_CONTEXTS
public static final int FLAG_CONTEXTS
Create contexts also. Generating contexts can be expensive, so they're not on by default.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
LuceneSearchProvider
public LuceneSearchProvider()
-
-
Method Detail
-
initialize
public void initialize(Engine engine, java.util.Properties props) throws NoRequiredPropertyException, java.io.IOException
Initializes the page provider.- Specified by:
initialize
in interfaceWikiProvider
- Parameters:
engine
- Engine to own this providerprops
- A set of properties used to initialize this provider- Throws:
NoRequiredPropertyException
- If the provider needs a property which is not found in the property setjava.io.IOException
- If there is an IO problem
-
doFullLuceneReindex
protected void doFullLuceneReindex() throws java.io.IOException
Performs a full Lucene reindex, if necessary.- Throws:
java.io.IOException
- If there's a problem during indexing
-
getAttachmentContent
protected java.lang.String getAttachmentContent(java.lang.String attachmentName, int version)
Fetches the attachment content from the repository. Content is flat text that can be used for indexing/searching or display- Parameters:
attachmentName
- Name of the attachment.version
- The version of the attachment.- Returns:
- the content of the Attachment as a String.
-
getAttachmentContent
protected java.lang.String getAttachmentContent(Attachment att)
- Parameters:
att
- Attachment to get content for. Filename extension is used to determine the type of the attachment.- Returns:
- String representing the content of the file. FIXME This is a very simple implementation of some text-based attachment, mainly used for testing. This should be replaced /moved to Attachment search providers or some other 'pluggable' way to search attachments
-
updateLuceneIndex
protected void updateLuceneIndex(Page page, java.lang.String text)
Updates the lucene index for a single page.- Parameters:
page
- The WikiPage to checktext
- The page text to index.
-
luceneIndexPage
protected org.apache.lucene.document.Document luceneIndexPage(Page page, java.lang.String text, org.apache.lucene.index.IndexWriter writer) throws java.io.IOException
Indexes page using the given IndexWriter.- Parameters:
page
- WikiPagetext
- Page text to indexwriter
- The Lucene IndexWriter to use for indexing- Returns:
- the created index Document
- Throws:
java.io.IOException
- If there's an indexing problem
-
pageRemoved
public void pageRemoved(Page page)
Delete a page from the search index.- Specified by:
pageRemoved
in interfaceSearchProvider
- Parameters:
page
- Page to remove from search index.
-
reindexPage
public void reindexPage(Page page)
Adds a page-text pair to the lucene update queue. Safe to call always- Specified by:
reindexPage
in interfaceSearchProvider
- Parameters:
page
- WikiPage to add to the update queue.
-
findPages
public java.util.Collection<SearchResult> findPages(java.lang.String query, Context wikiContext) throws ProviderException
Search for pages matching a search query.- Specified by:
findPages
in interfaceSearchProvider
- Parameters:
query
- query to search forwikiContext
- the context within which to run the search- Returns:
- collection of pages that match query
- Throws:
ProviderException
- if the search provider failed.
-
findPages
public java.util.Collection<SearchResult> findPages(java.lang.String query, int flags, Context wikiContext) throws ProviderException
Searches pages using a particular combination of flags.- Parameters:
query
- The query to perform in Lucene query languageflags
- A set of flags- Returns:
- A Collection of SearchResult instances
- Throws:
ProviderException
- if there is a problem with the backend
-
getProviderInfo
public java.lang.String getProviderInfo()
Return a valid HTML string for information. May be anything.- Specified by:
getProviderInfo
in interfaceWikiProvider
- Returns:
- A string describing the provider.
-
-