@Service(value=OpenNLP.class) public class OpenNLP extends Object
DataFileProvider infrastructure. This allows users to copy models
to the 'datafiles' directory or developer to provide models via via OSGI
bundles.This service also provides methods that directly return the OpenNLP component wrapping the model.
| Modifier and Type | Field and Description |
|---|---|
protected Map<String,Object> |
models
Map holding the already built models
TODO: change to use a WeakReferenceMap
|
| Constructor and Description |
|---|
OpenNLP()
Default constructor
|
OpenNLP(org.apache.stanbol.commons.stanboltools.datafileprovider.DataFileProvider dataFileProvider)
Constructor intended to be used when running outside an OSGI environment
(e.g. when used for UnitTests)
|
| Modifier and Type | Method and Description |
|---|---|
opennlp.tools.chunker.Chunker |
getChunker(String language)
Getter for the
Chunker for a given language |
opennlp.tools.chunker.ChunkerModel |
getChunkerModel(String language)
Getter for the chunker model for the parsed language.
|
<T> T |
getModel(Class<T> modelType,
String modelName,
Map<String,String> properties)
Getter for the Model with the parsed type, name and properties.
|
opennlp.tools.namefind.TokenNameFinder |
getNameFinder(String type,
String language)
Getter for the
TokenNameFinder for the parsed entity type and language. |
opennlp.tools.namefind.TokenNameFinderModel |
getNameModel(String type,
String language)
Getter for the named entity finder model for the parsed entity type and language.
|
opennlp.tools.postag.POSModel |
getPartOfSpeachModel(String language)
Getter for the "part-of-speech" model for the parsed language.
|
opennlp.tools.postag.POSTagger |
getPartOfSpeechTagger(String language)
Getter for the "part-of-speech" tagger for the parsed language.
|
opennlp.tools.sentdetect.SentenceDetector |
getSentenceDetector(String language)
Getter for the sentence detector of the parsed language.
|
opennlp.tools.sentdetect.SentenceModel |
getSentenceModel(String language)
Getter for the sentence detection model of the parsed language.
|
opennlp.tools.tokenize.Tokenizer |
getTokenizer(String language)
Getter for the Tokenizer of a given language.
|
opennlp.tools.tokenize.TokenizerModel |
getTokenizerModel(String language)
Getter for the tokenizer model for the parsed language.
|
protected InputStream |
lookupModelStream(String modelName,
Map<String,String> properties)
Lookup an openNLP data file via the
dataFileProvider |
protected static String |
removeNonUtf8CompliantCharacters(String text)
Remove non UTF-8 compliant characters (typically control characters) so has to avoid polluting the
annotation graph with snippets that are not serializable as XML.
|
public OpenNLP()
public OpenNLP(org.apache.stanbol.commons.stanboltools.datafileprovider.DataFileProvider dataFileProvider)
dataFileProvider - the dataFileProvider used to load Model data.public opennlp.tools.sentdetect.SentenceModel getSentenceModel(String language) throws opennlp.tools.util.InvalidFormatException, IOException
DataFileProvider service.language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.sentdetect.SentenceDetector getSentenceDetector(String language) throws IOException
language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.namefind.TokenNameFinderModel getNameModel(String type, String language) throws opennlp.tools.util.InvalidFormatException, IOException
DataFileProvider service.type - the type of the named entities to find (person, organization)language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.namefind.TokenNameFinder getNameFinder(String type, String language) throws IOException
TokenNameFinder for the parsed entity type and language.type - the type of the named entities to find (person, organization)language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.tokenize.TokenizerModel getTokenizerModel(String language) throws opennlp.tools.util.InvalidFormatException, IOException
DataFileProvider service.language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.tokenize.Tokenizer getTokenizer(String language)
TokenizerME instance if the required
TokenizerModel for the parsed language is available. if such a
model is not available it returns the SimpleTokenizer instance.language - the language or null to build a
SimpleTokenizerTokenizer for the parsed language.public opennlp.tools.postag.POSModel getPartOfSpeachModel(String language) throws IOException, opennlp.tools.util.InvalidFormatException
DataFileProvider service.language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.postag.POSTagger getPartOfSpeechTagger(String language) throws IOException
language - the languagenull if no model data are foundopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic <T> T getModel(Class<T> modelType, String modelName, Map<String,String> properties) throws opennlp.tools.util.InvalidFormatException, IOException
modelType - the type of the Model (e.g. ChunkerModel)modelName - the name of the model file. MUST BE available via the
DataFileProvider.properties - additional properties about the model (parsed to the
DataFileProvider. NOTE that "Description", "Model Type" and
"Download Location" are set to default values if not defined in the
parsed value.opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.chunker.ChunkerModel getChunkerModel(String language) throws opennlp.tools.util.InvalidFormatException, IOException
DataFileProvider service.language - the languagenull if no model data are presentopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model datapublic opennlp.tools.chunker.Chunker getChunker(String language) throws IOException
Chunker for a given languagelanguage - the languageChunker or null if no model is presentopennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong formatIOException - on any error while reading the model dataprotected InputStream lookupModelStream(String modelName, Map<String,String> properties) throws IOException
dataFileProvidermodelName - the name of the modelnull if not foundIOException - an any error while opening the model fileCopyright © 2010-2014 The Apache Software Foundation. All Rights Reserved.