|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.stanbol.commons.opennlp.OpenNLP
@Service(value=OpenNLP.class) public class OpenNLP
OSGI service that let you load OpenNLP Models via the Stanbol
DataFileProvider infrastructure. This allows users to copy models
to the 'datafiles' directory or developer to provide models via via OSGI
bundles.
This service also provides methods that directly return the OpenNLP component wrapping the model.
| Field Summary | |
|---|---|
protected Map<String,Object> |
models
Map holding the already built models TODO: change to use a WeakReferenceMap |
| Constructor Summary | |
|---|---|
OpenNLP()
Default constructor |
|
OpenNLP(org.apache.stanbol.commons.stanboltools.datafileprovider.DataFileProvider dataFileProvider)
Constructor intended to be used when running outside an OSGI environment (e.g. |
|
| Method Summary | ||
|---|---|---|
opennlp.tools.chunker.Chunker |
getChunker(String language)
Getter for the Chunker for a given language |
|
opennlp.tools.chunker.ChunkerModel |
getChunkerModel(String language)
Getter for the chunker model for the parsed language. |
|
|
getModel(Class<T> modelType,
String modelName,
Map<String,String> properties)
Getter for the Model with the parsed type, name and properties. |
|
opennlp.tools.namefind.TokenNameFinder |
getNameFinder(String type,
String language)
Getter for the TokenNameFinder for the parsed entity type and language. |
|
opennlp.tools.namefind.TokenNameFinderModel |
getNameModel(String type,
String language)
Getter for the named entity finder model for the parsed entity type and language. |
|
opennlp.tools.postag.POSModel |
getPartOfSpeachModel(String language)
Getter for the "part-of-speech" model for the parsed language. |
|
opennlp.tools.postag.POSTagger |
getPartOfSpeechTagger(String language)
Getter for the "part-of-speech" tagger for the parsed language. |
|
opennlp.tools.sentdetect.SentenceDetector |
getSentenceDetector(String language)
Getter for the sentence detector of the parsed language. |
|
opennlp.tools.sentdetect.SentenceModel |
getSentenceModel(String language)
Getter for the sentence detection model of the parsed language. |
|
opennlp.tools.tokenize.Tokenizer |
getTokenizer(String language)
Getter for the Tokenizer of a given language. |
|
opennlp.tools.tokenize.TokenizerModel |
getTokenizerModel(String language)
Getter for the tokenizer model for the parsed language. |
|
protected InputStream |
lookupModelStream(String modelName,
Map<String,String> properties)
Lookup an openNLP data file via the dataFileProvider |
|
protected static String |
removeNonUtf8CompliantCharacters(String text)
Remove non UTF-8 compliant characters (typically control characters) so has to avoid polluting the annotation graph with snippets that are not serializable as XML. |
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected Map<String,Object> models
| Constructor Detail |
|---|
public OpenNLP()
public OpenNLP(org.apache.stanbol.commons.stanboltools.datafileprovider.DataFileProvider dataFileProvider)
dataFileProvider - the dataFileProvider used to load Model data.| Method Detail |
|---|
public opennlp.tools.sentdetect.SentenceModel getSentenceModel(String language)
throws opennlp.tools.util.InvalidFormatException,
IOException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.sentdetect.SentenceDetector getSentenceDetector(String language)
throws IOException
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.namefind.TokenNameFinderModel getNameModel(String type,
String language)
throws opennlp.tools.util.InvalidFormatException,
IOException
DataFileProvider service.
type - the type of the named entities to find (person, organization)language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.namefind.TokenNameFinder getNameFinder(String type,
String language)
throws IOException
TokenNameFinder for the parsed entity type and language.
type - the type of the named entities to find (person, organization)language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.tokenize.TokenizerModel getTokenizerModel(String language)
throws opennlp.tools.util.InvalidFormatException,
IOException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model datapublic opennlp.tools.tokenize.Tokenizer getTokenizer(String language)
TokenizerME instance if the required
TokenizerModel for the parsed language is available. if such a
model is not available it returns the SimpleTokenizer instance.
language - the language or null to build a
SimpleTokenizer
Tokenizer for the parsed language.
public opennlp.tools.postag.POSModel getPartOfSpeachModel(String language)
throws IOException,
opennlp.tools.util.InvalidFormatException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.postag.POSTagger getPartOfSpeechTagger(String language)
throws IOException
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public <T> T getModel(Class<T> modelType,
String modelName,
Map<String,String> properties)
throws opennlp.tools.util.InvalidFormatException,
IOException
modelType - the type of the Model (e.g. ChunkerModel)modelName - the name of the model file. MUST BE available via the
DataFileProvider.properties - additional properties about the model (parsed to the
DataFileProvider. NOTE that "Description", "Model Type" and
"Download Location" are set to default values if not defined in the
parsed value.
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.chunker.ChunkerModel getChunkerModel(String language)
throws opennlp.tools.util.InvalidFormatException,
IOException
DataFileProvider service.
language - the language
null if no model data are present
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
public opennlp.tools.chunker.Chunker getChunker(String language)
throws IOException
Chunker for a given language
language - the language
Chunker or null if no model is present
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
IOException - on any error while reading the model data
protected InputStream lookupModelStream(String modelName,
Map<String,String> properties)
throws IOException
dataFileProvider
modelName - the name of the model
null if not found
IOException - an any error while opening the model fileprotected static String removeNonUtf8CompliantCharacters(String text)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||