|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.stanbol.commons.opennlp.OpenNLP
@Service(value=OpenNLP.class) public class OpenNLP
Core of our EnhancementEngine, separated from the OSGi service to make it easier to test this.
| Field Summary | |
|---|---|
protected java.util.Map<java.lang.String,java.lang.Object> |
models
Map holding the already built models TODO: change to use a WeakReferenceMap |
| Constructor Summary | |
|---|---|
OpenNLP()
Default constructor |
|
OpenNLP(DataFileProvider dataFileProvider)
Constructor intended to be used when running outside an OSGI environment (e.g. |
|
| Method Summary | |
|---|---|
opennlp.tools.chunker.ChunkerModel |
getChunkerModel(java.lang.String language)
Getter for the chunker model for the parsed language. |
opennlp.tools.namefind.TokenNameFinderModel |
getNameModel(java.lang.String type,
java.lang.String language)
Getter for the named entity finder model for the parsed entity type and language. |
opennlp.tools.postag.POSModel |
getPartOfSpeachModel(java.lang.String language)
Getter for the "part-of-speach" model for the parsed language. |
opennlp.tools.sentdetect.SentenceModel |
getSentenceModel(java.lang.String language)
Getter for the sentence detection model of the parsed language. |
opennlp.tools.tokenize.Tokenizer |
getTokenizer(java.lang.String language)
Getter for the Tokenizer of a given language. |
opennlp.tools.tokenize.TokenizerModel |
getTokenizerModel(java.lang.String language)
Getter for the tokenizer model for the parsed language. |
protected java.io.InputStream |
lookupModelStream(java.lang.String modelName,
java.util.Map<java.lang.String,java.lang.String> properties)
Lookup an openNLP data file via the dataFileProvider |
protected static java.lang.String |
removeNonUtf8CompliantCharacters(java.lang.String text)
Remove non UTF-8 compliant characters (typically control characters) so has to avoid polluting the annotation graph with snippets that are not serializable as XML. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.util.Map<java.lang.String,java.lang.Object> models
| Constructor Detail |
|---|
public OpenNLP()
public OpenNLP(DataFileProvider dataFileProvider)
dataFileProvider - the dataFileProvider used to load Model data.| Method Detail |
|---|
public opennlp.tools.sentdetect.SentenceModel getSentenceModel(java.lang.String language)
throws opennlp.tools.util.InvalidFormatException,
java.io.IOException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
java.io.IOException - on any error while reading the model data
public opennlp.tools.namefind.TokenNameFinderModel getNameModel(java.lang.String type,
java.lang.String language)
throws opennlp.tools.util.InvalidFormatException,
java.io.IOException
DataFileProvider service.
type - the type of the named entities to find (person, organization)language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
java.io.IOException - on any error while reading the model data
public opennlp.tools.tokenize.TokenizerModel getTokenizerModel(java.lang.String language)
throws opennlp.tools.util.InvalidFormatException,
java.io.IOException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
java.io.IOException - on any error while reading the model datapublic opennlp.tools.tokenize.Tokenizer getTokenizer(java.lang.String language)
TokenizerME instance if the required
TokenizerModel for the parsed language is available. if such a
model is not available it returns the SimpleTokenizer instance.
language - the language or null to build a
SimpleTokenizer
Tokenizer for the parsed language.
public opennlp.tools.postag.POSModel getPartOfSpeachModel(java.lang.String language)
throws java.io.IOException,
opennlp.tools.util.InvalidFormatException
DataFileProvider service.
language - the language
null if no model data are found
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
java.io.IOException - on any error while reading the model data
public opennlp.tools.chunker.ChunkerModel getChunkerModel(java.lang.String language)
throws opennlp.tools.util.InvalidFormatException,
java.io.IOException
DataFileProvider service.
language - the language
null if no model data are present
opennlp.tools.util.InvalidFormatException - in case the found model data are in the wrong format
java.io.IOException - on any error while reading the model data
protected java.io.InputStream lookupModelStream(java.lang.String modelName,
java.util.Map<java.lang.String,java.lang.String> properties)
throws java.io.IOException
dataFileProvider
modelName - the name of the model
null if not found
java.io.IOException - an any error while opening the model fileprotected static java.lang.String removeNonUtf8CompliantCharacters(java.lang.String text)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||