|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.clerezza.uima.metadatagenerator.mediatype.TikaTextExtractor
public class TikaTextExtractor
An implementation based on Apache Tika.
| Constructor Summary | |
|---|---|
TikaTextExtractor()
Construct an instance using the default Tika configuration. |
|
TikaTextExtractor(String tikaConfigPath)
Construct an instance using a custom tika-config.xml configuration file. |
|
| Method Summary | |
|---|---|
String |
extract(byte[] bytes)
Extract the text from the provided input if its Media Type is supported. |
boolean |
supports(javax.ws.rs.core.MediaType mediaType)
Check if the provided MediaType is supported by this extractor. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public TikaTextExtractor()
Tika configuration.
public TikaTextExtractor(String tikaConfigPath)
tikaConfigPath - the path to the tika-config.xml configuration file.| Method Detail |
|---|
public boolean supports(javax.ws.rs.core.MediaType mediaType)
MediaType is supported by this extractor.
supports in interface MediaTypeTextExtractormediaType - to be checked.
true if the provided MediaType as input is supported.
public String extract(byte[] bytes)
throws UnsupportedMediaTypeException
extract in interface MediaTypeTextExtractorbytes - an array of byte representing the input.
String with the extracted text.
UnsupportedMediaTypeException - if the input implicit Media type is not supported.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||