A C D E G H M O P R S T X

A

AbstractTextExtractor - Class in org.apache.jackrabbit.extractor
Base class for text extractor implementations.
AbstractTextExtractor(String[]) - Constructor for class org.apache.jackrabbit.extractor.AbstractTextExtractor
 
addTextExtractor(TextExtractor) - Method in class org.apache.jackrabbit.extractor.CompositeTextExtractor
Adds a component text extractor.

C

characters(XMLString, Augmentations) - Method in class org.apache.jackrabbit.extractor.HTMLParser
 
CompositeTextExtractor - Class in org.apache.jackrabbit.extractor
Composite text extractor.
CompositeTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.CompositeTextExtractor
 

D

DefaultTextExtractor - Class in org.apache.jackrabbit.extractor
Composite text extractor that by default contains the standard text extractors found in this package.
DefaultTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.DefaultTextExtractor
Creates the default text extractor by adding instances of the standard text extractors as components.
DelegatingTextExtractor - Interface in org.apache.jackrabbit.extractor
Interface for text extractors that need to delegate the extraction of parts of content documents to another text extractor.

E

EmptyTextExtractor - Class in org.apache.jackrabbit.extractor
Dummy text extractor that always returns and empty reader for all documents.
EmptyTextExtractor(String[]) - Constructor for class org.apache.jackrabbit.extractor.EmptyTextExtractor
Creates a dummy text extractor for the given content types.
EmptyTextExtractor(String) - Constructor for class org.apache.jackrabbit.extractor.EmptyTextExtractor
Creates a dummy text extractor for the given content type.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.CompositeTextExtractor
Extracts text content using one of the component extractors.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.EmptyTextExtractor
Closes the given stream and returns an empty reader.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.HTMLTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.MsExcelTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.MsOutlookTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.MsWordTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.PdfTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.PlainTextExtractor
Wraps the given input stream to an InputStreamReader using the given encoding, or the platform default encoding if the encoding is not given or is unsupported.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.PngTextExtractor
Returns a reader for the text content of the given png image.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.RTFTextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in interface org.apache.jackrabbit.extractor.TextExtractor
Returns a reader for the text content of the given binary document.
extractText(InputStream, String, String) - Method in class org.apache.jackrabbit.extractor.XMLTextExtractor
Returns a reader for the text content of the given XML document.

G

getContents() - Method in class org.apache.jackrabbit.extractor.HTMLParser
Returns parsed content
getContentTypes() - Method in class org.apache.jackrabbit.extractor.AbstractTextExtractor
 
getContentTypes() - Method in class org.apache.jackrabbit.extractor.CompositeTextExtractor
Returns all the content types supported by the component extractors.
getContentTypes() - Method in class org.apache.jackrabbit.extractor.EmptyTextExtractor
Returns the supported content types.
getContentTypes() - Method in interface org.apache.jackrabbit.extractor.TextExtractor
Returns the MIME types supported by this extractor.

H

HTMLParser - Class in org.apache.jackrabbit.extractor
Helper class for HTML parsing
HTMLParser() - Constructor for class org.apache.jackrabbit.extractor.HTMLParser
 
HTMLTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for HyperText Markup Language (HTML).
HTMLTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.HTMLTextExtractor
Creates a new HTMLTextExtractor instance.

M

MsExcelTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Microsoft Excel sheets.
MsExcelTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.MsExcelTextExtractor
Creates a new MsExcelTextExtractor instance.
MsOutlookTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Microsoft Outlook messages.
MsOutlookTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.MsOutlookTextExtractor
Creates a new MsOutlookTextExtractor instance.
MsPowerPointTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Microsoft PowerPoint presentations.
MsPowerPointTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.MsPowerPointTextExtractor
Creates a new MsPowerPointTextExtractor instance.
MsWordTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Microsoft Word documents.
MsWordTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.MsWordTextExtractor
Creates a new MsWordTextExtractor instance.

O

OpenOfficeTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for OpenOffice documents.
OpenOfficeTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.OpenOfficeTextExtractor
Creates a new OpenOfficeTextExtractor instance.
org.apache.jackrabbit.extractor - package org.apache.jackrabbit.extractor
 

P

PdfTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Portable Document Format (PDF).
PdfTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.PdfTextExtractor
Creates a new PdfTextExtractor instance.
PlainTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for plain text.
PlainTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.PlainTextExtractor
Creates a new PlainTextExtractor instance.
PngTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for png/apng/mng images.
PngTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.PngTextExtractor
Creates a new PngTextExtractor instance.

R

RTFTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for Rich Text Format (RTF)
RTFTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.RTFTextExtractor
Creates a new RTFTextExtractor instance.

S

setDelegateTextExtractor(TextExtractor) - Method in interface org.apache.jackrabbit.extractor.DelegatingTextExtractor
Sets the text textractor to which this extractor should delegate any partial text extraction tasks.
startDocument(XMLLocator, String, NamespaceContext, Augmentations) - Method in class org.apache.jackrabbit.extractor.HTMLParser
 

T

TextExtractor - Interface in org.apache.jackrabbit.extractor
Interface for extracting text content from binary streams.

X

XMLTextExtractor - Class in org.apache.jackrabbit.extractor
Text extractor for XML documents.
XMLTextExtractor() - Constructor for class org.apache.jackrabbit.extractor.XMLTextExtractor
Creates a new XMLTextExtractor instance.

A C D E G H M O P R S T X

Copyright © 2004-2011 The Apache Software Foundation. All Rights Reserved.