| Class | Description |
|---|---|
| LuceneSegmentInputFormat |
InputFormat implementation which splits a Lucene index at the segment level. |
| LuceneSegmentInputSplit |
InputSplit implementation that represents a Lucene segment. |
| LuceneSegmentRecordReader |
RecordReader implementation for Lucene segments. |
| LuceneStorageConfiguration |
Holds all the configuration for
SequenceFilesFromLuceneStorage, which generates a sequence file
with id as the key and a content field as value. |
| MailArchivesClusteringAnalyzer |
Custom Lucene Analyzer designed for aggressive feature reduction
for clustering the ASF Mail Archives using an extended set of
stop words, excluding non-alpha-numeric tokens, and porter stemming.
|
| MultipleTextFileInputFormat |
Used in combining a large number of text files into one text input reader
along with the WholeFileRecordReader class.
|
| PrefixAdditionFilter |
Default parser for parsing text into sequence files.
|
| ReadOnlyFileSystemDirectory |
This class implements a read-only Lucene Directory on top of a general FileSystem.
|
| SequenceFilesFromDirectory |
Converts a directory of text documents into SequenceFiles of Specified chunkSize.
|
| SequenceFilesFromDirectoryFilter |
Implement this interface if you wish to extend SequenceFilesFromDirectory with your own parsing logic.
|
| SequenceFilesFromDirectoryMapper |
Map class for SequenceFilesFromDirectory MR job
|
| SequenceFilesFromLuceneStorage |
Generates a sequence file from a Lucene index with a specified id field as the key and a content field as the value.
|
| SequenceFilesFromLuceneStorageDriver |
Driver class for the lucene2seq program.
|
| SequenceFilesFromLuceneStorageMapper |
Maps document IDs to key value pairs with ID field as the key and the concatenated stored field(s)
as value.
|
| SequenceFilesFromLuceneStorageMRJob |
Generates a sequence file from a Lucene index via MapReduce.
|
| SequenceFilesFromMailArchives |
Converts a directory of gzipped mail archives into SequenceFiles of specified
chunkSize.
|
| SequenceFilesFromMailArchivesMapper |
Map Class for the SequenceFilesFromMailArchives job
|
| TextParagraphSplittingJob | |
| TextParagraphSplittingJob.SplitMap | |
| WholeFileRecordReader |
RecordReader used with the MultipleTextFileInputFormat class to read full files as
k/v pairs and groups of files as single input splits.
|
| WikipediaToSequenceFile |
Create and run the Wikipedia Dataset Creator.
|
| Enum | Description |
|---|---|
| SequenceFilesFromLuceneStorageMapper.DataStatus |
Copyright © 2008–2013 The Apache Software Foundation. All rights reserved.