org.apache.hadoop.examples.terasort
Class TeraInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
org.apache.hadoop.examples.terasort.TeraInputFormat
public class TeraInputFormat
- extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
An input format that reads the first 10 characters of each line as the key
and the rest of the line as the value. Both key and value are represented
as Text.
| Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
INPUT_DIR, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE |
|
Method Summary |
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job)
|
protected org.apache.hadoop.mapreduce.lib.input.FileSplit |
makeSplit(org.apache.hadoop.fs.Path file,
long start,
long length,
String[] hosts)
|
static void |
writePartitionFile(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path partFile)
Use the input splits to take samples of the input and generate sample
keys. |
| Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TeraInputFormat
public TeraInputFormat()
writePartitionFile
public static void writePartitionFile(org.apache.hadoop.mapreduce.JobContext job,
org.apache.hadoop.fs.Path partFile)
throws Throwable
- Use the input splits to take samples of the input and generate sample
keys. By default reads 100,000 keys from 10 locations in the input, sorts
them and picks N-1 keys to generate N equally sized partitions.
- Parameters:
job - the job to samplepartFile - where to write the output file to
- Throws:
Throwable - if something goes wrong
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
- Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Throws:
IOException
makeSplit
protected org.apache.hadoop.mapreduce.lib.input.FileSplit makeSplit(org.apache.hadoop.fs.Path file,
long start,
long length,
String[] hosts)
- Overrides:
makeSplit in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
throws IOException
- Overrides:
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
- Throws:
IOException
Copyright © 2013 Apache Software Foundation. All Rights Reserved.