org.apache.hadoop.examples.terasort
Class TeraInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
          extended by org.apache.hadoop.examples.terasort.TeraInputFormat

public class TeraInputFormat
extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

An input format that reads the first 10 characters of each line as the key and the rest of the line as the value. Both key and value are represented as Text.


Field Summary
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
INPUT_DIR, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
 
Constructor Summary
TeraInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
           
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
           
protected  org.apache.hadoop.mapreduce.lib.input.FileSplit makeSplit(org.apache.hadoop.fs.Path file, long start, long length, String[] hosts)
           
static void writePartitionFile(org.apache.hadoop.mapreduce.JobContext job, org.apache.hadoop.fs.Path partFile)
          Use the input splits to take samples of the input and generate sample keys.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TeraInputFormat

public TeraInputFormat()
Method Detail

writePartitionFile

public static void writePartitionFile(org.apache.hadoop.mapreduce.JobContext job,
                                      org.apache.hadoop.fs.Path partFile)
                               throws Throwable
Use the input splits to take samples of the input and generate sample keys. By default reads 100,000 keys from 10 locations in the input, sorts them and picks N-1 keys to generate N equally sized partitions.

Parameters:
job - the job to sample
partFile - where to write the output file to
Throws:
Throwable - if something goes wrong

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                        org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                 throws IOException
Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Throws:
IOException

makeSplit

protected org.apache.hadoop.mapreduce.lib.input.FileSplit makeSplit(org.apache.hadoop.fs.Path file,
                                                                    long start,
                                                                    long length,
                                                                    String[] hosts)
Overrides:
makeSplit in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job)
                                                       throws IOException
Overrides:
getSplits in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>
Throws:
IOException


Copyright © 2013 Apache Software Foundation. All Rights Reserved.