com.twitter.elephantbird.mapreduce.output
Class RCFileOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>
          extended by com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
Direct Known Subclasses:
RCFileProtobufOutputFormat, RCFileThriftOutputFormat

public class RCFileOutputFormat
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>

Hive's RCFileOutputFormat is written for deprecated OutputFormat. Pig requires newer OutputFormat. In addition RCFileOutputFormat's functionality this class adds RCFile metadata support. TODO: contribute this to PIG.


Nested Class Summary
protected static class RCFileOutputFormat.Writer
          RecordWriter wrapper around an RCFile.Writer
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
 
Field Summary
static String COMPRESSION_CODEC_CONF
           
static String DEFAULT_EXTENSION
           
static String EXTENSION_OVERRIDE_CONF
           
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, PART
 
Constructor Summary
RCFileOutputFormat()
           
 
Method Summary
protected  org.apache.hadoop.hive.ql.io.RCFile.Writer createRCFileWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job, org.apache.hadoop.io.Text columnMetadata)
           
static int getColumnNumber(org.apache.hadoop.conf.Configuration conf)
          Returns the number of columns set in the conf for writers.
 org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
           
static void setColumnNumber(org.apache.hadoop.conf.Configuration conf, int columnNum)
          set number of columns into the given configuration.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPRESSION_CODEC_CONF

public static String COMPRESSION_CODEC_CONF

DEFAULT_EXTENSION

public static String DEFAULT_EXTENSION

EXTENSION_OVERRIDE_CONF

public static String EXTENSION_OVERRIDE_CONF
Constructor Detail

RCFileOutputFormat

public RCFileOutputFormat()
Method Detail

setColumnNumber

public static void setColumnNumber(org.apache.hadoop.conf.Configuration conf,
                                   int columnNum)
set number of columns into the given configuration.

Parameters:
conf - configuration instance which need to set the column number
columnNum - column number for RCFile's Writer

getColumnNumber

public static int getColumnNumber(org.apache.hadoop.conf.Configuration conf)
Returns the number of columns set in the conf for writers.

Parameters:
conf -
Returns:
number of columns for RCFile's writer

createRCFileWriter

protected org.apache.hadoop.hive.ql.io.RCFile.Writer createRCFileWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job,
                                                                        org.apache.hadoop.io.Text columnMetadata)
                                                                 throws IOException
Throws:
IOException

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
                                                                                                                          throws IOException,
                                                                                                                                 InterruptedException
Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>
Throws:
IOException
InterruptedException


Copyright © 2015 Twitter. All Rights Reserved.