com.twitter.elephantbird.mapreduce.output
Class RCFileThriftOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>
          extended by com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
              extended by com.twitter.elephantbird.mapreduce.output.RCFileThriftOutputFormat

public class RCFileThriftOutputFormat
extends RCFileOutputFormat

OutputFormat for storing Thrift objects in RCFile.

Each of the top level fields is stored in a separate column. Thrift field ids are stored in RCFile metadata.

The user can write either a ThriftWritable with the Thrift object or a BytesWritable with serialized Thrift bytes. The latter ensures that all the fields are preserved even if the current Thrift definition does not match the definition represented in the serialized bytes. Any fields not recognized by current Thrift class are stored in the last column.


Nested Class Summary
 
Nested classes/interfaces inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
RCFileOutputFormat.Writer
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
 
Field Summary
 
Fields inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
COMPRESSION_CODEC_CONF, DEFAULT_EXTENSION, EXTENSION_OVERRIDE_CONF
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, PART
 
Constructor Summary
RCFileThriftOutputFormat()
          internal, for MR use only.
RCFileThriftOutputFormat(TypeRef<? extends org.apache.thrift.TBase<?,?>> typeRef)
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
           
protected  ColumnarMetadata makeColumnarMetadata()
           
static void setClassConf(Class<? extends org.apache.thrift.TBase<?,?>> thriftClass, org.apache.hadoop.conf.Configuration conf)
          Stores supplied class name in configuration.
 
Methods inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
createRCFileWriter, getColumnNumber, setColumnNumber
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RCFileThriftOutputFormat

public RCFileThriftOutputFormat()
internal, for MR use only.


RCFileThriftOutputFormat

public RCFileThriftOutputFormat(TypeRef<? extends org.apache.thrift.TBase<?,?>> typeRef)
Method Detail

makeColumnarMetadata

protected ColumnarMetadata makeColumnarMetadata()

setClassConf

public static void setClassConf(Class<? extends org.apache.thrift.TBase<?,?>> thriftClass,
                                org.apache.hadoop.conf.Configuration conf)
Stores supplied class name in configuration. This configuration is read on the remote tasks to initialize the output format correctly.


getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
                                                                                                                          throws IOException,
                                                                                                                                 InterruptedException
Overrides:
getRecordWriter in class RCFileOutputFormat
Throws:
IOException
InterruptedException


Copyright © 2015 Twitter. All Rights Reserved.