com.twitter.elephantbird.mapreduce.output
Class RCFileProtobufOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<K,V>
      extended by org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>
          extended by com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
              extended by com.twitter.elephantbird.mapreduce.output.RCFileProtobufOutputFormat

public class RCFileProtobufOutputFormat
extends RCFileOutputFormat

OutputFormat for storing protobufs in RCFile.

Each of the top level fields is stored in a separate column. The protobuf field numbers are stored in RCFile metadata.

A protobuf message can contain "unknown fields". These fields are preserved and stored in the last column. e.g. if protobuf A with 4 fields (a, b, c, d) is serialized and when it is deserialized A has only 3 fields (a, c, d), then 'b' is carried over as an unknown field.


Nested Class Summary
 
Nested classes/interfaces inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
RCFileOutputFormat.Writer
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
 
Field Summary
 
Fields inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
COMPRESSION_CODEC_CONF, DEFAULT_EXTENSION, EXTENSION_OVERRIDE_CONF
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, PART
 
Constructor Summary
RCFileProtobufOutputFormat()
          internal, for MR use only.
RCFileProtobufOutputFormat(TypeRef<? extends com.google.protobuf.Message> typeRef)
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
           
protected  ColumnarMetadata makeColumnarMetadata()
           
static void setClassConf(Class<? extends com.google.protobuf.Message> protoClass, org.apache.hadoop.conf.Configuration conf)
          Stores supplied class name in configuration.
 
Methods inherited from class com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
createRCFileWriter, getColumnNumber, setColumnNumber
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RCFileProtobufOutputFormat

public RCFileProtobufOutputFormat()
internal, for MR use only.


RCFileProtobufOutputFormat

public RCFileProtobufOutputFormat(TypeRef<? extends com.google.protobuf.Message> typeRef)
Method Detail

makeColumnarMetadata

protected ColumnarMetadata makeColumnarMetadata()

setClassConf

public static void setClassConf(Class<? extends com.google.protobuf.Message> protoClass,
                                org.apache.hadoop.conf.Configuration conf)
Stores supplied class name in configuration. This configuration is read on the remote tasks to initialize the output format correctly.


getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
                                                                                                                          throws IOException,
                                                                                                                                 InterruptedException
Overrides:
getRecordWriter in class RCFileOutputFormat
Throws:
IOException
InterruptedException


Copyright © 2015 Twitter. All Rights Reserved.