com.twitter.elephantbird.mapreduce.output
Class RCFileProtobufOutputFormat
java.lang.Object
org.apache.hadoop.mapreduce.OutputFormat<K,V>
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable>
com.twitter.elephantbird.mapreduce.output.RCFileOutputFormat
com.twitter.elephantbird.mapreduce.output.RCFileProtobufOutputFormat
public class RCFileProtobufOutputFormat
- extends RCFileOutputFormat
OutputFormat for storing protobufs in RCFile.
Each of the top level fields is stored in a separate column.
The protobuf field numbers are stored in RCFile metadata.
A protobuf message can contain
"unknown fields". These fields are preserved and stored
in the last column. e.g. if protobuf A with 4 fields (a, b, c, d) is
serialized and when it is deserialized A has only 3 fields (a, c, d),
then 'b' is carried over as an unknown field.
| Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat |
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter |
| Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat |
BASE_OUTPUT_NAME, PART |
|
Method Summary |
org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> |
getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
|
protected ColumnarMetadata |
makeColumnarMetadata()
|
static void |
setClassConf(Class<? extends com.google.protobuf.Message> protoClass,
org.apache.hadoop.conf.Configuration conf)
Stores supplied class name in configuration. |
| Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat |
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RCFileProtobufOutputFormat
public RCFileProtobufOutputFormat()
- internal, for MR use only.
RCFileProtobufOutputFormat
public RCFileProtobufOutputFormat(TypeRef<? extends com.google.protobuf.Message> typeRef)
makeColumnarMetadata
protected ColumnarMetadata makeColumnarMetadata()
setClassConf
public static void setClassConf(Class<? extends com.google.protobuf.Message> protoClass,
org.apache.hadoop.conf.Configuration conf)
- Stores supplied class name in configuration. This configuration is
read on the remote tasks to initialize the output format correctly.
getRecordWriter
public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Writable> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext job)
throws IOException,
InterruptedException
- Overrides:
getRecordWriter in class RCFileOutputFormat
- Throws:
IOException
InterruptedException
Copyright © 2015 Twitter. All Rights Reserved.