com.twitter.elephantbird.util
Class RCFileUtil

java.lang.Object
  extended by com.twitter.elephantbird.util.RCFileUtil

public class RCFileUtil
extends Object


Field Summary
static String COLUMN_METADATA_PROTOBUF_KEY
           
static String REQUIRED_FIELD_INDICES_CONF
          comma separated list of indices of the fields.
 
Constructor Summary
RCFileUtil()
           
 
Method Summary
static ArrayList<Integer> findColumnsToRead(org.apache.hadoop.conf.Configuration conf, List<Integer> currFieldIds, ColumnarMetadata storedInfo)
          Returns list of columns that need to be read from the RCFile.
static ColumnarMetadata readMetadata(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path rcfile)
          reads ColumnarMetadata stored in an RCFile.
static void setRequiredFieldConf(org.apache.hadoop.conf.Configuration conf, org.apache.pig.LoadPushDown.RequiredFieldList requiredFieldList)
          Sets REQUIRED_FIELD_INDICES_CONF to list of indices if requiredFieldList is not null.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

REQUIRED_FIELD_INDICES_CONF

public static String REQUIRED_FIELD_INDICES_CONF
comma separated list of indices of the fields. This is not a list of field numbers (as in a Protobuf or a Thrift class). If this configuration is not set or is empty, all the fields are read ("unknown fields" in Protobufs are not carried over).


COLUMN_METADATA_PROTOBUF_KEY

public static String COLUMN_METADATA_PROTOBUF_KEY
Constructor Detail

RCFileUtil

public RCFileUtil()
Method Detail

readMetadata

public static ColumnarMetadata readMetadata(org.apache.hadoop.conf.Configuration conf,
                                            org.apache.hadoop.fs.Path rcfile)
                                     throws IOException
reads ColumnarMetadata stored in an RCFile.

Throws:
IOException - if metadata is not stored or in case of any other error.

findColumnsToRead

public static ArrayList<Integer> findColumnsToRead(org.apache.hadoop.conf.Configuration conf,
                                                   List<Integer> currFieldIds,
                                                   ColumnarMetadata storedInfo)
                                            throws IOException
Returns list of columns that need to be read from the RCFile. These columns are the intersection of currently required columns and columns stored in the file. If any required column does not exist in the file, we need to read the "unknown fields" column, which is usually the last last one.

Throws:
IOException

setRequiredFieldConf

public static void setRequiredFieldConf(org.apache.hadoop.conf.Configuration conf,
                                        org.apache.pig.LoadPushDown.RequiredFieldList requiredFieldList)
Sets REQUIRED_FIELD_INDICES_CONF to list of indices if requiredFieldList is not null.



Copyright © 2015 Twitter. All Rights Reserved.