com.twitter.elephantbird.util
Class RCFileUtil
java.lang.Object
com.twitter.elephantbird.util.RCFileUtil
public class RCFileUtil
- extends Object
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
REQUIRED_FIELD_INDICES_CONF
public static String REQUIRED_FIELD_INDICES_CONF
- comma separated list of indices of the fields. This is not a list of field
numbers (as in a Protobuf or a Thrift class).
If this configuration is not set or is empty, all the fields
are read ("unknown fields" in Protobufs are not carried over).
COLUMN_METADATA_PROTOBUF_KEY
public static String COLUMN_METADATA_PROTOBUF_KEY
RCFileUtil
public RCFileUtil()
readMetadata
public static ColumnarMetadata readMetadata(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path rcfile)
throws IOException
- reads
ColumnarMetadata stored in an RCFile.
- Throws:
IOException - if metadata is not stored or in case of any other error.
findColumnsToRead
public static ArrayList<Integer> findColumnsToRead(org.apache.hadoop.conf.Configuration conf,
List<Integer> currFieldIds,
ColumnarMetadata storedInfo)
throws IOException
- Returns list of columns that need to be read from the RCFile.
These columns are the intersection of currently required columns and
columns stored in the file.
If any required column does not exist in the file, we need to read
the "unknown fields" column, which is usually the last last one.
- Throws:
IOException
setRequiredFieldConf
public static void setRequiredFieldConf(org.apache.hadoop.conf.Configuration conf,
org.apache.pig.LoadPushDown.RequiredFieldList requiredFieldList)
- Sets
REQUIRED_FIELD_INDICES_CONF to list of indices
if requiredFieldList is not null.
Copyright © 2015 Twitter. All Rights Reserved.