public class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT>
ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch.ParquetVectorizedInputFormat.ParquetReaderBatch<T>| Constructor and Description |
|---|
ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType projectedType,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive)
Constructor to create parquet format without extra fields.
|
ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType projectedType,
org.apache.flink.table.types.logical.RowType producedType,
ColumnBatchFactory<SplitT> batchFactory,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive)
Constructor to create parquet format with extra fields created by
ColumnBatchFactory. |
| Modifier and Type | Method and Description |
|---|---|
static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> |
createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType producedRowType,
List<String> partitionKeys,
org.apache.flink.table.filesystem.PartitionFieldExtractor<SplitT> extractor,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive)
Create a partitioned
ParquetColumnarRowInputFormat, the partition columns can be
generated by Path. |
protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> |
createReaderBatch(org.apache.flink.table.data.vector.writable.WritableColumnVector[] writableVectors,
org.apache.flink.table.data.vector.VectorizedColumnBatch columnarBatch,
org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler) |
org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> |
getProducedType() |
protected int |
numBatchesToCirculate(org.apache.flink.configuration.Configuration config) |
createReader, isSplittable, restoreReaderpublic ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType projectedType,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive)
public ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig,
org.apache.flink.table.types.logical.RowType projectedType,
org.apache.flink.table.types.logical.RowType producedType,
ColumnBatchFactory<SplitT> batchFactory,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive)
ColumnBatchFactory.projectedType - the projected row type for parquet format, excludes extra fields.producedType - the produced row type for this input format, includes extra fields.batchFactory - factory for creating column batch, can cram in extra fields.protected int numBatchesToCirculate(org.apache.flink.configuration.Configuration config)
numBatchesToCirculate in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> createReaderBatch(org.apache.flink.table.data.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)
createReaderBatch in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>writableVectors - vectors to be writecolumnarBatch - vectors to be readrecycler - batch recyclerpublic org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, List<String> partitionKeys, org.apache.flink.table.filesystem.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
ParquetColumnarRowInputFormat, the partition columns can be
generated by Path.Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.