E - The type of record to read.public abstract class ParquetInputFormat<E> extends org.apache.flink.api.common.io.FileInputFormat<E> implements org.apache.flink.api.common.io.CheckpointableInputFormat<org.apache.flink.core.fs.FileInputSplit,org.apache.flink.api.java.tuple.Tuple2<Long,Long>>
convert(Row) method need to be implemented.
Using ParquetRecordReader to read files instead of FSDataInputStream, we override open(FileInputSplit) and close() to change the behaviors.
| Modifier and Type | Field and Description |
|---|---|
static String |
PARQUET_SKIP_CORRUPTED_RECORD
The config parameter which defines whether to skip corrupted record.
|
static String |
PARQUET_SKIP_WRONG_SCHEMA_SPLITS
The config parameter which defines whether to skip file split with wrong schema.
|
| Modifier | Constructor and Description |
|---|---|
protected |
ParquetInputFormat(org.apache.flink.core.fs.Path path,
org.apache.parquet.schema.MessageType messageType)
Read parquet files with given parquet file schema.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
configure(org.apache.flink.configuration.Configuration parameters) |
protected abstract E |
convert(org.apache.flink.types.Row row)
This ParquetInputFormat read parquet record as Row by default.
|
org.apache.flink.api.java.tuple.Tuple2<Long,Long> |
getCurrentState() |
protected String[] |
getFieldNames()
Get field names of read result.
|
protected org.apache.flink.api.common.typeinfo.TypeInformation[] |
getFieldTypes()
Get field types of read result.
|
protected org.apache.parquet.filter2.predicate.FilterPredicate |
getPredicate() |
E |
nextRecord(E e) |
void |
open(org.apache.flink.core.fs.FileInputSplit split) |
boolean |
reachedEnd() |
void |
reopen(org.apache.flink.core.fs.FileInputSplit split,
org.apache.flink.api.java.tuple.Tuple2<Long,Long> state) |
void |
selectFields(String[] fieldNames)
Configures the fields to be read and returned by the ParquetInputFormat.
|
void |
setFilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate filterPredicate) |
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toStringpublic static final String PARQUET_SKIP_WRONG_SCHEMA_SPLITS
public static final String PARQUET_SKIP_CORRUPTED_RECORD
protected ParquetInputFormat(org.apache.flink.core.fs.Path path,
org.apache.parquet.schema.MessageType messageType)
path - The path of the file to read.messageType - schema of parquet filepublic void configure(org.apache.flink.configuration.Configuration parameters)
public void selectFields(String[] fieldNames)
fieldNames - Names of all selected fields.public void setFilterPredicate(org.apache.parquet.filter2.predicate.FilterPredicate filterPredicate)
public void open(org.apache.flink.core.fs.FileInputSplit split)
throws IOException
open in interface org.apache.flink.api.common.io.InputFormat<E,org.apache.flink.core.fs.FileInputSplit>open in class org.apache.flink.api.common.io.FileInputFormat<E>IOExceptionpublic void reopen(org.apache.flink.core.fs.FileInputSplit split,
org.apache.flink.api.java.tuple.Tuple2<Long,Long> state)
throws IOException
reopen in interface org.apache.flink.api.common.io.CheckpointableInputFormat<org.apache.flink.core.fs.FileInputSplit,org.apache.flink.api.java.tuple.Tuple2<Long,Long>>IOExceptionprotected String[] getFieldNames()
protected org.apache.flink.api.common.typeinfo.TypeInformation[] getFieldTypes()
@VisibleForTesting protected org.apache.parquet.filter2.predicate.FilterPredicate getPredicate()
public void close()
throws IOException
close in interface org.apache.flink.api.common.io.InputFormat<E,org.apache.flink.core.fs.FileInputSplit>close in class org.apache.flink.api.common.io.FileInputFormat<E>IOExceptionpublic boolean reachedEnd()
throws IOException
reachedEnd in interface org.apache.flink.api.common.io.InputFormat<E,org.apache.flink.core.fs.FileInputSplit>IOExceptionpublic E nextRecord(E e) throws IOException
nextRecord in interface org.apache.flink.api.common.io.InputFormat<E,org.apache.flink.core.fs.FileInputSplit>IOExceptionprotected abstract E convert(org.apache.flink.types.Row row)
row - row read from parquet fileCopyright © 2014–2022 The Apache Software Foundation. All rights reserved.