public class ParquetAvroInputFormat extends ParquetInputFormat<org.apache.avro.generic.GenericRecord> implements org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.avro.generic.GenericRecord>
ParquetInputFormat to read records from Parquet files and convert
them to Avro GenericRecord. To use it the user needs to add flink-avro optional
dependency to the classpath. Usage:
final ParquetAvroInputFormat inputFormat = new ParquetAvroInputFormat(new Path(filePath), parquetSchema);
DataSource<GenericRecord> source = env.createInput(inputFormat, new GenericRecordAvroTypeInfo(inputFormat.getAvroSchema()));
PARQUET_SKIP_CORRUPTED_RECORD, PARQUET_SKIP_WRONG_SCHEMA_SPLITS| Constructor and Description |
|---|
ParquetAvroInputFormat(org.apache.flink.core.fs.Path filePath,
org.apache.parquet.schema.MessageType messageType) |
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.avro.generic.GenericRecord |
convert(org.apache.flink.types.Row row)
This ParquetInputFormat read parquet record as Row by default.
|
org.apache.avro.Schema |
getAvroSchema() |
org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo |
getProducedType() |
void |
selectFields(String[] fieldNames)
Configures the fields to be read and returned by the ParquetInputFormat.
|
close, configure, getCurrentState, getFieldNames, getFieldTypes, getPredicate, nextRecord, open, reachedEnd, reopen, setFilterPredicateacceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toStringpublic ParquetAvroInputFormat(org.apache.flink.core.fs.Path filePath,
org.apache.parquet.schema.MessageType messageType)
public void selectFields(String[] fieldNames)
ParquetInputFormatselectFields in class ParquetInputFormat<org.apache.avro.generic.GenericRecord>fieldNames - Names of all selected fields.protected org.apache.avro.generic.GenericRecord convert(org.apache.flink.types.Row row)
ParquetInputFormatconvert in class ParquetInputFormat<org.apache.avro.generic.GenericRecord>row - row read from parquet filepublic org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo getProducedType()
getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.avro.generic.GenericRecord>public org.apache.avro.Schema getAvroSchema()
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.