Packages

c

org.apache.spark.sql.execution.datasources.parquet

ParquetToSparkSchemaConverter

class ParquetToSparkSchemaConverter extends AnyRef

This converter class is used to convert Parquet MessageType to Spark SQL StructType (via the convert method) as well as ParquetColumn (via the convertParquetColumn method). The latter contains richer information about the Parquet type, including its associated repetition & definition level, column path, column descriptor etc.

Parquet format backwards-compatibility rules are respected when converting Parquet MessageType schemas.

See also

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ParquetToSparkSchemaConverter
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ParquetToSparkSchemaConverter(conf: Configuration)
  2. new ParquetToSparkSchemaConverter(conf: SQLConf)
  3. new ParquetToSparkSchemaConverter(assumeBinaryIsString: Boolean = ..., assumeInt96IsTimestamp: Boolean = ..., caseSensitive: Boolean = ..., nanosAsLong: Boolean = ...)

    assumeBinaryIsString

    Whether unannotated BINARY fields should be assumed to be Spark SQL StringType fields.

    assumeInt96IsTimestamp

    Whether unannotated INT96 fields should be assumed to be Spark SQL TimestampType fields.

    caseSensitive

    Whether use case sensitive analysis when comparing Spark catalyst read schema with Parquet schema

    nanosAsLong

    Whether timestamps with nanos are converted to long.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def convert(parquetSchema: MessageType): StructType

    Converts Parquet MessageType parquetSchema to a Spark SQL StructType.

  7. def convertField(field: ColumnIO, sparkReadType: Option[DataType] = None): ParquetColumn

    Converts a Parquet Type to a ParquetColumn which wraps a Spark SQL DataType with additional information such as the Parquet column's repetition & definition level, column path, column descriptor etc.

  8. def convertParquetColumn(parquetSchema: MessageType, sparkReadSchema: Option[StructType] = None): ParquetColumn

    Convert parquetSchema into a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.

    Convert parquetSchema into a ParquetColumn which contains its corresponding Spark SQL StructType along with other information such as the maximum repetition and definition level of each node, column descriptor for the leave nodes, etc.

    If sparkReadSchema is not empty, when deriving Spark SQL type from a Parquet field this will check if the same field also exists in the schema. If so, it will use the Spark SQL type. This is necessary since conversion from Parquet to Spark could cause precision loss. For instance, Spark read schema is smallint/tinyint but Parquet only support int.

  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  19. def toString(): String
    Definition Classes
    AnyRef → Any
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped