public class DataSourceUtils extends Object
| Constructor and Description |
|---|
DataSourceUtils() |
| Modifier and Type | Method and Description |
|---|---|
static void |
checkRequiredProperties(TypedProperties props,
List<String> checkPropNames) |
static SparkRDDWriteClient |
createHoodieClient(org.apache.spark.api.java.JavaSparkContext jssc,
String schemaStr,
String basePath,
String tblName,
Map<String,String> parameters) |
static HoodieWriteConfig |
createHoodieConfig(String schemaStr,
String basePath,
String tblName,
Map<String,String> parameters) |
static HoodieRecord |
createHoodieRecord(org.apache.avro.generic.GenericRecord gr,
Comparable orderingVal,
HoodieKey hKey,
String payloadClass) |
static HoodieRecord |
createHoodieRecord(org.apache.avro.generic.GenericRecord gr,
HoodieKey hKey,
String payloadClass) |
static HoodieRecordPayload |
createPayload(String payloadClass,
org.apache.avro.generic.GenericRecord record)
Create a payload class via reflection, do not ordering/precombine value.
|
static HoodieRecordPayload |
createPayload(String payloadClass,
org.apache.avro.generic.GenericRecord record,
Comparable orderingVal)
Create a payload class via reflection, passing in an ordering/precombine value.
|
static Option<BulkInsertPartitioner<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>> |
createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config)
Create a UserDefinedBulkInsertPartitionerRows class via reflection,
if the class name of UserDefinedBulkInsertPartitioner is configured through the HoodieWriteConfig. |
static HoodieWriteResult |
doDeleteOperation(SparkRDDWriteClient client,
org.apache.spark.api.java.JavaRDD<HoodieKey> hoodieKeys,
String instantTime) |
static HoodieWriteResult |
doDeletePartitionsOperation(SparkRDDWriteClient client,
List<String> partitionsToDelete,
String instantTime) |
static HoodieWriteResult |
doWriteOperation(SparkRDDWriteClient client,
org.apache.spark.api.java.JavaRDD<HoodieRecord> hoodieRecords,
String instantTime,
WriteOperationType operation) |
static org.apache.spark.api.java.JavaRDD<HoodieRecord> |
dropDuplicates(org.apache.spark.api.java.JavaSparkContext jssc,
org.apache.spark.api.java.JavaRDD<HoodieRecord> incomingHoodieRecords,
HoodieWriteConfig writeConfig)
Drop records already present in the dataset.
|
static org.apache.spark.api.java.JavaRDD<HoodieRecord> |
dropDuplicates(org.apache.spark.api.java.JavaSparkContext jssc,
org.apache.spark.api.java.JavaRDD<HoodieRecord> incomingHoodieRecords,
Map<String,String> parameters) |
static Map<String,String> |
getExtraMetadata(Map<String,String> properties) |
static String |
getTablePath(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path[] userProvidedPaths) |
static void |
tryOverrideParquetWriteLegacyFormatProperty(Map<String,String> properties,
org.apache.spark.sql.types.StructType schema)
Checks whether default value (false) of "hoodie.parquet.writelegacyformat.enabled" should be
overridden in case:
Property has not been explicitly set by the writer
Data schema contains
DecimalType that would be affected by it
If both of the aforementioned conditions are true, will override the default value of the config
(by essentially setting the value) to make sure that the produced Parquet data files could be
read by AvroParquetReader |
public static String getTablePath(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path[] userProvidedPaths) throws IOException
IOExceptionpublic static Option<BulkInsertPartitioner<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>> createUserDefinedBulkInsertPartitionerWithRows(HoodieWriteConfig config) throws HoodieException
HoodieExceptionHoodieWriteConfig.getUserDefinedBulkInsertPartitionerClass()public static HoodieRecordPayload createPayload(String payloadClass, org.apache.avro.generic.GenericRecord record, Comparable orderingVal) throws IOException
IOExceptionpublic static HoodieRecordPayload createPayload(String payloadClass, org.apache.avro.generic.GenericRecord record) throws IOException
IOExceptionpublic static void checkRequiredProperties(TypedProperties props, List<String> checkPropNames)
public static HoodieWriteConfig createHoodieConfig(String schemaStr, String basePath, String tblName, Map<String,String> parameters)
public static SparkRDDWriteClient createHoodieClient(org.apache.spark.api.java.JavaSparkContext jssc, String schemaStr, String basePath, String tblName, Map<String,String> parameters)
public static HoodieWriteResult doWriteOperation(SparkRDDWriteClient client, org.apache.spark.api.java.JavaRDD<HoodieRecord> hoodieRecords, String instantTime, WriteOperationType operation) throws HoodieException
HoodieExceptionpublic static HoodieWriteResult doDeleteOperation(SparkRDDWriteClient client, org.apache.spark.api.java.JavaRDD<HoodieKey> hoodieKeys, String instantTime)
public static HoodieWriteResult doDeletePartitionsOperation(SparkRDDWriteClient client, List<String> partitionsToDelete, String instantTime)
public static HoodieRecord createHoodieRecord(org.apache.avro.generic.GenericRecord gr, Comparable orderingVal, HoodieKey hKey, String payloadClass) throws IOException
IOExceptionpublic static HoodieRecord createHoodieRecord(org.apache.avro.generic.GenericRecord gr, HoodieKey hKey, String payloadClass) throws IOException
IOExceptionpublic static org.apache.spark.api.java.JavaRDD<HoodieRecord> dropDuplicates(org.apache.spark.api.java.JavaSparkContext jssc, org.apache.spark.api.java.JavaRDD<HoodieRecord> incomingHoodieRecords, HoodieWriteConfig writeConfig)
jssc - JavaSparkContextincomingHoodieRecords - HoodieRecords to deduplicatewriteConfig - HoodieWriteConfigpublic static org.apache.spark.api.java.JavaRDD<HoodieRecord> dropDuplicates(org.apache.spark.api.java.JavaSparkContext jssc, org.apache.spark.api.java.JavaRDD<HoodieRecord> incomingHoodieRecords, Map<String,String> parameters)
public static void tryOverrideParquetWriteLegacyFormatProperty(Map<String,String> properties, org.apache.spark.sql.types.StructType schema)
DecimalType that would be affected by itAvroParquetReaderproperties - properties specified by the writerschema - schema of the dataset being writtenCopyright © 2022 The Apache Software Foundation. All rights reserved.