Feathr Job Configuration
Since Feathr uses Spark as the underlying execution engine, there’s a way to override Spark configuration by FeathrClient.get_offline_features()
with execution_configurations
parameters. The complete list of the available spark configuration is located in Spark Configuration (though not all of those are honored for cloud hosted Spark platforms such as Databricks), and there are a few Feathr specific ones that are documented here:
Property Name | Default | Meaning | Since Version |
---|---|---|---|
spark.feathr.inputFormat | None | Specify the input format if the file cannot be tell automatically. By default, Feathr will read files by parsing the file extension name; However the file/folder name doesn’t have extension name, this configuration can be set to tell Feathr which format it should use to read the data. Currently can only be set for Spark built-in short names, including json , parquet , jdbc , orc , libsvm , csv , text . For more details, see “Manually Specifying Options”. Additionally, delta is also supported if users want to read delta lake. | 0.2.1 |
spark.feathr.outputFormat | None | Specify the output format. “avro” is the default behavior if this value is not set. Currently can only be set for Spark built-in short names, including json , parquet , jdbc , orc , libsvm , csv , text . For more details, see “Manually Specifying Options”. Additionally, delta is also supported if users want to write delta lake. | 0.2.1 |