Feathr Job Configuration

Since Feathr uses Spark as the underlying execution engine, there’s a way to override Spark configuration by FeathrClient.get_offline_features() with execution_configurations parameters. The complete list of the available spark configuration is located in Spark Configuration (though not all of those are honored for cloud hosted Spark platforms such as Databricks), and there are a few Feathr specific ones that are documented here:

Property Name	Default	Meaning	Since Version
spark.feathr.inputFormat	None	Specify the input format if the file cannot be tell automatically. By default, Feathr will read files by parsing the file extension name; However the file/folder name doesn’t have extension name, this configuration can be set to tell Feathr which format it should use to read the data. Currently can only be set for Spark built-in short names, including `json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`. For more details, see “Manually Specifying Options”. Additionally, `delta` is also supported if users want to read delta lake.	0.2.1
spark.feathr.outputFormat	None	Specify the output format. “avro” is the default behavior if this value is not set. Currently can only be set for Spark built-in short names, including `json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`. For more details, see “Manually Specifying Options”. Additionally, `delta` is also supported if users want to write delta lake.	0.2.1