July 28, 2023

Adding extra params on DatabricksRunNowOperator

With the new Databricks jobs API 2.1 you have different parameters depending on the kind of tasks you have in your workflow. Like: jar_params, sql_params, python_params, notebook_params…

And not always the airflow operator is ready to handle all of the. If we check the current release of the DatabricksRunNowOperator, we can see that there is only support for: notebook_params python_params python_named_parameters jar_params spark_submit_params And not the query_params mentioned earlier. But there is a way of combining both, there is a param called jsob that allows you to write the payload of a databricksrunnow and it will also merge the content of the JSON with your named_params!

So if we have a job_runthat has both a query and a jar, we can:

    json = {
        "sql_params": {
            "date": "2023-07-28",
            "days_back": "30"
        }
    }

    notebook_run = DatabricksRunNowOperator(
        job_id = 42,
		task_id='notebook_run',
 		json=json,
        jar_params = ["douglas adams", "42"]
	)

This was well documented in the source code of the operator. Still, I think that this post can help people find it faster :)

2017-2022 Adrián Abreu powered by Hugo and Kiss Theme