Airflow Operators - A Comparison
Airflow provides a variety of operators to couple your business logic into executable tasks in a workflow. Often times it is confusing to decide when to use what. In this article we will discuss the p
Airflow provides a variety of operators to couple your business logic into executable tasks in a workflow. Often it is confusing to decide when to use what. In this article, we will discuss the pros and cons of each in detail.
PythonOperator
When using the Airflow PythonThe best operator, all the business logic and its associated code reside in the airflow DAG directory. The PythonOperator
imports and runs them during the execution
Pros
Best option is when the code is in the same repo as the Airflow
Simple and easy to use
Works well on small teams
Cons:
Couples airflow code with business logic
Any business logic change would mean redeploying airflow code
Sharing a single airflow instance across multiple projects will be a nightmare
Can run only Python code, well, duh.
DockerOperator
When using Airflow's Docker operator, all the business logic and its associated code reside in a docker image.
During execution
Airflow pulls the specified image
Spins up a container
Executes the respective command.
We have to ensure that a docker daemon is running
Pros
Works well across cross-functional teams
Can run projects that are not built-in Python
Works well when your infra is already working on a Docker system -e.g., Docker compose
Cons
Needs docker installed in the worker machine
Depending on the resources available, The load of the worker machine might be heavy when multiple containers run at the same time
KubernetesPodOperator
When using KubernetesPodOperator
, all the business logic and it's associated code resides in a docker image. During execution, airflow spins up a worker pod, which pulls the mentioned docker image and executes the respective command.
Pros
Works well across cross-functional teams
Single airflow instance can be shared across teams without hassle
Decouples DAG and the business logic
Cons:
Complex on the infrastructure, since it uses docker and Kubernetes.
Last updated