Airflow scheduler

#Airflow scheduler how to

Hence we built the Airflow “canary” monitoring system which aims to treat Airflow as a black-box and verify that it schedules and executes tasks in a reasonable amount of time. We didn’t have a good monitoring system to understand whether Airflow schedules tasks or not at that time. Previously, we had a production issue which caused Airflow not to schedule any task for an hour at Lyft. The overall system health dashboard for Airflow At Lyft, we leverage various technologies including Datadog, Statsd, Grafana, and PagerDuty to monitor the Airflow system. It is crucial to maintain the SLA and uptime for Airflow. Airflow Monitoring And Alerting are nearly five hundred DAGs running daily on Airflow. Numbers of DAGs / Tasks: 500+ DAGs, 800+ DagRuns, 25000+ TaskInstances running on Airflow platform at Lyft daily. The single node is used to process the compute-intensive workloads from a critical team’s DAGs. ASG #3: 1 worker node which is the m4.10xlarge type.This fleet of workers is dedicated for those DAGs with a strict SLA. ASG #2: 3 worker nodes each of which is the m4.4xlarge type.This fleet of workers is for processing low-priority memory intensive tasks. ASG #1: 15 worker nodes each of which is the r5.4xlarge type.Scale: Three sets of Amazon auto scaling group (ASG) for celery workers, each of which is associated with one celery queue:

#Airflow scheduler how to

Here we show how to deploy Airflow in production at Lyft:Ĭonfiguration: Apache Airflow 1.8.2 with cherry-picks, and numerous in-house Lyft customized patches. At Lyft, we leverage CeleryExecutor to scale out Airflow task execution with different celery workers in production. For example, the Kubernetes(k8s) operator and executor are added to Airflow 1.10 which provides native Kubernetes execution support for Airflow.

There are quite a few executors supported by Airflow.

Executor: A message queuing process that orchestrates worker processes to execute tasks.

Scheduler: a multi-process which parses the DAG bag, creates a DAG object and triggers executor to execute those dependency met tasks.

Metadata DB: the metastore of Airflow for storing various metadata including job status, task instance status, etc.

WebUI: the portal for users to view the related status of the DAGs.

Airflow Architecture graph shows the Airflow architecture at Lyft:Īs illustrated in the above graph, there are four main architecture components:

Pools: concurrency limit configuration for a set of Airflow tasks.įor other Airflow terminologies, please check out Airflow documentation for more details.

Plugin: an extension to allow users to easily extend Airflow with various custom hooks, operators, sensors, macros, and web views.

Task: a parameterized instance of an operator/sensor which represents a unit of actual work to be executed.

Sensor: a type of special operator which will only execute if a certain condition is met.

For example, BashOperator represents how to execute a bash script while PythonOperator represents how to execute a python function, etc.

Operator: a template for a specific type of work to be executed.

DAG (Directed Acyclic Graph): a workflow which glues all the tasks with inter-dependencies.

Airflow under the hoodįor context around the terms used in this blog post, here are a few key concepts for Airflow: In this post, we will share our experiences on how we run Airflow at Lyft. Today, Airflow has become one of the most important pieces of infrastructure at Lyft which serves various use cases: from powering executive dashboards to metrics aggregation, to derived data generation, to machine learning feature computation, etc. Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago. A reliable, efficient, and trustworthy workflow management system is crucial to make sure these pipelines run successfully and deliver the data on its set schedule.Īpache Airflow is a workflow orchestration management system which allows users to programmatically author, schedule, and monitor data pipelines. Data engineers and scientists at Lyft build various ETL pipelines which run at a different set schedule to gain insight on topics ranging from the current ridesharing market to the experiences for driver/passenger, etc. By Tao Feng, Andrew Stahlman, and Junda YangĮTL is a process to extract data from various raw events, transform them for analysis and load the derived data into a queryable data store.