Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Airflow demo #19

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions examples/airflow-operator/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# FirecREST Operators for Airflow

[Apache Airflow](https://airflow.apache.org) is an open-source workflow management platform. Airflow uses directed acyclic graphs (DAGs) to manage the workflows. Tasks and dependencies are defined in Python and then Airflow takes care of the scheduling and execution. DAGs can be run either on a schedule or based on external event triggers.
[Apache Airflow](https://airflow.apache.org) is an open-source workflow management platform. It uses directed acyclic graphs (DAGs) of tasks toorgnize the workflows. Tasks and dependencies are defined in Python and then passed to Airflow, which takes care of scheduling and executing them. DAGs can be run either on a schedule or based on external event triggers.

For this tutorial we have defined an Airflow DAG combining small tasks which can run localy and compute-intensive tasks that must run on a supercomputer. Our goal is to add to Airflow the support for executing the DAG's compute-intensive tasks in Piz Daint via [FirecREST](https://firecrest.readthedocs.io). For that we are going to write [custom Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) that will use FirecREST to access Piz Daint.
For this tutorial we have defined an Airflow DAG that combines small tasks that can run localy on a laptopand compute-intensive tasks that must run on a supercomputer. Our goal is to add to Airflow the support for executing the DAG's compute-intensive tasks in Piz Daint via [FirecREST](https://firecrest.readthedocs.io). For that we are going to write [custom Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) that will use FirecREST to interact with Piz Daint.

The idea behind this is very simple.
Operators are defined as units of work for Airflow to complete. Custom operators can be written by extending Airflow's [`BaseOperator`](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/baseoperator/index.html#airflow.models.baseoperator.BaseOperatorMeta) class.
Expand All @@ -28,7 +28,7 @@ We can export as environment variables the credentials that FirecREST will use a
```bash
export FIRECREST_CLIENT_ID=<client-id>
export FIRECREST_CLIENT_SECRET=<client-secret>
export AUTH_TOKEN_URL=https://auth.cscs.ch/auth/realms/firecrest-clients/protocol/openid-connect/token
export AUTH_TOKEN_URL=https://<token-url>
export FIRECREST_URL=https://<firecrest-url>
```

Expand Down Expand Up @@ -65,6 +65,7 @@ standalone | Airflow is ready
standalone | Login with username: admin password: <password>
standalone | Airflow Standalone is for development purposes only. Do not use this in production!
```
The password can be found as well in `$AIRFLOW_HOME/standalone_admin_password.txt`.

## Hands On

Expand Down
18 changes: 7 additions & 11 deletions examples/airflow-operator/airflow-dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,16 @@
job_script = """#!/bin/bash -l

#SBATCH --job-name=airflow-example
#SBATCH --output=/home/fireuser/airflow/slurm-%j.out
#SBATCH --output=slurm-%j.out
#SBATCH --time=00:05:00
##SBATCH --nodes=1
##SBATCH --ntasks-per-core=1
##SBATCH --ntasks-per-node=1
##SBATCH --cpus-per-task=12
##SBATCH --constraint=gpu
##SBATCH --account=csstaff

module load QuantumESPRESSO
#SBATCH --nodes=1

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun pw.x -in si.scf.in
uenv repo create
uenv image pull quantumespresso/v7.3.1:v2

srun --uenv quantumespresso/v7.3.1:v2 --view default pw.x -in si.scf.in
"""

with DAG(
Expand All @@ -43,7 +39,7 @@
start_date=pendulum.datetime(2023, 9, 1, tz="UTC"),
catchup=False,
dagrun_timeout=datetime.timedelta(minutes=60),
tags=["firecrest-training-2023"],
tags=["firecrest-v2-demo"],
) as dag:

wait_for_file = FileSensor(
Expand Down