Latest in branch 1.7
1.7.1.3
Released 08 Mar 2017
(9 years ago)
SoftwareApache Airflow
Version1.7
Environment
requirements
Python 2.7/3.4-3.5
MySQL ≥5.6
PostgreSQL ≥9.4
SQLite (dev)
Linux/macOS only
Initial release1.7.0
28 Mar 2016
(10 years ago)
Latest release1.7.1.3
08 Mar 2017
(9 years ago)
Limited Maintenance19 Mar 2017
(Ended 9 years, 2 months ago)
EOL/Terminated19 Mar 2017
(Ended 9 years, 2 months ago)
Release noteshttps://github.com/apache/airflow/releases/tag/1.7.1.3
Source codehttps://github.com/apache/airflow/tree/1.7.1.3
Downloadhttps://github.com/apache/airflow/releases/tag/1.7.1.3
Apache Airflow 1.7 ReleasesView full list

What Is New in Apache Airflow 1.7

Apache Airflow 1.7 is a broad platform release that spans security hardening, Google Cloud ecosystem expansion, new operator primitives, CLI power-ups, and a round of UI polish. For teams running Airflow in production, the headline themes are encryption of sensitive data at rest, a growing suite of BigQuery and GCS operators, and a more testable, scriptable CLI -- all of which translate directly into less toil on day-to-day pipeline operations.

Summary

Category Highlights
New Features Fernet-based Variable and connection-extra encryption; DockerOperator; SSHExecuteOperator; FTPSHook; gcloud-based GCSHook; BigQuery UDF support; BigQuery copy operator; BigQuery PEP 249 interface; MySQL-to-GCS and GCS-to-BQ operators; QDS (Qubole Data Service) operator; PrestoToMySqlTransfer; GitHub Enterprise authentication; StatsD abstraction layer; custom email backends; SLA miss callbacks; dags_are_paused_at_creation config flag; upstart startup scripts; SSL support for SMTP; three-legged OAuth for Google connections; domain-wide delegation for Google Cloud apps; LDAP search_scope configuration
Improvements CLI trigger_dag now accepts --conf as JSON; CLI test command accepts JSON parameter dictionaries; rendered template preview from CLI; dag_state command in CLI; SQLAlchemy pool_recycle and pool_size now configurable; DagBag import timeout parameterized; Oracle SID connection support; HDFS effective user from connection config; LDAP superuser and data-profiler role support; Slack attachment templating; Qubole operator template support; active DAG run counts in UI tooltip; DAG pausing now also pauses queued tasks; more verbose dependency logging; BashOperator output encoding option; GCS download operator filename in template_fields; logout button added to web UI; graph view border refresh without full page reload; Task Duration and Landing Times base date form
Bug Fixes Pool not used with CeleryExecutor; infinite retries hotfix; try_number not incremented correctly; subdag not refreshing or showing up; password printed to stdout on initdb/resetdb; fixed yesterday_ds_nodash and tomorrow_ds_nodash exposure; conflicting params in default_args; BashOperator xcom_push member collision; LDAP error messages on failed login; MySQL multi-byte char parsing; DagRuns not respecting start and end dates; adhoc tasks incorrectly closing DAG runs; GCS-to-BQ scoping fix; Presto hook 503 graceful handling; DAG IDs prefixed with numbers showing wrong status
Breaking Changes BashOperator xcom_push member renamed to avoid parent-class collision; flask.ext.* imports replaced with flask_* (Flask upgrade compatibility)

How does Airflow 1.7 protect sensitive data stored in connections and variables?

Airflow 1.7 ships Fernet-based encryption for Variables and the extra field of Connections, so passwords and API tokens are no longer stored in plain text in your metadata database. This is the single most operationally significant change in this release for teams that store credentials in Airflow's built-in secrets store.

To enable encryption, generate a Fernet key and add it to airflow.cfg:

python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Paste the output into airflow.cfg:
[core]
fernet_key = YOUR_GENERATED_KEY_HERE

In practice, existing un-encrypted variables and connection extras will continue to work after upgrading -- Airflow reads them as plain text. New values written after the key is set will be encrypted automatically. Watch out for the transition period if you rotate the key: existing encrypted values must be re-encrypted before discarding the old key.

Alongside this, 1.7 fixes a long-standing embarrassment where the database password was printed to stdout during airflow initdb, resetdb, and upgradedb. That leak is now closed.

What new Google Cloud operators and hooks ship with Airflow 1.7?

Airflow 1.7 dramatically expands the Google Cloud operator surface, making it a much more complete solution for GCP-centric data pipelines. The most impactful additions are a set of BigQuery and Cloud Storage operators that cover the canonical ETL path from MySQL all the way to a BigQuery table.

New and improved GCP capabilities include:

  • MySQL-to-GCS operator -- dumps MySQL query results to Cloud Storage; now handles TINYINT, INT24 (MEDIUMINT), and DATE types correctly.
  • GCS-to-BigQuery operator -- loads files from GCS into BQ; defaults to 0 rows loaded (instead of error) when the file is empty.
  • BigQuery copy operator -- copies a BQ table within or across projects without leaving Python.
  • BigQuery User Defined Functions (UDFs) -- pass JS UDF definitions directly from the operator.
  • BigQuery PEP 249 -- BigQuery hook now exposes a DB-API 2.0 interface, enabling use with standard SQL tooling.
  • gcloud-based GCSHook -- an alternative GCS hook that delegates to the gcloud CLI, useful in environments where service-account JSON is impractical.
  • Google Datastore Hook -- first-class hook for Cloud Datastore with persistent auth across the hook lifetime.
  • Three-legged OAuth and domain-wide delegation -- broader auth support for Google Workspace (formerly G Suite) scenarios.
  • Project specification in BigQuery hook methods -- lets you target a non-default GCP project at the method level rather than the connection level.

This matters if your team runs multi-project GCP environments -- you can now fan out to different projects within a single DAG without maintaining a separate Airflow connection per project.

One migration note: GCP API client dependencies are no longer force-installed. If your deployment relies on implicit transitive installation of google-api-python-client, add it to your pip requirements explicitly after upgrading.

What new operators and hooks does Airflow 1.7 introduce beyond Google Cloud?

Airflow 1.7 adds several operator primitives that have been on many teams' wish lists. The DockerOperator, SSHExecuteOperator, and the Qubole Data Services hook are the most production-relevant additions outside the GCP ecosystem.

  • DockerOperator -- run arbitrary Docker containers as Airflow tasks. This opens up language-agnostic task execution without the overhead of a full CeleryExecutor worker stack.
  • SSHExecuteOperator -- execute a command on a remote host over SSH directly from a task. Pairs with the SSH connection type now available in the web connection form.
  • FTPSHook -- full FTP/FTPS connectivity, a gap that previously forced teams to fall back to BashOperator + curl.
  • Qubole Data Services (QDS) operator and hook -- submit Hive, Hadoop, Pig, Spark, and Shell commands to Qubole clusters; Jinja template support included.
  • PrestoToMySqlTransfer -- direct result-set transfer from Presto to MySQL, no intermediate file required.
  • PigOperator stub -- lays the groundwork for Apache Pig integration.

The Presto hook also gets a meaningful defensive fix in 1.7: 503 errors from the Presto coordinator are now handled gracefully and no longer cause an uncontrolled crash. The old eval() call in the hook is removed -- a subtle but real security improvement.

How has the Airflow CLI improved in version 1.7?

The CLI in Airflow 1.7 becomes a significantly more capable debugging and scripting tool. Three changes stand out for daily use in production environments.

First, airflow trigger_dag now accepts --conf as a JSON string, allowing you to pass a runtime configuration dictionary to a DAG run without writing Python:

airflow trigger_dag my_dag_id --conf '{"date": "2016-03-01", "env": "prod"}'

Second, the airflow test command now accepts a JSON-formatted dictionary of parameters, making it straightforward to unit-test templated tasks with specific macro values from the command line.

Third, airflow render shows rendered templates for a task instance directly from the CLI -- previously you had to run the task or inspect the web UI to see how Jinja macros resolved. This saves significant debug time when chasing template rendering issues.

A new airflow dag_state command rounds out the additions, letting you query the state of a specific DAG run from a script without parsing web UI output or hitting the database directly. Most teams running on-call rotations will find this one immediately useful in runbooks.

Two new Jinja macros are also exposed in the task context: yesterday_ds_nodash and tomorrow_ds_nodash, filling a gap that previously forced teams to derive them manually in templated fields.

What authentication and monitoring improvements ship in Airflow 1.7?

Airflow 1.7 broadens authentication options and gives operators better runtime visibility through a StatsD abstraction and SLA miss callbacks.

On the authentication side:

  • GitHub Enterprise OAuth -- teams on self-hosted GitHub can now authenticate Airflow users against their GitHub Enterprise instance without a custom plugin.
  • LDAP search_scope -- a new search_scope configuration variable gives LDAP admins control over whether user lookups are scoped to a single OU or the entire directory tree. This is critical in large enterprise directories where a broad search produces ambiguous results.
  • LDAP superuser and data-profiler roles -- LDAP group membership can now be mapped to Airflow's built-in superuser and data profiler permission levels.

On the observability side, a StatsD abstraction layer is introduced, giving Airflow a clean internal metrics interface. In practice this means you can route Airflow scheduler and task metrics to your existing StatsD-compatible pipeline (DataDog, Graphite, InfluxDB via telegraf, etc.) without patching the source. Configure it in airflow.cfg under the [scheduler] section.

The SLA miss callback is another production-ops win: DAGs can now specify a Python callable that fires whenever a task misses its SLA window. This matters if your SLA alerting previously relied on external polling of the Airflow REST API or scraping the web UI. The callback receives the DAG, task list, blocking task instances, and SLA miss records -- enough context to build a meaningful alert body without additional database queries.

Custom email backends round out this area, letting teams route alert emails through non-SMTP transports. SSL support for SMTP is also added for environments that require encrypted mail relay.

Frequently Asked Questions about Apache Airflow 1.7

Do I need to re-encrypt existing Variables and Connections after enabling the Fernet key in 1.7?
Existing plain-text values will continue to work after enabling the Fernet key -- Airflow detects whether a value is encrypted and reads accordingly. Only new writes will be encrypted automatically, so you will need to manually update any existing sensitive values through the UI or CLI if you want them encrypted at rest.

Is the BashOperator xcom_push rename in 1.7 a breaking change I need to handle?
Yes, if any of your DAG code or plugins reference the xcom_push member on BashOperator directly by name, you will need to update those references after upgrading. The rename resolves a collision with the parent BaseOperator class method of the same name, which caused unpredictable behavior in certain execution contexts.

What is the correct way to pass configuration to a triggered DAG run using the 1.7 CLI?
Use airflow trigger_dag your_dag_id --conf followed by a valid JSON string in single quotes, for example --conf with the value being a JSON object like date and env keys. The conf dictionary becomes available inside the DAG via the dag_run.conf attribute on the context object passed to your Python callables.

Does upgrading to Airflow 1.7 require any database migrations?
Yes. Run airflow upgradedb after deploying the new code. The release includes schema changes, including an increased password field length in the user model. Always back up your metadata database before running migrations.

How do I enable the new StatsD metrics integration in Airflow 1.7?
Set statsd_on = True under the scheduler section of airflow.cfg, and provide statsd_host, statsd_port, and statsd_prefix to point at your StatsD endpoint. No code changes are required in your DAGs -- the instrumentation is internal to the scheduler and task runner.

Can I use the new SLA miss callback alongside existing SLA miss email alerts?
Yes. The sla_miss_callback parameter on the DAG object is additive -- the built-in SLA miss email behavior is not removed. Define the callback as a Python function that accepts dag, task_list, blocking_task_list, slas, and blocking_tis, then pass it as sla_miss_callback when constructing your DAG object.

Releases In Branch 1.7

VersionRelease date
1.7.1.308 Mar 2017
(9 years ago)
1.7.1.220 May 2016
(10 years ago)
1.7.1.120 May 2016
(10 years ago)
1.7.119 May 2016
(10 years ago)
1.7.1rc105 Apr 2016
(10 years ago)
1.7.028 Mar 2016
(10 years ago)