Latest development in branch 1.8
1.8.0-rc.1
Released 20 Dec 2017
(8 years ago)
SoftwareApache Airflow
Version1.8
Environment
requirements
Python 2.7/3.4-3.6
MySQL ≥5.6
PostgreSQL ≥9.4
SQLite (dev)
Linux/macOS
Broker (RabbitMQ/Redis) for CeleryExecutor
Initial release1.8.0
19 Mar 2017
(9 years ago)
Latest release1.8.0-rc.1
20 Dec 2017
(8 years ago)
Limited Maintenance03 Jan 2018
(Ended 8 years, 4 months ago)
EOL/Terminated03 Jan 2018
(Ended 8 years, 4 months ago)
Release noteshttps://github.com/apache/airflow/releases/tag/1.8.0-rc.1
Source codehttps://github.com/apache/airflow/tree/1.8.0-rc.1
Downloadhttps://github.com/apache/airflow/releases/tag/1.8.0-rc.1
Apache Airflow 1.8 ReleasesView full list

What Is New in Apache Airflow 1.8

Apache Airflow 1.8 brings meaningful changes to the scheduler, pool management, DAG lifecycle defaults, and Google Cloud integrations. Several configuration keys have been renamed and new behavioral defaults introduced, making this a release that demands careful review before upgrading in production.

Category Highlights
New Features Per-DAG scheduler subprocess with dedicated log files; catchup_by_default config option; dag_dir_list_interval and min_file_process_interval tuning knobs
Improvements Stricter and more robust pool slot enforcement; unified Google Cloud connection type; time-based scheduler run duration replacing loop-count logic
Bug Fixes Pool over-subscription fixed; dynamic start_date scheduling made more predictable
Breaking Changes Database schema migration required; new DAGs paused by default; Google Cloud P12 key files no longer supported; systemd unit files updated
Deprecations Top-level operator imports (e.g. airflow.operators.PigOperator) removed; operators no longer accept arbitrary *args or **kwargs

What Database and System Changes Are Required Before Upgrading to Airflow 1.8?

Airflow 1.8 requires a database schema migration and updated systemd unit files before the new version can run correctly.

Before starting any Airflow 1.8 instance, shut down all Airflow processes, take a full database backup, and run:

airflow upgradedb

If you manage Airflow with systemd, the unit files have also changed in this release. Deploy the updated unit files from the Airflow distribution before restarting services. Note that the webserver process does not detach cleanly in 1.8 -- this is a known issue flagged for a future fix.

In practice, teams that skip the schema upgrade will encounter startup failures. The database migration is non-negotiable and should be the very first step in your upgrade runbook.

How Did the Airflow 1.8 Scheduler Architecture Change and What New Options Should You Configure?

The 1.8 scheduler now processes each DAG in its own subprocess, improving fault isolation and making scheduler log management a new operational concern.

Each DAG file gets its own scheduler log written to child_process_log_directory, which defaults to <AIRFLOW_HOME>/scheduler/latest. You will need a log rotation policy for this directory -- these files are not cleaned up automatically and will grow over time.

Several scheduler configuration keys changed behavior or were introduced fresh:

  • run_duration -- replaces the old loop-count model. Defaults to -1 (run continuously). The scheduler now terminates after this many seconds rather than after a fixed number of loops.
  • num_runs -- now means the number of scheduling attempts per DAG file within run_duration, not total scheduler loops. Defaults to -1 (try indefinitely).
  • min_file_process_interval -- controls how frequently updated DAG files are re-read from disk.
  • dag_dir_list_interval -- controls how often the scheduler scans the DAG folder for new files. Decrease this during development if newly added DAGs are not appearing promptly.
  • catchup_by_default -- new global flag. When set to False, the scheduler only runs the most recent interval for a DAG rather than backfilling all missed runs. Can also be set per-DAG with catchup = False.

Watch out for the known issue with num_runs = -1: some users have reported spurious task-parsing errors under this default. If you hit this, change the default to None in cli.py as a temporary workaround while the fix is tracked upstream.

How Does Airflow 1.8 Change Default DAG Behavior at Creation and Scheduling?

Two behavioral defaults shift significantly in 1.8: new DAGs are now paused on creation, and the scheduler is stricter about dynamic start_date values.

Previously, a freshly imported DAG would begin scheduling immediately. In 1.8, all new DAGs are paused at creation. This is the safer default for production environments where you want explicit control over when a DAG starts running. To restore the old behavior, add the following to airflow.cfg:

[core]
dags_are_paused_at_creation = False

The scheduler is also less tolerant of dynamic start_date values such as start_date = datetime.now(). This pattern was never recommended, but 1.7 was forgiving about it. In 1.8, DAGs using dynamic start dates may silently fail to schedule. The fix is to switch to a fixed start_date and rename the DAG to clear its historical schedule state -- renaming is required because the old schedule metadata will otherwise interfere.

This matters if you have any DAGs -- often quick prototypes or utility DAGs -- where someone took a shortcut with datetime.now(). Audit your DAG folder before upgrading.

What Happened to Pool Enforcement and Why Were Tasks Blocked After the Upgrade?

Airflow 1.7.1 had a bug that allowed more pool slots to be consumed than the pool actually contained; 1.8.0 enforces pool limits correctly, which can cause tasks to stall after upgrading.

If you were unknowingly over-subscribing a pool -- which 1.7.1 silently permitted -- those queued tasks will now correctly wait for a free slot. This can look like tasks are stuck even though all their upstream dependencies are satisfied. There are two workarounds:

  • Temporarily increase the pool's slot count above the number of queued tasks until the backlog clears.
  • Move the affected tasks to a new, larger pool.

In practice, most teams will not hit this unless they deliberately undersized pools and relied on the lenient behavior. Check your pool utilization metrics before the upgrade to avoid a surprise freeze on day one.

What Breaking Changes Affect Google Cloud Operators and Hook Authentication in Airflow 1.8?

All Google Cloud Operators and Hooks are unified in 1.8 to use a single client library and a single connection type.

If you have existing Google Cloud connections in the Airflow metadata database, verify that each connection has its type set to Google Cloud Platform. Connections configured with the old type may fail to authenticate after the upgrade.

More critically, P12 key file authentication is no longer supported. Only JSON service account key files are accepted. If your connections use P12 keys, you must generate new JSON keys in your Google Cloud project and update every affected connection before upgrading:

# Old: P12 key file (no longer supported in 1.8)
# New: JSON key file required
# In Airflow UI: Admin -> Connections -> Edit each GCP connection
# Set "Keyfile JSON" field with the contents of your .json service account key

This matters if you have automated pipelines loading BigQuery, writing to GCS, or triggering Dataflow jobs -- any of those will break at runtime if the key format is not migrated beforehand.

Which Operator Import Patterns Were Deprecated and Will Break in Airflow 2.0?

Airflow 1.8 removes support for importing operators from the top-level airflow.operators namespace and stops accepting arbitrary constructor arguments in operators.

The top-level shortcut imports that many DAGs relied on are now gone. Update all DAG files to use explicit submodule imports:

# Broken in 1.8 -- do not use
from airflow.operators import PigOperator

# Correct -- import from the specific submodule
from airflow.operators.pig_operator import PigOperator

Additionally, Operator.__init__() previously swallowed any extra positional or keyword arguments without error. In 1.8, passing unrecognized arguments to an operator constructor raises an exception. This catches typos and misconfigured operators that were silently ignored before.

Most teams will find these import issues quickly via a syntax check or DAG parse run. Running python your_dag.py for each file is the fastest way to surface broken imports before you push to production.

Frequently Asked Questions -- Apache Airflow 1.8

Do I need to run airflow upgradedb before starting Airflow 1.8?
Yes, the database schema changed in 1.8 and the upgradedb command must be run after shutting down all Airflow processes and taking a full database backup, otherwise the scheduler and webserver will fail to start.

Why are my new DAGs not running automatically after upgrading to Airflow 1.8?
Airflow 1.8 changed the default so all new DAGs are created in a paused state. To restore the previous behavior where DAGs start scheduling immediately, set dags_are_paused_at_creation = False under the core section of airflow.cfg.

Can I still use P12 key files for Google Cloud connections in Airflow 1.8?
No, P12 key file authentication has been removed. You must generate a JSON service account key file in Google Cloud and update every affected connection in the Airflow metadata database before upgrading.

Why are tasks stuck in the queue after the upgrade even though their dependencies are all met?
Airflow 1.7.1 had a bug allowing pools to be over-subscribed; 1.8 enforces slot limits correctly, so tasks that previously ran over the pool capacity are now correctly queued. The fastest workaround is to temporarily raise the slot count on the affected pool until the backlog clears.

What is the simplest way to find DAGs with broken operator imports after upgrading to Airflow 1.8?
Run each DAG file directly with the Python interpreter, for example python my_dag.py, to trigger import errors immediately without needing to start the scheduler or webserver.

What changed about the num_runs scheduler option in Airflow 1.8?
The meaning of num_runs changed: it now controls how many times the scheduler attempts to process each DAG file within a given run_duration window rather than how many total scheduler loops to execute. A known issue exists where the default value of negative 1 can produce spurious parsing errors, and setting it to None in cli.py is the documented workaround.

Releases In Branch 1.8

VersionRelease date
1.8.0-rc.120 Dec 2017
(8 years ago)
1.8.207 Aug 2017
(8 years ago)
1.8.2rc407 Aug 2017
(8 years ago)
1.8.2rc301 Aug 2017
(8 years ago)
1.8.2rc222 Jun 2017
(8 years ago)
1.8.2rc113 Jun 2017
(8 years ago)
1.8.109 May 2017
(9 years ago)
1.8.019 Mar 2017
(9 years ago)