Latest in branch 1.9
1.9.0-1
Released 10 Oct 2018
(7 years ago)
SoftwareApache Airflow
Version1.9
Environment
requirements
Python 2.7/3.4-3.6
MySQL ≥5.6
PostgreSQL ≥9.4
Linux/macOS
≥4GB RAM recommended
Initial release1.9.0
15 Dec 2017
(8 years ago)
Latest release1.9.0-1
10 Oct 2018
(7 years ago)
Limited Maintenance27 Aug 2018
(Ended 7 years, 9 months ago)
EOL/Terminated27 Aug 2018
(Ended 7 years, 9 months ago)
Release noteshttps://github.com/apache/airflow/releases/tag/1.9.0-1
Source codehttps://github.com/apache/airflow/tree/1.9.0-1
Downloadhttps://github.com/apache/airflow/releases/tag/1.9.0-1
Apache Airflow 1.9 ReleasesView full list

What Is New in Apache Airflow 1.9

Apache Airflow 1.9 delivers a focused set of breaking changes and improvements that modernize core integrations, overhaul the logging subsystem, and introduce Dask-based distributed execution. Teams upgrading from 1.8 will need to review SSH, S3, and logging configurations before going to production.

Category Highlights
New Features DaskExecutor for distributed task execution; SFTPOperator for secure file transfers between hosts
Improvements SSHHook rewritten with Paramiko for proper SSH client connections; S3Hook migrated to boto3; centralized Python-native logging configuration; flexible log filename templating via Jinja
Breaking Changes SSHExecuteOperator removed -- replaced by SSHOperator; S3Hook constructor parameter renamed from s3_conn_id to aws_conn_id; default S3 connection changed to aws_default; REMOTE_BASE_LOG_FOLDER config key removed; S3/GCS logging requires custom LOGGING_CONFIG
Deprecations XCom pickling deprecated in favor of JSON serialization; post_execute() hook signature extended to include result argument; Dataproc operator argument names google_cloud_conn_id and dataproc_cluster renamed; DataflowPipelineRunner removed from google-cloud-dataflow

What broke in SSHHook and how do you migrate your SSH workflows?

SSHHook in 1.9 is a full rewrite -- the old sub-process shell execution is gone and replaced with a proper Paramiko-based SSH client, which is incompatible with the previous constructor signature and operator interface.

The biggest operational impact is that SSHExecuteOperator has been removed entirely. You must replace it with the new SSHOperator. In practice, this is a class swap in your DAG files, not a logic rewrite, but it will break any DAG that imports the old class.

# Before (1.8 and earlier -- no longer works)
from airflow.operators import SSHExecuteOperator

run_cmd = SSHExecuteOperator(
    task_id='run_remote',
    ssh_hook=my_hook,
    bash_command='echo hello',
    dag=dag
)

# After (1.9+)
from airflow.operators.ssh_operator import SSHOperator

run_cmd = SSHOperator(
    task_id='run_remote',
    ssh_conn_id='my_ssh_connection',
    command='echo hello',
    dag=dag
)

The new release also ships SFTPOperator for moving files securely between servers -- something that previously required custom scripting or workarounds with BashOperator. Watch out for teams that had wrapped SSHExecuteOperator in utility functions; those wrappers need to be updated too.

How do you update S3 connections after the boto3 migration in Airflow 1.9?

S3Hook has been migrated from the legacy boto (boto2) library to boto3, and this changes both the connection parameter names and the return types of several methods.

The constructor parameter s3_conn_id is now aws_conn_id, and the default connection ID changed from s3_default to aws_default. Any DAG or operator that passes a connection ID by keyword name will break silently or raise an error at runtime. The following operators are all affected and require the parameter rename:

  • S3ToHiveTransfer
  • S3PrefixSensor
  • S3KeySensor
  • RedshiftToS3Transfer
# Before
from airflow.hooks.S3_hook import S3Hook

hook = S3Hook(s3_conn_id='my_s3_conn')

# After
hook = S3Hook(aws_conn_id='my_s3_conn')

This matters if you are relying on get_bucket(), get_key(), or get_wildcard_key() -- those now return boto3 objects (boto3.s3.Bucket and boto3.S3.Object) instead of boto2 objects. Any downstream code that unpacks or inspects these return values needs to be audited for boto3 API compatibility.

How does the new centralized logging configuration work in Airflow 1.9?

Airflow 1.9 replaces scattered inline logging with a single Python-based configuration file, giving you full control over handlers, formatters, and log routing through the standard Python logging module.

The new system is activated by pointing logging_config_class in airflow.cfg to a Python dict named LOGGING_CONFIG. The configuration file must be on the PYTHONPATH -- by default $AIRFLOW_HOME/config is loaded automatically, making it easy to drop in a custom file without any path gymnastics.

# airflow.cfg
[core]
logging_config_class = airflow_local_settings.LOGGING_CONFIG
# $AIRFLOW_HOME/config/airflow_local_settings.py
LOGGING_CONFIG = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'airflow.task': {
            'format': LOG_FORMAT,
        },
    },
    'handlers': {
        'file.task': {
            'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
            'formatter': 'airflow.task',
            'base_log_folder': BASE_LOG_FOLDER,
            'filename_template': FILENAME_TEMPLATE,
        },
    },
    'loggers': {
        'airflow.task': {
            'handlers': ['file.task'],
            'level': 'INFO',
            'propagate': False,
        },
    }
}

If you were using S3 or GCS remote logging, this update is breaking. The REMOTE_BASE_LOG_FOLDER config key has been removed. You must provide your own LOGGING_CONFIG that explicitly configures S3TaskHandler or GCSTaskHandler with the bucket path set directly in the handler definition. Teams running S3 logging in production should treat this as a mandatory migration step before upgrading.

The FILENAME_TEMPLATE variable now supports Jinja templating, so log paths like {dag_id}/{task_id}/{execution_date}/{try_number}.log can be customized per your storage or retention strategy. Most teams will find the default template sufficient, but this flexibility is useful when organizing logs by date partition for cost-optimized S3 lifecycle rules.

What is the new DaskExecutor and when should you use it?

The DaskExecutor is a new executor backend in Airflow 1.9 that routes task execution to a Dask Distributed cluster instead of local workers or Celery.

In practice, the DaskExecutor is best suited for teams that already operate Dask clusters for data processing workloads and want to avoid the operational overhead of running a separate Celery infrastructure with a message broker like Redis or RabbitMQ. It integrates directly with Dask's scheduler and worker model.

This matters if your pipeline tasks are compute-heavy Python workloads, since Dask workers can leverage distributed memory and parallel execution natively. Watch out for the fact that Dask is not a drop-in replacement for Celery in all environments -- Celery remains the more battle-tested choice for large-scale multi-tenant Airflow deployments with complex task queuing requirements.

What XCom and operator deprecations should you address before upgrading to Airflow 2.0?

Airflow 1.9 introduces several deprecations that are still functional but will be removed entirely in Airflow 2.0 -- addressing them now reduces the cost of the eventual 2.0 migration.

XCom serialization: XCom messages previously used Python pickle, which introduced remote code execution risks. JSON is now the target serialization format. Pickling still works by default, but you can disable it explicitly:

# airflow.cfg
[core]
enable_xcom_pickling = False

JSON is stricter than pickle -- raw bytes, datetime objects, and non-serializable Python types passed through XCom will fail. Any task that passes binary data via XCom needs to encode it first, for example using base64.

post_execute() signature change: The post_execute() hook now receives two arguments -- context and result. Custom operators that override post_execute() with only one argument will still function but will emit a DeprecationWarning. Update the method signature now to avoid noise in your logs and to be 2.0-ready.

Dataproc operators: The google_cloud_conn_id and dataproc_cluster argument names in all Dataproc operators have been renamed to gcp_conn_id and cluster_name respectively. Passing the old names will still work with a deprecation warning in 1.9, but they will not work in 2.0.

Frequently Asked Questions about Apache Airflow 1.9

Do I need to update all my DAGs when upgrading to Airflow 1.9?
You only need to update DAGs that use SSHExecuteOperator (replace with SSHOperator), pass s3_conn_id to S3Hook or related operators (rename to aws_conn_id), or call post_execute() with a single argument in custom operators.

What happens to my existing S3 remote logging setup when I upgrade to Airflow 1.9?
Your S3 remote logging will break because the REMOTE_BASE_LOG_FOLDER config key is removed. You must create a custom LOGGING_CONFIG in a file on your PYTHONPATH that configures S3TaskHandler directly with the bucket path, then set logging_config_class in airflow.cfg to point to that config dict.

Is it safe to disable XCom pickling in Airflow 1.9?
You can set enable_xcom_pickling = False in airflow.cfg to use JSON-only serialization, but you should first audit all your tasks for XCom usage -- any task passing raw bytes, datetime objects not serializable by JSON, or complex Python objects through XCom will fail at runtime after this change.

Does the DaskExecutor replace CeleryExecutor in Airflow 1.9?
No, DaskExecutor is an additional executor option alongside LocalExecutor, SequentialExecutor, and CeleryExecutor -- it does not replace Celery and is best suited for teams already running Dask Distributed clusters who want to avoid maintaining a separate Celery and broker stack.

How do I configure the new SSHOperator to connect to a remote host in Airflow 1.9?
Create an SSH connection in the Airflow UI or metadata database with conn_type set to SSH, then reference it in SSHOperator via the ssh_conn_id parameter along with the command to execute -- for example SSHOperator(task_id='run', ssh_conn_id='my_ssh', command='ls /tmp', dag=dag).

Will my custom logging handler still work after the Airflow 1.9 logging overhaul?
Yes, any Python logging handler from the standard library (such as RotatingFileHandler or TimedRotatingFileHandler) can be added to your custom LOGGING_CONFIG dict under the handlers key, as long as the config file is on the PYTHONPATH and logging_config_class is set correctly in airflow.cfg.

Releases In Branch 1.9

VersionRelease date
1.9.0-110 Oct 2018
(7 years ago)
1.9.015 Dec 2017
(8 years ago)
1.9.0rc815 Dec 2017
(8 years ago)
1.9.0rc715 Dec 2017
(8 years ago)
1.9.0rc611 Dec 2017
(8 years ago)
1.9.0rc507 Dec 2017
(8 years ago)
1.9.0rc427 Nov 2017
(8 years ago)
1.9.0rc318 Nov 2017
(8 years ago)
1.9.0rc213 Nov 2017
(8 years ago)
1.9.0rc106 Nov 2017
(8 years ago)
1.9.0alpha111 Oct 2017
(8 years ago)
1.9.0alpha002 Oct 2017
(8 years ago)