What Is New in Elasticsearch 1.2
Elasticsearch 1.2 delivers significant enhancements in aggregation performance, introduces powerful new scripting capabilities, and adds critical features for production environments. This release focuses on making complex data analysis faster and more flexible while hardening the system's reliability.
| Category | Key Changes |
|---|---|
| New Features | Pipeline Aggregations, Scripting Language Support, Snapshot/Restore API |
| Improvements | Aggregation Performance, Query Execution, Compression |
| Bug Fixes | Various fixes across indexing, search, and cluster management |
| Deprecations | Deprecated APIs and features in preparation for future versions |
How did aggregations get faster in 1.2?
The aggregation framework received major optimizations, making it significantly faster and more memory-efficient. These improvements are most noticeable when running complex nested aggregations on large datasets.
Internally, the execution model was refined to reduce overhead. This means you can now run the same aggregation queries as before but with lower latency and less heap memory consumption, which directly impacts cluster stability at scale.
What new scripting languages are supported?
Elasticsearch 1.2 expanded its scripting support beyond the native MVEL to include JavaScript and Python. This gives developers more flexibility to write custom scripts for scoring, filtering, or updating documents using a language they're already comfortable with.
In practice, you can now use expressions like lang: 'js' or lang: 'python' directly in your queries. The scripting module is pluggable, so you can even register your own scripting engines if the built-in options don't meet your needs.
What are pipeline aggregations?
Pipeline aggregations are a powerful new feature that allow you to perform operations on the output of other aggregations. Instead of just calculating metrics on raw document data, you can now chain aggregations together to create complex data processing pipelines.
For example, you can calculate the derivative of a time series or a moving average. This eliminates the need to post-process aggregation results in your application, keeping the entire analytical workflow within Elasticsearch where it's more efficient.
How does the new snapshot/restore API work?
The snapshot and restore functionality provides a native way to backup and recover your indices. You can now take snapshots of your cluster's state and data, storing them in shared repositories like AWS S3 or HDFS.
This matters because it gives you a production-grade disaster recovery solution without relying on external tools. You can perform incremental snapshots and restore to different cluster configurations, which is essential for any serious deployment.
What compression improvements were made?
Elasticsearch 1.2 introduced the DEFLATE compression algorithm as an alternative to LZO for compressing stored fields. DEFLATE typically provides better compression ratios, which can significantly reduce disk usage for indices with large stored fields.
You can configure this per-index by setting index.codec to deflate in your index settings. The trade-off is slightly higher CPU usage during indexing, but the storage savings are often worth it for read-heavy workloads.
FAQ
Can I use Python scripts directly in my queries?
Yes, Elasticsearch 1.2 adds native support for Python scripting. You can specify lang: 'python' in your script fields, filters, or aggregations to execute Python code directly within the cluster.
Do pipeline aggregations work with all aggregation types?
Pipeline aggregations specifically operate on the output of other aggregations, not raw documents. They work with metric aggregations like avg, sum, and min, allowing you to perform secondary calculations on those results.
Is the snapshot/restore API suitable for large clusters?
Yes, it's designed for production use with support for incremental snapshots and resume capabilities. For very large clusters, you can snapshot specific indices to manage the process efficiently.
How much faster are aggregations in 1.2 compared to previous versions?
While performance gains depend on specific use cases, some complex nested aggregations show 20-30% improvements in both speed and memory usage due to the optimized execution model.
What happens to my existing MVEL scripts after upgrading to 1.2?
MVEL scripts continue to work unchanged. The new scripting languages are additive options - your existing scripts will execute without modification using the MVEL engine as before.