What Is New in Elasticsearch 1.4
Elasticsearch 1.4 delivers significant enhancements in aggregation performance, script handling, and cluster stability. This release focuses on making complex data operations faster and more reliable for large-scale deployments.
| Category | Key Changes |
|---|---|
| New Features | Cardinality aggregation, IP Range aggregation, Doc values by default |
| Performance | Faster aggregations, Reduced memory usage for field data |
| Scripting | New native scripting languages (JavaScript, Python), Sandboxing |
| Mapping & Search | Indexed scripts, Improved GeoJSON support |
| Resilience | Improved split-brain handling, Better recovery mechanisms |
How did aggregations get faster in 1.4?
The cardinality aggregation is the headline feature, providing a fast, approximate count of distinct values. It uses the HyperLogLog++ algorithm, which is far more memory-efficient than using a terms aggregation for the same task.
Doc values are now enabled by default for most numeric, date, and boolean fields. This shifts aggregation and sorting from using in-memory fielddata to operating directly on disk-based structures, drastically reducing heap memory pressure.
In practice, this means your cluster can handle larger datasets and more concurrent aggregations without running into out-of-memory issues. The performance boost for sorting and scripting on these fields is also substantial.
What scripting improvements were introduced?
Elasticsearch 1.4 expanded its scripting support beyond the native MVEL and Groovy options. You can now write scripts in JavaScript and Python directly within your queries and aggregations, which is great for developers already familiar with those languages.
A major step forward is the introduction of script sandboxing. This allows administrators to restrict what classes and methods a script can access, making it much safer to run user-defined scripts in a multi-tenant environment.
The new indexed script type lets you store and manage scripts directly within the Elasticsearch cluster. Instead of sending the entire script string with every request, you just reference its ID, which reduces network overhead and makes query templates cleaner.
How is cluster stability better in this version?
This release tackles the infamous split-brain scenario with improved discovery and master election logic. The settings for controlling the minimum number of master-eligible nodes are now more robust, helping to prevent data corruption from multiple masters.
Recovery processes are more resilient to network hiccups. If a node gets disconnected during recovery, the process can now resume more intelligently instead of starting over from scratch, which saves a ton of time and network bandwidth on large indices.
These changes matter because they reduce operational headaches. You'll spend less time manually recovering clusters and have more confidence in the system's ability to handle itself during network partitions.
FAQ
Should I enable doc values for all my fields?
For most numeric, date, and boolean fields, yes. Doc values are now the default because they offload memory pressure from the heap to the OS filesystem cache. For analyzed string fields you still need fielddata for aggregations, so the default remains false for those.
Is the cardinality aggregation accurate?
It's an approximate count, not an exact one. The trade-off for its speed and low memory footprint is a small margin of error (typically under 5%). For use cases like unique visitor counts, this is perfectly acceptable and far more efficient.
How do I use the new Python scripting?
First, ensure the lang-python plugin is installed. Then, you can specify the language in your script object: "lang": "python". The syntax is native Python, so you can use standard libraries and constructs within the sandbox's limits.
What's the main benefit of indexed scripts?
They centralize script management and improve performance. You store the script once on the cluster with a PUT request to _scripts/{lang}/{id}, and then reference it by ID in your searches. This eliminates sending large script strings repeatedly.
Are there any breaking changes in 1.4?
The switch to enabling doc values by default is the most noticeable change. If you were relying on the old default behavior, you may need to explicitly set doc_values: false in your mapping. Always check the deprecation logs during testing.