What Is New in Elasticsearch 1.3
| Category | Key Updates |
|---|---|
| New Features | Pipeline Aggregations, Doc Values by default, Groovy as default scripting language |
| Improvements | Query and indexing performance, Recovery prioritization, Snapshot/Restore process |
| Bug Fixes | Numerous fixes across aggregation, mapping, and node discovery |
| Deprecations | Deprecated features in preparation for future versions |
How does pipeline aggregation change data analysis?
Pipeline aggregations are the headline feature in 1.3, allowing you to compute new metrics from the output of other aggregations. This lets you perform complex analysis like moving averages or derivatives directly within your aggregation requests. In practice, this means you can offload more computational work to Elasticsearch instead of processing results in your application code.
You can use various pipeline types, such as derivative to calculate the derivative of a metric, or cumulative_sum to compute a cumulative sum. This opens up new possibilities for time series analysis without needing to export your data.
Why are Doc Values now enabled by default?
Doc Values are now enabled by default for better performance and memory efficiency. This change means that for most field types, Elasticsearch will use a columnar data structure for sorting, aggregations, and scripting instead of relying on fielddata.
The switch to Doc Values helps prevent those nasty heap memory issues that could occur with fielddata. Since Doc Values operate off-heap, they're more memory efficient and provide better stability for aggregation-heavy workloads.
What scripting changes should I know about?
Groovy has become the default scripting language, replacing MVEL. This change brings better performance and security characteristics to your scripts. If you were using MVEL scripts, you'll need to update them to Groovy syntax.
The update also includes sandboxing improvements that make scripting more secure out of the box. You can still use other languages like JavaScript, but Groovy is now the recommended choice for new development.
What performance improvements stand out?
Recovery operations now prioritize primary shards, which means your cluster will become available faster after a restart or node failure. This matters because it reduces downtime and gets your cluster back to serving requests more quickly.
The snapshot and restore process received significant optimizations, making backups faster and more reliable. Indexing and query performance also saw various optimizations throughout the release.
FAQ
Do I need to change my queries because of the Doc Values change?
No, your existing queries will work exactly the same. The change is transparent to application code - it's an internal implementation change that improves memory usage and performance for sorting and aggregations.
How do I migrate my MVEL scripts to Groovy?
The syntax is quite similar, but you'll need to update your scripts to use Groovy's specific syntax. Test your scripts thoroughly as there might be subtle differences in how certain operations work between the two languages.
Are pipeline aggregations expensive performance-wise?
They add computational overhead since they process results from other aggregations. For large datasets, test the performance impact, but for most use cases they're efficient enough to run in production.
Will this version break my existing cluster?
It shouldn't break anything, but always test upgrades in a staging environment first. Pay attention to the deprecated features list and plan to update those areas of your codebase.
What happens if I don't want to use Groovy as my scripting language?
You can still configure other languages like JavaScript, but Groovy is now the default. The change was made for performance and security reasons, so we recommend giving Groovy a try for new scripts.