Skip to main content

Apache Hudi 2024: A Year In Review

Shiyan Xu
6 min read
drawing

As we wrap up another remarkable year for Apache Hudi, I am thrilled to reflect on the tremendous achievements and milestones that have defined 2024. This year has been particularly special as we achieved several significant milestones, including the landmark release of Hudi 1.0, the publication of comprehensive books, and the introduction of new tools that expand Hudi's ecosystem.

Community Growth and Engagement

The Apache Hudi community continued its impressive growth trajectory in 2024. The number of new PRs has remained stable, indicating a consistent level of development activities:

drawing

Our community presence expanded significantly across various platforms:

  • The community grew to over 10,500 followers on LinkedIn
  • Added 8,755 new followers in the last 365 days
  • Generated 441,402 content impressions
  • Received 6,555 reactions and 493 comments across platforms
  • Our Slack community remained vibrant with rich technical discussions and knowledge sharing

Major Milestones

Apache Hudi 1.0 Release

2024 marked a historic moment with the release of Apache Hudi 1.0, representing a major evolution in data lakehouse technology. This release brought several groundbreaking features:

  • Secondary Indexing: First of its kind in lakehouses, enabling database-like query acceleration with demonstrated 95% latency reduction on 10TB TPC-DS for low-moderate selectivity queries
  • Logical Partitioning via Expression Indexes: Introducing PostgreSQL-style expression indexes for more efficient partition management
  • Partial Updates: Achieving 2.6x performance improvement and 85% reduction in bytes written for update-heavy workloads
  • Non-blocking Concurrency Control (NBCC): An industry-first feature allowing simultaneous writing from multiple writers
  • Merge Modes: First-class support for both commit_time_ordering and event_time_ordering
  • LSM Timeline: Revamped timeline storage as a scalable LSM tree for extended table history retention
  • TrueTime: Strengthened time semantics ensuring forward-moving clocks in distributed processes

Please check out the announcement blog.

Launch of Hudi-rs

A significant expansion of the Hudi ecosystem occurred with the release of Hudi-rs, the native Rust implementation for Apache Hudi with Python API bindings. This new project enables:

  • Reading Hudi Tables without Spark or JVM dependencies
  • Integration with Apache Arrow for enhanced compatibility
  • Support for Copy-on-Write (CoW) table snapshots and time-travel reads
  • Cloud storage support across AWS, Azure, and GCP
  • Native integration with Apache DataFusion, Ray, Daft, etc

Published Books and Educational Content

2024 saw the release of two comprehensive guides to Apache Hudi:

  • "Apache Hudi: The Definitive Guide" (O'Reilly) - Released in early access, free copy available, providing comprehensive coverage of:
    • Distributed query engines
    • Snapshot and time travel queries
    • Incremental queries
    • Change-data-capture modes
    • End-to-end ingestion with Hudi Streamer
drawing
  • "Apache Hudi: From Zero to One" - A 10-part blog series turned into an ebook, offering deep technical insights into Hudi's architecture and capabilities, covering:
    • Storage format and operations
    • Read and write flows
    • Table services and indexing
    • Incremental processing
    • Hudi 1.0 features
drawing

Community Events and Sharing

The Apache Hudi community maintained a strong presence at major industry events throughout 2024:

drawing
  • Databricks' Data+AI Summit - Presenting Apache Hudi's role in the lakehouse ecosystem and its interoperability with other table formats through XTable, an open-source project enabling seamless conversion between Hudi, Delta Lake, and Iceberg
  • Confluent's Current 2024 - Demonstrating Hudi's powerful CDC capabilities with Apache Flink, showcasing real-time data pipelines and the innovative Non-Blocking Concurrency Control (NBCC) for high-volume streaming workloads
  • Trino Fest 2024 - Showcasing Hudi connector's evolution and innovations in Trino, including multi-modal indexing capabilities and the roadmap for enhanced query performance through Alluxio-powered caching and expanded DDL/DML support
  • Bangalore Lakehouse Days - Deep dive into Apache Hudi 1.0's groundbreaking features including LSM-based timeline, functional indexes, and non-blocking concurrency control, demonstrating Hudi's continued innovation in the lakehouse space

Additionally, the community launched several new initiatives to foster learning and knowledge sharing:

Lakehouse Chronicles with Apache Hudi

A new community series with 4 episodes released.

drawing

Hudi Newsletter

9 editions published, keeping the community informed about latest developments.

drawing

Community Syncs

Featured 8 user stories from major organizations including Amazon, Peloton, Shopee and Uber.

drawing

Notable User Stories and Technical Content

Throughout 2024, several organizations shared their Hudi implementation experiences:

Looking Ahead to 2025

As we look forward to 2025, Apache Hudi's roadmap includes several exciting developments:

  • Enhanced core engine with modernized write paths and advanced indexing (bitmap, vector search)
  • Multi-modal data support with improved storage engine APIs and cross-format interoperability
  • Enterprise-grade features including multi-table transactions and advanced caching
  • Robust platform services with Data Lakehouse Management System (DLMS) components
  • Broader adoption of Hudi-rs across the ecosystem
  • Continued focus on stability and seamless migration path for the community

These initiatives reflect our commitment to advancing data lakehouse technology while ensuring reliability and user experience.

Get Involved

Join our thriving community:

The success of Apache Hudi in 2024 wouldn't have been possible without our dedicated community of contributors, users, and supporters. As we celebrate these achievements, we look forward to another year of innovation and growth in 2025.