Apache Hudi | An Open Source Data Lake Platform

What is Hudi

Apache Hudi is an open data lakehouse platform, built on a high-performance open table format to bring database functionality to your data lakes.
Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.

Hudi Features

Mutability support for all workload shapes & sizes

Quickly update & delete data with fast, pluggable indexing. This includes database CDC and high-scale streaming data, with best-in-class support for out-of-order records, bursty traffic & data deduplication.

Unlock 10x efficiency by incrementally processing new data

Replace old-school batch pipelines with incremental streaming on your data lake. Experience faster ingestion and lower processing times for your data pipelines.

ACID Transactional guarantees for your data lake

Atomic writes, with relational/streaming data consistency models, snapshot isolation and non-blocking concurrency controls tailored for longer-running lake transactions.

Analyze historical data with time travel

Query historical data with the ability to roll back to a table version; debug data versions to understand what changed over time; audit data changes by viewing the commit history.

Interoperable multi-cloud ecosystem support

Built on open data formats with extensive ecosystem support across cloud vendor ecosystem, with plug-and-play options for popular data sources & query engines.

Automatic table services for a high-performance lakehouse

Fully automated table services that continuously schedule & orchestrate clustering, compaction, cleaning, file sizing & indexing to ensure tables are always optimized.

Open Data Lakehouse platform to get you going faster

Effortlessly build your lakehouse with built-in tools for auto ingestion from services like Debezium and Kafka and auto catalog sync to major cloud engines & more.

Query acceleration through multi-modal indexes.

Experience faster write transactions on huge/wide tables & faster query performance with first-of-its kind multi-modal indexing subsystem.

Resilient Pipelines with schema evolution & enforcement

Easily change the current schema of a Hudi table to adapt to the data that is changing over time and ensure pipeline resilience by failing fast and avoiding data corruption.

Why Hudi

The most innovative and completely open data lakehouse platform in the industry!

Trusted Platform

Battle tested and proven in production in some of the largest data lakes on the planet.

Open Source

Hudi is a thriving & growing community that is built with contributions from people around the globe.

High Performance

Hudi's storage format is purpose-built to continuously deliver performance as data scales.

Data streams

Take advantage of built-in CDC sources and tools for streaming ingestion.

Hudi Blogs

Building a RAG-based AI Recommender (Part 1/2)

Shiyan Xu

July 10, 2025

Building a RAG-based AI Recommender (Part 1/2)

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

Amit Maindola and Srinivas Kandi, Hossein Johari, Ahmad Rawashdeh, Lei Meng

July 7, 2025

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

Why Uber Built Hudi: The Strategic Decision Behind a Custom Table Format

ThamizhElango Natarajan

July 3, 2025

Join our Community

Get technical help, influence the product roadmap & see what’s new with Hudi!

GitHub

Join community

Slack

Join community

Twitter

Join community

Youtube

Mailing

What is Hudi

Hudi Features

Mutability support for all workload shapes & sizes

Unlock 10x efficiency by incrementally processing new data

ACID Transactional guarantees for your data lake

Analyze historical data with time travel

Interoperable multi-cloud ecosystem support

Automatic table services for a high-performance lakehouse

Open Data Lakehouse platform to get you going faster

Query acceleration through multi-modal indexes.

Resilient Pipelines with schema evolution & enforcement

Why Hudi

Trusted Platform

Open Source

High Performance

Data streams

Hudi Blogs

Building a RAG-based AI Recommender (Part 1/2)

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

Why Uber Built Hudi: The Strategic Decision Behind a Custom Table Format

Lakehouse Architecture - Apache Hudi and Apache Iceberg

Scaling Complex Data Workflows at Uber Using Apache Hudi

Apache Hudi does XYZ (1/10): File pruning with multi-modal index

Exploring Apache Hudi’s New Log-Structured Merge (LSM) Timeline

How Doris + Hudi Turned the Impossible Into the Everyday

Why Walmart Chose Apache Hudi for Their Lakehouse

From Swamp to Stream: How Apache Hudi Transforms the Modern Data Lake

Join our Community