Skip to main content
bg-image

Apache HudiTM brings

to data lakes
hudi-logo

What is Hudi 

Apache Hudi is an open data lakehouse platform, built on a high-performance open table format to bring database functionality to your data lakes. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics.
Hudi Data LakeHudi Data Lake

Integrations 

Data Stream
Apache Kafka
Apache Kafka
Rocket MQ
Rocket MQ
Pulsar
Pulsar
Databases
Postgress
Postgress
Cassandra
Cassandra
MongoDB
MongoDB
MySQL
MySQL
Cloud Storage
Parquet
Parquet
Hudi
Hudi
Arvo
Arvo
JSON files
JSON files
CSV files
CSV files
Oracle Cloud
Oracle Cloud
Lakehouse Platform
Hadoop
Hadoop
S3
S3
Google Cloud Storage
Google Cloud Storage
Azure Data Lake
Azure Data Lake
Alibaba Cloud
Alibaba Cloud
Minio
Minio
Alluxio
Alluxio
Native Uploads
Native Uploads
Metastore
Glue
Glue
Hive
Hive
DataHub
DataHub
BI Analytics
Redshift
Redshift
BigQuery
BigQuery
Apache Doris
Apache Doris
Interactive Analytics
Trino
Trino
Athena
Athena
Presto
Presto
Batch Analyitics
Hive
Hive
Apache Spark
Apache Spark
Apache Impala
Apache Impala
Stream Analytics
Apache Spark
Apache Spark
StarRocks
StarRocks
Flink
Flink
Orchestration
dbt
dbt
Apache Airflow
Apache Airflow

Hudi Features 

Why Hudi 

The most innovative and completely open data lakehouse platform in the industry!

Trusted Platform

Battle tested and proven in production in some of the largest data lakes on the planet.

Open Source

Hudi is a thriving & growing community that is built with contributions from people around the globe.

High Performance

Hudi's storage format is purpose-built to continuously deliver performance as data scales.

Data streams

Take advantage of built-in CDC sources and tools for streaming ingestion.

Join our Community 

Get technical help, influence the product roadmap & see what’s new with Hudi!

Youtube

Linkedin

GitHub

Slack

Mailing

X