Apache Hudi is a fast growing diverse community of people and organizations from all around the globe. The following is a small sample of companies that have adopted or contributed to the Apache Hudi community! Join us on slack, or come to one of our virtual community events.
37 Interactive Entertainment
37 Interactive Entertainment is a global Top20 listed game company, and a leading company on A-shares market of China. Apache Hudi is integrated into our Data Middle Platform offering real-time data warehouse and solving the problem of frequent changes of data. Meanwhile, we build a set of data access standards based on Hudi, which provides a guarantee for massive data queries in game operation scenarios.
Alibaba Cloud provides cloud computing services to online businesses and Alibaba's own e-commerce ecosystem, Apache Hudi is integrated into Alibaba Cloud Data Lake Analytics offering real-time analysis on hudi dataset.
Amazon Transportation service uses Apache Hudi for the backbone of their package delivery system, powering petabyte-scale near real time analytics.
Amazon Web Services
Amazon Web Services is the World's leading cloud services provider. Apache Hudi is pre-installed with the AWS Elastic Map Reduce offering, providing means for AWS users to perform record-level updates/deletes and manage storage efficiently.
ByteDance uses Apache Hudi to power their Exabyte scale TikTok #ForYouPage realtime recommendation engine.
Clinbrain is the leader of big data platform and usage in medical industry. We have built 200 medical big data centers by integrating Hudi Data Lake solution in numerous hospitals. Hudi provides the ability to upsert and delete on hdfs, at the same time, it can make the fresh data-stream up-to-date efficiently in hadoop system with the hudi incremental view.
DiDi is the World‘s Leading Transportation Platform. Based on the Hadoop ecosystem, we built a new generation of big data platform based on Apache Hudi, which provides record-level updates/deletes as well as streaming and batch integrated data processing.
Disney shared how they migrated CDC data to Apache Hudi to power a real-time ads platform for their streaming service.
EMIS Health is the largest provider of Primary Care IT software in the UK with datasets including more than 500Bn healthcare records. HUDI is used to manage their analytics dataset in production and keeping them up-to-date with their upstream source. Presto is being used to query the data written in HUDI format.
H3C Digital Platform
H3C digital platform provides the whole process capability of data collection, storage, calculation and governance, and enables the construction of data center and data governance ability for medical, smart park, smart city and other industries; Apache Hudi is integrated in the digital platform to meet the real-time update needs of massive data
Kyligence is the leading Big Data analytics platform company. We’ve built end to end solutions for various Global Fortune 500 companies in US and China. We adopted Apache Hudi in our Cloud solution on AWS in 2019. With the help of Hudi, we are able to process upserts and deletes easily and we use incremental views to build efficient data pipelines in AWS. The Hudi datasets can also be integrated to Kyligence Cloud directly for high concurrent OLAP access.
Lingyue-digital Corporation belongs to BMW Group. Apache Hudi is used to perform ingest MySQL and PostgreSQL change data capture. We build up upsert scenarios on Hadoop and spark.
Hopsworks 1.x series supports Apache Hudi feature groups, to enable upserts and time travel.
SF-Express is the leading logistics service provider in China. HUDI is used to build a real-time data warehouse, providing real-time computing solutions with higher efficiency and lower cost for our business.
Tathastu.ai offers the largest AI/ML playground of consumer data for data scientists, AI experts and technologists to build upon. They have built a CDC pipeline using Apache Hudi and Debezium. Data from Hudi datasets is being queried using Hive, Presto and Spark.
EMR from Tencent Cloud has integrated Hudi as one of its BigData components since V2.2.0. Using Hudi, the end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro.
Apache Hudi was originally developed at Uber, to achieve low latency database ingestion, with high efficiency. It has been in production since Aug 2016, powering the massive 100PB data lake, including highly business critical tables like core trips,riders,partners. It also powers several incremental Hive ETL pipelines and being currently integrated into Uber's data dispersal system.
At Udemy, Apache Hudi on AWS EMR is used to perform ingest MySQL change data capture.
Walmart chose Apache Hudi to manage their data lake of store transactions.
Yields.io is the first FinTech platform that uses AI for automated model validation and real-time monitoring on an enterprise-wide scale. Their data lake is managed by Hudi. They are also actively building their infrastructure for incremental, cross language/platform machine learning using Hudi.
Using Hudi at Yotpo for several usages. Firstly, integrated Hudi as a writer in their open source ETL framework, Metorikku and using as an output writer for a CDC pipeline, with events that are being generated from a database binlog streams to Kafka and then are written to S3.
At Zendesk, Apache Hudi is adopted for building Data Lake on AWS.