Talks & Presentations
"Hoodie: Incremental processing on Hadoop at Uber" - By Vinoth Chandar & Prasanna Rajaperumal Mar 2017, Strata + Hadoop World, San Jose, CA
"Hoodie: An Open Source Incremental Processing Framework From Uber" - By Vinoth Chandar. Apr 2017, DataEngConf, San Francisco, CA Slides Video
"Incremental Processing on Large Analytical Datasets" - By Prasanna Rajaperumal June 2017, Spark Summit 2017, San Francisco, CA. Slides Video
"Hudi: Unifying storage and serving for batch and near-real-time analytics" - By Nishith Agarwal & Balaji Vardarajan September 2018, Strata Data Conference, New York, NY
"Hudi: Large-Scale, Near Real-Time Pipelines at Uber" - By Vinoth Chandar & Nishith Agarwal October 2018, Spark+AI Summit Europe, London, UK
"Powering Uber's global network analytics pipelines in real-time with Apache Hudi" - By Ethan Guo & Nishith Agarwal, April 2019, Data Council SF19, San Francisco, CA.
"Building highly efficient data lakes using Apache Hudi (Incubating)" - By Vinoth Chandar June 2019, SF Big Analytics Meetup, San Mateo, CA
"Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data Lake Architectures" - By Vinoth Chandar & Balaji Varadarajan September 2019, ApacheCon NA 19, Las Vegas, NV, USA
"Insert, upsert, and delete data in Amazon S3 using Amazon EMR" - By Paul Codding & Vinoth Chandar December 2019, AWS re:Invent 2019, Las Vegas, NV, USA
"Building Robust CDC Pipeline With Apache Hudi And Debezium" - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India
"Using Apache Hudi to build the next-generation data lake and its application in medical big data" - By JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China
"Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin" - By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China
"Building large scale, transactional data lakes using Apache Hudi" - By Nishith Agarwal, June 2020, Berlin Buzzwords 2020.
"Apache Hudi - Design/Code Walkthrough Session for Contributors" - By Vinoth Chandar, July 2020, Hudi community.
"PrestoDB and Apache Hudi" - By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB Community Meetup.
"DC_THURS : Apache Hudi w/ Nishith Agarwal & Vinoth Chandar", Aug 2020, Online discussion/Q&A with DataCouncil Founder
"Building Large-Scale, Transactional Data Lakes using Apache Hudi" - By Nishith Agarwal, Data Summit 2020
"Landing practice of Apache Hudi in T3go" - By VinoYang and XianghuWang, November 2020, Qcon.
"Meetup talk by Nishith Agarwal" - Uber Data Platforms Meetup, Dec 2020
"Apache Hudi learning series: Understanding Hudi internals" - By Abhishek Modi, Balajee Nagasubramaniam, Prashant Wason, Satish Kotha, Nishith Agarwal, Feb 2021, Uber Meetup
"Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & Uber" - By Udit Mehrotra, Wenning Ding (AWS), Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), Feb 2021
"Apache Hudi: The Streaming Data Lake Platform" - By Nishith Agarwal, Sivabalan Narayanan, Data Summit Connect, May, 2021
"Change Data Capture to Data lakes using Apache Pulsar/Hudi" - By Vinoth Chandar, Pulsar Summit North America, June 2021. "Video link"
"Apache Hudi: Large Scale Data Systems with Vinoth Chandar" - By Vinoth Chandar. SE Daily Podcast. May, 2021
"Meet the creator of Apache hudi: Vinoth Chandar" - By Vinoth Chandar. Presto Con Day, 2021
"Presto Eco system Panel Discussion" - By Vinoth Chandar, Dipti Borkar, Nezih Yigitbasi, Maxime Beauchemin, Kishore. Presto Con, 2021
"Speeding up Presto Queries Using Apache Hudi Clustering" - By Satish Kotha and Nishith Agarwal. Presto Con, March 2021
"Building a Large-scale Transactional Data Lake Using Apache Hudi" - By Satish Kotha, AICamp
"Apache Hudi table format, Purpose-built for low latency data lake use-cases" - By Nishith Agarwal and Sivabalan Narayanan. July, 2021
"Community round table: Open data lakes with Presto, Hudi and Aws - the next generation of analytics" - By Vinoth chandar, Roy Hasson, Dipti Borkar, Coordinated by Eric Kavanagh. July, 2021
"DataEngineering Podcast: Charting A Path For Streaming Data To Fill Your Data Lake With Hudi" - By Vinoth Chandar. Aug, 2021
"Streaming Data Lakes using Kafka Connect + Apache Hudi" - Balaji Varadarajan and Vinoth Chandar. Sep 27, 2021.
"Code/Design walk through" - By Vinoth Chandar. Oct 8, 2021
"Apache Hudi - The Data lake platform" - By Vinoth Chandar. Oct 11, 2021
"Building Open Data Lakes on AWS with Debezium and Apache Hudi" - By Gary A. Stafford. Oct 31, 2021
"Apache Hudi Meetup at Uber with talks from Disney, Walmart & Uber" - By Vinay Patil (Disney+Hotstar), Samuel Guleff (Walmart), Surya Prasanna Yalla, Meenal Binwade (Uber), Jan 2022
You can check out our blog pages for content written by our committers/contributors.
- "The Case for incremental processing on Hadoop" - O'reilly Ideas article by Vinoth Chandar
- "Hoodie: Uber Engineering's Incremental Processing Framework on Hadoop" - Engineering Blog By Prasanna Rajaperumal
- "New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi" - AWS Blog by Danilo Poccia
- "The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project" - ASF Graduation announcement
- "Apache Hudi grows cloud data lake maturity"
- "Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi" - Uber eng blog by Nishith Agarwal
- "Hudi On Hops" - By NETSANET GEBRETSADKAN KIDANE
- "PrestoDB and Apache Hudi - PrestoDB - Hudi integration blog by Bhavani Sudha Saktheeswaran and Brandon Scheller
- "Origins of Data Lake at Grofers" - by Akshay Agarwal
- "Data Lake Change Capture using Apache Hudi & Amazon AMS/EMR" - Towards DataScience article, Oct 20
- "How nClouds Helps Accelerate Data Delivery with Apache Hudi on Amazon EMR" - published by nClouds in partnership with AWS
- "Apply record level changes from relational databases to Amazon S3 data lake using Apache Hudi on Amazon EMR and AWS Database Migration Service" - AWS blog
- "Architecting Data Lakes for the Modern Enterprise at Data Summit Connect Fall 2020"
- "Can Big Data Solutions Be Affordable?"
- "Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go"
- "Data Lake Change Capture using Apache Hudi & Amazon AMS/EMR Part 2"
- "Building a large scale transactional data lake at Uber using Apache Hudi" - Engineering Blog By Nishith Agarwal
- "Data Lakehouse: Building the Next Generation of Data Lakes using Apache Hudi" - Slalom Build blog By Ryan D'Souza and Brandon Stanley
- "Time travel operations in Hopsworks Feature Store"
- "Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and Apache Hudi on Amazon EMR"
- "Apache Hudi: How Uber gets data a ride to its destination" - By Jae McKendrick.
- "Experts primer on Apache Hudi" - By Stephanie Simone, Data Summit Connect
- "New features from Apache hudi in Amazon EMR"
- "Build a data lake using amazon kinesis data stream for amazon dynamodb and apache hudi" - Amazon AWS
- "Amazon Athena expands Apache Hudi support" - Amazon AWS
- "Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries" - Amazon AWS
- "Baixin bank’s real-time data lake evolution scheme based on Apache Hudi" July, 2021
- "MLOps Wars: Versioned Feature Data with a Lakehouse" - By David Bzhalava and Jim Dowling. Aug, 2021
- "Cost-Efficient Open Source Big Data Platform at Uber" - By Zheng Shao and Mohammad Islam. Aug, 2021
- "Data Platform 2.0 - Part I" - By Jitendra Shah. Oct 5, 2021
- "How Amazon Transportation Service enabled near-real-time event analytics at petabyte scale using AWS Glue with Apache Hudi" - Madhavan Sriram, Diego Menin, Gabriele Cacciola, and Kunal Gautam. Oct 14, 2021
- "Practice of Apache Hudi in building real-time data lake at station B" by Yu Zhaojing. Oct 21, 2021
- "How GE Aviation built cloud-native data pipelines at enterprise scale using the AWS platform" by Alcuin Weidus and Suresh Patnam. Nov 16, 2021
- "https://www.xenonstack.com/insights/what-is-hudi" by Chandan Gaur. Nov 22, 2021
- "https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-7-0-and-0-8-0-available-on-amazon-emr/" by Udit Mehotra and Gagan Brahmi. Dec 20, 2021
- "Designing the Analytics patterns using a Lake House approach on AWS" by Adit Modi. Dec 30, 2021
- "The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium" by Gary Stafford. Dec 31, 2021