Skip to main content

Talks & Articles

Talks & Presentations

  1. "Hoodie: Incremental processing on Hadoop at Uber" - By Vinoth Chandar & Prasanna Rajaperumal Mar 2017, Strata + Hadoop World, San Jose, CA

  2. "Hoodie: An Open Source Incremental Processing Framework From Uber" - By Vinoth Chandar. Apr 2017, DataEngConf, San Francisco, CA Slides Video

  3. "Incremental Processing on Large Analytical Datasets" - By Prasanna Rajaperumal June 2017, Spark Summit 2017, San Francisco, CA. Slides Video

  4. "Hudi: Unifying storage and serving for batch and near-real-time analytics" - By Nishith Agarwal & Balaji Vardarajan September 2018, Strata Data Conference, New York, NY

  5. "Hudi: Large-Scale, Near Real-Time Pipelines at Uber" - By Vinoth Chandar & Nishith Agarwal October 2018, Spark+AI Summit Europe, London, UK

  6. "Powering Uber's global network analytics pipelines in real-time with Apache Hudi" - By Ethan Guo & Nishith Agarwal, April 2019, Data Council SF19, San Francisco, CA.

  7. "Building highly efficient data lakes using Apache Hudi (Incubating)" - By Vinoth Chandar June 2019, SF Big Analytics Meetup, San Mateo, CA

  8. "Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data Lake Architectures" - By Vinoth Chandar & Balaji Varadarajan September 2019, ApacheCon NA 19, Las Vegas, NV, USA

  9. "Insert, upsert, and delete data in Amazon S3 using Amazon EMR" - By Paul Codding & Vinoth Chandar December 2019, AWS re:Invent 2019, Las Vegas, NV, USA

  10. "Building Robust CDC Pipeline With Apache Hudi And Debezium" - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India

  11. "Using Apache Hudi to build the next-generation data lake and its application in medical big data" - By JingHuang & Leesf March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  12. "Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin" - By ShaoFeng Shi March 2020, Apache Hudi & Apache Kylin Online Meetup, China

  13. "Building large scale, transactional data lakes using Apache Hudi" - By Nishith Agarwal, June 2020, Berlin Buzzwords 2020.

  14. "Apache Hudi - Design/Code Walkthrough Session for Contributors" - By Vinoth Chandar, July 2020, Hudi community.

  15. "PrestoDB and Apache Hudi" - By Bhavani Sudha Saktheeswaran and Brandon Scheller, Aug 2020, PrestoDB Community Meetup.

  16. "DC_THURS : Apache Hudi w/ Nishith Agarwal & Vinoth Chandar", Aug 2020, Online discussion/Q&A with DataCouncil Founder

  17. "Panel Discussion on Presto Ecosystem" - By Vinoth Chandar, Sep 2020, PrestoCon "panel".

  18. "Next Generation Data lakes using Apache Hudi" - By Balaji Varadarajan and Sivabalan Narayanan, Sep 2020, "ApacheCon"

  19. "Building Large-Scale, Transactional Data Lakes using Apache Hudi" - By Nishith Agarwal, Data Summit 2020

  20. "Landing practice of Apache Hudi in T3go" - By VinoYang and XianghuWang, November 2020, Qcon.

  21. "Meetup talk by Nishith Agarwal" - Uber Data Platforms Meetup, Dec 2020

  22. "Apache Hudi learning series: Understanding Hudi internals" - By Abhishek Modi, Balajee Nagasubramaniam, Prashant Wason, Satish Kotha, Nishith Agarwal, Feb 2021, Uber Meetup

  23. "Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & Uber" - By Udit Mehrotra, Wenning Ding (AWS), Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), Feb 2021

  24. "Apache Hudi: The Streaming Data Lake Platform" - By Nishith Agarwal, Sivabalan Narayanan, Data Summit Connect, May, 2021

  25. "Change Data Capture to Data lakes using Apache Pulsar/Hudi" - By Vinoth Chandar, Pulsar Summit North America, June 2021. "Video link"

  26. "Apache Hudi: Large Scale Data Systems with Vinoth Chandar" - By Vinoth Chandar. SE Daily Podcast. May, 2021

  27. "Meet the creator of Apache hudi: Vinoth Chandar" - By Vinoth Chandar. Presto Con Day, 2021

  28. "Presto Eco system Panel Discussion" - By Vinoth Chandar, Dipti Borkar, Nezih Yigitbasi, Maxime Beauchemin, Kishore. Presto Con, 2021

  29. "Speeding up Presto Queries Using Apache Hudi Clustering" - By Satish Kotha and Nishith Agarwal. Presto Con, March 2021

  30. "Building a Large-scale Transactional Data Lake Using Apache Hudi" - By Satish Kotha, AICamp

  31. "Apache Hudi table format, Purpose-built for low latency data lake use-cases" - By Nishith Agarwal and Sivabalan Narayanan. July, 2021

  32. "Community round table: Open data lakes with Presto, Hudi and Aws - the next generation of analytics" - By Vinoth chandar, Roy Hasson, Dipti Borkar, Coordinated by Eric Kavanagh. July, 2021

  33. "DataEngineering Podcast: Charting A Path For Streaming Data To Fill Your Data Lake With Hudi" - By Vinoth Chandar. Aug, 2021

  34. "Streaming Data Lakes using Kafka Connect + Apache Hudi" - Balaji Varadarajan and Vinoth Chandar. Sep 27, 2021.

  35. "Code/Design walk through" - By Vinoth Chandar. Oct 8, 2021

  36. "Apache Hudi - The Data lake platform" - By Vinoth Chandar. Oct 11, 2021

  37. "Building Open Data Lakes on AWS with Debezium and Apache Hudi" - By Gary A. Stafford. Oct 31, 2021

  38. "Apache Hudi Meetup at Uber with talks from Disney, Walmart & Uber" - By Vinay Patil (Disney+Hotstar), Samuel Guleff (Walmart), Surya Prasanna Yalla, Meenal Binwade (Uber), Jan 2022


You can check out our blog pages for content written by our committers/contributors.

  1. "The Case for incremental processing on Hadoop" - O'reilly Ideas article by Vinoth Chandar
  2. "Hoodie: Uber Engineering's Incremental Processing Framework on Hadoop" - Engineering Blog By Prasanna Rajaperumal
  3. "New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi" - AWS Blog by Danilo Poccia
  4. "The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project" - ASF Graduation announcement
  5. "Apache Hudi grows cloud data lake maturity"
  6. "Building a Large-scale Transactional Data Lake at Uber Using Apache Hudi" - Uber eng blog by Nishith Agarwal
  8. "PrestoDB and Apache Hudi - PrestoDB - Hudi integration blog by Bhavani Sudha Saktheeswaran and Brandon Scheller
  9. "Origins of Data Lake at Grofers" - by Akshay Agarwal
  10. "Data Lake Change Capture using Apache Hudi & Amazon AMS/EMR" - Towards DataScience article, Oct 20
  11. "How nClouds Helps Accelerate Data Delivery with Apache Hudi on Amazon EMR" - published by nClouds in partnership with AWS
  12. "Apply record level changes from relational databases to Amazon S3 data lake using Apache Hudi on Amazon EMR and AWS Database Migration Service" - AWS blog
  13. "Architecting Data Lakes for the Modern Enterprise at Data Summit Connect Fall 2020"
  14. "Can Big Data Solutions Be Affordable?"
  15. "Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go"
  16. "Data Lake Change Capture using Apache Hudi & Amazon AMS/EMR Part 2"
  17. "Building a large scale transactional data lake at Uber using Apache Hudi" - Engineering Blog By Nishith Agarwal
  18. "Data Lakehouse: Building the Next Generation of Data Lakes using Apache Hudi" - Slalom Build blog By Ryan D'Souza and Brandon Stanley
  19. "Time travel operations in Hopsworks Feature Store"
  20. "Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and Apache Hudi on Amazon EMR"
  21. "Apache Hudi: How Uber gets data a ride to its destination" - By Jae McKendrick.
  22. "Experts primer on Apache Hudi" - By Stephanie Simone, Data Summit Connect
  23. "New features from Apache hudi in Amazon EMR"
  24. "Build a data lake using amazon kinesis data stream for amazon dynamodb and apache hudi" - Amazon AWS
  25. "Amazon Athena expands Apache Hudi support" - Amazon AWS
  26. "Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries" - Amazon AWS
  27. "Baixin bank’s real-time data lake evolution scheme based on Apache Hudi" July, 2021
  28. "MLOps Wars: Versioned Feature Data with a Lakehouse" - By David Bzhalava and Jim Dowling. Aug, 2021
  29. "Cost-Efficient Open Source Big Data Platform at Uber" - By Zheng Shao and Mohammad Islam. Aug, 2021
  30. "Data Platform 2.0 - Part I" - By Jitendra Shah. Oct 5, 2021
  31. "How Amazon Transportation Service enabled near-real-time event analytics at petabyte scale using AWS Glue with Apache Hudi" - Madhavan Sriram, Diego Menin, Gabriele Cacciola, and Kunal Gautam. Oct 14, 2021
  32. "Practice of Apache Hudi in building real-time data lake at station B" by Yu Zhaojing. Oct 21, 2021
  33. "How GE Aviation built cloud-native data pipelines at enterprise scale using the AWS platform" by Alcuin Weidus and Suresh Patnam. Nov 16, 2021
  34. "" by Chandan Gaur. Nov 22, 2021
  35. "" by Udit Mehotra and Gagan Brahmi. Dec 20, 2021
  36. "Designing the Analytics patterns using a Lake House approach on AWS" by Adit Modi. Dec 30, 2021
  37. "The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium" by Gary Stafford. Dec 31, 2021