Concepts
Apache Hudi (pronounced “Hudi”) provides the following streaming primitives over hadoop compatible storages
- Update/Delete Records (how do I change records in a table?)
- Change Streams (how do I fetch records that changed?)
In this section, we will discuss key concepts & terminologies that are important to understand, to be able to effectively use these primitives.
Timeline
At its core, Hudi maintains a timeline
of all actions performed on the table at different instants
of time that helps provide instantaneous views of the table,
while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components
Instant action
: Type of action performed on the tableInstant time
: Instant time is typically a timestamp (e.g: 20190117010349), which monotonically increases in the order of action's begin time.state
: current state of the instant
Hudi guarantees that the actions performed on the timeline are atomic & timeline consistent based on the instant time.
Key actions performed include
COMMITS
- A commit denotes an atomic write of a batch of records into a table.CLEANS
- Background activity that gets rid of older versions of files in the table, that are no longer needed.DELTA_COMMIT
- A delta commit refers to an atomic write of a batch of records into a MergeOnRead type table, where some/all of the data could be just written to delta logs.COMPACTION
- Background activity to reconcile differential data structures within Hudi e.g: moving updates from row based log files to columnar formats. Internally, compaction manifests as a special commit on the timelineROLLBACK
- Indicates that a commit/delta commit was unsuccessful & rolled back, removing any partial files produced during such a writeSAVEPOINT
- Marks certain file groups as "saved", such that cleaner will not delete them. It helps restore the table to a point on the timeline, in case of disaster/data recovery scenarios.
Any given instant can be in one of the following states
REQUESTED
- Denotes an action has been scheduled, but has not initiatedINFLIGHT
- Denotes that the action is currently being performedCOMPLETED
- Denotes completion of an action on the timeline

Example above shows upserts happenings between 10:00 and 10:20 on a Hudi table, roughly every 5 mins, leaving commit metadata on the Hudi timeline, along
with other background cleaning/compactions. One key observation to make is that the commit time indicates the arrival time
of the data (10:20AM), while the actual data
organization reflects the actual time or event time
, the data was intended for (hourly buckets from 07:00). These are two key concepts when reasoning about tradeoffs between latency and completeness of data.
When there is late arriving data (data intended for 9:00 arriving >1 hr late at 10:20), we can see the upsert producing new data into even older time buckets/folders. With the help of the timeline, an incremental query attempting to get all new data that was committed successfully since 10:00 hours, is able to very efficiently consume only the changed files without say scanning all the time buckets > 07:00.
File management
Hudi organizes a table into a directory structure under a basepath
on DFS. Table is broken up into partitions, which are folders containing data files for that partition,
very similar to Hive tables. Each partition is uniquely identified by its partitionpath
, which is relative to the basepath.
Within each partition, files are organized into file groups
, uniquely identified by a file id
. Each file group contains several
file slices
, where each slice contains a base file (*.parquet
) produced at a certain commit/compaction instant time,
along with set of log files (*.log.*
) that contain inserts/updates to the base file since the base file was produced.
Hudi adopts a MVCC design, where compaction action merges logs and base files to produce new file slices and cleaning action gets rid of
unused/older file slices to reclaim space on DFS.
Index
Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.
Table Types & Queries
Hudi table types define how data is indexed & laid out on the DFS and how the above primitives and timeline activities are implemented on top of such organization (i.e how data is written).
In turn, query types
define how the underlying data is exposed to the queries (i.e how data is read).
Table Type | Supported Query types |
---|---|
Copy On Write | Snapshot Queries + Incremental Queries |
Merge On Read | Snapshot Queries + Incremental Queries + Read Optimized Queries |
Table Types
Hudi supports the following table types.
- Copy On Write : Stores data using exclusively columnar file formats (e.g parquet). Updates simply version & rewrite the files by performing a synchronous merge during write.
- Merge On Read : Stores data using a combination of columnar (e.g parquet) + row based (e.g avro) file formats. Updates are logged to delta files & later compacted to produce new versions of columnar files synchronously or asynchronously.
Following table summarizes the trade-offs between these two table types
Trade-off | CopyOnWrite | MergeOnRead |
---|---|---|
Data Latency | Higher | Lower |
Update cost (I/O) | Higher (rewrite entire parquet) | Lower (append to delta log) |
Parquet File Size | Smaller (high update(I/0) cost) | Larger (low update cost) |
Write Amplification | Higher | Lower (depending on compaction strategy) |