SQL Procedures
Stored procedures are available when use Hudi SparkSQL extensions in all spark's version.
Usage
CALL
supports passing arguments by name (recommended) or by position. Mixing position and named arguments is also supported.
Named arguments
All procedure arguments are named. When passing arguments by name, arguments can be in any order and any optional argument can be omitted.
CALL system.procedure_name(arg_name_2 => arg_2, arg_name_1 => arg_1, ... arg_name_n => arg_n);
Positional arguments
When passing arguments by position, the arguments may be omitted if they are optional.
CALL system.procedure_name(arg_1, arg_2, ... arg_n);
note: The system here has no practical meaning, the complete procedure name is system.procedure_name.
help
Show parameters and outputTypes of a procedure.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
cmd | String | N | None | name of a procedure |
Output
Output Name | Type |
---|---|
result | String |
Example
call help(cmd => 'show_commits');
result |
---|
parameters: param type_name default_value required table string None true limit integer 10 false outputType: name type_name nullable metadata commit_time string true {} action string true {} total_bytes_written long true {} total_files_added long true {} total_files_updated long true {} total_partitions_written long true {} total_records_written long true {} total_update_records_written long true {} total_errors long true {} |
Commit management
show_commits
Show commits' info.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
commit_time | String |
total_bytes_written | Long |
total_files_added | Long |
total_files_updated | Long |
total_partitions_written | Long |
total_records_written | Long |
total_update_records_written | Long |
total_errors | Long |
Example
call show_commits(table => 'test_hudi_table', limit => 10);
commit_time | total_bytes_written | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |
---|---|---|---|---|---|---|---|
20220216171049652 | 432653 | 0 | 1 | 1 | 0 | 0 | 0 |
20220216171027021 | 435346 | 1 | 0 | 1 | 1 | 0 | 0 |
20220216171019361 | 435349 | 1 | 0 | 1 | 1 | 0 | 0 |
show_commits_metadata
Show commits' metadata.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
commit_time | String |
action | String |
partition | String |
file_id | String |
previous_commit | String |
num_writes | Long |
num_inserts | Long |
num_deletes | Long |
num_update_writes | String |
total_errors | Long |
total_log_blocks | Long |
total_corrupt_logblocks | Long |
total_rollback_blocks | Long |
total_log_records | Long |
total_updated_records_compacted | Long |
total_bytes_written | Long |
Example
call show_commits_metadata(table => 'test_hudi_table');
commit_time | action | partition | file_id | previous_commit | num_writes | num_inserts | num_deletes | num_update_writes | total_errors | total_log_blocks | total_corrupt_logblocks | total_rollback_blocks | total_log_records | total_updated_records_compacted | total_bytes_written |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20220109225319449 | commit | dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435349 |
20220109225311742 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109214830592 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435340 |
20220109225301429 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109214830592 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435340 |
20220109214830592 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109191631015 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 432653 |
20220109214830592 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109191648181 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 432653 |
20220109191648181 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435341 |
20220109191631015 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435341 |
show_commit_extra_metadata
Show commits' extra metadata.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
instant_time | String | N | None | Instant time |
metadata_key | String | N | None | Key of metadata |
Output
Output Name | Type |
---|---|
instant_time | String |
action | String |
metadata_key | String |
metadata_value | String |
Example
call show_commit_extra_metadata(table => 'test_hudi_table');
instant_time | action | metadata_key | metadata_value |
---|---|---|---|
20230206174349556 | deltacommit | schema | {"type":"record","name":"hudi_mor_tbl","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"id","type":"int"},{"name":"ts","type":"long"}]} |
20230206174349556 | deltacommit | latest_schema | {"max_column_id":8,"version_id":20230206174349556,"type":"record","fields":[{"id":0,"name":"_hoodie_commit_time","optional":true,"type":"string","doc":""},{"id":1,"name":"_hoodie_commit_seqno","optional":true,"type":"string","doc":""},{"id":2,"name":"_hoodie_record_key","optional":true,"type":"string","doc":""},{"id":3,"name":"_hoodie_partition_path","optional":true,"type":"string","doc":""},{"id":4,"name":"_hoodie_file_name","optional":true,"type":"string","doc":""},{"id":5,"name":"id","optional":false,"type":"int"},{"id":8,"name":"ts","optional":false,"type":"long"}]} |
show_archived_commits
Show archived commits.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
start_ts | String | N | "" | Start time for commits, default: now - 10 days |
end_ts | String | N | "" | End time for commits, default: now - 1 day |
Output
Output Name | Type |
---|---|
commit_time | String |
total_bytes_written | Long |
total_files_added | Long |
total_files_updated | Long |
total_partitions_written | Long |
total_records_written | Long |
total_update_records_written | Long |
total_errors | Long |
Example
call show_archived_commits(table => 'test_hudi_table');
commit_time | total_bytes_written | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |
---|---|---|---|---|---|---|---|
20220216171049652 | 432653 | 0 | 1 | 1 | 0 | 0 | 0 |
20220216171027021 | 435346 | 1 | 0 | 1 | 1 | 0 | 0 |
20220216171019361 | 435349 | 1 | 0 | 1 | 1 | 0 | 0 |
show_archived_commits_metadata
Show archived commits' metadata.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
start_ts | String | N | "" | Start time for commits, default: now - 10 days |
end_ts | String | N | "" | End time for commits, default: now - 1 day |
Output
Output Name | Type |
---|---|
commit_time | String |
action | String |
partition | String |
file_id | String |
previous_commit | String |
num_writes | Long |
num_inserts | Long |
num_deletes | Long |
num_update_writes | String |
total_errors | Long |
total_log_blocks | Long |
total_corrupt_logblocks | Long |
total_rollback_blocks | Long |
total_log_records | Long |
total_updated_records_compacted | Long |
total_bytes_written | Long |
Example
call show_archived_commits_metadata(table => 'test_hudi_table');
commit_time | action | partition | file_id | previous_commit | num_writes | num_inserts | num_deletes | num_update_writes | total_errors | total_log_blocks | total_corrupt_logblocks | total_rollback_blocks | total_log_records | total_updated_records_compacted | total_bytes_written |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20220109225319449 | commit | dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435349 |
20220109225311742 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109214830592 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435340 |
20220109225301429 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109214830592 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435340 |
20220109214830592 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | 20220109191631015 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 432653 |
20220109214830592 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | 20220109191648181 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 432653 |
20220109191648181 | commit | dt=2021-05-02 | b3b32bac-8a44-4c4d-b433-0cb1bf620f23-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435341 |
20220109191631015 | commit | dt=2021-05-01 | 0d7298b3-6b55-4cff-8d7d-b0772358b78a-0 | null | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435341 |
call show_archived_commits(table => 'test_hudi_table');
commit_time | total_bytes_written | total_files_added | total_files_updated | total_partitions_written | total_records_written | total_update_records_written | total_errors |
---|---|---|---|---|---|---|---|
20220216171049652 | 432653 | 0 | 1 | 1 | 0 | 0 | 0 |
20220216171027021 | 435346 | 1 | 0 | 1 | 1 | 0 | 0 |
20220216171019361 | 435349 | 1 | 0 | 1 | 1 | 0 | 0 |
show_commit_files
Show files of a commit.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
action | String |
partition_path | String |
file_id | String |
previous_commit | String |
total_records_updated | Long |
total_records_written | Long |
total_bytes_written | Long |
total_errors | Long |
file_size | Long |
Example
call show_commit_files(table => 'test_hudi_table', instant_time => '20230206174349556');
action | partition_path | file_id | previous_commit | total_records_updated | total_records_written | total_bytes_written | total_errors | file_size |
---|---|---|---|---|---|---|---|---|
deltacommit | dt=2021-05-03 | 7fb52523-c7f6-41aa-84a6-629041477aeb-0 | null | 0 | 1 | 434768 | 0 | 434768 |
show_commit_partitions
Show partitions of a commit.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
action | String |
partition_path | String |
total_files_added | Long |
total_files_updated | Long |
total_records_inserted | Long |
total_records_updated | Long |
total_bytes_written | Long |
total_errors | Long |
Example
call show_commit_partitions(table => 'test_hudi_table', instant_time => '20230206174349556');
action | partition_path | total_files_added | total_files_updated | total_records_inserted | total_records_updated | total_bytes_written | total_errors |
---|---|---|---|---|---|---|---|
deltacommit | dt=2021-05-03 | 7fb52523-c7f6-41aa-84a6-629041477aeb-0 | 0 | 1 | 434768 | 0 | 0 |
show_commit_write_stats
Show write statistics of a commit.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
action | String |
total_bytes_written | Long |
total_records_written | Long |
avg_record_size | Long |
Example
call show_commit_write_stats(table => 'test_hudi_table', instant_time => '20230206174349556');
action | total_bytes_written | total_records_written | avg_record_size |
---|---|---|---|
deltacommit | 434768 | 1 | 434768 |
show_rollbacks
Show rollback commits.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
instant | String |
rollback_instant | String |
total_files_deleted | Int |
time_taken_in_millis | Long |
total_partitions | Int |
Example
call show_rollbacks(table => 'test_hudi_table');
instant | rollback_instant | total_files_deleted | time_taken_in_millis | time_taken_in_millis |
---|---|---|---|---|
deltacommit | 434768 | 1 | 434768 | 2 |
show_rollback_detail
Show details of a rollback commit.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
instant | String |
rollback_instant | String |
partition | String |
deleted_file | String |
succeeded | Int |
Example
call show_rollback_detail(table => 'test_hudi_table', instant_time => '20230206174349556');
instant | rollback_instant | partition | deleted_file | succeeded |
---|---|---|---|---|
deltacommit | 434768 | 1 | 434768 | 2 |
commits_compare
Compare commit with another path.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
path | String | Y | None | Path of table |
Output
Output Name | Type |
---|---|
compare_detail | String |
Example
call commits_compare(table => 'test_hudi_table', path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table');
compare_detail |
---|
Source test_hudi_table is ahead by 0 commits. Commits to catch up - [] |
archive_commits
archive commits.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | N | None | Hudi table name |
path | String | N | None | Path of table |
min_commits | Int | N | 20 | Similar to hoodie.keep.max.commits, but controls the minimum number of instants to retain in the active timeline. |
max_commits | Int | N | 30 | Archiving service moves older entries from timeline into an archived log after each write, to keep the metadata overhead constant, even as the table size grows. This config controls the maximum number of instants to retain in the active timeline. |
retain_commits | Int | N | 10 | Archiving of instants is batched in best-effort manner, to pack more instants into a single archive log. This config controls such archival batch size. |
enable_metadata | Boolean | N | false | Enable the internal metadata table |
Output
Output Name | Type |
---|---|
result | Int |
Example
call archive_commits(table => 'test_hudi_table');
result |
---|
0 |
export_instants
extract instants to local folder.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
local_folder | String | Y | None | Local folder |
limit | Int | N | -1 | Number of instants to be exported |
actions | String | N | clean,commit,deltacommit,rollback,savepoint,restore | Commit action |
desc | Boolean | N | false | Descending order |
Output
Output Name | Type |
---|---|
export_detail | String |
Example
call export_instants(table => 'test_hudi_table', local_folder => '/tmp/folder');
export_detail |
---|
Exported 6 Instants to /tmp/folder |
rollback_to_instant
Rollback a table to the commit that was current at some time.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
rollback_result | Boolean |
Example
Roll back test_hudi_table to one instant
call rollback_to_instant(table => 'test_hudi_table', instant_time => '20220109225319449');
rollback_result |
---|
true |
create_savepoint
Create a savepoint to hudi's table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
commit_time | String | Y | None | Commit time |
user | String | N | "" | User name |
comments | String | N | "" | Comments |
Output
Output Name | Type |
---|---|
create_savepoint_result | Boolean |
Example
call create_savepoint(table => 'test_hudi_table', commit_time => '20220109225319449');
create_savepoint_result |
---|
true |
show_savepoints
Show savepoints.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
savepoint_time | String |
Example
call show_savepoints(table => 'test_hudi_table');
savepoint_time |
---|
20220109225319449 |
20220109225311742 |
20220109225301429 |
delete_savepoint
Delete a savepoint to hudi's table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
delete_savepoint_result | Boolean |
Example
Delete a savepoint to test_hudi_table
call delete_savepoint(table => 'test_hudi_table', instant_time => '20220109225319449');
delete_savepoint_result |
---|
true |
rollback_to_savepoint
Rollback a table to the commit that was current at some time.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
instant_time | String | Y | None | Instant time |
Output
Output Name | Type |
---|---|
rollback_savepoint_result | Boolean |
Example
Rollback test_hudi_table to one savepoint
call rollback_to_savepoint(table => 'test_hudi_table', instant_time => '20220109225319449');
rollback_savepoint_result |
---|
true |
copy_to_temp_view
copy table to a temporary view.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
query_type | String | N | "snapshot" | Whether data needs to be read, in incremental mode (new data since an instantTime) (or) read_optimized mode (obtain latest view, based on base files) (or) snapshot mode (obtain latest view, by merging base and (if any) log files) |
view_name | String | Y | None | Name of view |
begin_instance_time | String | N | "" | Begin instance time |
end_instance_time | String | N | "" | End instance time |
as_of_instant | String | N | "" | As of instant time |
replace | Boolean | N | false | Replace an existed view |
global | Boolean | N | false | Global view |
Output
Output Name | Type |
---|---|
status | Boolean |
Example
call copy_to_temp_view(table => 'test_hudi_table', view_name => 'copy_view_test_hudi_table');
status |
---|
0 |
copy_to_table
copy table to a new table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
query_type | String | N | "snapshot" | Whether data needs to be read, in incremental mode (new data since an instantTime) (or) read_optimized mode (obtain latest view, based on base files) (or) snapshot mode (obtain latest view, by merging base and (if any) log files) |
new_table | String | Y | None | Name of new table |
begin_instance_time | String | N | "" | Begin instance time |
end_instance_time | String | N | "" | End instance time |
as_of_instant | String | N | "" | As of instant time |
save_mode | String | N | "overwrite" | Save mode |
columns | String | N | "" | Columns of source table which should copy to new table |
Output
Output Name | Type |
---|---|
status | Boolean |
Example
call copy_to_table(table => 'test_hudi_table', new_table => 'copy_table_test_hudi_table');
status |
---|
0 |
Metadata Table management
create_metadata_table
Create metadata table of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
result | String |
Example
call create_metadata_table(table => 'test_hudi_table');
result |
---|
Created Metadata Table in hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table/.hoodie/metadata (duration=2.777secs) |
init_metadata_table
Init metadata table of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
read_only | Boolean | N | false | Read only |
Output
Output Name | Type |
---|---|
result | String |
Example
call init_metadata_table(table => 'test_hudi_table');
result |
---|
Initialized Metadata Table in hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table/.hoodie/metadata (duration=0.023sec) |
delete_metadata_table
Delete metadata table of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
result | String |
Example
call delete_metadata_table(table => 'test_hudi_table');
result |
---|
Removed Metadata Table from hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table/.hoodie/metadata |
show_metadata_table_partitions
Show partition of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
partition | String |
Example
call show_metadata_table_partitions(table => 'test_hudi_table');
partition |
---|
dt=2021-05-01 |
dt=2021-05-02 |
dt=2021-05-03 |
show_metadata_table_files
Show files of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
partition | String | N | "" | Partition name |
limit | Int | N | 100 | Limit number |
Output
Output Name | Type |
---|---|
file_path | String |
Example
Show files of a hudi table under one partition.
call show_metadata_table_files(table => 'test_hudi_table', partition => 'dt=20230220');
file_path |
---|
.d3cdf6ff-250a-4cee-9af4-ab179fdb9bfb-0_20230220190948086.log.1_0-111-123 |
d3cdf6ff-250a-4cee-9af4-ab179fdb9bfb-0_0-78-81_20230220190948086.parquet |
show_metadata_table_stats
Show metadata table stats of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
Output
Output Name | Type |
---|---|
stat_key | String |
stat_value | String |
Example
call show_metadata_table_stats(table => 'test_hudi_table');
stat_key | stat_value |
---|---|
dt=2021-05-03.totalBaseFileSizeInBytes | 23142 |
validate_metadata_table_files
Validate metadata table files of a hudi table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
verbose | Boolean | N | False | If verbose print all files |
Output
Output Name | Type |
---|---|
partition | String |
file_name | String |
is_present_in_fs | Boolean |
is_present_in_metadata | Boolean |
fs_size | Long |
metadata_size | Long |
Example
call validate_metadata_table_files(table => 'test_hudi_table');
partition | file_name | is_present_in_fs | is_present_in_metadata | fs_size | metadata_size |
---|---|---|---|---|---|
dt=2021-05-03 | ad1e5a3f-532f-4a13-9f60-223676798bf3-0_0-4-4_00000000000002.parquet | true | true | 43523 | 43523 |
Table information
show_table_properties
Show hudi properties of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
path | String | N | None | Path of table |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
key | String |
value | String |
Example
call show_table_properties(table => 'test_hudi_table', limit => 10);
key | value |
---|---|
hoodie.table.precombine.field | ts |
hoodie.table.partition.fields | dt |
show_fs_path_detail
Show detail of a path.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
path | String | Y | None | Hudi table name |
is_sub | Boolean | N | false | Whether to list files |
sort | Boolean | N | true | Sorted by storage_size |
limit | Int | N | 100 | Limit number |
Output
Output Name | Type |
---|---|
path_num | Long |
file_num | Long |
storage_size | Long |
storage_size(unit) | String |
storage_path | String |
space_consumed | Long |
quota | Long |
space_quota | Long |
Example
call show_fs_path_detail(path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table');
path_num | file_num | storage_size | storage_size(unit) | storage_path | space_consumed | quota | space_quota |
---|---|---|---|---|---|---|---|
22 | 58 | 2065612 | 1.97MB | hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table | -1 | 6196836 | -1 |
stats_file_sizes
Show file sizes of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
partition_path | String | N | "" | Partition path |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
commit_time | String |
min | Long |
10th | Double |
50th | Double |
avg | Double |
95th | Double |
max | Long |
num_files | Int |
std_dev | Double |
Example
call stats_file_sizes(table => 'test_hudi_table');
commit_time | min | 10th | 50th | avg | 95th | max | num_files | std_dev |
---|---|---|---|---|---|---|---|---|
20230205134149455 | 435000 | 435000.0 | 435000.0 | 435000.0 | 435000.0 | 435000 | 1 | 0.0 |
stats_wa
Show write stats and amplification of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
commit_time | String |
total_upserted | Long |
total_written | Long |
write_amplification_factor | String |
Example
call stats_wa(table => 'test_hudi_table');
commit_time | total_upserted | total_written | write_amplification_factor |
---|---|---|---|
Total | 0 | 0 | 0 |
show_logfile_records
Show records in logfile of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
log_file_path_pattern | String | Y | 10 | Pattern of logfile |
merge | Boolean | N | false | Merge results |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
records | String |
Example
call show_logfile_records(table => 'test_hudi_table', log_file_path_pattern => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table/*.log*');
records |
---|
{"_hoodie_commit_time": "20230205133427059", "_hoodie_commit_seqno": "20230205133427059_0_10", "_hoodie_record_key": "1", "_hoodie_partition_path": "", "_hoodie_file_name": "3438e233-7b50-4eff-adbb-70b1cd76f518-0", "id": 1, "name": "a1", "price": 40.0, "ts": 1111} |
show_logfile_metadata
Show metadatas in logfile of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
log_file_path_pattern | String | Y | 10 | Pattern of logfile |
merge | Boolean | N | false | Merge results |
limit | Int | N | 10 | Max number of records to be returned |
Output
Output Name | Type |
---|---|
instant_time | String |
record_count | Int |
block_type | String |
header_metadata | String |
footer_metadata | String |
Example
call show_logfile_metadata(table => 'hudi_mor_tbl', log_file_path_pattern => 'hdfs://ns1/hive/warehouse/hudi.db/hudi_mor_tbl/*.log*');
instant_time | record_count | block_type | header_metadata | footer_metadata |
---|---|---|---|---|
20230205133427059 | 1 | AVRO_DATA_BLOCK | {"INSTANT_TIME":"20230205133427059","SCHEMA":"{"type":"record","name":"hudi_mor_tbl_record","namespace":"hoodie.hudi_mor_tbl","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"id","type":"int"},{"name":"name","type":"string"},{"name":"price","type":"double"},{"name":"ts","type":"long"}]}"} | {} |
show_invalid_parquet
Show invalid parquet files of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
Path | String | Y | None | Hudi table name |
limit | Int | N | 100 | Limit number |
Output
Output Name | Type |
---|---|
Path | String |
Example
call show_invalid_parquet(path => 'hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table');
Path |
---|
hdfs://ns1/hive/warehouse/hudi.db/test_hudi_table/7fb52523-c7f6-41aa-84a6-629041477aeb-0_0-92-99_20230205133532199.parquet |
show_fsview_all
Show file system views of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
max_instant | String | N | "" | Max instant time |
include_max | Boolean | N | false | Include max instant |
include_in_flight | Boolean | N | false | Include in flight |
exclude_compaction | Boolean | N | false | Exclude compaction |
limit | Int | N | 10 | Max number of records to be returned |
path_regex | String | N | "*/*/*" | Pattern of path |
Output
Output Name | Type |
---|---|
partition | String |
file_id | String |
base_instant | String |
data_file | String |
data_file_size | Long |
num_delta_files | Long |
total_delta_file_size | Long |
delta_files | String |
Example
call show_fsview_all(table => 'test_hudi_table');
partition | file_id | base_instant | data_file | data_file_size | num_delta_files | total_delta_file_size | delta_files |
---|---|---|---|---|---|---|---|
dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | 20220109225319449 | 7fb52523-c7f6-41aa-84a6-629041477aeb-0_0-92-99_20220109225319449.parquet | 5319449 | 1 | 213193 | .7fb52523-c7f6-41aa-84a6-629041477aeb-0_20230205133217210.log.1_0-60-63 |
show_fsview_latest
Show latest file system view of a table.
Input
Parameter Name | Type | Required | Default Value | Description |
---|---|---|---|---|
table | String | Y | None | Hudi table name |
max_instant | String | N | "" | Max instant time |
include_max | Boolean | N | false | Include max instant |
include_in_flight | Boolean | N | false | Include in flight |
exclude_compaction | Boolean | N | false | Exclude compaction |
partition_path | String | Y | None | Partition path |
merge | Boolean | N | false | Merge results |
Output
Output Name | Type |
---|---|
partition | String |
file_id | String |
base_instant | String |
data_file | String |
data_file_size | Long |
num_delta_files | Long |
total_delta_file_size | Long |
delta_size_compaction_scheduled | Long |
delta_size_compaction_unscheduled | Long |
delta_to_base_radio_compaction_scheduled | Double |
delta_to_base_radio_compaction_unscheduled | Double |
delta_files_compaction_scheduled | String |
delta_files_compaction_unscheduled | String |
Example
call show_fsview_latest(table => 'test_hudi_table', partition => 'dt=2021-05-03');
partition | file_id | base_instant | data_file | data_file_size | num_delta_files | total_delta_file_size | delta_files |
---|---|---|---|---|---|---|---|
dt=2021-05-03 | d0073a12-085d-4f49-83e9-402947e7e90a-0 | 20220109225319449 | 7fb52523-c7f6-41aa-84a6-629041477aeb-0_0-92-99_20220109225319449.parquet | 5319449 | 1 | 213193 | .7fb52523-c7f6-41aa-84a6-629041477aeb-0_20230205133217210.log.1_0-60-63 |