Manage Data with TTL (Time-to-live)
Overview of TTL
TTL (time-to-live) refers to the capability of having rows or columns moved, deleted, or rolled up after a certain interval of time has passed. While the expression "time-to-live" sounds like it only applies to deleting old data, TTL has several use cases:
- Removing old data: no surprise, you can delete rows or columns after a specified time interval
- Moving data between disks: after a certain amount of time, you can move data between storage volumes - useful for deploying a hot/warm/cold architecture
- Data rollup: rollup your older data into various useful aggregations and computations before deleting it
TTL can be applied to entire tables or specific columns.
TTL Syntax
The TTL clause can appear after a column definition and/or at the end of the table definition. Use the INTERVAL clause to define a length of time (which needs to be a Date or DateTime data type). For example, the following table has two columns
with TTL clauses:
- The x column has a time to live of 1 month from the timestamp column
- The y column has a time to live of 1 day from the timestamp column
- When the interval lapses, the column expires. ClickHouse replaces the column value with the default value of its data type. If all the column values in the data part expire, ClickHouse deletes this column from the data part in the filesystem.
TTL rules can be altered or deleted. See the Manipulations with Table TTL page for more details.
Triggering TTL Events
The deleting or aggregating of expired rows is not immediate - it only occurs during table merges. If you have a table that's not actively merging (for whatever reason), there are two settings that trigger TTL events:
- merge_with_ttl_timeout: the minimum delay in seconds before repeating a merge with delete TTL. The default is 14400 seconds (4 hours).
- merge_with_recompression_ttl_timeout: the minimum delay in seconds before repeating a merge with recompression TTL (rules that roll up data before deleting). Default value: 14400 seconds (4 hours).
So by default, your TTL rules will be applied to your table at least once every 4 hours. Just modify the settings above if you need your TTL rules applied more frequently.
Not a great solution (or one that we recommend you use frequently), but you can also force a merge using OPTIMIZE:
OPTIMIZE initializes an unscheduled merge of the parts of your table, and FINAL forces a reoptimization if your table is already a single part.
Removing Rows
To remove entire rows from a table after a certain amount of time, define the TTL rule at the table level:
Additionally, it is possible to define a TTL rule based on the record's value. This is easily implemented by specifying a where condition. Multiple conditions are allowed:
Removing Columns
Instead of deleting the entire row, suppose you want just the balance and address columns to expire. Let's modify the customers table and add a TTL for both columns to be 2 hours:
Implementing a Rollup
Suppose we want to delete rows after a certain amount of time but hang on to some of the data for reporting purposes. We don't want all the details - just a few aggregated results of historical data. This can be implemented by adding a GROUP BY clause to your TTL expression, along with some columns in your table to store the aggregated results.
Suppose in the following hits table we want to delete old rows, but hang on to the sum and maximum of the hits columns before removing the rows. We will need a field to store those values in, and we will need to add a GROUP BY clause to the TTL clause that rolls up the sum and maximum:
Some notes on the hits table:
- The GROUP BYcolumns in theTTLclause must be a prefix of thePRIMARY KEY, and we want to group our results by the start of the day. Therefore,toStartOfDay(timestamp)was added to the primary key
- We added two fields to store the aggregated results: max_hitsandsum_hits
- Setting the default value of max_hitsandsum_hitstohitsis necessary for our logic to work, based on how theSETclause is defined
Implementing a hot/warm/cold architecture
If you are using ClickHouse Cloud, the steps in the lesson are not applicable. You do not need to worry about moving old data around in ClickHouse Cloud.
A common practice when working with large amounts of data is to move that data around as it gets older. Here are the steps for implementing a hot/warm/cold architecture in ClickHouse using the TO DISK and TO VOLUME clauses of the TTL command. (By the way, it doesn't have to be a hot and cold thing - you can use TTL to move data around for whatever use case you have.)
- The TO DISKandTO VOLUMEoptions refer to the names of disks or volumes defined in your ClickHouse configuration files. Create a new file namedmy_system.xml(or any file name) that defines your disks, then define volumes that use your disks. Place the XML file in/etc/clickhouse-server/config.d/to have the configuration applied to your system:
- The configuration above refers to three disks that point to folders that ClickHouse can read from and write to. Volumes can contain one or more disks - we defined a volume for each of the three disks. Let's view the disks:
- And...let's verify the volumes:
- Now we will add a TTLrule that moves the data between the hot, warm and cold volumes:
- The new TTLrule should materialize, but you can force it to make sure:
- Verify your data has moved to its expected disks using the system.partstable:
The response will look like:
Related Content
- Blog & Webinar: Using TTL to Manage Data Lifecycles in ClickHouse
