History Table

Software Engineering
January 29, 2024

History Table (Slowly Changing Dimension Table Type 4)

Artie is a real-time database replication solution that leverages change data capture (CDC) and streams processing to enable sub-minute latency. Today, Artie maintains a 1:1 copy of your source tables to your data warehouse.

With History Table, Artie will create a separate table that stores all historical changes of your source. Every data mutation (insert, update, delete) will be recorded along with a timestamp and the operation type.

History Mode reuses from the same topic for replication and uses a different Kafka consumer group.

Example

Imagine you have a customers table with id and name. If History mode is enabled, we will create a separate table called customers__history with additional columns: 1. __db_updated_at 2. __artie_operation

Let's assume the following happened:
INSERT INTO customers (id, name) VALUES (1, 'Alice');
INSERT INTO customers (id, name) VALUES (2, 'Bob');
-- Updated Alice to include her last name
UPDATE customers SET name = 'Alice Smith' WHERE id = 1;
id name __db_updated_at __artie_operation
1 Alice 2021-01-01 00:00:00 CREATE
2 Bob 2021-01-01 00:00:00 CREATE
1 Alice Smith 2021-01-01 00:01:23 UPDATE

Snapshots vs CDC for History Tables

A common way for companies to create variations of history tables is to perform regular snapshots of their current table. An example could be to take a daily snapshot. However, an obvious downside is that only daily changes are recorded and intraday changes are lost .

One major benefit of leveraging CDC is that every single mutation is captured , which provides a comprehensive view of all your data mutations.

What are Slowly Changing Dimension Tables?

As a brief overview, there are four types of Slowly Changing Dimension (SCD) Tables. SCDs are a common data warehousing design pattern used to manage and track changes in dimension data over time. They are particularly useful for historical reporting and analysis.

Type Summary Use Case
1 When a change occurs, the existing record is updated with new information, which overwrites old data. When historical data is not necessary.
2 A new record is created with updated values while keeping old records. Each record is tagged with version or effective dates to distinguish between the rows. Used when it's important to preserve and maintain a full history of changes.
3 This table includes an additional column that stores the previous values. Useful when it's necessary to track the current and immediate preceding value of an attribute.
4 Uses a separate table to store changes. The main table holds the current data while history table keeps a record of all changes. Used when it's important to keep operational dim table lean and performant while still maintaining a detailed change history in a separate table.

Using the prior example, here are how each of the types of history tables would look like. Note that Type 4 is the only type which creates a separate table, in addition to the current table.

Type 1

id name __db_updated_at __artie_operation
1 Alice Smith 2021-01-01 00:01:23 CREATE
2 Bob 2021-01-01 00:00:00 CREATE

Type 2

id name __db_updated_at __artie_operation
1 Alice 2021-01-01 00:00:00 CREATE
2 Bob 2021-01-01 00:00:00 CREATE
1 Alice Smith 2021-01-01 00:01:23 UPDATE

Type 3

id name __db_updated_at __artie_operation previous_name
1 Alice Smith 2021-01-01 00:01:23 CREATE Alice
2 Bob 2021-01-01 00:00:00 CREATE

Type 4

customers

id name
1 Alice Smith
2 Bob

customers__history

id name __db_updated_at __artie_operation
1 Alice 2021-01-01 00:00:00 CREATE
2 Bob 2021-01-01 00:00:00 CREATE
1 Alice Smith 2021-01-01 00:01:23 UPDATE

Artie leverages SCD Type 4 for simplicity and performance

Our initial launch of History Table will support SCD Type 4, which is extremely useful when organizations need to balance fast access to current data with the need to maintain a detailed history of changes.

By maintaining a separate history table, SCD Type 4 ensures all historical data is preserved for detailed historical analysis without impacting the performance of the main current table. Queries against the current table will remain fast and efficient. In addition, this will keep it straightforward for customers to access current data, without having to filter through historical records and materialize the most current view. Since changes are tracked explicitly in a separate table, SCD Type 4 also reduces the risk of accidental data loss due to overwrites, which can happen in SCD Type 1.

Of course, SCD Type 4 has some downsides. Maintaining a separate history table requires additional storage space and increases data management overhead to make sure current and history tables are properly synced. As such, it’s important to think about what historical data would benefit the business and day to day operations, and enable SCD Type 4 history tables for those select table(s).

Industry use cases for History Table

An ecommerce company uses SCD Type 4 History Table to manage its product catalog and inventory levels across various warehouses. A current table is maintained with up-to-date information about each product, including price, description, and current inventory levels in each warehouse. This enables quick access to the latest product and inventory details, which is crucial for managing online sales, order fulfillment, and warehouse stock. A separate history table captures all changes made to the product and inventory levels over time. This historical data is invaluable not only for analyzing trends in product pricing and popularity but also for tracking inventory movement, identifying patterns in stock discrepancies, and improving warehouse reconciliation practices. By employing SCD Type 4, the company ensures that its website and inventory management system reflect accurate current information, while also maintaining a detailed record for analytical and operational improvement purposes.

Fintech - track consumer financial health

A consumer fintech company utilizes SCD Type 4 History Table to track customer financial health and delinquency rates, maintaining a current table for real-time indicators like credit scores, loan statuses, and account balances, and a separate history table for recording changes over time. This dual-table approach ensures immediate access to the latest financial health data for risk management, while the historical data provides insights into trends in customer behavior, enabling predictive modeling and strategic decision-making. By leveraging this method, the company enhances its capability to manage credit risk, tailor customer engagement, and optimize its financial product offerings based on comprehensive customer data.

Consumer subscription business - track changes in status and preferences

A consumer subscription company employs SCD Type 4 History Table to manage subscriber data and preferences, maintaining a current table with up-to-date subscriber details such as active plans, preferences, and account statuses. Simultaneously, a history table captures every change in subscription plans and preferences. This enables the company to swiftly adapt to subscriber demands in real-time while using historical insights to analyze subscription trends, predict churn rates, and tailor marketing strategies. By leveraging SCD Type 4, the company not only ensures accurate, real-time service delivery but also gains a deep understanding of subscriber behavior over time, improving customer satisfaction and retention.

Sign up to try Artie’s History Table

Artie Cloud has a one-click option to enable History Table. Customers can enable this under table advanced settings.

If you’re interested in Artie’s History Table, please contact us.

Table of contents
    Author
    Jacqueline Cheong
    Co-founder & CEO