Why ELTs are here to stay

Software Engineering
Updated on
May 16, 2024

Every company nowadays requires data to inform their business decisions. In order to have data be readily accessible and stored in one place, companies have to set up various data integration pipelines.

These are the main components within the data integration stack:

  • Data sources: this could be a service (internal or external), vendor, data management platforms, etc.
  • Data pipelines: to fetch data from various data sources and transfer them to the destination.
  • Data warehouse (DWH): to house and conduct queries  against the various data sources.

The traditional data integration architecture looks like this:

  • Data pipelines extract data from the various sources
  • Data pipelines then transforms it into the specified format
  • Data pipelines then loads the transformed data into a data warehouse

More recently, there has been a shift towards ELTs. With ELTs, the data integration architecture looks something like this:

  • (unchanged) Data pipelines extract data from the various sources
  • Data pipelines load the raw dataset into the data warehouse
  • Transformations are then done within the data warehouse

Below is a visualization of the two architectures

Difference between ETL and ELT processes

What are some examples of transformations?

Simplistically, transformations are any type of manipulation on the raw datasets to present a particular finding or view. Here are some examples of typical data transformations:

  • Concatenation (such as combining columns and joining strings)
  • Upper / lower case values
  • Joining data sets 
  • Numerical operations (multiply, divide, add, subtract, etc.)
  • Grouping data (by customer, by month, etc.)
  • Deduplicating data
  • Data obfuscation (scrubbing PII or credit card data)

There are endless amounts of possible transforms. To get a better understanding of what additional transformations may look like, check out all the Snowflake functions!

So, why are ELTs better?

> Transformations are done by the team that works closer with the actual data

Transforms can be implemented with SQL and frameworks like dbt from the data warehouse. This means that business analysis, data scientists, and others will be able to write their own transforms and maintain the definition.

Tools like dbt also provide features such as data lineage which are particularly helpful when performing migrations and troubleshooting.

> ELTs build upon the self-serve data culture and helps free up data engineering time

With the ELT approach, business users will be able to directly create their own transformations within a data warehouse without having to be dependent on the data engineering team. As a business analyst, imagine the pain of having to open a ticket for the data engineering teams to perform new transforms every time you need to create a new view or analyze data differently (see the example in the section below).

ELTs also increase data literacy. Having transformations done in SQL allows for a higher level of comprehension of how the fields are constructed and how the transformations are applied. On the other hand, if the transformations are written in code (as in the traditional ETL process), it may require users to have further understanding from other frameworks and libraries such as Pandas, SciPy, and others, reducing the number of people that are able to understand how the transformations work.

> More flexibility by having additional ways to analyze datasets

With the ETL approach, the process only outputs and uploads the transformed data set and is extremely rigid. As a result of this, it prevents others from performing additional queries and masks the underlying tables that are referenced in the transformation. 

Example: A company has a transformed table that shows the historical monthly sign ups and page views on their website. If the business analyst wants to write an ad-hoc query to get a weekly snapshot, they would need to ask the data engineering team for another ETL job that outputs this in a weekly format.

ELTs greatly simplify and streamline the data integration process. Moreover, it helps free up your data engineering team to work on data pipeline improvements and unlock new data sources. ELTs ultimately help increase data literacy within a company and give stakeholders maximum flexibility in analyzing various datasets.

Artie Transfer helps companies adopt and supercharge ELT workflows by providing real-time data replication from transactional databases to your data warehouse. 

To learn more, check out Artie Transfer’s Github repo, schedule a demo here or drop us a note at hi@artie.so!

Table of contents
    Author
    Robin Tang
    Co-founder & CTO