For modern businesses, you must quickly transform from raw information into valuable insights—but the process can be slow. Modern tools like Matillion ETL use the cloud’s power to speed it up.
Since about 2013, businesses have turned to cloud data warehouses to make data accessible for analysts. With the help of solutions known as ETL (Extract-Transform-Load), they extracted raw data, transformed it into a usable form, and then loaded it into the cloud for data analytics.
Once the transformed data was in the cloud data warehouse, the business was golden. However, they housed ETL tools in specialized, high-end hardware in the business’s data center, making the transformation process complex, inflexible, costly—and slow.
Recently, cloud migration software company Matillion flipped the tables on traditional cloud migration. Instead of transforming the data before moving it to the cloud, Matillion’s solution moves raw data directly into the cloud data warehouse. There, the software uses the cloud’s nearly infinitely scalable compute power to transform the data in the cloud.
Essentially, Matillion has turned “ETL” into “ELT,” though they market the solution as Matillion ETL. But let’s not get hung up on abbreviations. Instead, let’s take a closer look at the benefits of Matillion ETL.
The Many Benefits of Matillion ETL
Specifically designed for cloud data warehouses, Matillion’s software is the only ETL solution that integrates with Amazon Redshift, Microsoft Azure Synapse, Snowflake and Google BigQuery. Its broader number of integrations makes the most of its many benefits, allowing you to perform powerful transformations on data before feeding it into your preferred business intelligence (BI) tools, such as Tableau, Looker and Power BI.
What’s more, you can combine simple transformations such as Filter, Join, Rank and more to solve for complex business logic, all while ensuring you’re using your resources to their full potential by scheduling jobs to run only when resources are available.
If this sounds complicated, rest assured. Matillion’s browser-based, graphical user interface (GUI) makes it easy to extract and load data without coding and from a wide range of data sources. For example, you could extract and load from on-premises, cloud databases, SaaS applications, NoSQL or spreadsheets. Matillion’s REST API even allows for custom data source integrations.
All of this adds up to Matillion’s most important benefit—speed. Users can learn the Matillion solution in a matter of hours, rather than the weeks traditional ETL tools can take. In fact, you can learn some of Matillion’s more than 20 data read, write, join and transform components in minutes. Then, automated data transformations and real-time feedback allow the tool to process millions of rows of data in seconds.
Seeing it in Action
My journey with Matillion started when my client wanted to migrate their existing data stored on Oracle data to Snowflake. They needed an ETL tool to rewrite the required business logics. The client had no prior experience with Matillion, but they had identified it as the most compatible tool for Snowflake.
I was new to Matillion, too, and I was very keen to work on the emerging tool. I had earlier worked on various ETL tools, but this would be my first time working with an AWS Tool. Using use cases from my prior ETL experience with various data analytics projects, I gave the client a deeper understanding of Matillion quite easily.
My client benefitted from my research. They quickly used every Matillion component in their migration project. As a newer tool, one challenge was the limited amount of online documentation available. I would advise any future Matillion user to take advantage of the 14-day trial, Matillion Partnership, which includes connection credentials. The credentials allow you to learn the tool and explore features not yet documented online.
Let’s take a closer look at the features that make Matillion so easy to use.
What Drives Matillion’s Benefits
Matillion is a browser-based user interface that uses the cloud platform’s native SQL capabilities to transform data as needed. It also provides an abstract layer for data pipeline design and loads the data using cloud platform-specific capabilities.
We can divide Matillion’s components into two broad categories:
- Orchestration: These components include data ingestion components (i.e., “EL” of the ELT) and supporting components such as “AND, “OR”, “IF,” “ITERATOR,” and so on. It helps control the execution flow, such as control flow and error handling.
- Transformation: Once the data is loaded, the Transformation components primarily aid in calculations, aggregation, filtering and more. The tool then drives these through Pushdown optimization, using native capabilities of the cloud data warehouse.
Within these categories, features include:
- Component support for cloud platform-specific alerts
- Log centralization using AWS Cloudwatch (other platform-specific log supports, such as Azure Monitor and Google Stackdriver, are in the works)
- Shared jobs created for the entire data pipeline or workflow. Shared jobs are like custom components you can use in various other jobs, such as Orchestration or Transformation. When triggered, they invoke the entire shared job workflow. Shared jobs are reusable components you can use for common pipeline patterns, such as logging, alerting, auditing and any other workflow pattern.
- Incremental data load supported by timestamps or incremental number sequences in source database columns. (Matillion does not support native database capabilities for identifying changed records like the AWS Data Migration Service does.)
- REST API support for programmatically interacting with Matillion ETL
- Component lists you can customize with the “Python Script” component to set variables and run commands for a specific data warehouse. Variables are name-value pairs, each of which represents a model state that may change during simulation. Variables can be used in all sorts of parameters and expressions, and they allow the user to pass and centralize environment-specific configurations. They are also easily accessible through Python and Bash scripts, where they are substituted in at run time.
Below is a summary of the variables Matillion ETL supports:
Conclusion
Modern businesses need modern tools. Matillion ETL makes the most of the most modern tool of all—the cloud—to make migrating data to the cloud faster and easier. No doubt many imitators will begin to emerge that use Matillion’s innovative, table-turning approach to cloud computing’s latest transformative tool.