Introducing Last Mile ETL: One Tool for Better Data Transformation
Written by Patrik Braborec |
Imagine you connect a database to analytics and the data is not in the shape you would like to have it. For example, coordinates should be separated by latitude and longitude, some values are in a different format or type, and some tables can even have a bad design. Normally, you would need to spend a lot of time context-switching between the ELT/ETL pipeline and analytics, where, in the transformation (T) phase you would need to edit code that transforms data from the application shape to the analytics shape. This experience is sort of normal in our industry but does it have to be? For this reason, we are introducing Last Mile ETL! With it, you can do everything in one tool which significantly improves the ability to iterate on and the speed of development, customization, and security. You may now ask how? Well, enough words, it’s time for an example. Let’s deep dive into it!
Exploratory Analytics With Last Mile ETL
What do I mean by "exploratory analytics"? Let’s say, we have just three tables with data, and we would like to explore its value or learn some facts based on this data. It means that instead of one concrete goal to achieve, we will primarily try to explore some value in this data! With defined exploratory analytics, here are three tables in a database (Airports
, Country list
, and GDP
— Gross Domestic Product):
You can see that the coordinates are in one column called coordinates
instead of latitude
and longitude
, or value
in GDP
the table is text rather than numerics. We can address these issues with the help of Last Mile ETL inside the analytics.
Let’s connect the database to the analytics (if you are not familiar with GoodData, I encourage you to check the documentation). The result is the following:
What you can see in the image above are datasets. We can convert a dataset to a so-called SQL dataset:
The SQL dataset gives us the possibility to write SQL queries that are executed directly in the database. Let’s just check what types of airports we have in the database:
It seems that the Airports
table contains several types of airports such as heliports, or even closed airports. Let’s say that I am interested just in the medium and large airports — it’s not a problem at all. I don’t need to go to the ELT/ETL pipeline, I simply create a new SQL dataset called Airports transformed
directly in the analytics, and I can do that with the following SQL code:
You can see that I can do it with quite a simple SQL query and the result is the following: