Go back to Blog's hub Blog   |   tags:  

Introducing Last Mile ETL: One Tool for Better Data Transformation

Written by Patrik Braborec  | 

Share
Introducing Last Mile ETL: One Tool for Better Data Transformation

Imagine you connect a database to analytics and the data is not in the shape you would like to have it. For example, coordinates should be separated by latitude and longitude, some values are in a different format or type, and some tables can even have a bad design. Normally, you would need to spend a lot of time context-switching between the ELT/ETL pipeline and analytics, where, in the transformation (T) phase you would need to edit code that transforms data from the application shape to the analytics shape. This experience is sort of normal in our industry but does it have to be? For this reason, we are introducing Last Mile ETL!  With it, you can do everything in one tool which significantly improves the ability to iterate on and the speed of development, customization, and security. You may now ask how? Well, enough words, it’s time for an example. Let’s deep dive into it!

Exploratory Analytics With Last Mile ETL

What do I mean by "exploratory analytics"? Let’s say, we have just three tables with data, and we would like to explore its value or learn some facts based on this data. It means that instead of one concrete goal to achieve, we will primarily try to explore some value in this data! With defined exploratory analytics, here are three tables in a database (Airports, Country list, and GDP — Gross Domestic Product):

Tables in a database
Tables in a database

You can see that the coordinates are in one column called coordinates instead of latitude and longitude, or value in GDP the table is text rather than numerics. We can address these issues with the help of Last Mile ETL inside the analytics.

Let’s connect the database to the analytics (if you are not familiar with GoodData, I encourage you to check the documentation). The result is the following:

The result of connected database to analytics (GoodData)

What you can see in the image above are datasets. We can convert a dataset to a so-called SQL dataset:

Converting dataset to SQL dataset

The SQL dataset gives us the possibility to write SQL queries that are executed directly in the database. Let’s just check what types of airports we have in the database:

SQL query from analytics

It seems that the Airports table contains several types of airports such as heliports, or even closed airports. Let’s say that I am interested just in the medium and large airports — it’s not a problem at all. I don’t need to go to the ELT/ETL pipeline, I simply create a new SQL dataset called Airports transformed directly in the analytics, and I can do that with the following SQL code:

New dataset Airports transformed

You can see that I can do it with quite a simple SQL query and the result is the following:

Airports transformed

Unfortunately, there is one more problem we need to solve. The coordinates are not in the format we need for analytics. We need to break this column into longitude and latitude to render a geo chart. Let’s edit the Airports transformed dataset:

Longitude and latitude columns

Now, with this little transformation, we can simply render a geo chart to see the all airports in the whole world:

Geo chart with transformed longitude and latitude columns

As this was rather simple, let’s explore something more valuable — what is the correlation between GDP and the number of airports in a country?

First, we can just list the values of Airports and GDP tables:

Values of the GDP table

Values of the Airports table

The bad news is that the Airports table does not have the full country names (for example, Albania) but only iso codes of countries (for example, AL). On the other hand, the GDP table does not have iso codes but only has the full name of the country. Sadly, there is one more issue with the column value. You can see that the values contain commas and this is not the correct format for numbers. Let’s solve the mentioned problems. We can start with the Country list table where both the full names and the iso codes are found:

Values of the Country list table

Now, we just need to create a new SQL dataset for the GDP where we join Country list to add the missing iso code (or as I named it — country code), and remove commas from the column value and convert it to numeric type:

SQL dataset GDP

Perfect, we have the GDP SQL dataset that we need! Also, to count the airports correctly, I will add an identifior to the Airport transformed SQL dataset:

Add “id” to SQL dataset Airports transformed

Both of the SQL datasets now contain country codes, we can simply join them based on these country codes:

Join the  and

The result is as follows. I also removed the Country list dataset as it does not have any value for us right now:

The ‘Airports transformed’ and ‘Gdp’ SQL datasets

Let’s move to the Analyze tab to find out what the correlation between the GDP and the number of airports in the country is. First of all, we need to calculate the number of airports in the country (for more information, check how to create metrics in GoodData):

Create metric for the number of airports

The last step is to calculate the correlation between GDP and the number of airports in the country:

Create a metric for the correlation between GDP and the number of airports

Everything is transformed and computed, the result is as follows:

Result of the correlation between GDP and the number of airports

Conclusion

Together we explored what Last Mile ETL is, and how you can benefit from it. In short, you can do everything within the analytics and, as a result, avoid the need for context-switching. On top of that, you can benefit from the security implications — that being, you do not need to give access to the database in order to make simple changes within the analytics. A further plus point is the ability to iterate, meaning that you can start with something simple, and then move to more complex results. Finally, thanks to GoodData's analytics-as-code approach you can easily version everything in Git and thus apply software engineering best-practices to the analytics.

Why not try our 30-day free trial?

Fully managed, API-first analytics platform. Get instant access — no installation or credit card required.

Get started

Written by Patrik Braborec  | 

Share
Go back to Blog's hub Blog   |   tags:  

Related content

Read more

Subscribe to our newsletter

Get your dose of interesting facts on analytics in your inbox every month.

Subscribe