Data Analytics and Machine Learning Integration


Summary
Data is one of the most valuable assets a business possesses, and combining analytics with machine learning unlocks even greater value. This article explains how organizations can move beyond describing the past to predicting the future while keeping workflows simple and accessible.
Using the example of stock price prediction, this article demonstrates how data preparation, metric definition, and automated modeling can work together. Readers will see how this approach enables faster experimentation, more accurate forecasts, and a practical path toward advanced analytics.
The Growing Role of Machine Learning
The rise of machine learning applications across industries is indisputable, as nearly every company now looks for ways to leverage technology to accelerate growth. The same is true for data analytics, which has become a foundation for understanding business performance. Together, these practices address the universal need to know what works, what does not, and what might work in the future.
Despite their potential, analytics and machine learning are often complex to implement. Building systems for each typically requires separate teams with specialized skills, making the process costly and time intensive. With the right technologies, businesses can bridge the gap, combining analytics and machine learning in a practical and cost-effective way. The following stock price prediction example illustrates how even advanced use cases can be achieved without relying on large engineering or data science teams.
Prediction of Stock Price with Data Analytics and Machine Learning
The best example of building data analytics and machine learning systems is to show the process in a real use case. As the title suggests, it will be about the stock price prediction. If you have read something about stocks, you might know that predicting stock prices is very difficult and maybe even impossible. The reason is that there are tons of variables that can influence the stock price. You might ask yourself, why bother with something like this if it is almost impossible? Well, the example I will show you is quite simple (please note that it is only for demo purposes), but at the end of the article, I want to share my idea of how the whole stock price prediction/analysis might be improved. Now, let’s move to the next section with an overview of the architecture of the mentioned example.
Overview of Architecture
You can imagine the whole architecture as a set of four key parts. Every part is responsible just for one thing, and data flows from the beginning (extract and load) to the end (machine learning).
The solution I built for this article runs only locally on my computer, but it can be easily put, for example, into a CI/CD pipeline — if you are interested in this approach, you can check my article How to Automate Data Analytics Using CI/CD.
Part 1: Extract and Load
The extract part is done with the help of RapidAPI. RapidAPI contains thousands of APIs with easy management. The best part of RapidAPI is that you can test single APIs directly in the browser, which lets you find the best API that fits your needs very easily. The load part (load data into a PostgreSQL database) is done by a Python script. The result of this part is schema input_stage with column data which is type JSON (API response is JSON content type).
Part 2: Transform
The data is loaded to a PostgreSQL database into a JSON column, and that’s not something you want to connect to analytics — you would lose information about each item. Therefore the data needs to be transformed, and with dbt it is quite easy. Simply put, the dbt executes SQL script(s) upon your database schemas and transforms them into desirable output. Another advantage is that you can write tests and documentation, which can be very helpful if you want to build a bigger system. The result of this part is the schema output_stage with transformed data ready for analytics.
Part 3: Analytics
Once the data is extracted, loaded, and transformed it can be consumed by analytics. GoodData gives the best possibility to create metrics using MAQL (proprietary language for metrics creation) and prepare reports that are used to train an ML model. Another advantage is that GoodData is an API-first platform, which is great because it can fetch data from the platform. It is possible to use the API directly or to use GoodData Python SDK that simplifies the process. The result of this part are reports with metrics used to train an ML model.
Part 4: Machine Learning
PyCaret is an open-source machine learning library in Python that automates machine learning workflows. The library significantly simplifies the application of machine learning. Instead of writing a thousand lines of code where deep domain knowledge is necessary, you write just a few lines, and being a professional data scientist is not a prerequisite. I would say that in some way it is comparable to AutoML. Still, according to PyCaret documentation, they focus on the emerging role of citizen data scientists — citizen data scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would have previously required more technical expertise.
Why not try our 30-day free trial?
Fully managed, API-first analytics platform. Get instant access — no installation or credit card required.
Get startedExample of Implementation
The following section describes key parts of the implementation. You can find the whole example in the repository gooddata-and-ml — feel free to try it on your own! I added notes to README.md on how to start.
Please just note that to run the whole example successfully, you will need to have a database (such as PostgreSQL) and an account in GoodData — you can use either GoodData Cloud with a 30-day trial.
Step 1: Extract and Load
To train an ML model, you need to have historical data. I used Alpha Vantage API to get historical data on MSFT stock. The following script needs the RapidAPI key and host — I mentioned above, RapidAPI helps with management of the API. If fetch of API is successful, the get_data function will return data which will be loaded into the PostgreSQL database (to schema input_stage).
Step 2: Transform
From the previous step, data is loaded to input_stage and can be transformed. As discussed in the architecture overview section, dbt transforms data using an SQL script. The following code snippet contains the transformation of loaded stock data, note that it is important to extract data from the JSON column and conversion to single database columns.
Step 3: Analytics
The most important step is the metric definition using MAQL — for the demonstration, I computed simple metrics that compute on the fact close (the price of the stock when the stock market was closed) simple moving average (SMA). The formula for SMA is as follows:
An = the price of a stock at period n
n = the number of total periods
SMA and other metrics are used by people who invest as a technical indicator. Technical indicators can help you determine if a stock price will continue to grow or decline. It is computed as the average of a range of prices by the number of periods within that range. The definition of the SMA metric using MAQL is the following (you can see that I selected range 20 days):
The ML model will not be trained just on this one metric but on the whole report. I created the report using GoodData Analytics Designer with simple drag and drop experience:
Step 4: Machine Learning
Last step is to get data from GoodData and train an ML model. Thanks to the GoodData Python SDK, it is just a few lines of code. The same applies to the ML model, thanks to PyCaret. The ML part is done by two function calls: setup and compare_models. Setup initializes the training environment. Compare_models function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of the compare_models function is a scoring grid with average cross-validated scores. Once training is done, you can call the function predict_model, which will predict the value (in this case, the close price of the stock) — see the next section for a demonstration.
Demo Time
The demonstration is just for the last step (machine learning). If you run the script for machine learning mentioned above, the first thing you will see is printed data from the GoodData:
Immediately after that, PyCaret inferred data types and ask you if you want to continue or not:
In case everything is alright, you can continue and PyCaret will train models and then pick the best one.
For prediction of data, the following code needs to be executed:
The result is as follows (Label is the predicted value):
That’s it! With PyCaret it is very easy to start with machine learning!
Conclusion
At the beginning of the article I teased an idea for an improvement that I think might be pretty cool. In this article, I demonstrated a simple use case. Imagine if you add data from multiple other APIs/data sources. For example, news (Yahoo Finance, Bloomberg, etc.), Twitter, and LinkedIn — it is known that the news/sentiment can influence the stock price, which is great because these AutoML tools give the possibility for sentiment analysis. If you combine all this data, train multiple models on top of it, and display the results in analytics, you can have a handy helper when investing in stocks. What do you think about it?
Thank you for reading! I want to hear your opinion! Let us know in the comments, or join our GoodData community slack to discuss this exciting topic. Do not forget to follow GoodData on medium to avoid missing any new content. Thanks!
Why not try our 30-day free trial?
Fully managed, API-first analytics platform. Get instant access — no installation or credit card required.
Get startedIf you are interested in GoodData.CN, please contact us. Alternatively, sign up for a trial version of GoodData Cloud: https://www.gooddata.com/trial/
FAQs About Data Analytics and Machine Learning Integration
Analytics helps companies understand what is happening today, while machine learning predicts what may happen in the future. Together, they create a more complete decision-making framework.
Metrics created in a governed analytics environment ensure consistent and reliable calculations. When these are used as features in machine learning models, predictions are based on trustworthy data.
Automation handles repetitive steps such as model training, evaluation, and comparison. This makes it easier for analytics teams to experiment with different models and reach useful results more quickly.
Yes. Incorporating data such as news, sentiment, or market signals can enrich models and increase their accuracy. Adding external sources allows businesses to capture a wider range of factors that influence outcomes.
Modern tools such as automated modeling frameworks and integrated analytics platforms reduce the need for advanced coding or multiple specialized roles. This allows smaller teams to take advantage of predictive capabilities.