Time-series is kind of a problem that every Data Scientist/ML Engineer will encounter in the span of their careers, more often than they think. So, it’s an important concept to understand in-out.
You see, time-series is a type of data that is sampled based on a time-based dimension like days, months, years, etc. We term this data as “dynamic” as we’ve indexed it based on a DateTime attribute. This gives data an implicit order. Don’t get me wrong, static data can still have an attribute that’s a DateTime value but the data will not be sampled or indexed based on that attribute.
When we apply machine learning algorithms on time-series data and want to make predictions for the future DateTime values, for e.g. predicting total sales for February given data for the previous 5 years, or predicting the weather for a certain day given weather data of several years. These predictions on time-series data are called forecasting. This contrasts with what we deal with when working on static data.
In this blog we’re going to talk about:
- 1How is time-series prediction i.e forecasting different from static machine learning predictions?
- 2Best practices while working on time series forecasting
Time-series data vs static ML
So far we’ve established a baseline on how we should perceive time-series data as compared to static data. In this section, we are going to talk about the difference in approaching both of these types of data.
Note: For the sake of simplicity we assume data to be continuous in all cases.
Imputation of missing data
Imputation of missing data is a key preprocessing step in any tabular machine learning project. In static data, techniques like Simple Imputation where you can fill missing data with mean, median, mode of the data depending on nature of the attribute, or more sophisticated methods like Nearest Neighbour imputation where you employ a KNN algorithm to identify missing datums.
However, In time-series, missing data looks something like this:
You have these visible gaps in the data that can’t be logically filled with any of the imputation strategies that can be used on static data. Let’s discuss some techniques that can be useful:
- Why not fill it with mean? Static mean doesn’t do us any good here since it makes no sense to fill your missing values by taking cues from the future. In the plot above, it’s quite intuitive that gaps between 2001-2003 can logically be filled with only historical data i.e. pre-2001 data.
In Time-Series data, we use something called rolling mean or moving average or window mean which is taking mean of values pertaining to a predefined window for e.g., a 7-day window or a 1-month window. So, we can utilize this moving average to fill in any missing gaps in our time-series data.
Note: Stationarity plays an important role when working with averages in time-series data.
- Interpolations are quite popular: Utilizing the implicit order that Time-Series data has, interpolation is quite often the go-to method for devising the missing parts in the Time-Series data. Interpolations, in brief, use the value present before and after the missing point to calculate the missing datum For eg, Linear interpolations work by calculating a straight line between the two points, averaging them, and getting the missing datum.
There are many types of interpolations available like Linear, Spline, Stineman. Their implementations are given in almost all major modules like python’s pandas interpolate() function and R imputeTime-Series package.
Although, interpolation can also be used in static data as well. However, it isn’t widely used since there are more sophisticated imputations techniques in static data (some of which are explained above).
- Understanding the business use-case: This is not any technical method to deal with missing data. But I feel it’s the most underrated technique which can give results quickly. This involves understanding the problem at hand and then devising which method would work best. After all, SOTA might not be SOTA on your use case. For eg, Sales data should be treated differently than say stocks data, with both having a different set of market metrics.
By the way, this technique is common between static as well as time-series data.
Feature engineering in time-series model
Working with features is another major step that differentiates time-series data from static. Feature engineering is a broad term that encapsulates a variety of standard techniques and ad-hoc methods. Features are handled differently in time-series data as compared to static data.
Note: One might argue that imputation comes under Feature engineering, which is not wrong but I wanted to explain this under a separate section to give you a better idea.
In static data, it’s highly subjective on the kind of problem at hand but a few standard techniques include Feature Transformations, Scaling, Compression, Normalization, Encoding, etc.
Time-series data can have other attributes apart from time-based features. If those attributes are time-based then the resulting time-series would be multivariate and if static, the resulting would be univariate with static features. Non-time-based features can utilize methods from the static techniques in a way that doesn’t hinder the integrity of the data.
All time-based components have a definitive pattern that can be devised using some standard techniques. Let’s look at some of the techniques that prove useful while working with time-based features.
Time-series components: what is the main characteristic of time-series data
For starters, every time-series data has time-series components. We do an STL decomposition (Seasonal and Trend decomposition using Loess) to extract some of these components. Let’s take a look at what each of these means.
- Trend: Time-series data shows a trend when its value variably changes with time, an increasing value shows a positive trend and decreasing, a negative trend. In the plot above, you can see a positive increasing trend.
- Seasonality: Seasonality refers to a property of time-series that displays periodical patterns repeating at a constant frequency. In the example above, we can observe a seasonal component with the frequency being 12 months, which broadly means that the periodical pattern repeats every twelve months.
- Remainder: After extracting Trend and Seasonality from the data, the remaining is what we call remainder (error) or Residual. This actually helps in anomaly detection in time-series.
- Cycle: Time-series data is termed cyclical when there are trends with no set repetitions or seasonality.
- Stationarity: Time-series data is stationary when its statistical features do not change over time i.e. a constant mean and standard deviation. The covariance is independent of time.
These components when extracted usually form the basis of the next steps in Feature engineering in time-series data. To put this in perspective of static data, STL decomposition is the descriptive part of the time-series world. There are a few more time-series specific metrics subjective to the type of time-series data like dummy variables when working on stock data.
Time-series components are highly important for analyzing the time-series variable of interest in order to understand its behavior, what patterns it has, and to be able to choose and fit an appropriate time-series model.
Analysis and visualization in time-series models
Time-series data analysis comes with a different blueprint than a static data analysis. As discussed in the previous section, time-series analysis starts with answering questions like:
- Does this data has a trend?
- Does this data contain any sort of pattern or seasonality?
- Is the data stationary or non-stationary?
Ideally, one must proceed further with the analysis after working on the answers to the above questions. Similar to this, static data analysis has some procedures like Descriptive, Predictive and Prescriptive. Although, Descriptive is standard in all problem statements, Predictive and Prescriptive are subjective. These procedures are common in both time-series and static ML. However, many metrics used inside Descriptive, Predictive and Prescriptive are used differently, one of which is, Correlation.
Contrastingly, in time-series data we use something called Autocorrelation and Partial-Autocorrelation. Autocorrelation and Partial-Autocorrelation are both measures of association between current and past series values and indicate which past series values are most useful in predicting future values.
While the approach for analysis is somewhat different between the two data kinds, the core idea is the same, it depends largely on the problem statement. E.g. Stocks and weather data, both are time-series but you can use stock data to predict future values and weather data to study the seasonal patterns. Similarly, using loan data you can use it to analyze patterns of the borrowers or check if a new borrower will default on loan repayment or not.
Visualization is an integral part of any analysis. The differencing question isn’t what should you visualize but how should you visualize.
You see, time-series data’s time-based features should be visualized with one axis of the plot being time and non-time-based features are subjected to the strategy employed to work on the problem.
Time-series forecasting vs static ML predictions
In the previous section, we saw the difference between the two data kinds pertaining to the initial steps and also the difference in approaches while comparing the two. In this section, we’re going to explore the next steps i.e. prediction or in terms of time-series, forecasting.
The choice of algorithms in time-series data is completely different from the one in static data. An algorithm that can extrapolate patterns and encapsulate the time-series components outside of the domain of training data can be considered as a time-series algorithm.
Now, most static machine learning algorithms like Linear regression, SVMs do not have this capability as they generalize the training space for any new prediction. They simply can’t exhibit any behaviour we discussed above.
Some common algorithms used for time-series forecasting:
- ARIMA: It stands for Autoregressive-Integrated-Moving Average. It utilizes the combination of Autoregressive and moving averages to predict future values. Read more about it here.
- EWMA/Exponential Smoothening: Exponentially weighted moving average or Exponential Smoothening serves as an upgrade to the Moving averages. It works by reducing the lag effect shown by moving averages by putting on more weight on values that occurred more recently. Read more about it here.
- Dynamic Regression Models: This algorithm also takes other miscellaneous information into account such as public holidays, changes in law, etc. Read more about it here.
- Prophet: Prophet, which was released by Facebook’s Core Data Science team, is an open-source library developed by Facebook and designed for automatic forecasting of univariate time series data.
- LSTM: Long Short-Term Memory (LSTM) is a type of recurrent neural network that can learn the order dependence between items in a sequence. It is often used to solve time series forecasting problems.
Recommended for you
This list is certainly not exhaustive. Many complex models or approaches such as Generalized Autoregressive Conditional Heteroskedasticity (GARCH) and Bayesian structural time-series (BS time-series) may be very useful in some cases. There are also neural network models like Neural Networks Autoregression (NNAR) that can be applied to time series which use lagged predictors and can handle features.
Evaluation metrics in time-series models
Forecasting evaluation involves metrics like scale-dependent errors such as Mean squared error(MSE) and Root mean squared error (RMSE), Percentage errors such as Mean absolute percentage error (MAPE), Scaled errors such as Mean absolute scaled error (MASE) to mention a few. These metrics are actually similar to static ML metrics.
However, while evaluation metrics help determine how close the fitted values are to the actual ones, they do not evaluate whether the model properly fits the time series. For this, we do something called Residual Diagnostics. Read about it in detail here.
Dealing with outliers/anomalies
Outliers plague almost every real-world data. Time-series and static data take two completely different routes from identification to the handling of outliers/anomalies.
- For identification in static data, we use techniques from Z-score, Boxplot analysis to some advanced statistical techniques like hypothesis testing.
- In time-series we use a range of techniques and algorithms starting from the STL analysis to using algorithms like Isolation forests. You can read about it in more detail here.
- We use methods like Trimming, Quantile based flooring and capping, and Mean/Median Imputation in static data depending on the capacity and problem statement at hand.
- In time-series data, there are a number of options that can be highly subjective to your use case. A few of them are:
- Using replacement: We can compute values that can replace the outlier and will make a better fit for the data. tsclean() function in R will fit a robust trend using loess (for non-seasonal series), or robust trend and seasonal components using STL (for seasonal series) to compute the replacement value.
- Studying the business: This is not a technical approach but an ad-hoc one. You see, identifying and studying the business behind the problem can really help deal with the outlier. Whether or not it is a wise choice to drop it or replace it will come from first studying it in-out.
Best practices while working on time-series data and forecasting
Although there are no fixed steps to be followed while working on time-series and forecasting, there are still some good practices one can employ to get optimal results.
- No One-size-fits-all: No forecasting method performs best for all time-series. You need to understand the problem statement, type of features, and goals before starting to work on forecasting. Some domains you can select algorithms from depending on your need (compute + goals):
- statistical models,
- machine learning,
- and hybrid methods.
- Feature selection: Selection of the features has an impact on the resulting forecast error. In other words, the selection has to be done carefully. There are different methods like correlation analysis also known as ﬁlter, wrapper (i.e., adding or removing features iterative), and embedded (i.e., the selection is already part of the forecasting method).
- Countering Overfitting: During the training of the model, the risk of over-ﬁtting may occur, as the best model does not always lead to the best forecast. To counteract the over-ﬁtting problem, the historical data can be split into train and test data and internal validations can be conducted.
- Data preprocessing: Data should be first analyzed and preprocessed to make it clean for forecasting. Data can contain missing values and as most forecasting methods can’t handle missing values, values have to be imputed.
- Keep the Curse of Dimensionality in mind: When models in training are presented with a lot of dimensions and a lot of potential factors, they can encounter the Curse of Dimensionality, which says that as we have a finite amount of training data and we add more dimensions to that data, we start having diminishing returns in terms of accuracy.
- Working with Seasonal Data Patterns: If there is seasonality in time series data, multiple cycles that include that seasonal pattern are required to make a proper forecast. Otherwise, there is no way for the model to learn the pattern.
- Deal with Anomalies before moving to Forecast: Anomalies can create huge bias in the model learning and more often the results will always be subpar.
- Studying the problem statement carefully: This is probably the most underrated practice especially when you’re just starting to work on a time-series problem. Identify your time-based and non-time-based features, study the data first before moving to any standard techniques.
You’ve reached the end!
We successfully understood the difference in structure and approach between Time-series and static data. The sections listed in this blog are by no means exhaustive. There can be more differences when we move to more granularities pertaining to specific data problems in each. Here are some of my favorite resources you can refer to while studying about time-series:
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
10 mins read | Author Jakub Czakon | Updated July 14th, 2021
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.