Basic time – related machine learning models

Introduction

With data that have time-related information, time features can be created to possibly add more information to the models.

Since how to consider time series for machine learning is a broad topic, this article only aims to introduced basic ways to create time features for those models.

Type of data that is expected for this application

It is expected that transaction data type or any kind that similar to it will be the common one for this application. Other kinds of data that have timestamps information for each data point should also apply to some extent of this approach well.

Considering before attempt: a need of analyzing the problem and scope

For data with the time element, it can be presented as a time series, which is how a set of data points described by an entity are follow ordered indexes of time. One aspect considered for time series is that observations are expected to depend on the previous one in time sequence, with the latter is correlated to the one before. In those cases, using time series models for forecasting is a straightforward approach to use this data. Another way to approach it is to use feature engineering to transform data to have features that can be used for supervised machine learning models, which is the focus of this article.

Using the time series model or adapting the machine learning model is dependent. In some cases, domain knowledge or business requirement will influence this decision.  It is better to analyze the problem first to see the need of using either one or both types of models.

Regardless of the domain knowledge  or business requirement aspects, the approach decision should have always considering the efficiency the approach will bring in terms of accuracy and computation cost.

Basic methods

A first preprocessing step to have the first set of time features: extracting time information from timestamp

The most straightforward thing to do is to extract basic time units, which for instance are hours, date, month, years into separates features. Another kind of information that can also be extracted is the characteristic of the time, which could be whether the time is at a part of days (morning, afternoon), is weekend or not or is it a holiday, etc.

In some business requirements or domains’ aspects, those initial features at this level are already needed to see if the value of observation is followed those factors or not. For example, the data is the record of timestamps of customers visiting a shop and their purchases. There is a need to know at which hours, date, month… a customer would come and purchase so that follow-up actions can be made to increase sales.

Aggregate techniques

Regarding feature engineering for time data, the well-known technique that is commonly used is aggregate features by taking statistics (variance, max, min, etc.) of the set of values grouped by the desired time unit: hours, days, months…

Apart from that, a time window could be defined and compute aggregate by rolling or expanding the time window.

  • Rolling: have a fixed time window size and to predict a value for the data point at a time, features will be computed from aggregating backward number of time steps according to the time window
  • Expanding: from the data point, the window will be the whole record of past time steps.

There are also two aspects of aggregating:

  • Aggregating to create new features for the current data points. For the first case, the model is considered to include the time series characteristic, meaning a moment will likely be related to other moments from the recent past.
  • Aggregating to create a new set of data points with a corresponding new set of features from the current ones. For the second one, the new number of data points considering for the model is changed and each new data point is the summary of information from a subset of initial data points. As a result, objects for the models may be shifted like being mentioned in considering the before part. If the data only about the information record of one entity, or in other words only contains one time series of an entity, through this technique, the new computed data points can be the summary of other features’ value in the chosen time unit. On the other hand, if there are more entities observed in the data set, each new data point is then the summary information of each observed entity.

How to decide on the focus objects for the problem and the approach is situational but for a fresh problem and fresh data with no specific requirement or prior domain knowledge, it is better to consider all of them for the model and execute feature selection to see if the created time features are of any value.

Dealing with hours of a day – Circular data

For some needs, a specific time of the day is required to be focus. A use case for detecting fraud transactions is a good example of this. To find something like the most frequent time that a kind of behavior is performed, for instance, using the arithmetic mean may be misleading and is not a good representation. An important point that needs to be considered is that hours of the day is a circular data and it should be represented on a circular axis with its’ value between 0 to 2π. To have a better representation of the mean, using von Mises distribution to have periodic mean is a suitable approach for this situation (Mishtert, 2019).

Validation for the model

Before the model building, a validation set is needed to be selected from the data first. In the usual cases, to avoid overfitting data will be randomly shuffled and then will be divided into a training set and validation set. However, for this kind of situation, it shouldn’t be done so to avoid the mistake of having past data in the validation and the future data in the training, in other words using future data to predict the past.