Basic time – related machine learning models

Introduction

With data that have time-related information, time features can be created to to possibly add more information to the models.

Since how to consider time series for machine learning is a broad topic, this article only aims to introduced basic ways to create time features for those models.

Type of data that is expected for this application

It is expected that transaction data type or any kind that similar to it will be the common one for this application. Other kinds of data that have timestamps information for each data points should also be applicable to some extent of this approach well.

Considering before attempt: a need of analyzing the problem and scope

For data with time element, it can be presented as a time series, which is how a set of data points described an entity are follow ordered indexes of time. One aspect considered for time series is that observations is expected to depend on previous one in time sequence, with the later is correlated to the one before. In those cases, using time series models for forecasting is a straightforward approach to use this data. Another way to approach it is to use feature engineering to transform data to have features that can be used for supervised machine learning model, which is the focus of this article.

Using time series model or adapting machine learning model is depended. In some cases, domain knowledge or business requirement will influence this decision.  It is better to analyze the problem first to see the need of using either one or both type of models.

Regardless of the domain knowledge  or business requirement aspects, the approach decision should have always considering the efficiency the approach will bring for in term of accuracy and computation cost.

Basic methods

A first preprocessing step to have a first set of time features: extracting time information from timestamp

The most straightforward thing to do is to extract basic time units, which for instance are hours, date, month, year into separates features. Another kind of information that can also be extracted is the characteristic of the time period, which could be whether the time is at a part of days (morning, afternoon), is weekend or not or is it a holiday, etc.

In some business requirements or domains’ aspects, those initial features at this level is already needed to see if the value of observation is follow those factors or not. For example, the data is the record of timestamps of customer visiting a shop and their purchase. There is a need to know at which hours, date, month… a customer would come and purchase so that follow up actions can be made to increase sales.

Aggregate techniques

Regarding feature engineering for time data, the well-known technique that commonly used is aggregate features by taking statistics (variance, max, min, etc.) of the set of values grouped by a desired time unit: hours, days, months…

Apart from that, a time window could be defined and compute aggregate by rolling or expanding time window.

  • Rolling: have a fixed time window size and to predict a value for the data point at a time, features will be computed from aggregating backward number of time steps according to the time window
  • Expanding: from the data point, the window will be the whole record of past time steps.

There are also two aspects of aggregating:

  • Aggregating to create new features for the current data points. For the first case, the model is considered to include the time series characteristic, meaning a moment will likely to be related by other moments from the recent past.
  • Aggregating to create a new set of data points with corresponding new set of features from the current ones. For the second one, the new number of data points considering for the model is changed and each new data point is the summary of information from a subset of initial data points. As a results, objects for the models may be shifted like being mentioned in considering before part. If the data only about information record of one entity, or in other word only contains one time series of an entity, through this techniques, the new computed data points can be the summary of other features’ value in the chosen time unit. On the other hand, if there are more entities observed in the data set, each new data points is then the summary information of each observed entities.

How to decide on the focus objects for the problem and the approach is situational but for fresh problem and fresh data with no specific requirement or prior domain knowledge, it is better to consider all of them for the model and execute feature selection to see if the created time features is of any value.

Dealing with hours of a day – Circular data

For some needs, specific time of a day is required to be focus at. A use case of of detecting fraud transactions is a good example for this. To find something like the most frequent time that a kind of behaviour is performed for instance, using arithmetic mean mean may be misleading and is not a good representation. An important point need to be considered is that hours of day is a circular data and it should be represented on a circular axis with its’ value between 0 to 2π. To have a better representation of the mean, using von Mises distribution to have periodic mean is a suitable approach for this situation (Mishtert, 2019).

Validation for the model

Before the model building, a validation set is needed to be selected from the data first. In the usual cases, to avoid overfitting data will be randomly shuffled and then will be divided into training set and validation set. However, for this kind of situations it shouldn’t be done so to avoid the mistake of having past data in the validation and the future data in the training, in other words using future data to predict the past.