Whether on-premises or in the cloud, your data provides a link to the past and a glimpse into the future. Why did you lose past customers? Which current customers should you pay more attention to? Where are your new customers going to come from? In this blog, I’ll talk about the capabilities of Teradata Vantage and its Machine Learning Engine, and how it can help you turn 100% of your data into answers that your business can use to pave a path to the future.
What is time series data?
A time series consists of repeated measurements of a value over time and is commonly seen in areas like revenue forecasting, stock prices, industrial process monitoring, public health, urban growth planning and even audio and video signal processing. The trick with time series data is to not only understand that this data exists, but to accurately analyze it and then take the important steps to ingrate it into your future business decisions.
Data prep functions
Teradata Vantage includes several time series data preparation functions. These functions support very simple applications such as interpolation to fill in gaps in a time series, as well as more sophisticated requirements such as distributing values across a time interval based on different probability distributions.
- Burst function – takes the values for each time interval and combines and distributes those values into new, user-specified intervals that can be whatever you decide to choose – daily, weekly, monthly, quarterly, etc. It creates a single time series, using the specified interval, combining all reported values and dividing them appropriately across the new intervals.
- Interpolator function – fills in missing values in a time series, using either interpolation or aggregation. Interpolation estimates missing values using a mathematical equation based on the nearest known values. Aggregation combines known values within a sliding window to produce an aggregate value – the minimum, maximum, mean, mode or sum of the values in the sliding window.
- SeriesSplitter function – separates a time series into a set of shorter time series, keeping the original ordering and making no other changes. Each new, shorter time series is in a separate partition of the output table.
- SAX (Symbolic Aggregate approXimation) function – creates a simpler representation of a time series and can be used directly for analysis or as input to another time series function. SAX divides a time series into shorter intervals, and replaces each interval with a symbol (or character) that represents the average value across that interval.
Time series clustering and classification
Clustering and classification of time series are important for applications such as monitoring process and sensor data. Is a process falling out of spec? Does a particular machine require maintenance? It’s also useful for understanding customer behavior. What groups of shoppers or subscribers do I have? What are the characteristics of each?
- Dynamic Time Warping (DTW) is commonly used for audio and video analysis and was originally developed for speech recognition applications. It also has many other uses, including identifying process and environmental anomalies, as well as for clustering signals. The idea behind DTW is to compare two time series that are similar, but might move at different speeds.
- Shapelet functions (or shapelets, for short) are another interesting approach to classifying time series and can be used for both classification and clustering. Shapelet clustering can be used to group customers based on purchasing records or usage history. Shapelet classification can be used in areas such as manufacturing, weather forecasting or finance, for prediction or anomaly detection.
Forecasting
There are various models that are often used for forecasting, including Autoregressive Integrated Moving Average (ARIMA), Vector Autoregressive Moving Average with Exogenous Variables (VARMAX) and Volume Weighted Average Price (VWAP).
- ARIMA – builds a model of a time series based on a linear combination of the previous values and previous forecast errors of that time series.
- VARMAX – adds the capability to simultaneously forecast multiple, related time series and models exogenous variables, which are independent time series that represent external impacts on the time series of interest.
- VWAP – this function computes the volume-weighted average price of a traded item (usually an equity share) for each time interval in a series of equal-length time intervals.
Turning analytics into answers
We all want to predict the future, but none of us have crystal balls that are 100% accurate 100% of the time. That said, valuable tools exist within Vantage’s Machine Learning Engine that can analyze past data, thus making that crystal ball of yours slightly less foggy.
With Vantage, you can use state-of-the-art algorithms to generate real-time insights and gain a competitive advantage by using 100% of your data, 100% of the time. That’s Pervasive Data Intelligence!
Andrea Kress is a Staff Data Scientist at Teradata. She has been gleaning insights from data since studying physics in college and applied math in graduate school, and has worked in the semiconductor, energy, and healthcare industries.
View all posts by Andrea Kress