Various aspects of working with time series data — Part 2: Time series analysis — Seasonality, Trends, and Frequencies

Roy Yanovski
9 min readFeb 1, 2024

--

This article is part of a series of articles aimed to discuss issues that every person working with time series data (especially data scientists) should know.

The topics discussed in this series are of different levels of depth and complexity. If you are just starting your journey with TS data, you might take interest in the whole series, and if you are already familiar with TS data, you might want to skip to the later parts, that include more advanced topics.
All of the code examples and discussions are in Python.

Articles in this series
Part 1: Time formats
Part 2: Time series analysis — Seasonality, Trends, and Frequencies
Part 3: Anomalies, Motifs, and Signatures
Part 4: Time series tasks and relevant algorithms

Table of contents

· Introduction to time series analysis
· Trends
· Seasonality
· Frequencies
· Fourier Transform
· Wavelets
· Summary

Introduction to time series analysis

When analyzing Time Series (TS) data, we usually want to look at some interesting features of the data, which are unique to serially ordered data. The aim is usually to identify and extract rules and trends that are related to the cyclicality of the data and/or to the changes over time.
In some kinds of TS data, like daily temperatures on an annual scale for example, we may be interested in seasonality, while in other kinds of TS data like stock prices for example, we may want to look on the trend. If I am looking for a short term profit from my stock I might be more interested in local trends, and if I am going for a long term investment I might prefer looking at the global trends. All of these analyses are based on the premise that the data points in TS data are not independent and follow a certain logic that connects one data point to next one in the sequence.
Note: This introduction contained a few unexplained terms, and there are more terms to come. I assure you, by the time you will finish reading this article, you will understand all of them perfectly. So let’s break it down.

Trends

In TS data, a trend is a significant change (increase or decrease) in value over time. Why significant? because changes happen all the time, but most of them can be defined as “noise”. Noise, in general, includes random, insignificant variations in the data. For example, when working with sensors, we may receive different readings due to technical limitations of the sensor’s accuracy or due to changes in environmental conditions. Some times the reason is not technical and there is a real change but it is just not meaningful enough for us so we classify it as noise.
Trends can be local or global. The difference depends greatly on the time scale that we are examining (the resolution). For example, if we will look at the average daily temperature over a year, we will see the increasing trend from winter to summer and the decreasing trend from summer to winter. These are local oscillations that repeat annually. But if we will look at the last thousand years, we will see an increasing global trend in temperature. The local trends still exist and we can easily identify them when we zoom-in but they might be harder to identify when we zoom-out again.
It’s important to mention again that global trends can be relative and may depend on the resolution of our data and the time frame that we find relevant for our case.

Seasonality

Seasonality is a repeating, time-defined oscillating trend that usually follows a cyclic pattern. A good example for seasonality can be (as the name implies), the four seasons. The annual temperature oscillations between the winter and the summer represent a seasonality pattern. Another example for seasonality can be demonstrated by the hourly temperatures over the course of the day (day vs. night).

When working on new data, one of the first things we would like to look at is the composition of our data. That is, the trend, the seasonality and the noise. It’s not always easy to identify these features just by looking at the data or its graph, but there are a few libraries that can help us like the following statsmodels method.
**Note that this doesn’t always work perfectly.

For this example I chose to use the hourly temperature over a course of 8 months. I then take the measurements every 12 hours so I’ll have two measurements per day.

from datetime import datetime
import matplotlib.pyplot as plt
from meteostat import Point, Hourly

# Set the time period:
start = datetime(2022, 12, 1)
end = datetime(2023, 8, 1)

# Create a Point for Amsterdam:
location = Point(52.3676, 4.9041, 2)

# Get data:
data = Hourly(location, start, end)
data = data.fetch()
temp_data = data.iloc[::12]['temp']

# Plot line chart:
temp_data.plot(figsize=(15, 6))
plt.show()
from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(temp_data, model='additive')
result.plot().set_figwidth(15)
plt.show()

Above we can see (top to bottom): 1) The raw data. 2) the trend of increasing temperature from winter to summer. 3) the daily seasonality. 4). The “noise”.

Frequencies

There are two types of frequencies in TS data. The first one is the frequency of our measurements (i.e. the time interval between one measurement to the next) and it was briefly discussed in the previous article (Part 1: Time formats). The second type, which will be discussed here, is the frequency of a periodically recurring pattern in the data. This is a commonly discussed subject in many engineering and scientific fields, often referred to as the frequency domain. We will not dive into the theory or math behind it, but we will discuss it on a high level and present related concepts named Fourier transform and Wavelet Transform.

When analyzing TS data, identifying the relevant frequencies for our data can provide us many insights and help us to better understand our data. The underlying frequencies tell us the story of the “routine” in our data (unlike the anomalies which tell us different stories which will be discussed the next article).
Let’s try to intuitively understand frequencies through the example of sound (audio signal). Essentially, sounds are changes in air pressure that our ear is able to pickup. These changes can be modeled as sinusoid-like waves that are repeating periodically. The time it takes for a wave to complete a cycle is its frequency. These frequencies are usually measured in Hertz (Hz) which means — the amount of cycles per second.
In the case of sound, what does the frequency tell us?
Different things omit sounds at different frequencies. For example, calm sea waves have frequencies of 0.05–16 Hz while trains produce frequencies that range between 400–4000 Hz. So if we want to better understand what we recorded, or perhaps focus on one specific tone, we need to know the frequencies of the sounds we picked up. As implied by the last sentence, usually natural sounds are composed from more than one frequency.

Let’s see the visual representation of what we discussed:

# We will create two waves with two different frequencies and plot them:
frequency_a = 20
frequency_b = 50
time = 100
cycles_a = time / frequency_a
cycles_b = time / frequency_b


length_a = np.pi * 2 * cycles_a
length_b = np.pi * 2 * cycles_b
wave_20hz = np.sin(np.arange(0, length_a, length_a / time))
wave_50hz = np.sin(np.arange(0, length_b, length_b / time))

plt.plot(wave_20hz)
plt.title('20Hz')
plt.show()
plt.plot(wave_50hz)
plt.title('50Hz')
plt.show()
# Now let's combine them together:
plt.plot(wave_50hz + wave_20hz)
plt.title('Composite signal')
plt.show()

In the cases of composite signals like the one above, the main challenge is to separate these frequencies and identify them both. This can be achieved by applying a method called Fourier Transform.

Fourier Transform

Fourier Transform (FT) decomposes a signal to its frequencies by calculating an integral that expresses the generalization of the complex Fourier series with the limit at infinity. I know the last sentence can be too much for someone with no background but I have no intention of going into the math. If you wish to better understand it (and I think you should), it can be found online, and if you want an intuitive, visual explanation I recommend this one by 3blue1brown.
My aim here is to simply show how to compute FT using Python:

from scipy.fft import rfft, rfftfreq

composite_signal = wave_50hz + wave_20hz
fourier_transform = rfft(composite_signal)

xf = rfftfreq(100, 1 / (frequency_a * frequency_b))

plt.figure(figsize=(15, 5))
plt.plot(xf, abs(fourier_transform))
plt.xticks(np.arange(xf.min(), xf.max() + 1, 20))
plt.show()

The above graph reveals the frequencies from our composite signal as two peaks. Great!

Wavelets

FT is great as it is very precise in identifying the frequency, but it disregards the temporal spread (we know the frequencies but don’t know where are they located in our time sequence). This happens because there is a tradeoff between these two which can be explained by the Heisenberg Uncertainty Principle (but we won’t dive into it here). The Wavelet Transform (WT) is able to balance the two and provide information on both the frequency and the temporal spread of the signal, but it is important to know that this information won’t be extremely accurate. In other words, WT can tell us an approximation of both the frequency of the signal and the time it occurs. It can be useful in cases where we want to find the frequencies, but they are not persistent throughout our entire signal and we want to know their temporal position as well. I read a great example that illustrates it in one of the Q&A forums: If we are examining the light wave signal of a traffic light, the FT will tell us the exact frequency of red, yellow, and green, but it won’t be able to tell us anything about when they occurred. WT will tell us roughly the color frequencies and also (roughly) the time each of them happened.
The process of wavelet transform is to take different wavelets with different scales (a measure of how stretched it is) and try to fit them to our signal. There are many different patterns of wavelets (see some common ones below).

There are two main ways to perform WT: Discrete Wavelet Transform (DWT) and Continuous Wavelet Transform (CWT). DWT and CWT both can be calculated using the PyWavelets library. CWT can also be calculated with using the Scipy library (scipy.signal.cwt). The results can be presented on a plot of the frequency against the time, where the color usually represents the result of the analysis.

For more in-depth coverage of wavelets, including code examples see: Introduction to Wavelet Transform using Python

Summary

  • There are many common data analysis practices, some also apply for TS data and were not covered in this article. The aim here was to present a few unique concepts and methodologies that are relevant for TS data and not for other types of data.
  • When we want to better understand our TS data, we should try to find any cyclic or linear changes in our sequence. These are the frequencies, seasonalities, and trends.
  • Frequency can be complex and may be composed of different frequencies. Fourier Transform can help us identify them.
  • In most cases, frequency can change on a temporal basis. Wavelet Transform can help us assess our signal on both the frequency and the temporal plane.
  • Performing these analyses can provide insights about are data, and help us understand how our TS data “behaves”.

If you liked this article, please give it a clap. If you want, you can also follow me to see more of my content.

To read the next article in the series: Part 3: Anomalies, Motifs, and Signatures

--

--

Roy Yanovski
Roy Yanovski

Written by Roy Yanovski

PhD, Marine biologist, Data scientist, Sports lover, and nature enthusiast. Interested in using data science to make the world a better place.

No responses yet