Time series modelling#
We have a time series with a clear trend, seasonality and jump. The researcher knows that there is jump present at \(t=270\) due to a change in the data collection method. First we will load the data and plot it. After that, we will use the PSD to identify the frequency of the seasonality and the trend.
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
# Import the data
data = np.loadtxt("y_values.csv", delimiter=",")
t = np.arange(0, len(data), 1)
# Plot the data
plt.figure(figsize=(12, 3))
plt.plot(t, data, label="y_values")
plt.xlabel("Time")
plt.ylabel("y_values")
plt.title("y_values over time")
plt.grid()

We can now inspect the time series and identify the jump at \(t=270\). We can also see that there is a clear trend and seasonality. We will now use the PSD to identify the frequency of the seasonality and the trend.
remember that to compute a PSD we need to make our time series stationary. In the cell below I have shown how not to do it. without stationarity, the PSD is not useful.
plt.psd(data, Fs=1);

Since we already now there is a linear trend and a jump we can remove them from the time series. We can do this by fitting a linear model to the time series and subtracting it. The resulting residuals will be stationary and we can compute the PSD.
Matrix A is defined as follows:
offset = np.zeros_like(data)
offset[270:] = 1
A = np.column_stack((np.ones_like(t), t, offset))
xhat = np.linalg.solve(A.T @ A, A.T @ data)
yhat = A @ xhat
ehat = data - yhat
fig, axs = plt.subplots(2, 1, figsize=(12, 5))
# Plot data and fit
axs[0].plot(t, data, label='Data')
axs[0].plot(t, yhat, label='Fit', linestyle='--')
axs[0].set_xlabel('Time')
axs[0].set_ylabel('y_values')
axs[0].set_title('Data and Fit')
axs[0].legend()
axs[0].grid()
# Plot residuals
axs[1].plot(t, ehat, label='Residuals', color='red')
axs[1].set_xlabel('Time')
axs[1].set_ylabel('Residuals')
axs[1].set_title('Residuals')
axs[1].legend()
axs[1].grid()
plt.tight_layout()
plt.show()
# Create a sample signal
N = len(data) # Number of points
T = 500/N # Sampling interval (1000 Hz sampling rate)
plt.figure(figsize=(12, 4))
pxx, freqs, line = plt.psd(ehat, return_line=True, Fs=1, NFFT=1024, pad_to=2048)
plt.xlabel("Frequency")
plt.ylabel("Power Spectral Density")
plt.title("Power Spectral Density of the residuals")
plt.grid(True)
# Find the frequency corresponding to the highest peak
highest_peak_freq = freqs[np.argmax(pxx)]
print("Frequency corresponding to the highest peak:", round(highest_peak_freq,3))

Frequency corresponding to the highest peak: 0.084

Using peak detection we can identify the frequency of the seasonality and the trend.
Finally, we can use the identified frequencies to model the time series.
For this, we need to expand our matrix A to include the seasonality and the trend. The new matrix A is defined as follows:
f = highest_peak_freq
A2 = np.column_stack((np.ones_like(t), t, offset, np.cos(2*np.pi*f*t), np.sin(2*np.pi*f*t)))
xhat2 = np.linalg.solve(A2.T @ A2, A2.T @ data)
yhat2 = A2 @ xhat2
ehat2 = data - yhat2 # removing the functional components from the data
fig, axs = plt.subplots(2, 1, figsize=(12, 5))
# Plot data and fit
axs[0].plot(t, data, label='Data')
axs[0].plot(t, yhat2, label='Fit', linestyle='--')
axs[0].set_xlabel('Time')
axs[0].set_ylabel('y_values')
axs[0].set_title('Data and Fit')
axs[0].legend()
axs[0].grid()
# Plot residuals
axs[1].plot(t, ehat2, label='Residuals', color='red')
axs[1].set_xlabel('Time')
axs[1].set_ylabel('Residuals')
axs[1].set_title('Residuals')
axs[1].legend()
axs[1].grid()
plt.tight_layout()
plt.show()
