Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Time series forecasting is an important task in many applications.
Real-world time series data is often limited and noisy.
A bidirectional variational auto-encoder (BVAE) is proposed to address the time series forecasting problem.
The BVAE is equipped with diffusion, denoise, and disentanglement.
Experiments show that the BVAE outperforms competitive algorithms.

Paper Content

Introduction

Time series forecasting is important for decision-making
Traditional RNN-based methods capture temporal dependencies
LSTMs and GRUs use gate functions to handle long-term dependencies
CNNs capture complex inner patterns of the time series
Transformer-based models have shown great performance
Neural networks have uncertainty issues
VAR models try to model the distribution of time series
Interpretable representation learning is another merit
VAEs have superiority in modeling latent distributions
Disentangled representation can improve performance and robustness
Real-world time series are often noisy and short
D 3 VAE proposed to address time series forecasting problem

Coupled diffusion probabilistic model

Diffusion probabilistic model is a family of latent variable models to generate high-quality samples
Coupled forward process is developed to augment input and target series synchronously
Bidirectional variational auto-encoder (BVAE) proposed to take place of reverse process in diffusion model
Markov chain adds Gaussian noise to data
Coupled diffusion process diffuses input and output series
Variance schedule and scale parameter used to reduce aleatoric uncertainty
BVAE opens interface to integrate disentanglement for model interpretability

Scaled denoising score matching for diffused time series cleaning

Augmenting time series data with coupled diffusion probabilistic model
Generative distribution moves toward diffused target series
Employ Denoising Score Matching (DSM) to accelerate de-uncertainty process
Use monotonically decreasing series of fixed σ values to scale noise of different levels

Disentangling latent variables for interpretation

Interpretability of time series forecasting model is important
Disentangling latent variables can enhance reliability of prediction
Total Correlation (TC) is used to measure dependencies among multiple random variables
Bidirectional structure of BVAE aggregates rich semantics into latent variables
Algorithm 1 and 2 used to train and forecast

Training and forecasting

Proposed coupled diffusion with denoising network to reduce effect of uncertainty
Minimized TC of latent variables to disentangle them
Reconstructed loss with trade-off parameters
Minimized objective to learn generative model

Experiment settings

Generated two synthetic datasets and six real-world datasets
Sliced datasets to contain at most 1000 time points
Compared D3VAE to one GP based method, two auto-regressive methods, and four VAE-based methods
Used Adam optimizer with initial learning rate of 5e-4
Batch size of 16 and training set to 20 epochs
Number of disentanglement factors chosen from {4, 8}
Evaluation metrics: CRPS and MSE
Experiments conducted on Linux machine with single NVIDIA P40 GPU
Experiments repeated five times

Main results

Two prediction lengths (8 and 16) are evaluated
Results of longer prediction lengths are in Appendix D
Noise of outcome series can be estimated to assess uncertainty
Scale parameter ω can be adjusted to generate distribution space
Uncertainty estimation can quantify uncertainty effectively
Disentanglement quality can be assessed by evaluating classification performance
MIG metric used to evaluate disentanglement
Diffusion process can effectively augment input or target

Model analysis

Variance Schedule β and The Number of Diffusion Steps T should be configured properly to reduce the effect of uncertainty.
Too small a variance schedule or inadequate diffusion steps will lead to a meaningless diffusion process.
Analysis of the effect of the variance schedule β and the number of diffusion steps T showed that prediction performance can be improved with proper β and T.

Discussion

Langevin dynamics has been applied to EBMs, computer vision, and natural language processing
Experiments demonstrate effectiveness of single-step sampling
Extra empirical study to investigate whether more sampling steps improve performance
Omitting additive noise in Langevin dynamics and using multi-step denoising for D3VAE
Different configurations of Langevin dynamics do not bring indispensable benefits for time series forecasting

Conclusion

Proposed a generative model with bidirectional VAE as the backbone
Devised a coupled diffusion probabilistic model for time series forecasting
Developed a scaled denoising network to guarantee prediction accuracy
Latent variables further disentangled for better model interpretability
Experiments on synthetic and real-world data validate SOTA performance
Reviewed related literature of time series forecasting methods
Complex temporal patterns can be manifested over short- and long-term
Existing statistical models such as ARIMA and Gaussian process regression
Temporal attention and causal convolution explored to model temporal dependencies
Transformer-based models strengthen capability of exploring hidden temporal patterns
Multivariate nature of TSF another topic many works have been focusing on
Probabilistic models, matrix/tensor factorization, CNNs, and GNNs
Generative methods for TSF focus on energy-based models
VAE-based models to infer underlying distribution of time series data
Coupled probabilistic diffusion model proposed to augment input and output series
Multi-scaled score-matching denoising network plugged in for accurate prediction
Estimate uncertainty for time series forecasting by epistemic uncertainty
Detect noise in time series data or devise suitable models for noise alleviation
Neural networks introduced to denoise time series
Explain deep neural networks to make prediction more interpretable
Disentangle latent variables to identify independent factors of data
Bidirectional VAE and take dimensions of each latent variable to be disentangled
Experiments on synthetic and real-world datasets
Input representation with embedding method and RNN
Baselines include GP-copula, DeepAR, TimeGrad, Vanilla VAE, NVAE, f-VAE, and β-TCVAE
Longer-term time series forecasting and full datasets experiments

Link to paper#

Abstract#

Paper Content#

Introduction#

Coupled diffusion probabilistic model#

Scaled denoising score matching for diffused time series cleaning#

Disentangling latent variables for interpretation#

Training and forecasting#

Experiment settings#

Main results#

Model analysis#

Discussion#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Coupled diffusion probabilistic model

Scaled denoising score matching for diffused time series cleaning

Disentangling latent variables for interpretation

Training and forecasting

Experiment settings

Main results

Model analysis

Discussion

Conclusion