Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Time series forecasting is an important task in many applications.
  • Real-world time series data is often limited and noisy.
  • A bidirectional variational auto-encoder (BVAE) is proposed to address the time series forecasting problem.
  • The BVAE is equipped with diffusion, denoise, and disentanglement.
  • Experiments show that the BVAE outperforms competitive algorithms.

Paper Content

Introduction

  • Time series forecasting is important for decision-making
  • Traditional RNN-based methods capture temporal dependencies
  • LSTMs and GRUs use gate functions to handle long-term dependencies
  • CNNs capture complex inner patterns of the time series
  • Transformer-based models have shown great performance
  • Neural networks have uncertainty issues
  • VAR models try to model the distribution of time series
  • Interpretable representation learning is another merit
  • VAEs have superiority in modeling latent distributions
  • Disentangled representation can improve performance and robustness
  • Real-world time series are often noisy and short
  • D 3 VAE proposed to address time series forecasting problem

Coupled diffusion probabilistic model

  • Diffusion probabilistic model is a family of latent variable models to generate high-quality samples
  • Coupled forward process is developed to augment input and target series synchronously
  • Bidirectional variational auto-encoder (BVAE) proposed to take place of reverse process in diffusion model
  • Markov chain adds Gaussian noise to data
  • Coupled diffusion process diffuses input and output series
  • Variance schedule and scale parameter used to reduce aleatoric uncertainty
  • BVAE opens interface to integrate disentanglement for model interpretability

Scaled denoising score matching for diffused time series cleaning

  • Augmenting time series data with coupled diffusion probabilistic model
  • Generative distribution moves toward diffused target series
  • Employ Denoising Score Matching (DSM) to accelerate de-uncertainty process
  • Use monotonically decreasing series of fixed σ values to scale noise of different levels

Disentangling latent variables for interpretation

  • Interpretability of time series forecasting model is important
  • Disentangling latent variables can enhance reliability of prediction
  • Total Correlation (TC) is used to measure dependencies among multiple random variables
  • Bidirectional structure of BVAE aggregates rich semantics into latent variables
  • Algorithm 1 and 2 used to train and forecast

Training and forecasting

  • Proposed coupled diffusion with denoising network to reduce effect of uncertainty
  • Minimized TC of latent variables to disentangle them
  • Reconstructed loss with trade-off parameters
  • Minimized objective to learn generative model

Experiment settings

  • Generated two synthetic datasets and six real-world datasets
  • Sliced datasets to contain at most 1000 time points
  • Compared D3VAE to one GP based method, two auto-regressive methods, and four VAE-based methods
  • Used Adam optimizer with initial learning rate of 5e-4
  • Batch size of 16 and training set to 20 epochs
  • Number of disentanglement factors chosen from {4, 8}
  • Evaluation metrics: CRPS and MSE
  • Experiments conducted on Linux machine with single NVIDIA P40 GPU
  • Experiments repeated five times

Main results

  • Two prediction lengths (8 and 16) are evaluated
  • Results of longer prediction lengths are in Appendix D
  • Noise of outcome series can be estimated to assess uncertainty
  • Scale parameter ω can be adjusted to generate distribution space
  • Uncertainty estimation can quantify uncertainty effectively
  • Disentanglement quality can be assessed by evaluating classification performance
  • MIG metric used to evaluate disentanglement
  • Diffusion process can effectively augment input or target

Model analysis

  • Variance Schedule β and The Number of Diffusion Steps T should be configured properly to reduce the effect of uncertainty.
  • Too small a variance schedule or inadequate diffusion steps will lead to a meaningless diffusion process.
  • Analysis of the effect of the variance schedule β and the number of diffusion steps T showed that prediction performance can be improved with proper β and T.

Discussion

  • Langevin dynamics has been applied to EBMs, computer vision, and natural language processing
  • Experiments demonstrate effectiveness of single-step sampling
  • Extra empirical study to investigate whether more sampling steps improve performance
  • Omitting additive noise in Langevin dynamics and using multi-step denoising for D3VAE
  • Different configurations of Langevin dynamics do not bring indispensable benefits for time series forecasting

Conclusion

  • Proposed a generative model with bidirectional VAE as the backbone
  • Devised a coupled diffusion probabilistic model for time series forecasting
  • Developed a scaled denoising network to guarantee prediction accuracy
  • Latent variables further disentangled for better model interpretability
  • Experiments on synthetic and real-world data validate SOTA performance
  • Reviewed related literature of time series forecasting methods
  • Complex temporal patterns can be manifested over short- and long-term
  • Existing statistical models such as ARIMA and Gaussian process regression
  • Temporal attention and causal convolution explored to model temporal dependencies
  • Transformer-based models strengthen capability of exploring hidden temporal patterns
  • Multivariate nature of TSF another topic many works have been focusing on
  • Probabilistic models, matrix/tensor factorization, CNNs, and GNNs
  • Generative methods for TSF focus on energy-based models
  • VAE-based models to infer underlying distribution of time series data
  • Coupled probabilistic diffusion model proposed to augment input and output series
  • Multi-scaled score-matching denoising network plugged in for accurate prediction
  • Estimate uncertainty for time series forecasting by epistemic uncertainty
  • Detect noise in time series data or devise suitable models for noise alleviation
  • Neural networks introduced to denoise time series
  • Explain deep neural networks to make prediction more interpretable
  • Disentangle latent variables to identify independent factors of data
  • Bidirectional VAE and take dimensions of each latent variable to be disentangled
  • Experiments on synthetic and real-world datasets
  • Input representation with embedding method and RNN
  • Baselines include GP-copula, DeepAR, TimeGrad, Vanilla VAE, NVAE, f-VAE, and β-TCVAE
  • Longer-term time series forecasting and full datasets experiments