Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Time series forecasting is an important task in many applications.
- Real-world time series data is often limited and noisy.
- A bidirectional variational auto-encoder (BVAE) is proposed to address the time series forecasting problem.
- The BVAE is equipped with diffusion, denoise, and disentanglement.
- Experiments show that the BVAE outperforms competitive algorithms.
Paper Content
Introduction
- Time series forecasting is important for decision-making
- Traditional RNN-based methods capture temporal dependencies
- LSTMs and GRUs use gate functions to handle long-term dependencies
- CNNs capture complex inner patterns of the time series
- Transformer-based models have shown great performance
- Neural networks have uncertainty issues
- VAR models try to model the distribution of time series
- Interpretable representation learning is another merit
- VAEs have superiority in modeling latent distributions
- Disentangled representation can improve performance and robustness
- Real-world time series are often noisy and short
- D 3 VAE proposed to address time series forecasting problem
Coupled diffusion probabilistic model
- Diffusion probabilistic model is a family of latent variable models to generate high-quality samples
- Coupled forward process is developed to augment input and target series synchronously
- Bidirectional variational auto-encoder (BVAE) proposed to take place of reverse process in diffusion model
- Markov chain adds Gaussian noise to data
- Coupled diffusion process diffuses input and output series
- Variance schedule and scale parameter used to reduce aleatoric uncertainty
- BVAE opens interface to integrate disentanglement for model interpretability
Scaled denoising score matching for diffused time series cleaning
- Augmenting time series data with coupled diffusion probabilistic model
- Generative distribution moves toward diffused target series
- Employ Denoising Score Matching (DSM) to accelerate de-uncertainty process
- Use monotonically decreasing series of fixed σ values to scale noise of different levels
Disentangling latent variables for interpretation
- Interpretability of time series forecasting model is important
- Disentangling latent variables can enhance reliability of prediction
- Total Correlation (TC) is used to measure dependencies among multiple random variables
- Bidirectional structure of BVAE aggregates rich semantics into latent variables
- Algorithm 1 and 2 used to train and forecast
Training and forecasting
- Proposed coupled diffusion with denoising network to reduce effect of uncertainty
- Minimized TC of latent variables to disentangle them
- Reconstructed loss with trade-off parameters
- Minimized objective to learn generative model
Experiment settings
- Generated two synthetic datasets and six real-world datasets
- Sliced datasets to contain at most 1000 time points
- Compared D3VAE to one GP based method, two auto-regressive methods, and four VAE-based methods
- Used Adam optimizer with initial learning rate of 5e-4
- Batch size of 16 and training set to 20 epochs
- Number of disentanglement factors chosen from {4, 8}
- Evaluation metrics: CRPS and MSE
- Experiments conducted on Linux machine with single NVIDIA P40 GPU
- Experiments repeated five times
Main results
- Two prediction lengths (8 and 16) are evaluated
- Results of longer prediction lengths are in Appendix D
- Noise of outcome series can be estimated to assess uncertainty
- Scale parameter ω can be adjusted to generate distribution space
- Uncertainty estimation can quantify uncertainty effectively
- Disentanglement quality can be assessed by evaluating classification performance
- MIG metric used to evaluate disentanglement
- Diffusion process can effectively augment input or target
Model analysis
- Variance Schedule β and The Number of Diffusion Steps T should be configured properly to reduce the effect of uncertainty.
- Too small a variance schedule or inadequate diffusion steps will lead to a meaningless diffusion process.
- Analysis of the effect of the variance schedule β and the number of diffusion steps T showed that prediction performance can be improved with proper β and T.
Discussion
- Langevin dynamics has been applied to EBMs, computer vision, and natural language processing
- Experiments demonstrate effectiveness of single-step sampling
- Extra empirical study to investigate whether more sampling steps improve performance
- Omitting additive noise in Langevin dynamics and using multi-step denoising for D3VAE
- Different configurations of Langevin dynamics do not bring indispensable benefits for time series forecasting
Conclusion
- Proposed a generative model with bidirectional VAE as the backbone
- Devised a coupled diffusion probabilistic model for time series forecasting
- Developed a scaled denoising network to guarantee prediction accuracy
- Latent variables further disentangled for better model interpretability
- Experiments on synthetic and real-world data validate SOTA performance
- Reviewed related literature of time series forecasting methods
- Complex temporal patterns can be manifested over short- and long-term
- Existing statistical models such as ARIMA and Gaussian process regression
- Temporal attention and causal convolution explored to model temporal dependencies
- Transformer-based models strengthen capability of exploring hidden temporal patterns
- Multivariate nature of TSF another topic many works have been focusing on
- Probabilistic models, matrix/tensor factorization, CNNs, and GNNs
- Generative methods for TSF focus on energy-based models
- VAE-based models to infer underlying distribution of time series data
- Coupled probabilistic diffusion model proposed to augment input and output series
- Multi-scaled score-matching denoising network plugged in for accurate prediction
- Estimate uncertainty for time series forecasting by epistemic uncertainty
- Detect noise in time series data or devise suitable models for noise alleviation
- Neural networks introduced to denoise time series
- Explain deep neural networks to make prediction more interpretable
- Disentangle latent variables to identify independent factors of data
- Bidirectional VAE and take dimensions of each latent variable to be disentangled
- Experiments on synthetic and real-world datasets
- Input representation with embedding method and RNN
- Baselines include GP-copula, DeepAR, TimeGrad, Vanilla VAE, NVAE, f-VAE, and β-TCVAE
- Longer-term time series forecasting and full datasets experiments