Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Machine learning methods are used to build a mapping from source data to target data.
Target data is high-dimensional and complex, making it difficult to learn the mapping.
Regeneration learning is a learning paradigm that generates an abstraction of the target data, then uses it to generate the target data.
Regeneration learning is a counterpart of traditional representation learning.
Regeneration learning can be used for various data generation tasks.

Template-based methods extract template from target data and regenerate target data from template
Vocoding methods convert speech waveform to spectrograms and regenerate waveform from spectrograms
Grapheme/phoneme conversions convert text/character sequence to phoneme sequence and regenerate text/character sequence from phoneme sequence
Auto-encoding methods convert target data to representations and regenerate target data from representations
Easy Mapping learning is a sequence-to-sequence task where X and Y contain comparable information
Denoising diffusion probabilistic models add noise to original data Y
Iterative-based non-autoregressive sequence generation is an extended version of regeneration learning
Post-refine methods improve quality of generated Y but not considered regeneration learning

Regeneration learning is a type of representation learning for data generation
Regeneration learning and traditional representation learning are counterparts of each other
Regeneration learning handles abstraction of target data for data generation, traditional representation learning handles abstraction of source data for understanding
Both processes can be learned in a self-supervised way
Mapping from X to Y is simpler in regeneration and traditional representation learning than direct mapping from X to Y

Text-to-Speech Synthesis
Automatic Speech Recognition
Text Generation
Melody Generation
Talking-Head Video Synthesis
Image/Video/Sound Generation
Regeneration Learning used when target data is too high-dimensional or complex, source and target data have uncorrelated information, or lack of paired data

RQ1: How to learn Y from Y?
RQ2: How to design better learning paradigms to learn Y?
RQ6: How to determine the format of Y?
RQ7: How to design better generative models to learn X → Y and Y → Y mapping?
RQ8: How to leverage the assumption of semantic conversion and detail rendering?
RQ9: How to leverage self-supervised learning for Y → Y mapping?
RQ10: How to reduce the training-inference mismatch in regeneration learning?

Presentation of a learning paradigm called regeneration learning for data generation tasks
Regeneration learning handles the abstraction of the target data for data generation while representation learning handles the abstraction of source data for data understanding
Connections of regeneration learning to other methods
Variety of data generation tasks that can benefit from regeneration learning
Comparison between regeneration learning and traditional representation learning on X → Y
How to learn X → Y and Y → Y
How to reduce training-inference mismatch in X → Y → Y
Generate target Y from Y
Infer Y from X via Y
Typical data generation tasks that leverage regeneration learning