有关VC, SSL, Feature Distanglement 的论文总结
=== Voice Conversion (VC)
One-Shot Voice Conversion with Speaker-Agnostic StarGAN | Microsoft | Interspeech 2021 | repo
StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts | Stellenbosch University | SACAIR 2021 | repo
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization | National Taiwan University | Interspeech 2019 | repo
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | MIT-IBM | ICML 2019 | repo
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture | National Taiwan University | Interspeech 2020 | repo
AGAIN-VC: A ONE-SHOT VOICE CONVERSION USING ACTIVATION GUIDANCE AND ADAPTIVE INSTANCE NORMALIZATION | National Taiwan University | ICASSP 2021 | repo
One-Shot Voice Conversion by Vector Quantization| National Taiwan University | ICASSP 2020
One-shot Voice Conversion with Global Speaker Embeddings | Tsinghua-CUHK | Interspeech 2019
GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus | Yitu Technology | Interspeech 2020
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion | Ubisoft La Forge | ICASSP 2022 | repo
=== Self-supervised Learning (SSL)
wav2vec: Unsupervised Pre-training for Speech Recognition | Facebook AI Research | INTERSPEECH 2019 | repo
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | Facebook AI | Neurips 2020 | repo
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units | Wei-Ning Hsu, Meta AI | TASLP 2021 | repo
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers | MIT-IBM Watson AI Lab | PMLR 2022 | repo
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition | MIT CSAIL | Neurips 2021 | repo
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing | Microsoft | JSTSP 2022 | repo
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Wei-Ning Hsu, Meta AI | ICML 2022 | repo
=== Feature Disentanglement
SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks | MIT-IBM Watson AI Lab | ICASSP 2022 | repo
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE | CUHK | SLT 2022 | repo
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion | Tsinghua | Interspeech 2022 | repo
A Brief Overview of Unsupervised Neural Speech Representation Learning | University of Copenhagen | AAAI SAS 2022
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning | Duke Uni | ICLR 2021