0%

有关VC, SSL以及Feature Distanglement的论文总结

发表于 2022-12-11 更新于 2022-12-27 分类于工作

本文为有关VC, SSL以及Feature Distanglement的论文总结。

有关VC, SSL, Feature Distanglement 的论文总结

=== Voice Conversion (VC)

One-Shot Voice Conversion with Speaker-Agnostic StarGAN | Microsoft | Interspeech 2021 | repo

StarGAN-ZSVC: Towards Zero-Shot Voice Conversion in Low-Resource Contexts | Stellenbosch University | SACAIR 2021 | repo

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization | National Taiwan University | Interspeech 2019 | repo

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | MIT-IBM | ICML 2019 | repo

VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture | National Taiwan University | Interspeech 2020 | repo

AGAIN-VC: A ONE-SHOT VOICE CONVERSION USING ACTIVATION GUIDANCE AND ADAPTIVE INSTANCE NORMALIZATION | National Taiwan University | ICASSP 2021 | repo

One-Shot Voice Conversion by Vector Quantization| National Taiwan University | ICASSP 2020

One-shot Voice Conversion with Global Speaker Embeddings | Tsinghua-CUHK | Interspeech 2019

GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus | Yitu Technology | Interspeech 2020

A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion | Ubisoft La Forge | ICASSP 2022 | repo

=== Self-supervised Learning (SSL)

wav2vec: Unsupervised Pre-training for Speech Recognition | Facebook AI Research | INTERSPEECH 2019 | repo

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | Facebook AI | Neurips 2020 | repo

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units | Wei-Ning Hsu, Meta AI | TASLP 2021 | repo

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers | MIT-IBM Watson AI Lab | PMLR 2022 | repo

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition | MIT CSAIL | Neurips 2021 | repo

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing | Microsoft | JSTSP 2022 | repo

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | Wei-Ning Hsu, Meta AI | ICML 2022 | repo

=== Feature Disentanglement

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks | MIT-IBM Watson AI Lab | ICASSP 2022 | repo

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE | CUHK | SLT 2022 | repo

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion | Tsinghua | Interspeech 2022 | repo

A Brief Overview of Unsupervised Neural Speech Representation Learning | University of Copenhagen | AAAI SAS 2022

Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning | Duke Uni | ICLR 2021