语音合成声码器脉络总结如下,持续更新ing
Order | Model | Year | Institution | Conference | Inherited Model (Base model) | Corresponding Author (Team leader) | URL |
---|---|---|---|---|---|---|---|
1 | WaveNet | 2016.9 | Google DeepMind | SSW 2016 | CNN | Nal Kalchbrenner | https://arxiv.org/pdf/1609.03499.pdf |
2 | WaveRNN | 2018.6 | DeepMind & Google Brain | ICML 2018 | RNN | Nal Kalchbrenner | https://arxiv.org/pdf/1802.08435.pdf |
3 | WaveGlow | 2018.10 | Nvidia | ICASSP 2019 | WaveNet | Rafael Valle | https://arxiv.org/pdf/1811.00002.pdf |
4 | LPCNet | 2019.2 | Mozilla, Google | ICASSP 2019 | WaveRNN | Jean-Marc Valin | https://arxiv.org/pdf/1810.11846.pdf |
5 | WaveGAN | 2019.2 | UC San Diego | ICLR 2019 | GAN | Miller Puckette | https://arxiv.org/pdf/1802.04208.pdf |
6 | Multi-band WaveRNN | 2019.4 | Tecent AI Lab | Interspeech 2020 | DurIAN, WaveRNN | Dong Yu | https://arxiv.org/pdf/1909.01700.pdf |
7 | MelGAN | 2019.12 | University of Montreal, Mila, Lyrebird AI | NeurIPS 2019 | GAN | Yoshua Bengio | https://arxiv.org/pdf/1910.06711.pdf |
8 | SqueezeWave | 2020.1 | UC Berkeley | WaveGlow | Bichen Wu | https://arxiv.org/pdf/2001.05685.pdf | |
9 | Parallel WaveGAN (PWG) | 2020.2 | LINE Corp., NAVER Corp. | GAN | Ryuichi Yamamoto | https://arxiv.org/pdf/1910.11480.pdf | |
10 | Multi-band MelGAN | 2020.5 | 西北工业大学,sogou | melgan, multi-band | Xielei | https://arxiv.org/pdf/2005.05106.pdf | |
11 | FeatherWave | 2020.10 | Tecent | Interspeech 2020 | MB LP, WaveRNN | Shan Liu | https://isca-speech.org/archive/Interspeech_2020/pdfs/1156.pdf |
12 | WaveGrad | 2020.10 | Johns Hopkins University, Google Brain | CNN | Heiga Zen | https://arxiv.org/pdf/2009.00713.pdf |
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
此篇论文尝试解释为什么近期涌现的GAN-based vocoders要好于过往的Flow-based或者Autoregressive的vocoders。文章通过消融实验分析认为原因主要在于Multi-Resolution Discriminator的设计使得GAN-based vocoders达到了一个新的水平。