Interspeech 2021
清华深研院 - 吴致勇教授团队
语音合成
Towards Multi-Scale Style Control for Expressive Speech Synthesis
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis
音色转换
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion
其他方向
Voting for the Right Answer: Adversarial Defense for Speaker Verification
西北工业大学 - 谢磊教授团队
语音合成
Controllable Context-Aware Conversational Speech Synthesis
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS
音色转换
Enriching Source Style Transfer in Recognition-Synthesis Based Non-Parallel Voice Conversion
Improving Robustness of One-Shot Voice Conversion with Deep Discriminative Speaker Encoder
其他方向
DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition
F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
Microsoft - 谭旭团队
语音合成
Adaptive Text to Speech for Spontaneous Style
音色转换
其他方向
Cross-Domain Speech Recognition with Unsupervised Character-Level Distribution Matching
Google - Heiga Zen 团队
语音合成
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
音色转换
其他方向
Semi-Supervision in ASR: Sequential MixMatch and Factorized TTS-Based Augmentation
新加坡国立大学 - 李海洲团队
语音合成
音色转换
Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
其他方向
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement
Phonetically Motivated Self-Supervised Speech Representation Learning
Diagnosis of COVID-19 Using Auditory Acoustic Cues
Neural Speaker Extraction with Speaker-Speech Cross-Attention Network
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification
合作论文
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
北京大学 深研院 - 邹月娴团队
Spoken Dialogue Systems
Self-Supervised Dialogue Learning for Spoken Conversational Question Answering
Semantic Transportation Prototypical Network for Few-Shot Intent Detection
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification
Contextualized Attention-Based Knowledge Transfer for Spoken Conversational Question Answering
Text Anchor Based Metric Learning for Small-Footprint Keyword Spotting
USTC 中科大 - 凌震华团队
语音合成
UnitNet-Based Hybrid Speech Synthesis
A Neural-Network-Based Approach to Identifying Speakers in Novels
音色转换
Adversarial Voice Conversion Against Neural Spoofing Detectors
台湾国立大学 - 李宏毅团队
音色转换
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
其他方向
SUPERB: Speech Processing Universal PERformance Benchmark
Towards Lifelong Learning of End-to-End ASR
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training
Voting for the Right Answer: Adversarial Defense for Speaker Verification
Metric
Utilizing Self-Supervised Representations for MOS Prediction
UoE 爱丁堡大学 - Simon King 团队
语音合成
Detection and Analysis of Attention Errors in Sequence-to-Sequence Text-to-Speech
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis