Dual-channel generative pretraining for learning natural turn-taking in spoken dialogue without labeled data. A 0.5B model that outperforms models 6x its size on turn prediction.
Publications
Peer-reviewed research at Interspeech, ICASSP, NeurIPS, and PMLR.
Two techniques for incorporating prosody into end-to-end SLU: prosody-attention and prosody-distillation. Up to 8% intent classification accuracy improvement on SLURP.
Map-Mix: a data augmentation approach using model training dynamics to guide latent mixup sampling, giving ~2% weighted F1 improvement on low-resource dialect classification.
The first public Indian-accented SLU dataset in the banking domain. SSL speech representations beat ASR-based approaches for intent classification.
A semi-supervised framework for speaker profiling that leverages external unlabelled corpora via supervised, unsupervised, and consistency training, achieving RMSE of 6.8 years on age estimation.
Design and results of the AutoDL challenge series 2019 (AutoCV, AutoCV2, AutoNLP, AutoSpeech, AutoDL), showing winning solutions generalize to unseen datasets.
A novel generic mathematical formulation of AutoML unifying HPO and meta-learning, showing meta-learning addresses AutoML more fundamentally than hyperparameter optimization.
A data-driven deep learning approach combining CNN feature extraction with Neural Arithmetic Logic Units (NALU) for stock price prediction using historical price data.