Shangeth
Home
Research
Experience
Blog
GitHub
Contact
Large Language Models
DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining
DualTurn learns natural conversational turn-taking via generative pretraining on dual-channel audio, outperforming larger models on turn prediction tasks.
SpeechLLM: Multi-Modal LLM for Speech Understanding
SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from audio.
Cite
×