Large Language Models

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining

DualTurn learns natural conversational turn-taking via generative pretraining on dual-channel audio, outperforming larger models on turn prediction tasks.

SpeechLLM: Multi-Modal LLM for Speech Understanding

SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from audio.