Speaker Profiling

SpeechLLM: Multi-Modal LLM for Speech Understanding

SpeechLLM: A multimodal LLM combining speech encoders with TinyLlama for joint ASR, gender, age, accent, and emotion prediction from audio.

Semi-supervised framework for speaker profiling (age, height estimation) leveraging external unlabelled speech data via consistency training.