A multimodal speech LLM that processes audio directly to enhance conversational AI while reducing overhead compared to traditional ASR-LLM-TTS pipelines.
Blog
Writing on Voice AI, speech research, and machine learning.
Speech LLMs for Conversations
Feature Disentanglement - I
How deep learning models can isolate independent factors of variation in data through VAEs and Beta-TCVAE, enabling controlled synthesis and better downstream representations.
Code Mixing in NLP and Speech
Notes from a seminar covering six papers on code-mixing across NLP, speech synthesis, and speech recognition — including multilingual synthesis and code-mixed ASR.
KL Divergence: Entropy, Cross Entropy, and Mutual Information in PyTorch
A walkthrough of information entropy, KL divergence, mutual information, and cross entropy — with PyTorch implementations.
Off-Policy Monte Carlo Prediction with Importance Sampling
How importance sampling lets us estimate value functions under a target policy using episodes collected by a different behavior policy.