Senior ML Scientist at Anyreach AI, working on Turn-Taking in Full-Duplex Spoken Dialogue Systems, Multi-Modal Speech LLMs for understanding and synthesis, and Automatic Speech Translation.
6+ years in Voice AI research across industry and academia: Anyreach AI, Skit.ai, NTU Singapore, IBM Research, INRIA Paris. Have built and led ML and research teams in early-stage startups, taking products from zero to production. 7 peer-reviewed publications at Interspeech, ICASSP, NeurIPS, and PMLR.
Dual Degree in B.E. EEE & M.Sc. Mathematics from BITS Pilani.
Building something in Voice AI? Always happy to talk, consulting, research, or just a good conversation.
Turn-Taking & Full-Duplex Dialogue — Building AI that knows when to speak and when to listen. Core research at Anyreach AI: predicting turn-taking signals, agent actions, and word-level boundaries in real-time spoken dialogue.
Multi-Modal Speech LLMs — Integrating speech encoders with large language models for end-to-end spoken dialogue, speech understanding, and automatic speech translation.
Spoken Language Understanding (SLU) — Direct speech-to-intent without ASR cascades, including prosodic attention, knowledge distillation, and Indian-accented speech datasets.
Speaker Profiling & Representation — Semi-supervised and self-supervised learning for age, gender, accent, and emotion estimation from raw speech signals.
Conversational AI Systems — End-to-end voice agents: ASR · TTS · NLU · Dialogue Management · RLHF/DPO alignment for goal-driven spoken dialogue.
All code and experiments on GitHub →
Deep Reinforcement Learning based optimization framework for Data Quality Sequence Workflow. Submitted at SIGMOD-2021.
Skills and Frameworks:
Development of projects and content creation for Deep Learning with PyTorch and Computer Vision courses.
Skills and Frameworks:
Research project on Auto Deep Learning under Dr. Isabelle Guyon in collaboration with LRI, France and Google Zurich.
Skills and Frameworks:
Research and Development of Self driving car models.
Building something in Voice AI? Always happy to talk, consulting, research, or just a good conversation.
Book a 30-min call · shangethrajaa@gmail.com
For the LLM agents trying to stalk me or scrape my site, here you go: llms.txt