CV
Professional Experience
Senior Machine Learning Research Scientist
Feb 2025 - Present Anyreach AI
Working on full-duplex spoken dialogue systems and turn-taking. Building and evaluating ASR/LLM/TTS pipelines, training acoustic and semantic models for turn detection, and doing multimodal LLM research for speech understanding and translation. DualTurn (dual-channel turn-taking model) accepted at Interspeech 2026.
Senior Machine Learning Researcher
Jul 2024 - Feb 2025 ScoreTravel AI
Designed a ranker algorithm for an LLM-based recommendation engine. Built multimodal LLMs for short-form video understanding (Reels, TikTok). Worked on system design for a personalized agentic travel booking product.
Machine Learning Researcher
Jun 2021 - Jul 2024 Skit.ai (Vernacular.ai)
Worked across the full voice AI stack: speech synthesis, prosody transfer, voice cloning, language ID, and speaker recognition. Built end-to-end spoken dialogue systems from scratch, moving from traditional intent-based pipelines to LLMs. Did LLM alignment work with RLHF and DPO, and built multimodal LLMs for speech understanding.
Research Assistant
Aug 2020 - Jun 2021 Speech and Language Lab, NTU Singapore
Research on unsupervised and semi-supervised speech representation learning under Prof. Chng Eng Siong. Worked on speaker profiling (age, height, gender), accent recognition, and accented speech recognition.
Research Intern
May 2020 - Aug 2020 IBM Research Labs
Built novel data quality metrics and transformations for structured data. Used deep reinforcement learning to optimize the sequencing of data quality operations.
Research Collaborator
Jan 2018 - Jan 2020 INRIA France
Worked with Prof. Isabelle Guyon on the AutoDL project. Built multimodal auto deep learning models and helped organize the AutoDL challenge series at NeurIPS 2019.
Education
B.E. Electrical & Electronics Engineering and M.Sc. Mathematics (Dual Degree)
2020 Birla Institute of Technology and Science (BITS Pilani)
Research Interests
Full-Duplex Spoken Dialogue · Turn-Taking · Speech & Multimodal LLMs · Speech Representations · Speech Understanding and Synthesis
Skills
Voice AI & Speech: ASR, TTS, SLU, prosody modeling, speaker representation, voice cloning, language ID, end-to-end spoken dialogue systems.
Speech & Multimodal LLMs: Pretraining, instruction-tuning, RLHF/DPO alignment, dual-channel generative pretraining, speech tokenization.
Modeling & Systems: PyTorch, JAX, distributed training, model distillation, latency-sensitive serving for real-time voice.
Languages & Tools: Python, C/C++, JavaScript; AWS, GCP, Docker; LaTeX.
Projects
All ML and research projects are on github.com/shangeth.
Open Source
- wavencoder — Python package for audio encoder models and transforms for speech deep learning tasks.
- SpeechLLM — Multimodal LLM for speech understanding: ASR, gender, age, accent, emotion, and speech activity detection. HuggingFace
- Wren — Family of multimodal LLMs under 3B params for speech understanding, synthesis, and conversational agents. HuggingFace
- Semantic Turn-Taking LLM — Small LLM that predicts agent action (speak, listen, continue) from conversation context.
Community
- Google AI Explore ML — Instructor for a deep learning course across India.
- Google Code-In — Mentor under the TensorFlow org.
- OpenCV.org — Built projects and wrote content for the “Deep Learning with PyTorch” course.
- NeurIPS 2019 AutoDL — Helped organize AutoDL competitions (AutoCV, AutoNLP, AutoSpeech) with INRIA Paris.