Professional Experience

Senior Machine Learning Research Scientist

Feb 2025 - Present
Anyreach AI
Working on full-duplex spoken dialogue systems and turn-taking. Building and evaluating ASR/LLM/TTS pipelines, training acoustic and semantic models for turn detection, and doing multimodal LLM research for speech understanding and translation. DualTurn (dual-channel turn-taking model) accepted at Interspeech 2026.

Senior Machine Learning Researcher

Jul 2024 - Feb 2025
ScoreTravel AI
Designed a ranker algorithm for an LLM-based recommendation engine. Built multimodal LLMs for short-form video understanding (Reels, TikTok). Worked on system design for a personalized agentic travel booking product.

Machine Learning Researcher

Jun 2021 - Jul 2024
Skit.ai (Vernacular.ai)
Worked across the full voice AI stack: speech synthesis, prosody transfer, voice cloning, language ID, and speaker recognition. Built end-to-end spoken dialogue systems from scratch, moving from traditional intent-based pipelines to LLMs. Did LLM alignment work with RLHF and DPO, and built multimodal LLMs for speech understanding.

Research Assistant

Aug 2020 - Jun 2021
Speech and Language Lab, NTU Singapore
Research on unsupervised and semi-supervised speech representation learning under Prof. Chng Eng Siong. Worked on speaker profiling (age, height, gender), accent recognition, and accented speech recognition.

Research Intern

May 2020 - Aug 2020
IBM Research Labs
Built novel data quality metrics and transformations for structured data. Used deep reinforcement learning to optimize the sequencing of data quality operations.

Research Collaborator

Jan 2018 - Jan 2020
INRIA France
Worked with Prof. Isabelle Guyon on the AutoDL project. Built multimodal auto deep learning models and helped organize the AutoDL challenge series at NeurIPS 2019.

Education

B.E. Electrical & Electronics Engineering and M.Sc. Mathematics (Dual Degree)

2020
Birla Institute of Technology and Science (BITS Pilani)

Research Interests

Full-Duplex Spoken Dialogue · Turn-Taking · Speech & Multimodal LLMs · Speech Representations · Speech Understanding and Synthesis

Skills

Voice AI & Speech: ASR, TTS, SLU, prosody modeling, speaker representation, voice cloning, language ID, end-to-end spoken dialogue systems.

Speech & Multimodal LLMs: Pretraining, instruction-tuning, RLHF/DPO alignment, dual-channel generative pretraining, speech tokenization.

Modeling & Systems: PyTorch, JAX, distributed training, model distillation, latency-sensitive serving for real-time voice.

Languages & Tools: Python, C/C++, JavaScript; AWS, GCP, Docker; LaTeX.

Projects

All ML and research projects are on github.com/shangeth.

Open Source

  • wavencoder — Python package for audio encoder models and transforms for speech deep learning tasks.
  • SpeechLLM — Multimodal LLM for speech understanding: ASR, gender, age, accent, emotion, and speech activity detection. HuggingFace
  • Wren — Family of multimodal LLMs under 3B params for speech understanding, synthesis, and conversational agents. HuggingFace
  • Semantic Turn-Taking LLM — Small LLM that predicts agent action (speak, listen, continue) from conversation context.

Community

  • Google AI Explore ML — Instructor for a deep learning course across India.
  • Google Code-In — Mentor under the TensorFlow org.
  • OpenCV.org — Built projects and wrote content for the “Deep Learning with PyTorch” course.
  • NeurIPS 2019 AutoDL — Helped organize AutoDL competitions (AutoCV, AutoNLP, AutoSpeech) with INRIA Paris.