My research lies at the intersection of computer vision and machine learning. I am particularly focused on the controllability of generative models, with an emphasis on generating realistic 3D human face and body motions. My work also includes the low-rank (LoRA) adaptation of vision-language models for visual grounding. Prior to specializing in computer vision, I explored multi-fidelity parallel Bayesian optimization for multi-agent systems, robotics (vision + control), and computer-aided engineering (CAE).
Audiopedia: Audio QA with Knowledge
Abhirama S. Penamakuri*, Kiran Chhatre*, Akshat Jain (* denotes equal contribution)
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025
arxiv /
code /
website /
Audiopedia introduces a novel, knowledge-intensive audio question answering task and proposes a framework to enhance audio language models by integrating external knowledge.
AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
AMUSE generates realistic emotional 3D body gestures directly from a speech sequence. It provides user control over the generated emotion by combining the driving speech with a different emotional audio.
EMOTE: Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Given audio input and an emotion label, EMOTE generates an animated 3D head that has state-of-the-art lip synchronization while expressing the emotion. The method is trained from 2D video sequences using a novel video emotion loss and a mechanism to disentangle emotion from speech.
This workshop paper investigates spatial-temporal priors for 3D human motion synthesis by comparing graph convolutional networks and transformer architectures to capture dynamic joint dependencies.
Other Projects
BEAMBayesOpt: Parallel Bayesian Optimization of Agent-Based Transportation Simulation
BEAMBayesOpt introduces a parallel Bayesian optimization approach with early stopping that autonomously calibrates hyperparameters in BEAM’s large-scale multi-agent transportation simulations and enables efficient surrogate modeling of complex scenarios.
Rethinking Computer-Aided Architectural Design (CAAD) – From Generative Algorithms and Architectural Intelligence to Environmental Design and Ambient Intelligence
This paper reviews the evolution of CAAD—from generative algorithms and BIM to current AI developments—and argues that integrating AI-driven ambient intelligence into digital design tools can transform architectural and urban design for smarter, more sustainable cities.