research

research projects and publications

Publications

Re-Align@ICLR

On the Non-Identifiability of Steering Vectors in Large Language Models

Sohan Venkatesh, Ashish Mahendran Kurapath

Accepted in ICLR 2026 Workshop on Representational Alignment

PDF CODE ▶ VIDEO
Preprint

Large Language Models are Algorithmically Blind

Sohan Venkatesh, Ashish Mahendran Kurapath, Tejas Melkote

Preprint under review

PDF CODE ▶ VIDEO

Research Projects

Prisoner's Dilemma in LLMs
Tested cooperation and defection behaviors across 3 LLMs (Claude-Opus-4.6, GPT-5.2, Gemini-3-Flash) through 20-round iterated Prisoner's Dilemma simulations. More details in my blog.
Is Alignment Faking Generalizable?
Studied cross-architecture transfer of alignment faking in Transformers and MoE models using behavioral and activation-based detectors to evaluate generalization across different model architectures.
Do LLMs really understand causality?
Evaluated LLM-generated simulations of causal outputs with algorithmic results across real-world, causal and synthetic datasets using metrics such as Precision, Recall, F1-score and Structural Hamming Distance.
Word Embeddings from Scratch
Generated and evaluated word embeddings with pre-trained GloVe and Word2Vec embeddings and performed cross-lingual alignment of English and Hindi embedding using Procrustes. Also, analyzed the race and gender bias in pre-trained embeddings via WEAT test.
The Holistic Interpretability
Analyzed the CNNs trained on MNIST and FashionMNIST using neuron ablation, causal tracing and activation interventions to understand the role of individual neurons and their interactions in model predictions.