Publications
Where You Steer Depends on What You Steer: Component Selection Across Steering Axes in LLMs
Nitin Sharma, Thomas Wolfers, Çağatay Yıldız
Submitted to Mechanistic Interpretability Workshop, ICML 2026
This work provides the first systematic study of how the choice of internal component—residual stream, MLP output, or attention output—affects activation steering quality in large language models. Testing across LLAMA-3.1-8B, Qwen2.5-7B, and Gemma-2 9B on two factual axes (medical, law) and a behavioral axis (toxicity), MLP-output steering dominates for factual axes with a 45% mean rank improvement over the residual stream, concentrated in early-to-middle layers consistent with MLP layers as the primary site of factual knowledge storage. For behavioral toxicity steering, this ordering inverts, with attention output outperforming MLP across all three models. Global injection across all token positions is critical for realizing the MLP advantage on factual axes, while behavioral steering is insensitive to injection strategy—establishing component choice as an explicit, axis-dependent design variable rather than a default.
Multiple-Choice Completion: A Format-Sensitive Failure Mode of Cloze-Style LLM Evaluation
Nitin Sharma, Thomas Wolfers, Çağatay Yıldız
Submitted to ARR May 2026 (EMNLP track)
This paper identifies and characterizes MCQ completion, a failure mode of cloze-style LLM evaluation in which models respond to factual prompts by fabricating multiple-choice option lists rather than providing direct answers. Evaluating eight base models from the LLAMA and QWEN families, the study finds that prompt phrasing critically determines whether MCQ completion occurs: definitional endings such as "is known as" trigger the behavior 40–80% of the time, while passive tense or stop-word endings rarely do. The effect is strongest on medical questions and in newer models, yet disappears almost entirely on prompts extracted from real corpora or curated by experts. Phrasing alone shifts downstream multiple-choice accuracy by up to 7.5 points zero-shot, and this accuracy sensitivity persists even after three-shot prompting removes the completion artifact—showing that format effects on capability cannot be prompted away as easily as the surface behavior.
From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise
Nitin Sharma, Thomas Wolfers, Çağatay Yıldız
arXiv:2506.07658 (v3), 2026 | Under review at TMLR
This paper addresses critical challenges in language model evaluation by introducing a deterministic pipeline that transforms raw domain corpora into completion-style benchmarks without relying on other LLMs or human annotation. The approach extracts domain-specific keywords and constructs prompt-target pairs, providing a direct and contamination-free assessment of domain knowledge at low computational cost. Through mechanistic analysis, the research reveals that initial-to-mid layers handle attribute extraction while later layers focus on next token prediction, and that forgetting during domain adaptation begins in middle layers and amplifies in later layers.
Investigating Continual Pretraining in Large Language Models: Insights and Implications
Çağatay Yıldız, Nishaanth Kanna Ravichandran, Nitin Sharma, Matthias Bethge, Beyza Ermis
Transactions on Machine Learning Research (TMLR), 2025
This study introduces a new benchmark for continual domain-adaptive pretraining in large language models, examining how models adapt to changing data landscapes while retaining previous knowledge. The research reveals that continual pretraining consistently improves smaller models (<1.5B parameters), with larger models achieving better perplexity but smaller models showing higher sensitivity to both learning and forgetting. The findings demonstrate that semantic similarity in domain sequences enables better specialization, while randomized training domains lead to superior transfer and final performance.
A normative reference for large-scale human brain dynamics across the lifespan
Yanwu Yang, Nitin Sharma, Sicheng Dai, Francesco Mallus, Guinan Su, Atharva Kand, Mariam Zabihi, Dag Alnæs, et al.
Submitted to Nature Neuroscience, 2026 | bioRxiv preprint
This work establishes the first population-level normative reference for large-scale human brain dynamics using resting-state fMRI data from more than 10,000 individuals spanning the lifespan across 91 scanning sites. The study derives a compact set of recurring brain-state configurations that are reproducible across scanners and generalise to unseen cohorts. Applying normative lifespan models reveals that intrinsic brain dynamics undergo systematic reorganisation across development and ageing, and that mental health conditions show disorder-specific, highly heterogeneous deviations not captured by static neuroimaging measures.
Predicting Mental and Neurological Illnesses Based on Cerebellar Normative Features
Milin Kim, Nitin Sharma, Esten H Leonardsen, Saige Rutherford, Geir Selbæk, Karin Persson, et al.
Biological Psychiatry Global Open Science, 2025
This study used machine learning to test whether individual differences in cerebellar brain structure could help predict various mental and neurological conditions. The research analyzed brain imaging data from over 27,000 participants and found that cerebellar features could moderately predict autism spectrum disorder (ASD) and schizophrenia (SZ), with accuracy rates between 56–64%. The strongest predictive signals came from the posterior regions of the cerebellum — areas more strongly linked to higher cognitive functions than motor control.
Predicting Postoperative Delirium in Older Patients Before Elective Surgery: Multicenter Retrospective Cohort Study
Shun-Chin Jim Wu*, Nitin Sharma*, Anne Bauch, Hao-Chun Yang, Jasmine L Hect, et al.
JMIR Aging, 2025
This study analyzed data from 1,624 elderly patients (≥70 years) across five medical centers to develop machine learning models for predicting postoperative delirium (POD). Using demographic, clinical, surgical, and neuropsychological features, the models achieved strong predictive performance (AUC 0.79) before surgery, with specific cognitive tests like the Montreal Cognitive Assessment memory subdomain and Trail Making Test Part B emerging as crucial predictors. The findings demonstrate that effective POD risk prediction is possible before surgery, potentially enabling better surgical planning and postoperative care for high-risk patients.
Lamellar Normative Charting of the Hippocampus Across the Lifespan
Yanwu Yang, Na Gao, Nitin Sharma, Sicheng Dai, Guinan Su, Richard Dinga, Pierluigi Salvo Rossi, Zhiyuan Liu, Tengfei Guo, Thomas Wolfers, the Alzheimer's Disease Neuroimaging Initiative
Submitted to Nature Communications, 2026
This work establishes the first large-scale normative charting framework for hippocampal geometry that resolves lamellar morphology across the lifespan, mapping trajectories for over 26,000 individuals from 165 scanning sites. Hippocampal geometry shows spatially non-uniform developmental and ageing patterns, with lamellar thickness, width, and length following dissociable trajectories. Applied to multiple brain disorders, the framework uncovers a dichotomy in disease-associated patterns: neurodegenerative conditions and schizophrenia show predominant atrophy, while other disorders exhibit focal or selective hypertrophy—revealing localized alterations beyond conventional subfield-level summaries. Transfer to a longitudinal Alzheimer's Disease Neuroimaging Initiative cohort further supports individual-level tracking and disease conversion risk stratification.
Ketamine-induced pleasant but not unpleasant dissociation is linked to the functional connectivity profile of the posteromedial cortex
Zumrut Duygu Sen, Nitin Sharma, Lena Vera Danyeli, Lejla Colic, et al.
PsyArXiv, 2024
This study reveals that pleasant dissociative experiences during ketamine treatment can be predicted by examining the functional connections of the posteromedial cortex (PMC) in the brain, both before and during ketamine infusion. The research found that pleasant dissociation correlates with specific brain connectivity patterns, particularly between the PMC and control network regions, while unpleasant dissociation shows no such correlation — suggesting different neural mechanisms underlie these two types of ketamine-induced experiences.
Ketamine-induced ego dissolution is related to the functional connectivity reconfiguration of the posteromedial cortex
Meng Li, Nitin Sharma, Lena Danyeli, Lejla Colic, et al.
Biological Psychiatry, 2023
A single dose of intravenous ketamine improves the severity of symptoms in disorders such as depression while also introducing acute and transient ego dissolution during administration. Understanding the neural correlates of ego dissolution may help elucidate its therapeutic effect mechanism. Previous studies suggested that the posteromedial cortex (PMC) activity is related to the altered sense of self and the sense of dissociation, phenomena underlying ego dissolution.
Epileptic seizure detection using STFT based peak mean feature and support vector machine
Nitin Sharma, G Gaurav, RS Anand
2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)
This research presents a novel approach for detecting epileptic seizures using EEG data through a combination of Short-time Fourier transform (STFT) and a new feature called "peak mean". The study decomposed EEG signals into sub-bands and extracted three key features — mean, sample entropy, and peak mean — achieving 100% classification accuracy in distinguishing epileptic ictal EEG signals from healthy subjects using Support Vector Machine with radial basis function (SVM-RBF). The proposed peak mean feature outperformed other commonly used features in epilepsy detection, demonstrating its potential for practical seizure detection applications.
