Generalization and mechanistic interpretability

We pursue a simple goal: to understand not just whether models work, but why they work, when they fail, and what they are truly relying on under the hood. Our research focuses on robust generalization under distribution shift, the emergence of spurious correlations and shortcut strategies, and the internal mechanisms that drive these behaviors. We develop methods that go beyond merely cataloging failures after the fact by revealing hidden biases in learned representations, tracing shortcut learning through embeddings and weight space, and testing whether models can transfer abstract knowledge beyond the settings in which it was first acquired.

News

Academic Generalization & Interpretability

Fine-Tuning Regimes Define Distinct Continual Learning Problems

Paul-Tiberiu Iordache, Elena Burceanu

Under review at CoLLAs 2026 Apr 2026

Links: arXiv Abstract Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite …

Academic Generalization & Interpretability

Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability

Nicolae Filat, Ahmed Hussain, Konstantinos Kalogiannis, Elena Burceanu

Under review at CoLLAs 2026 Apr 2026

Links: arXiv Abstract Streaming Continual Learning (CL) typically converts a continuous stream into a sequence of discrete tasks through temporal …

Academic Generalization & Interpretability

Bridging Explainability and Embeddings: BEE Aware of Spuriousness

Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu

Accepted at ICLR 2026 (poster) Apr 2026

Links: OpenReview GitHub Abstract Current methods for detecting spurious correlations rely on data splits or error patterns, leaving many harmful …

Academic Generalization & Interpretability

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

Liviu Nicolae Fircă, Antonio Bărbălau, Dan Oneata, Elena Burceanu

Accepted at NeurIPS 2025 Workshop CauScien (Poster) Dec 2025

Links: arXiv GitHub Abstract Can models generalize attribute knowledge across semantically and perceptually dissimilar categories? While prior work …

Academic Generalization & Interpretability

Learning (Approximately) Equivariant Networks via Constrained Optimization

Andrei Manolache, Luiz F.O. Chamon, Mathias Niepert

Accepted at NeurIPS 2025 (Oral, top 0.4%) Dec 2025

Links: arXiv · GitHub Abstract Equivariant neural networks are designed to respect symmetries through their architecture, boosting generalization and …

Academic Generalization & Interpretability

Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild

Damien Teney, Lianze Jiang, Florin Gogianu, Ehsan Abbasnejad

Accepted at CVPR 2025 (Oral, top 0.8%) Jun 2025

Links: arXiv · CVF Open Access Abstract Common choices of architecture give neural networks a preference for fitting data with simple functions. This …

Academic Generalization & Interpretability

Robust Novelty Detection through Style-Conscious Feature Ranking

Stefan Smeu, Elena Burceanu, Emanuela Haller, Andrei Liviu Nicolicioiu

Accepted at WACV 2025 (Poster) Feb 2025

Links: arXiv Proceedings GitHub Abstract Novelty detection seeks to identify samples deviating from a known distribution, yet data shifts in a …

Academic Generalization & Interpretability

ConceptDrift: Uncovering Biases through the Lens of Foundational Models

Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu

Accepted at NeurIPS 2024 Workshop Interpretable AI: Past, Present and Future Dec 2024

Links: arXiv Abstract Datasets and pre-trained models come with intrinsic biases. Most methods rely on spotting them by analysing misclassified …

Academic Generalization & Interpretability

WASP: A Weight-Space Approach to Detecting Learned Spuriousness

Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu

Accepted at NeurIPS 2024 Workshop Interpretable AI: Past, Present and Future Dec 2024

Links: arXiv GitHub Abstract It is of crucial importance to train machine learning models such that they clearly understand what defines each class in …

Academic Generalization & Interpretability

Probabilistic Graph Rewiring via Virtual Nodes

C. Qian, A. Manolache, C. Morris, M. Niepert

Accepted at NeurIPS 2024 (Poster) Dec 2024

Links: arXiv GitHub Abstract Message-passing graph neural networks (MPNNs) have emerged as a powerful paradigm for graph-based machine learning. …