Mechanistic Interpretability on Bitdefender AI Research

Mechanistic Interpretability on Bitdefender AI Researchhttps://bit-ml.github.io/tags/mechanistic-interpretability/Recent content in Mechanistic Interpretability on Bitdefender AI ResearchHugo -- 0.146.0en-usWed, 25 Feb 2026 00:00:00 +0000BEE Aware of Spuriousness: Mechanistic Interpretability for Fine Tuning Foundation Modelshttps://bit-ml.github.io/blog/bee-aware-of-spuriousness/Wed, 25 Feb 2026 00:00:00 +0000https://bit-ml.github.io/blog/bee-aware-of-spuriousness/In our ICLR 2026 paper “Bridging Explainability and Embeddings: BEE Aware of Spuriousness”, we introduce BEE, a diagnostic tool that surfaces spurious correlations by analyzing weight space drift and embedding geometry rather than relying only on held out validation data.