Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Researchers from Cambridge and Oxford developed Composition Projection Decomposition (CPD), a novel analytical method revealing how atomistic foundation models encode molecular information. Their study shows equivariant architectures like MACE create linearly disentangled representations where geometric and compositional information are nearly separable (R² = 0.782), while other models entangle these features nonlinearly. This discovery has significant implications for designing interpretable, sample-efficient AI for materials science and drug discovery.

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Decoding AI's Chemical Intuition: New Method Reveals How Foundation Models Understand Molecules

A new analytical technique is revealing the inner workings of atomistic foundation models, showing that some AI architectures develop a surprisingly clean, linear understanding of molecular geometry, while others entangle this information in complex, nonlinear ways. Researchers from the University of Cambridge and the University of Oxford introduced Composition Projection Decomposition (CPD), a method that isolates how models encode geometric structure versus simple chemical composition. Their findings, published in a preprint (arXiv:2603.03155v1), indicate that equivariant architectures like MACE produce more interpretable and sample-efficient representations, a discovery with significant implications for designing trustworthy AI in materials science and drug discovery.

Peeling Back the Layers of AI Reasoning

The core challenge in understanding foundation models lies in their intermediate representations—the complex, high-dimensional data structures where the AI stores its "understanding" of a molecule or crystal. The research team's novel CPD method acts as a diagnostic tool. It uses a QR projection to mathematically strip away the signal related to basic atomic composition from these representations. What remains—the geometric residual—is then analyzed to see how much of a target property, like the energy gap between molecular orbitals (HOMO-LUMO gap), can be predicted from pure geometry.

This approach revealed a stark disentanglement gradient across different model families. The study evaluated eight models from five architectural families on datasets including QM9 molecules and Materials Project crystals. The results show that models built with specific mathematical constraints organize information fundamentally differently than more generic neural networks.

Architectural Choice Dictates Information Organization

The most striking finding concerns tensor product equivariant architectures, specifically the MACE model. After CPD removed compositional data, the geometric information in MACE's representations remained almost fully accessible using simple linear models. For predicting the HOMO-LUMO gap, a linear probe on the residual achieved an impressive R² = 0.782. This suggests MACE constructs a representation where geometry and composition are nearly linearly separable.

In contrast, models using handcrafted descriptors, like ANI-2x, deeply entangle this information. A linear probe on ANI-2x's residual failed catastrophically (R² = -0.792), but a nonlinear Multi-Layer Perceptron (MLP) could recover the signal (R² = +0.784). This indicates the geometric data is present but woven into the representation in a complex, nonlinear manner, making it harder to interpret and extract.

Equivariance Creates Structured, Interpretable Pathways

Further analysis uncovered that the equivariant architecture of MACE does not just separate information—it organizes it with clear, physical intuition. The model appears to route target-specific signals through dedicated mathematical channels. For instance, information related to the molecular dipole moment (a vector property) was primarily found in the L = 1 irreducible representation channels, while data for the HOMO-LUMO gap (a scalar property) was channeled through L = 0 representations.

This structured, physics-aware organization was not observed in other high-performing models like ViSNet, which uses a vector-scalar architecture. Under the same CPD probe, ViSNet did not show the same clear channel specialization, suggesting its internal reasoning, while effective, may be less aligned with human-understandable physical principles.

Key Takeaways for AI-Driven Science

The study concludes with critical methodological insights and practical implications for the field.

  • Probe Choice Matters: The researchers demonstrated that using powerful nonlinear probes like gradient boosted trees on projected residuals can be misleading, artificially inflating performance metrics (recovering R² = 0.68–0.95 on a purely compositional target). They strongly recommend linear probes as the primary metric for evaluating representation disentanglement.
  • Sample Efficiency Advantage: Linearly disentangled representations, like those from MACE, offer a tangible practical benefit. They prove to be significantly more sample-efficient in downstream tasks when used with simple linear models, reducing the amount of training data needed for accurate predictions.
  • Beyond Raw Accuracy: The work argues that model evaluation should look beyond just prediction accuracy on benchmarks. Interpretability and the sample efficiency of a model's learned representations are crucial advantages, especially for scientific discovery where understanding the "why" is as important as the "what."

This research provides a powerful new lens for evaluating AI models in the physical sciences. It suggests that incorporating physical constraints like equivariance not only boosts performance but also guides models to develop internal representations that are more interpretable and data-efficient, paving the way for more reliable and insightful AI partners in scientific research.

常见问题