The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence
arXiv:2510.25883v2 Announce Type: replace Abstract: Why do brains and deep networks converge on similar representations? Task-optimized artificial neural networks quantitatively predict primate ventral stream responses despite radically different substrates and optimization dynamics. This converg...
arXiv:2510.25883v2 Announce Type: replace
Abstract: Why do brains and deep networks converge on similar representations? Task-optimized artificial neural networks quantitatively predict primate ventral stream responses despite radically different substrates and optimization dynamics. This convergence demands explanation beyond shared natural image statistics or task structure alone. The Compression Efficiency Principle (CEP) specifies the selection mechanism: representations exploiting unstable correlations pay a growing "exception tax" (approximately linear excess codelength under shortcut-flipping shifts), while representations encoding shift-stable invariants amortize this cost. When environments provide intervention-rich shifts and exhibit approximately modular causal structure, these invariants align with causal mechanisms. The framework offers a unified lens on three biological signatures -- steep metabolic constraints on neural signaling, high coding efficiency in early sensory pathways, and hierarchical tolerance in the ventral stream -- and connects them to parallel phenomena in deep learning: scaling frontiers, shortcut failures under distribution shift, and the role of augmentation in enforcing invariances. Distinctive predictions follow: a crossover threshold beyond which invariant representations dominate, and systematic coupling between compression efficiency and out-of-distribution robustness -- testable across substrates. Predicted divergences (sparse biological signaling versus dense overparameterization) arise from different resource constraints on a shared trade-off topology. The convergence is not a coincidence. It is evidence for a substrate-independent basin shaped by predictive compression under shift.