Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model

arXiv:2603.00173v1 Announce Type: cross Abstract: We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional mode...

Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model
arXiv:2603.00173v1 Announce Type: cross Abstract: We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional model trained on approximately 50 million clips. We outline our approach combining metadata-driven dataset curation, multi-stage filtering, $\mu$P parameterization, and hypersphere-constrained optimization. We developed the Lavender Data system for dataset management and adopted inference-aware architectural choices. We share observations on what worked in our setting: dataset engineering consumed the majority of effort, architectural variants showed smaller differences than we expected, and $\mu$P hyperparameter transfer appeared effective even under geometric constraints. We hope this account proves useful to others undertaking similar projects.