Revisiting Data Scaling in Medical Image Segmentation via Topology-Aware Augmentation
arXiv:2511.13883v2 Announce Type: replace Abstract: Understanding how segmentation performance scales with training data is fundamental for developing data-efficient medical AI systems. In this study, we systematically revisit data scaling behavior across 15 anatomical segmentation tasks spanning...
arXiv:2511.13883v2 Announce Type: replace
Abstract: Understanding how segmentation performance scales with training data is fundamental for developing data-efficient medical AI systems. In this study, we systematically revisit data scaling behavior across 15 anatomical segmentation tasks spanning four imaging modalities. We observe that medical segmentation follows a structurally stable power-law-like relationship between predictive error and dataset size, characterized by rapid improvement in low-data regimes. However, unlike classical large-scale vision or language tasks, segmentation exhibits earlier and task-dependent performance saturation, with a persistent error floor emerging even as data increases. This behavior suggests that segmentation scaling is not purely data-constrained but is influenced by intrinsic geometric and anatomical structure. To further probe this geometry-constrained regime, we investigate whether topology-aware deformation-based augmentation can modify effective scaling dynamics. We compare random elastic deformation with registration-guided and generative deformation-field modeling strategies. While the overall functional form of the scaling law remains preserved, topology-aware augmentation systematically lowers the effective error scale and reshapes convergence behavior in a task-dependent manner, leading to improved sample efficiency without overturning the underlying scaling principle. These findings indicate that medical segmentation obeys a geometry-limited scaling law, and that anatomically grounded augmentation enhances data efficiency by expanding effective topological coverage rather than altering the fundamental scaling structure. Our results provide a principled empirical perspective on data-efficient learning in medical image segmentation. The code will be released after acceptance.