Universal NP-Hardness of Clustering under General Utilities

arXiv:2603.00210v1 Announce Type: cross Abstract: Clustering is a central primitive in unsupervised learning, yet practice is dominated by heuristics whose outputs can be unstable and highly sensitive to representations, hyperparameters, and initialisation. Existing theoretical results are largel...

Universal NP-Hardness of Clustering under General Utilities
arXiv:2603.00210v1 Announce Type: cross Abstract: Clustering is a central primitive in unsupervised learning, yet practice is dominated by heuristics whose outputs can be unstable and highly sensitive to representations, hyperparameters, and initialisation. Existing theoretical results are largely objective-specific and do not explain these behaviours at a unifying level. We formalise the common optimisation core underlying diverse clustering paradigms by defining the Universal Clustering Problem (UCP): the maximisation of a polynomial-time computable partition utility over a finite metric space. We prove the NP-hardness of UCP via two independent polynomial-time reductions from graph colouring and from exact cover by 3-sets (X3C). By mapping ten major paradigms -- including k-means, GMMs, DBSCAN, spectral clustering, and affinity propagation -- to the UCP framework, we demonstrate that each inherits this fundamental intractability. Our results provide a unified explanation for characteristic failure modes, such as local optima in alternating methods and greedy merge-order traps in hierarchical clustering. Finally, we show that clustering limitations reflect interacting computational and epistemic constraints, motivating a shift toward stability-aware objectives and interaction-driven formulations with explicit guarantees.