In current models of neurodegeneration, individual diseases are defined by the presence of one or two pathogenic protein species. Yet, it is the rule rather than the exception that a patient meets criteria for more than one disease. This fact often remains hidden until autopsy, when neuropathological evaluation can assign disease labels based on gold-standard criteria. Ultimately, the prevalence of concomitant diagnoses and the inability to infer an underlying neuropathological syndrome from clinical variables hinders the identification of patients who might be good candidates for a particular intervention. Here, by applying graph-based clustering to post-mortem histopathological data from 1389 patients with degeneration in the central nervous system, we generate 4 non-overlapping, data-driven disease categories that simultaneously account for amyloid-β plaques, tau neurofibrillary tangles, α-synuclein inclusions, neuritic plaques, TDP-43 inclusions, angiopathy, neuron loss, and gliosis. The resulting disease clusters are transdiagnostic in the sense that each cluster contains patients belonging to multiple different existing disease diagnoses, who colocalize in clusters according to the pathogenic protein aggregates known to drive each disease. We show that our disease clusters, defined solely by histopathology, separate patients in terms of cognitive phenotypes, cerebrospinal fluid (CSF) protein levels, and genotype in a manner that is not trivially explained by the representation of individual diseases within each cluster. Finally, we use cross-validated multiple logistic regression to generate high accuracy predictions (AUC > 0.9) of membership to both existing disease categories and transdiagnostic clusters based on CSF protein levels and genotype, both accessible \emph{in vivo}. Broadly, our approach parses phenotypic and genotypic heterogeneity in neurodegenerative disease, and represents a general framework for identifying otherwise-fuzzy disease subtypes in other areas of medicine, such as epilepsy, vascular disease, and cancer. In clinical neurology, the statistical models we generate may be useful for repurposing drugs by comparing efficacy to probabilistic estimates of disease cluster membership, as well as for future trials that could be targeted towards an algorithmically defined family of diseases.