Abstract Background Diabetes is presently classified into two main forms, type 1 (T1D) and type 2 diabetes (T2D), but especially T2D is highly heterogeneous. A refined classification could provide a powerful tool individualize treatment regimes and identify individuals with increased risk of complications already at diagnosis. Methods We applied data-driven cluster analysis (k-means and hierarchical clustering) in newly diagnosed diabetic patients (N=8,980) from the Swedish ANDIS (All New Diabetics in Scania) cohort, using five variables (GAD-antibodies, BMI, HbA1c, HOMA2-B and HOMA2-IR), and related to prospective data on development of complications and prescription of medication from patient records. Replication was performed in three independent cohorts: the Scania Diabetes Registry (SDR, N=1466), ANDIU (All New Diabetics in Uppsala, N=844) and DIREVA (Diabetes Registry Vaasa, N=3485). Cox regression and logistic regression was used to compare time to medication, time to reaching the treatment goal and risk of diabetic complications and genetic associations. Findings We identified 5 replicable clusters of diabetes patients, with significantly different patient characteristics and risk of diabetic complications. Particularly, individuals in the most insulin-resistant cluster 3 had significantly higher risk of diabetic kidney disease, but had been prescribed similar diabetes treatment compared to the less susceptible individuals in clusters 4 and 5. The insulin deficient cluster 2 had the highest risk of retinopathy. In support of the clustering, genetic associations to the clusters differed from those seen in traditional T2D. Interpretation We could stratify patients into five subgroups predicting disease progression and development of diabetic complications more precisely than the current classification. This new substratificationn may help to tailor and target early treatment to patients who would benefit most, thereby representing a first step towards precision medicine in diabetes. Funding The funders of the study had no role in study design, data collection, analysis, interpretation or writing of the report. Research in context Evidence before this study The current diabetes classification into T1D and T2D relies primarily on presence (T1D) or absence (T2D) of autoantibodies against pancreatic islet beta cell autoantigens and age at diagnosis (earlier for T1D). With this approach 75-85% of patients are classified as T2D. A third subgroup, Latent Autoimmune Diabetes in Adults (LADA,<10%), is defined by presence of autoantibodies against glutamate decarboxylase (GADA) with onset in adult age. In addition, several rare monogenic forms of diabetes have been described, including Maturity Onset Diabetes of the Young (MODY) and neonatal diabetes. This information is provided by national guidelines (ADA,WHO, IDF, Diabetes UK etc) but has not been much updated during the past 20 years and very few attempts have been made to explore heterogeneity of T2D. A topological analysis of potential T2D subgroups using electronic health records was published in 2015 but this information has not been implemented in the clinic. Added value of this study Here we applied a data-driven cluster analysis of 5 simple variables measured at diagnosis in 4 independent cohorts of newly-diagnosed diabetic patients (N=14755) and identified 5 replicable clusters of diabetes patients, with significantly different patient characteristics and risk of diabetic complications. Particularly, individuals in the most insulin-resistant cluster 3 had significantly higher risk of diabetic kidney disease. Implications of the available evidence This new sub-stratification may help to tailor and target early treatment to patients who would benefit most, thereby representing a first step towards precision medicine in diabetes