e15089 Background: Diagnosis of CRC is biased towards later stages in India (3.8% Stage I, 16.7% Stage II, 50.7% Stage III, 28.8% Stage IV), and five-year survival at < 40% is one of the lowest in the world. A blood-based non-invasive screening test for CRC using cell-free DNA (cfDNA) methylation sequencing is developed here from the blood of 212 controls and 67 treatment naive CRC patients from 21 sites across India and processed in Strand’s reference lab in Bangalore. Methods: Steps involved cfDNA extraction, NEB Enzymatic Methyl-Seq library preparation, Twist Human Methylome hybridisation capture, 2x150bp sequencing on NovaSeq 6000/X. Methylation + fragmentomic features were calculated for each target region. Samples were split randomly into a leave-in set of 170 controls + 53 cancers (I: 8, II 16, III: 23, IV: 6) and a leave-out set of 42 controls + 14 cancers, (I: 5, II: 4, III: 2, IV: 3) with 20 rounds of 4-fold cross-validation done on the leave-in set (random splits). Feature selection per fold was performed using the KS test without access to ¼ of the leave-in set and to the entire leave-out set. Gradient boosted trees with monotonic constraints reflecting the expected association of the scores with cancer were used to build “explainable” models. Test robustness was assessed using differentially methylated regions (DMRs) from other studies, and by assessing predictability using sample metadata alone. Results: At a 91% specificity level, the ensemble model had a median sensitivity of 62.5% for Stage I (95%CI 38%-86%), 87% for Stage II (95% CI 75%-95%), 87% for Stage III (95% CI 75%-95%) and 83.4% for Stage IV (95% CI 84%-100%) in cross-validation, and 60% for Stage I (95%CI 60%-80%), 100% for Stage II (95% CI 80%-100%), 100% for Stage III (95% CI 100%-100%) and 100% for Stage IV (95% CI 100%-100%) on the leave-out set. At a ~98% specificity level, the model had a median sensitivity of 37.5% for Stage I (95%CI 37%-50%), 69% for Stage II (95% CI 56%-75%), 69% for Stage III (95% CI 60%-84%) and 67% for Stage IV (95% CI 50%-84%) in cross-validation, and 40% for Stage I (95%CI 20%-60%), 75% for Stage II (95% CI 75%-100%), 100% for Stage III (95% CI 100%-100%) and 100% for Stage IV (95% CI 66%-100%) with a slight decrease in specificity to 95.2% on the leave-out set. Using DMRs derived from TCGA data and other publications, yielded a comparable (to our models) cross-validation area under the curve (AUC: 0.93-0.95). Cross-validation performance using only the sample metadata in table 1 and without access to the data was significantly poorer (AUC 0.75-0.81). Conclusions: cfDNA-based methylation profiles are consistent across studies and ethnicities, leading to robust and “explainable” CRC screening predictions. [Table: see text]
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.