ABSTRACTAcute Myeloid Leukemia (AML) is a severe, mostly fatal hematopoietic malignancy. Despite nearly two decades of promising results using gene expression profiling, international recommendations for diagnosis and differential diagnosis of AML remain based on classical approaches including assessment of morphology, immunophenotyping, cytochemistry, and cytogenetics. Concerns about the translation of whole transcriptome profiling include the robustness of derived predictors when taking into account factors such as study- and site-specific effects and whether achievable levels of accuracy are sufficient for practical use. In the present study, we sought to shed light on these issues via a large-scale analysis using machine learning methods applied to a total of 12,029 samples from 105 different studies. Taking advantage of the breadth of data and the now much improved understanding of high-dimensional modeling, we show that AML can be predicted with high accuracy. High-dimensional approaches - in which multivariate signatures are learned directly from genome-wide data with no prior biological knowledge - are highly effective and robust. We explore also the relationship between predictive signatures, differential expression and known AML-related genes. Taken together, our results support the notion that transcriptome assessment could be used as part of an integrated genomic approach in cancer diagnosis and treatment to be implemented early on for diagnosis and differential diagnosis of AML.\n\nOne Sentence SummaryBlood gene expression data and machine learning were used to develop robust and accurate classifiers for diagnosis and differential diagnosis of acute myeloid leukemia based on analysis of more than 12,000 samples derived from more than 100 individual studies