Abstract Complex microbial interactions can lead to different colonization outcomes of exogenous species, be they pathogenic or beneficial in nature. Predicting the colonization of exogenous species in complex communities remains a fundamental challenge in microbial ecology, mainly due to our limited knowledge of the diverse physical, biochemical, and ecological processes governing microbial dynamics. Here, we proposed a data-driven approach independent of any dynamics model to predict colonization outcomes of exogenous species from the baseline compositions of microbial communities. We systematically validated this approach using synthetic data, finding that machine learning models (including Random Forest and neural ODE) can predict not only the binary colonization outcome but also the post-invasion steady-state abundance of the invading species. Then we conducted colonization experiments for two commensal gut bacteria species Enterococcus faecium and Akkermansia muciniphila in hundreds of human stool-derived in vitro microbial communities, confirming that the data-driven approach can successfully predict the colonization outcomes. Furthermore, we found that while most resident species were predicted to have a weak negative impact on the colonization of exogenous species, strongly interacting species could significantly alter the colonization outcomes, e.g., the presence of Enterococcus faecalis inhibits the invasion of E. faecium . The presented results suggest that the data-driven approach is a powerful tool to inform the ecology and management of complex microbial communities.