Single-cell techniques have enabled the acquisition of multi-modal data, particularly for neurons, to characterize cellular functions. Patch-seq, for example, combines patch-clamp recording, cell imaging, and single-cell RNA-seq to obtain electrophysiology, morphology, and gene expression data from a single neuron. While these multi-modal data offer potential insights into neuronal functions, they can be heterogeneous and noisy. To address this, machine-learning methods have been used to align cells from different modalities onto a low-dimensional latent space, revealing multi-modal cell clusters. However, the use of those methods can be challenging for biologists and neuroscientists without computational expertise and also requires suitable computing infrastructure for computationally expensive methods. To address these issues, we developed a cloud-based web application, MANGEM (Multimodal Analysis of Neuronal Gene expression, Electrophysiology, and Morphology) at https://ctc.waisman.wisc.edu/mangem. MANGEM provides a step-by-step accessible and user-friendly interface to machine-learning alignment methods of neuronal multi-modal data while enabling real-time visualization of characteristics of raw and aligned cells. It can be run asynchronously for large-scale data alignment, provides users with various downstream analyses of aligned cells and visualizes the analytic results such as identifying multi-modal cell clusters of cells and detecting correlated genes with electrophysiological and morphological features. We demonstrated the usage of MANGEM by aligning Patch-seq multimodal data of neuronal cells in the mouse visual cortex.