Vocalizations are a widespread means of communication in the animal kingdom. Mice use a large repertoire of ultrasonic vocalizations (USVs) in different social contexts, for instance courtship, territorial dispute, dominance and mother-pup interaction. Previous studies have pointed to differences in the USVs in different context, sexes, strains and individuals, however, in many cases the outcomes of the analyses remained inconclusive.\n\nWe here provide a more general approach to automatically classify USVs using deep neural networks (DNN). We classified the sex of the emitting mouse (C57Bl/6) based on the vocalizations spectrogram, reaching unprecedented performance (~84% correct) in comparison with other techniques (Support Vector Machines: 64%, Ridge regression: 52%). Vocalization characteristics of individual mice only contribute mildly, and sex-only classification reaches ~78%. The performance can only partially be explained by a set of classical shape features, with duration, volume and bandwidth being the most useful predictors. Splitting estimation into two DNNs, from spectrograms to features (57-82%) and features to sex (67%) does not reach the single-step performance.\n\nIn summary, the emitters sex can be successfully predicted from their spectrograms using DNNs, excelling over other classification techniques. In contrast to previous research, this suggests that male and female vocalizations differ in their spectrotemporal structure, recognizable even in single vocalizations.