Measurement of blood oxygen saturation (sO2) by optical imaging oximetry provides invaluable insight into local tissue functions and metabolism. Despite different embodiments and modalities, all label-free optical imaging oximetry utilize the same principle of sO2-dependent spectral contrast from hemoglobin. Traditional approaches for quantifying sO2 often rely on analytical models that are fitted by the spectral measurements. These approaches in practice suffer from uncertainties due to biological variability, tissue geometry, light scattering, systemic spectral bias, and variations in experimental conditions. Here, we propose a new data-driven approach, termed deep spectral learning (DSL) for oximetry to be highly robust to experimental variations, and more importantly to provide uncertainty quantification for each sO2 prediction. To demonstrate the robustness and generalizability of DSL, we analyze data from two visible light optical coherence tomography (vis-OCT) setups across two separate in vivo experiments in rat retina. Predictions made by DSL are highly adaptive to experimental variabilities as well as the depth-dependent backscattering spectra. Two neural-network-based models are tested and compared with the traditional least-squares fitting (LSF) method. The DSL-predicted sO2 shows significantly lower mean-square errors than the LSF. For the first time, we have demonstrated en face maps of retinal oximetry along with pixel-wise confidence assessment. Our DSL overcomes several limitations in the traditional approaches and provides a more flexible, robust, and reliable deep learning approach for in vivo non-invasive label-free optical oximetry.