In residential buildings, window opening is typically one of the most significant parameters that affects indoor air temperature and thus the energy required for heating and cooling. While several field measurement campaigns have been undertaken, and datasets have been used to calibrate data-driven stochastic models based on different techniques, a critical comparison between these window operation models, analysing their suitability for building energy simulation, is needed. This study compares seven different modelling approaches including Gaussian distribution function, logistic regression, Markov chain, Markov-logit hybrid, classification tree, artificial neural network and random forest, trained and tested using measurements of 21 living rooms and 10 bedrooms from residential buildings located in Australian subtropical and temperate climates. Two types of modelling approaches were tested: individual, where each window operation was modelled individually, and cohort, where a common model was developed considering all the windows in the dataset. The stochastic nature of window opening was introduced to the models by Monte Carlo methods. True Positive Rate (TPR) and True Negative Rate (TNR), quantified using the Area Under Curve (AUC) method, were used to benchmark the performance of the different models. The classification trees and random forest were identified as more accurate methods (> 0.74 median AUC) for cohort modelling, while artificial neural network and Markov-logit hybrid methods were identified as more accurate methods (> 0.7 median AUC) for individual window operation modelling in building simulation applications.