Abstract A core problem in visual object learning is using a finite number of images of a new object to accurately identify that object in future, novel images. One longstanding, conceptual hypothesis asserts that this core problem is solved by adult brains through two connected mechanisms: 1) the re-representation of incoming retinal images as points in a fixed, multidimensional neural space, and 2) the optimization of linear decision boundaries in that space, via simple plasticity rules applied to a single downstream layer. Though this scheme is biologically plausible, the extent to which it explains learning behavior in humans has been unclear – in part because of a historical lack of image-computable models of the putative neural space, and in part because of a lack of measurements of human learning behaviors in difficult, naturalistic settings. Here, we addressed these gaps by 1) drawing from contemporary, image-computable models of the primate ventral visual stream to create a large set of testable learning models (n=2,408 models), and 2) using online psychophysics to measure human learning trajectories over a varied set of tasks involving novel 3D objects (n=371,000 trials), which we then used to develop (and publicly release ) empirical benchmarks for comparing learning models to humans. We evaluated each learning model on these benchmarks, and found that learning models that using specific, high-level contemporary representations are surprisingly aligned with human behavior. While no tested model explained the entirety of replicable human behavior, these results establish that rudimentary plasticity rules, when combined with appropriate visual representations, have high explanatory power in predicting human behavior with respect to this core object learning problem. Author Summary A basic conceptual hypothesis for how an adult brain learns to visually identify a new object is: 1) it re-represents images as points in a fixed, multidimensional space, then 2) it learns linear decision boundaries that separate images of a new object from others, using a single layer of plasticity. This hypothesis is considered biologically plausible, but gauging its power to explain human learning behavior has not been straightforward. In part, this is because it is difficult to model how brains re-represent images during object learning. However, ongoing efforts in neuroscience have led to the identification of specific, image-computable models that are at least partially accurate descriptions of the neural representations involved in primate vision. Here, we asked whether any of those representations, when combined them with simple plasticity rules, could make accurate predictions over a large body of human object learning behavioral measurements. We found that specific models could indeed explain a majority of our behavioral measurements, suggesting the rudimentary, biologically-plausible mechanisms considered here may be sufficient to explain a core aspect of human object learning.