ResearchHub | Open Science Community

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

Tim Sainburg et al.Oct 15, 2020

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.

Ecology

Developmental Biology

0

Paper

Save

A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations

Mara Thomas et al.Dec 17, 2021

ABSTRACT The manual detection, analysis, and classification of animal vocalizations in acoustic recordings is laborious and requires expert knowledge. Hence, there is a need for objective, generalizable methods that detect underlying patterns in these data, categorize sounds into distinct groups, and quantify similarities between them. Among all computational methods that have been proposed to accomplish this, neighborhood-based dimensionality reduction of spectrograms to produce a latent-space representation of calls stands out for its conceptual simplicity and effectiveness. Using a dataset of manually annotated meerkat ( Suricata suricatta ) vocalizations, we demonstrate how this method can be used to obtain meaningful latent space representations that reflect the established taxonomy of call types. We analyze strengths and weaknesses of the proposed approach, give recommendations for its usage and show application examples, such as the classification of ambiguous calls and the detection of mislabeled calls. All analyses are accompanied by example code to help researchers realize the potential of this method for the study of animal vocalizations.

Artificial Intelligence

Developmental Biology

16

Paper

Artificial Intelligence

2

0

Save

1

Long-range sequential dependencies precede complex syntactic production in language acquisition

Tim Sainburg et al.Aug 20, 2020

Abstract To convey meaning, human language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences, and discourse. The strength of the relationships between sequentially ordered elements of language (e.g., phonemes, characters, words) decays following a power law as a function of sequential distance. To understand the origins of these relationships, we examined long-range statistical structure in the speech of human children at multiple developmental time points, along with non-linguistic behaviors in humans and phylogenetically distant species. Here we show that adult-like power-law statistical dependencies precede the production of hierarchically-organized linguistic structures, and thus cannot be driven solely by these structures. Moreover, we show that similar long-range relationships occur in diverse non-linguistic behaviors across species. We propose that the hierarchical organization of human language evolved to exploit pre-existing long-range structure present in much larger classes of non-linguistic behavior, and that the cognitive capacity to model long-range hierarchical relationships preceded language evolution. We call this the Statistical Scaffolding Hypothesis for language evolution. 1 Significance Statement Human language is uniquely characterized by semantically meaningful hierarchical organization, conveying information over long timescales. At the same time, many non-linguistic human and animal behaviors are also often characterized by richly hierarchical organization. Here, we compare the long-timescale statistical dependencies present in language to those present in non-linguistic human and animal behaviors as well as language production throughout childhood. We find adult-like, long-timescale relationships early in language development, before syntax or complex semantics emerge, and we find similar relationships in non-linguistic behaviors like cooking and even housefly movement. These parallels demonstrate that long-range statistical dependencies are not unique to language and suggest a possible evolutionary substrate for the long-range hierarchical structure present in human language.

Philosophy

Artificial Intelligence

1

Paper

Save

Context-dependent sensory modulation underlies Bayesian vocal sequence perception

Tim Sainburg et al.Apr 15, 2022

Vocal communication in both songbirds and humans relies on categorical perception of smoothly varying acoustic spaces. Vocal perception can be biased by expectation and context, but the mechanisms of this bias are not well understood. We developed a behavioral task in which songbirds, European starlings, are trained to to classify smoothly varying song syllables in the context of predictive syllable sequences. We find that syllable-sequence predictability biases perceptual categorization following a Bayesian model of probabilistic information integration. We then recorded from populations of neurons in the auditory forebrain while birds actively categorized song syllables, observing large proportions of neurons that track the smoothly varying natural feature space of syllable categories. We observe that predictive information in the syllable sequences dynamically modulates sensory neural representations. These results support a Bayesian model of perception where predictive information acts to dynamically reallocate sensory neural resources, sharpening acuity (i.e. the likelihood) in high-probability regions of stimulus space. One-Sentence Summary Predictive information in vocal sequences biases Bayesian categorical perception through rapid sensory reorganization. Graphical Abstract

Ecology

Artificial Intelligence

1

Paper

Save

Latent space visualization, characterization, and generation of diverse vocal communication signals

Tim Sainburg et al.Dec 11, 2019

Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species’ vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present here a set of computational methods that center around projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from data. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates, enabling high-powered comparative analyses of unbiased acoustic features in the communicative repertoires across species. Latent projections uncover complex features of data in visually intuitive and quantifiable ways. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication. Finally, we show how systematic sampling from latent representational spaces of vocalizations enables comprehensive investigations of perceptual and neural representations of complex and ecologically relevant acoustic feature spaces.

Ecology

Developmental Biology

0

Paper

Ecology

Developmental Biology

0

Save

Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires

A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations

Long-range sequential dependencies precede complex syntactic production in language acquisition

Context-dependent sensory modulation underlies Bayesian vocal sequence perception

Latent space visualization, characterization, and generation of diverse vocal communication signals

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension