ResearchHub | Open Science Community

Inference of Population Structure Using Multilocus Genotype Data

Jonathan Pritchard et al.Jun 1, 2000

Abstract We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

Genetics

Ecology

0

Paper

Save

Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies

Daniel Falush et al.Aug 1, 2003

Abstract We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations (“admixture linkage disequilibium”). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.

Genetics

Philosophy

0

Paper

Save

A New Statistical Method for Haplotype Reconstruction from Population Data

Matthew Stephens et al.Apr 1, 2001

Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by >50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources. Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by >50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources.

Genetics

Demography

0

Paper

Save

The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans

Stephen Buia et al.May 7, 2015

Expression, genetic variation, and tissues Human genomes show extensive genetic variation across individuals, but we have only just started documenting the effects of this variation on the regulation of gene expression. Furthermore, only a few tissues have been examined per genetic variant. In order to examine how genetic expression varies among tissues within individuals, the Genotype-Tissue Expression (GTEx) Consortium collected 1641 postmortem samples covering 54 body sites from 175 individuals. They identified quantitative genetic traits that affect gene expression and determined which of these exhibit tissue-specific expression patterns. Melé et al. measured how transcription varies among tissues, and Rivas et al. looked at how truncated protein variants affect expression across tissues. Science , this issue p. 648 , p. 660 , p. 666 ; see also p. 640

Genetics

Molecular Biology

0

Paper

Save

A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data

Matthew Stephens et al.Oct 24, 2003

Genetics

Artificial Intelligence

0

Paper

Save

Inference of population structure using multilocus genotype data: dominant markers and null alleles

Daniel Falush et al.Mar 26, 2007

Abstract Dominant markers such as amplified fragment length polymorphisms (AFLPs) provide an economical way of surveying variation at many loci. However, the uncertainty about the underlying genotypes presents a problem for statistical analysis. Similarly, the presence of null alleles and the limitations of genotype calling in polyploids mean that many conventional analysis methods are invalid for many organisms. Here we present a simple approach for accounting for genotypic ambiguity in studies of population structure and apply it to AFLP data from whitefish. The approach is implemented in the program structure version 2.2, which is available from http://pritch.bsd.uchicago.edu/structure.html .

Genetics

Artificial Intelligence

0

Paper

Save

Inferring weak population structure with the assistance of sample group information

Melissa Hubisz et al.Mar 20, 2009

Genetic clustering algorithms require a certain amount of data to produce informative results. In the common situation that individuals are sampled at several locations, we show how sample group information can be used to achieve better results when the amount of data is limited. New models are developed for the structure program, both for the cases of admixture and no admixture. These models work by modifying the prior distribution for each individual's population assignment. The new prior distributions allow the proportion of individuals assigned to a particular cluster to vary by location. The models are tested on simulated data, and illustrated using microsatellite data from the CEPH Human Genome Diversity Panel. We demonstrate that the new models allow structure to be detected at lower levels of divergence, or with less data, than the original structure models or principal components methods, and that they are not biased towards detecting structure when it is not present. These models are implemented in a new version of structure which is freely available online at http://pritch.bsd.uchicago.edu/structure.html.

Genetics

Philosophy

0

Paper

Save

Genome-wide efficient mixed-model analysis for association studies

Xiang Zhou et al.Jun 17, 2012

Genetics

Philosophy

0

Paper

Save

RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays

John Marioni et al.Jun 11, 2008

Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

Genetics

Molecular Biology

0

Paper

Save

Association Mapping in Structured Populations

Jonathan Pritchard et al.Jul 1, 2000

Genetics

Ophthalmology

0

Paper

Genetics

1,990

0

Save