ResearchHub | Open Science Community

RI

Rafael Irizarry

Author with expertise in Comprehensive Integration of Single-Cell Transcriptomic Data

Achievements

Cited Author

Open Access Advocate

Key Stats

Upvotes received:

0

Publications:

74

(70% Open Access)

Cited by:

78,777

h-index:

96

/

i10-index:

202

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

Bioconductor: open software development for computational biology and bioinformatics

Robert Gentleman et al.Sep 15, 2004

Abstract The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

0

Paper

Save

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

Rafael Irizarry et al.Apr 1, 2003

In this paper we report exploratory analyses of high‐density oligonucleotide array data from the Affymetrix GeneChip® system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip® arrays, part of the data from an extensive spike‐in study conducted by Gene Logic and Wyeth's Genetics Institute involving 95 HG‐U95A human GeneChip® arrays; and part of a dilution study conducted by Gene Logic involving 75 HG‐U95A GeneChip® arrays. We display some familiar features of the perfect match and mismatch probe (PM and MM) values of these data, and examine the variance–mean relationship with probe‐level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike‐in data and assess three commonly used summary measures: Affymetrix's (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model‐based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi‐array average (RMA) of background‐adjusted, normalized, and log‐transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike‐in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe‐specific affinities.

Molecular Biology

0

Paper

Save

Salmon provides fast and bias-aware quantification of transcript expression

Rob Patro et al.Mar 6, 2017

0

Paper

Save

A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

Benjamin Bolstad et al.Jan 21, 2003

When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations.We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably.Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org.Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Summaries of Affymetrix GeneChip probe level data

Rafael Irizarry et al.Feb 11, 2003

High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11–20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike‐in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.

0

Paper

Save

affy—analysis of Affymetrix GeneChip data at the probe level

Laurent Gautier et al.Feb 11, 2004

Abstract Motivation: The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. Results: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform.

Molecular Biology

0

Paper

Save

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays

Martin Aryee et al.Jan 28, 2014

Abstract Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.

0

Paper

Save

Epigenetic memory in induced pluripotent stem cells

K. Kim et al.Jul 19, 2010

Molecular Biology

0

Paper

Save

The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores

Rafael Irizarry et al.Jan 18, 2009

Molecular Biology

0

Paper

Save

MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens

Wei Li et al.Dec 4, 2014

Abstract We propose the Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK demonstrates better performance compared with existing methods, identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. Using public datasets, MAGeCK identified novel essential genes and pathways, including EGFR in vemurafenib-treated A375 cells harboring a BRAF mutation. MAGeCK also detected cell type-specific essential genes, including BCR and ABL1 , in KBM7 cells bearing a BCR-ABL fusion, and IGF1R in HL-60 cells, which depends on the insulin signaling pathway for proliferation.

Molecular Biology

0

Paper

Save

Load More