ResearchHub | Open Science Community

Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing

Alysha Livera et al.Dec 1, 2011

An innovations state space modeling framework is introduced for forecasting complex seasonal time series such as those with multiple seasonal periods, high-frequency seasonality, non-integer seasonality, and dual-calendar effects. The new framework incorporates Box–Cox transformations, Fourier representations with time varying coefficients, and ARMA error correction. Likelihood evaluation and analytical expressions for point forecasts and interval predictions under the assumption of Gaussian errors are derived, leading to a simple, comprehensive approach to forecasting complex seasonal time series. A key feature of the framework is that it relies on a new method that greatly reduces the computational burden in the maximum likelihood estimation. The modeling framework is useful for a broad range of applications, its versatility being illustrated in three empirical studies. In addition, the proposed trigonometric formulation is presented as a means of decomposing complex seasonal time series, and it is shown that this decomposition leads to the identification and extraction of seasonal components which are otherwise not apparent in the time series plot itself.

Finance

Paleontology

0

Paper

Save

RUV-III-NB: Normalization of single cell RNA-seq Data

Agus Salim et al.Nov 8, 2021

Abstract Despite numerous methodological advances, the normalization of single cell RNA-seq (scRNA-seq) data remains a challenging task and the performance of different methods can vary greatly across datasets. Part of the reason for this is the different kinds of unwanted variation, including library size, batch and cell cycle effects, and the association of these with the biology embodied in the cells. A normalization method that does not explicitly take into account cell biology risks removing some of the signal of interest. Furthermore, most normalization methods remove the effects of unwanted variation for the cell embedding used for clustering-based analysis but not from gene-level data typically used for differential expression (DE) analysis to identify marker genes. Here we propose RUV-III-NB, a statistical method that can be used to remove unwanted variation from both the cell embedding and gene-level counts. RUV-III-NB explicitly takes into account its potential association with biology when removing unwanted variation via the use of pseudo-replicates. The method can be used for both UMI or sequence read counts and returns adjusted counts that can be used for downstream analyses such as clustering, DE and pseudotime analyses. Using five publicly available datasets that encompass different technological platforms, kinds of biology and levels of association between biology and unwanted variation, we show that RUV-III-NB manages to remove library size and batch effects, strengthen biological signals, improve differential expression analyses, and lead to results exhibiting greater concordance with independent datasets of the same kind. The performance of RUV-III-NB is consistent across the five datasets and is not sensitive to the number of factors assumed to contribute to the unwanted variation. It also shows promise for removing other kinds of unwanted variation such as platform effects. The method is implemented as a publicly available R package available from https://github.com/limfuxing/ruvIIInb .

Genetics

Artificial Intelligence

1

Paper

Save

MetaHD: a multivariate meta-analysis model for metabolomics data

J. Liyanage et al.Jul 1, 2024

Meta-analysis methods widely-used for combining metabolomics data do not account for correlation between metabolites or missing values. Within- and between-study variability are also often overlooked. These can give results with inferior statistical properties, leading to misidentification of biomarkers.

Molecular Biology

Internal Medicine

0

Paper

Save

Genomic prediction of coronary heart disease

Gad Abraham et al.Feb 26, 2016

Background Genetics plays an important role in coronary heart disease (CHD) but the clinical utility of a genomic risk score (GRS) relative to clinical risk scores, such as the Framingham Risk Score (FRS), is unclear. Methods We generated a GRS of 49,310 SNPs based on a CARDIoGRAMplusC4D Consortium meta-analysis of CHD, then independently tested this using five prospective population cohorts (three FINRISK cohorts, combined n=12,676, 757 incident CHD events; two Framingham Heart Study cohorts (FHS), combined n=3,406, 587 incident CHD events). Results The GRS was strongly associated with time to CHD event (FINRISK HR=1.74, 95% CI 1.61-1.86 per S.D. of GRS; Framingham HR=1.28, 95% CI 1.18-1.38), and was largely unchanged by adjustment for clinical risk scores or individual risk factors, including family history. Integration of the GRS with clinical risk scores (FRS and ACC/AHA13 score) improved prediction of CHD events within 10 years (meta-analysis C-index: +1.5-1.6%, P<0.001), particularly for individuals ≥60 years old (meta-analysis C-index: +4.6-5.1%, P<0.001). Men in the top 20% of the GRS had 3-fold higher risk of CHD by age 75 in FINRISK and 2-fold in FHS, and attaining 10% cumulative CHD risk 18y earlier in FINRISK and 12y earlier in FHS than those in the bottom 20%. Furthermore, high genomic risk was partially compensated for by low systolic blood pressure, low cholesterol level, and non-smoking. Conclusions A GRS based on a large number of SNPs substantially improves CHD risk prediction and encodes decades of variation in CHD risk not captured by traditional clinical risk scores.

Genetics

Internal Medicine

0

Paper

Genetics

Internal Medicine

0

Save