2018 recipients of the Award for Annual Meeting Outstanding Student Presentations
Mohammed Alqawba
Title: Copula Based Models for Zero-Inflated Count Time Series
Abstract: Count time series data are observed in several applied disciplines such as in environmental field, biostatistics, economics, public health, and finance. In some cases, a specific count, say zero, may occur more often than usual. Overlooking the frequent occurrence of zeros could lead to false inference. In this presentation, we develop a copula-based time series model for zero-inflated counts with the presence of covariates. Zero-inflated Poisson (ZIP), zero-inflated negative Binomial (ZINB), and zero-inflated Conway-Maxwell-Poisson (ZICMP) distributed marginals will be considered, while the joint distribution is modeled under Gaussian copula with autoregression moving average (ARMA) errors. Importance sampling likelihood inference is utilized. Prediction, and forecasting are carried out in an analogous style to an ordinary ARMA model. To evaluate the proposed method, simulated and real-life data examples are provided and studied.
Kellen Cresswell
Title: Detection of the 3D genomic structures using spectral clustering
Abstract: Background. Recent advances in genome sequencing have given us the ability to analyze the 3D organization of the genome. Topologically associating domains (TADs) are distinct units of the genome confined in 3D space and characterized by strong interactions among regions within them. These structures have been shown to be highly associated with regulation of gene expression, defining the identity of the cells. Despite their apparent importance, there is no consensus method for detecting TADs. Methods. We propose a novel, spectral clustering-based approach to solve this problem. Using a sliding window based on the maximum biologically possible TAD size, we move along the diagonal of the matrix performing spectral clustering. The number of clusters in each window is chosen by finding the number of clusters that maximizes the average silhouette score. Silhouette score is a method for measuring the level of connectivity within a TAD while penalizing for high connectivity between TADs. Results. We show this method is robust to issues specific to 3D sequencing data such as sparsity, sequencing depth and the resolution of the data. We also show that by using a windowed approach, we transform spectral clustering from a problem with cubic computational complexity to one with linear complexity. The method is extended to allow for hierarchical clustering and the separation of TADs into smaller sub-TADs enclosed in large meta-TADs. Finally, the biological relevance is tested based on the proximity of boundaries to known epigenomic marks and transcription factor binding sites. Conclusions. Current methods for defining the fundamental 3D units of the genome, such as TADs, remain to be improved. We demonstrate that treating chromatin interaction data as a graph clustering problem improves robustness of TAD boundary detection in terms of sensitivity to noise, sparsity, sequencing depth of the data. We present a fast and accurate R package SpectralTAD that outperforms other TAD callers in detecting biologically relevant TAD boundaries.