Overview
Publication
Biostatistics. 2015 Jul; 16(3):480-92.
PubMed ID: 25532524
Title
Kernel-based logistic regression model for protein sequence without vectorialization
Authors
Fong Y, Datta S, Georgiev IS, Kwong PD, Tomaras GD
Abstract
Protein sequence data arise more and more often in vaccine and infectious disease research. These types of data are discrete, high-dimensional, and complex. We propose to study the impact of protein sequences on binary outcomes using a kernel-based logistic regression model, which models the effect of protein through a random effect whose variance-covariance matrix is mostly determined by a kernel function. We propose a novel, biologically motivated, profile hidden Markov model (HMM)-based mutual information (MI) kernel. Hypothesis testing can be carried out using the maximum of the score statistics and a parametric bootstrap procedure. To improve the power of testing, we propose intuitive modifications to the test statistic. We show through simulation studies that the profile HMM-based MI kernel can be substantially more powerful than competing kernels, and that the modified test statistics bring incremental gains in power. We use these proposed methods to investigate two problems from HIV-1 vaccine research: (1) identifying segments of HIV-1 envelope (Env) protein that confer resistance to neutralizing antibody and (2) identifying segments of Env that are associated with attenuation of protective vaccine effect by antibodies of isotype A in the RV144 vaccine trial.
With the publicly available data in the CAVD DataSpace we can Learn about studies, products, assays, antibodies, and publications, Find subjects with common characteristics, Plot assay results across studies and years of research, and Compare monoclonal antibodies and their neutralization curves. Data are also accessible via DataSpaceR, our R API.
Sign in to see full information about this publication and to download study data when available.