--- title: R/lmmlite user guide output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{R/lmmlite user guide} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8](inputenc) --- [R/lmmlite](https://kbroman.org/lmmlite/) is an R package for fitting linear mixed models in the context of genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping. At present, **it is not intended for "production" use.** I wrote this to try to get a better understanding of how to fit these models, and I've not yet put any effort into checking arguments and omitting cases with missing values. The code closely follows [Nick Furlotte](http://whatmind.com)'s [pylmm](https://github.com/nickFurlotte/pylmm). The sort of model I want to fit is $y = X\beta + \epsilon$ where $\epsilon$ is multivariate normal with mean 0 and variance $\sigma^2_g K + \sigma^2_e I$, where $K$ is a known kinship matrix and $I$ is the identity. In pylmm, the focus is on $\sigma^2 = \sigma^2_g + \sigma^2_e$ and the heritability $h^2 = \sigma^2_g / \sigma^2$. ```{r load_data, echo=FALSE} library(lmmlite) data(recla) ``` We'll focus here on the `recla` dataset that I include with the package. This is data on a set of diversity outcross mice, from [Recla et al. (2014)](https://pubmed.ncbi.nlm.nih.gov/24700285/) and [Logan et al. (2013)](https://pubmed.ncbi.nlm.nih.gov/23433259/). The data are originally from the [QTL Archive/Mouse Phenotype Database](https://phenome.jax.org/projects/Recla1) and I've also [placed them online](https://github.com/kbroman/qtl2data/tree/master/DO_Recla) in the format used by [R/qtl2](https://kbroman.org/qtl2/). I used [qtl2geno](https://github.com/rqtl/qtl2geno) to calculate genotype probabilities from the MUGA SNP array data. The data include `r nrow(recla$kinship)` mice, and form a list with three components: the estimated kinship matrix, a matrix with `r ncol(recla$pheno)` phenotypes, and a covariate matrix that contains an intercept (all 1's) and sex (0 for female and 1 for male). ```{r show_load_data, eval=FALSE} library(lmmlite) data(recla) ``` The package includes two implementations of the fit of linear mixed models (LMMs): plain R, and C++ (using the [Rcpp](https://github.com/RcppCore/Rcpp) and [RcppEigen](https://github.com/RcppCore/RcppEigen) packages, the latter being an interface to the [Eigen](https://libeigen.gitlab.io/) C++ linear algebra library). R/lmmlite includes just four functions: `eigen_rotation()`, `getMLsoln()`, `calcLL()`, and `fitLMM()`. In each case, use `use_cpp=FALSE` to use the plain R implementation and `use_cpp=TRUE` (the default) to use the C++ implementation. The first function, `eigen_rotation()`, calculates the eigen decomposition of the kinship matrix and "rotates" the phenotypes and covariates by pre-multiplying them with the tranpose of the matrix of eigenvectors. This rotation greatly simplifies later calculations by turning a general least squares problem into a weighted least squares problem. Care must be taken with missing values in the phenotypes and covariates. In practice, one must work with batches of phenotypes that have a common missing data pattern, and first remove individuals with missing values. Here, we'll focus on the first phenotype. ```{r eigen_rotation} e <- eigen_rotation(recla$kinship, recla$pheno[,1], recla$covar) ``` The output is a list containing four components: `Kva` is the vector of eigenvalues of the kinship matrix, `Kva_t` is the transposed matrix of eigenvectors, `y` is the rotated phenotype matrix, and `X` is the rotated covariate matrix. The next function `getMLsoln()` will estimates the coefficients ($\beta$) and residual variance ($\sigma^2 = \sigma^2_g + \sigma^2_e$) for a given value of the heritability ($h^2 = \sigma^2_g / \sigma^2$). It takes, as input, a single value for the heritability, and then `Kva`, `y`, and `X`, as returned from `eigen_rotation()`. A further argument `reml`, indicates whether REML will later be used, in which case the log determinant of the matrix $X' W X$, where $W$ is a diagonal matrix of weights, is calculated. ```{r getMLsoln} ml_soln <- getMLsoln(0.5, e$Kva, e$y, e$X) ``` The result is a list containing `beta` and `sigmasq`, with the residual sum of squares (RSS) and log det($X' W X$) included as attributes. The function `calcLL()` calculates the log likelihood for a given value of the heritability. (It can also take a vector of heritabilities.) It takes the same arguments as `getMLsoln()`, and in practice one would probably not call `getMLsoln()` directly. ```{r calcLL} hsq <- seq(0, 1, by=0.01) ll <- calcLL(hsq, e$Kva, e$y, e$X) ``` We can plot the results as follows. ```{r plot_ll, fig.width=7, fig.height=4} plot(hsq, ll, type="l", lwd=2, las=1, xlab="heritability", ylab="log likelihood") ``` If you call `calcLL()` with a single heritability value, the results include the estimates of `beta` and `sigmasq` as attributes. ```{r calcLL_one} one_ll <- calcLL(0.5, e$Kva, e$y, e$X) attr(one_ll, "beta") attr(one_ll, "sigmasq") ``` The final function, `fitLMM()`, optimizes the log likelihood to estimate the heritability. The result is a list containing `beta`, `sigmasq`, `hsq`, `sigmasq_g`, `sigmasq_e`, and `loglik`. ```{r fitLMM} (out <- fitLMM(e$Kva, e$y, e$X)) ``` The `fitLMM()` function includes an argument `check_boundary`; if `TRUE` (the default), the 0 and 1 boundaries are checked explicitly, in seeking to estimate the heritability.