Latent Variable Models (`lava`)

Latent Variable Models (lava)

Latent Variable Models (`lava`)

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) doi:10.1007/s00180-012-0344-y). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) doi:10.1093/biostatistics/kxy082). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Installation

install.packages("lava", dependencies=TRUE)
library("lava")
demo("lava")

For graphical capabilities the Rgraphviz package is needed (first install the BiocManager package)

# install.packages("BiocManager")
BiocManager::install("Rgraphviz")

or the igraph or visNetwork packages

install.packages("igraph")
install.packages("visNetwork")

The development version of lava may also be installed directly from github:

# install.packages("remotes")
remotes::install_github("kkholst/lava")

Citation

To cite that lava package please use one of the following references

Klaus K. Holst and Esben Budtz-Joergensen (2013). Linear Latent Variable Models: The lava-package. Computational Statistics 28 (4), pp 1385-1453. http://dx.doi.org/10.1007/s00180-012-0344-y

@article{lava,
  title = {Linear Latent Variable Models: The lava-package},
  author = {Klaus Kähler Holst and Esben Budtz-Jørgensen},
  year = {2013},
  volume = {28},
  number = {4},
  pages = {1385-1452},
  journal = {Computational Statistics},
  doi = {10.1007/s00180-012-0344-y}
}

Klaus K. Holst and Esben Budtz-Jørgensen (2020). A two-stage estimation procedure for non-linear structural equation models. Biostatistics 21 (4), pp 676-691. http://dx.doi.org/10.1093/biostatistics/kxy082

@article{lava_nlin,
  title = {A two-stage estimation procedure for non-linear structural equation models},
  author = {Klaus Kähler Holst and Esben Budtz-Jørgensen},
  journal = {Biostatistics},
  year = {2020},
  volume = {21},
  number = {4},
  pages = {676-691},
  doi = {10.1093/biostatistics/kxy082},
}

Examples

Influence functions

Construct estimate objects from parameter coefficients and estimated influence functions

a <- estimate(coef=c("a"=0.5), IC=rnorm(10), id=1:10)
b <- estimate(coef=c("b"=0.8), IC=rnorm(10), id=6:15)

Alternatively, we can construct estimate objects directly from an existing model object (glm(), mets::phreg(), targeted::cate(), …)

estimate(modelobj, id, ...)

We can now merge the estimate objects to obtain their joint distribution via their estimated influence functions

e <- c(a, b)
vcov(e) # joint distribution
#>             a           b
#> a  0.07514259 -0.01509644
#> b -0.01509644  0.03372599
summary(e, null=c(0, 0))
#> Call: estimate.default(contrast = as.list(seq_along(p)), null = ..1, 
#>     vcov = vcov(object, messages = 0), coef = p)
#> ────────────────────────────────────────────────────────────
#>   Estimate Std.Err     2.5% 97.5%   P-value
#> a      0.5  0.2741 -0.03727 1.037 6.815e-02
#> b      0.8  0.1836  0.44006 1.160 1.323e-05
#> ────────────────────────────────────────────────────────────
#> Null Hypothesis: 
#>   [a] = 0
#>   [b] = 0 
#>  
#> chisq = 29.7439, df = 2, p-value = 3.477e-07

Parameter transformations can be calculated directly as in the following examples.

Products

a * b
#>   Estimate Std.Err    2.5%  97.5% P-value
#> a      0.4  0.2108 -0.0132 0.8132 0.05778

General transformations

(3 * cos(a) / sqrt(b) + 1) / a^2
#>   Estimate Std.Err   2.5% 97.5% P-value
#> a    15.77    18.7 -20.87 52.42  0.3989

Inner product, sums, and products

c(iprod=e %*% c(a, b^2), sum=sum(e), prod=prod(e))
#>       Estimate Std.Err     2.5%  97.5%   P-value
#> iprod    0.762  0.3762  0.02473 1.4993 4.279e-02
#> ─────                                           
#> sum      1.300  0.2805  0.75025 1.8498 3.574e-06
#> ─────                                           
#> prod     0.400  0.2108 -0.01320 0.8132 5.778e-02

Exponentiation and renaming of parameter

c(pow = a^b)
#>     Estimate Std.Err    2.5% 97.5% P-value
#> pow   0.5743  0.2826 0.02051 1.128  0.0421

Transformation and subsetting

c(e["a"] * e["b"] / a, e["b"])
#>   Estimate Std.Err   2.5% 97.5%   P-value
#> a      0.8  0.1836 0.4401  1.16 1.323e-05
#> ─                                        
#> b      0.8  0.1836 0.4401  1.16 1.323e-05

For the %*%* operator we can also use a general contrast matrix

B <- rbind(c(1,-1), c(1,0), c(0,1))
B %*% e
#>           Estimate Std.Err     2.5%  97.5%   P-value
#> [a] - [b]     -0.3  0.3729 -1.03089 0.4309 4.211e-01
#> a              0.5  0.2741 -0.03727 1.0373 6.815e-02
#> b              0.8  0.1836  0.44006 1.1599 1.323e-05
#> ────────────────────────────────────────────────────────────
#> Null Hypothesis: 
#>   [a] - [b] = 0
#>   [a] = 0
#>   [b] = 0 
#>  
#> chisq = 29.7439, df = 2, p-value = 3.477e-07
plot(B %*% e)

Structural Equation Model

Specify structural equation models with two factors

m <- lvm()
regression(m) <- y1 + y2 + y3 ~ u1
regression(m) <- z1 + z2 + z3 ~ u2
latent(m) <- ~ u1 + u2
regression(m) <- u2 ~ u1 + x
regression(m) <- u1 ~ x
    
plot(m)

Simulation

d <- sim(m, 100, seed=1)

Estimation

e <- estimate(m, d)
e
#>                     Estimate Std. Error  Z-value   P-value
#> Measurements:                                             
#>    y2~u1             0.95462    0.08083 11.80993    <1e-12
#>    y3~u1             0.98476    0.08922 11.03722    <1e-12
#>     z2~u2            0.97038    0.05368 18.07714    <1e-12
#>     z3~u2            0.95608    0.05643 16.94182    <1e-12
#> Regressions:                                              
#>    u1~x              1.24587    0.11486 10.84694    <1e-12
#>     u2~u1            0.95608    0.18008  5.30910 1.102e-07
#>     u2~x             1.11495    0.25228  4.41951 9.893e-06
#> Intercepts:                                               
#>    y2               -0.13896    0.12458 -1.11537    0.2647
#>    y3               -0.07661    0.13869 -0.55241    0.5807
#>    u1                0.15801    0.12780  1.23644    0.2163
#>    z2               -0.00441    0.14858 -0.02969    0.9763
#>    z3               -0.15900    0.15731 -1.01076    0.3121
#>    u2               -0.14143    0.18380 -0.76949    0.4416
#> Residual Variances:                                       
#>    y1                0.69684    0.14858  4.69004          
#>    y2                0.89804    0.16630  5.40026          
#>    y3                1.22456    0.21182  5.78109          
#>    u1                0.93620    0.19623  4.77084          
#>    z1                1.41422    0.26259  5.38570          
#>    z2                0.87569    0.19463  4.49934          
#>    z3                1.18155    0.22640  5.21883          
#>    u2                1.24430    0.28992  4.29195

Model assessment

Assessing goodness-of-fit, here the linearity between eta2 and eta1 (requires the gof package)

# install.packages("gof", repos="https://kkholst.github.io/r_repo/")
library("gof")
set.seed(1)
g <- cumres(e, u2 ~ u1)
plot(g)

Non-linear measurement error model

Simulate non-linear model

m <- lvm(y1 + y2 + y3 ~ u, u ~ x)
transform(m,u2 ~ u) <- function(x) x^2
regression(m) <- z~u2+u

d <- sim(m,200,p=c("z"=-1, "z~u2"=-0.5), seed=1)

Stage 1:

m1 <- lvm(c(y1[0:s], y2[0:s], y3[0:s]) ~ 1*u, u ~ x)
latent(m1) <- ~ u
(e1 <- estimate(m1, d))
#>                     Estimate Std. Error  Z-value  P-value
#> Regressions:                                             
#>    u~x               1.06998    0.08208 13.03542   <1e-12
#> Intercepts:                                              
#>    u                -0.08871    0.08753 -1.01344   0.3108
#> Residual Variances:                                      
#>    y1                1.00054    0.07075 14.14214         
#>    u                 1.19873    0.15503  7.73233

Stage 2

pp <- function(mu,var,data,...) cbind(u=mu[,"u"], u2=mu[,"u"]^2+var["u","u"])
(e <- measurement.error(e1, z~1+x, data=d, predictfun=pp))
#>             Estimate Std.Err    2.5%   97.5%   P-value
#> (Intercept)  -1.1181 0.13555 -1.3838 -0.8524 1.602e-16
#> x            -0.0537 0.14361 -0.3352  0.2278 7.085e-01
#> u             1.0039 0.12651  0.7560  1.2519 2.093e-15
#> u2           -0.4718 0.05858 -0.5867 -0.3570 7.974e-16

f <- function(p) p[1]+p["u"]*u+p["u2"]*u^2
u <- seq(-1, 1, length.out=100)
plot(e, f, data=data.frame(u))

Simulation

Studying the small-sample properties of a mediation analysis

m <- lvm(y~x, c~1)
regression(m) <- y+x ~ z
eventTime(m) <- t~min(y=1, c=0)
transform(m,S~t+status) <- function(x) survival::Surv(x[,1],x[,2])

plot(m)

plot(m)

Simulate from model and estimate indirect effects

future::plan("multicore") # parallelization via future
## progressr::handlers(global=TRUE) # add progress-bar
onerun <- function(...) {
  d <- sim(m, 100)
  m0 <- lvm(S~x+z, x~z)
  e <- estimate(m0, d, estimator="glm")
  vec(summary(effects(e, S~z))$coef[,1:2])
}
val <- sim(onerun, 100)
summary(val, estimate=1:4, se=5:8, short=TRUE)
#> 100 replications                 Time: 3.275s
#> 
#>         Total.Estimate Direct.Estimate Indirect.Estimate S~x~z.Estimate
#> Mean           1.99533         1.00468           0.99066        0.99066
#> SD             0.18662         0.16553           0.17792        0.17792
#> SE             0.18258         0.17507           0.16503        0.16503
#> SE/SD          0.97835         1.05767           0.92755        0.92755
#>                                                                        
#> Min            1.49759         0.56922           0.50331        0.50331
#> 2.5%           1.66826         0.68806           0.67154        0.67154
#> 50%            1.96614         1.00175           0.97848        0.97848
#> 97.5%          2.39620         1.29212           1.35790        1.35790
#> Max            2.63461         1.42005           1.47636        1.47636
#>                                                                        
#> Missing        0.00000         0.00000           0.00000        0.00000

Add additional simulations and visualize results

val <- sim(val,500) ## Add 500 simulations
plot(val, estimate=c("Total.Estimate", "Indirect.Estimate"),
     true=c(2, 1), se=c("Total.Std.Err", "Indirect.Std.Err"),
     scatter.plot=TRUE)

Name		Name	Last commit message	Last commit date
Latest commit History 1,703 Commits
.github		.github
R		R
data		data
demo		demo
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.editorconfig		.editorconfig
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
INDEX		INDEX
LICENSE		LICENSE
Makefile		Makefile
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Variable Models (`lava`)

Installation

Citation

Examples

Influence functions

Structural Equation Model

Model assessment

Non-linear measurement error model

Simulation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Latent Variable Models (lava)

Installation

Citation

Examples

Influence functions

Structural Equation Model

Model assessment

Non-linear measurement error model

Simulation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Latent Variable Models (`lava`)

Packages