| Title: | Estimating Bivariate Dependency from Marginal Data |
|---|---|
| Description: | Provides statistical methods for estimating bivariate dependency (correlation) from marginal summary statistics across multiple studies. The package supports three modules of bivariate joint distribution estimated from marginal summary data: (1) two binary, (2) two continuous, (3) one binary and one continuous These methods enable privacy-preserving joint estimation when individual-level data are unavailable. The approaches are detailed in Shang, Tsao, and Zhang (2025a) <doi:10.48550/arXiv.2505.03995> and Shang, Tsao, and Zhang (2025b) <doi:10.48550/arXiv.2508.02057>. |
| Authors: | Longwen Shang [aut], Min Tsao [aut], Xuekui Zhang [aut, cre, fnd] |
| Maintainer: | Xuekui Zhang <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 3.0.1 |
| Built: | 2026-05-24 08:07:12 UTC |
| Source: | https://github.com/cran/ebdm |
Simulated dataset for testing the cor_bin() function.
data(bin_example)data(bin_example)
A data frame with 3 columns:
Sample size per study
Count of first binary variable
Count of second binary variable
Simulated dataset for testing the cor_cont() function.
data(cont_example)data(cont_example)
A data frame with 5 columns:
Sample size for each study.
Sample mean of variable X.
Sample mean of variable Y.
Sample variance of variable X.
Sample variance of variable Y.
Performs maximum likelihood estimation (MLE) of the joint distribution of two binary variables using only marginal summary data from multiple studies.
cor_bin(ni, xi, yi, ci_method = c("none", "normal", "lr"))cor_bin(ni, xi, yi, ci_method = c("none", "normal", "lr"))
ni |
Numeric vector. Sample sizes for each dataset. |
xi |
Numeric vector. Count of observations where variable 1 equals 1. |
yi |
Numeric vector. Count of observations where variable 2 equals 1. |
ci_method |
Character string. Method for confidence interval computation.
Options are |
A named list with point estimates, variance, standard error, and confidence interval (if requested).
Estimated marginal probability for variable 1.
Estimated marginal probability for variable 2.
Estimated joint probability.
Estimated variance of p11_hat.
Standard error of p11_hat.
Confidence interval for p11_hat, if requested.
data(bin_example) cor_bin(bin_example$ni, bin_example$xi, bin_example$yi, ci_method = "lr")data(bin_example) cor_bin(bin_example$ni, bin_example$xi, bin_example$yi, ci_method = "lr")
Estimate the correlation coefficient (and marginal means / SDs)
of two normally-distributed variables using summary-level data from
multiple independent studies.
cor_cont( n, xbar, ybar, s2x = NULL, s2y = NULL, method = c("proposed", "weighted"), ci_method = c("none", "normal", "lr") )cor_cont( n, xbar, ybar, s2x = NULL, s2y = NULL, method = c("proposed", "weighted"), ci_method = c("none", "normal", "lr") )
n |
Numeric vector. Sample size of each study. |
xbar, ybar
|
Numeric vectors. Sample means of the two variables. |
s2x, s2y
|
Numeric vectors. Sample variances; required for |
method |
Character. |
ci_method |
Confidence interval type: |
A list with elements
mu_x, mu_y : estimated marginal means
sigma_x, sigma_y : estimated SDs
rho : estimated correlation
se : standard error of rho (proposed only)
ci : confidence interval for rho (if requested)
data(cont_example) # Example with full summaries cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, cont_example$Variance_X, cont_example$Variance_Y, method = "proposed", ci_method = "lr") # Only means + n, weighted mean method cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, method = "weighted")data(cont_example) # Example with full summaries cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, cont_example$Variance_X, cont_example$Variance_Y, method = "proposed", ci_method = "lr") # Only means + n, weighted mean method cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, method = "weighted")
Estimates group-specific means and standard deviations in a two-component
normal mixture model based on aggregate data across multiple studies. The continuous variable
is assumed to follow a Gaussian mixture conditional on a binary group indicator ,
with each study reporting only summary-level statistics.
est_mixture(ni, xbar, mi, s2 = NULL, method = c("gmm", "naive"))est_mixture(ni, xbar, mi, s2 = NULL, method = c("gmm", "naive"))
ni |
Integer vector of sample sizes per study. |
xbar |
Numeric vector of sample means per study. |
mi |
Integer vector of group 1 counts per study. |
s2 |
Numeric vector of sample variances per study. Required if |
method |
Estimation method to use. One of |
#' Two estimation methods are available:
"naive": Likelihood-based estimator using only sample means.
"gmm": Generalized method of moments (GMM) estimator using sample means and variances.
A named list containing:
mu1_hat, mu0_hatEstimated means of the two groups.
sigma1_hat, sigma0_hatEstimated standard deviations.
seStandard errors of the parameter estimates (NA if method = "naive").
ciList of 95% confidence intervals for each parameter (NULL if method = "naive").
methodA character string indicating the method used.
# Load example dataset included in the package data(mixture_example) # Estimate using GMM (recommended) with full summary statistics est_mixture( ni = mixture_example$ni, xbar = mixture_example$xbar, s2 = mixture_example$s2, mi = mixture_example$mi, method = "gmm" ) # Estimate using naive likelihood method (only means used) est_mixture( ni = mixture_example$ni, xbar = mixture_example$xbar, mi = mixture_example$mi, method = "naive" )# Load example dataset included in the package data(mixture_example) # Estimate using GMM (recommended) with full summary statistics est_mixture( ni = mixture_example$ni, xbar = mixture_example$xbar, s2 = mixture_example$s2, mi = mixture_example$mi, method = "gmm" ) # Estimate using naive likelihood method (only means used) est_mixture( ni = mixture_example$ni, xbar = mixture_example$xbar, mi = mixture_example$mi, method = "naive" )
Simulated dataset for testing the est_mixture() function.
Each row corresponds to one study providing summary-level data
from a two-component normal mixture.
data(mixture_example)data(mixture_example)
A data frame with 4 columns:
Sample size for each study.
Count of group 1 individuals in each study.
Sample mean of the outcome variable.
Sample variance of the outcome variable.