Package 'bin2norm'

Title: Hierarchical Probit Estimation for Dichotomized Data
Description: Provides likelihood-based and hierarchical estimation methods for thresholded (binomial-probit) data. Supports fixed-mean and random-mean models with maximum likelihood estimation (MLE), generalized linear mixed model (GLMM), and Bayesian Markov chain Monte Carlo (MCMC) implementations. For methodological background, see Albert and Chib (1993) <doi:10.1080/01621459.1993.10476321> and McCulloch (1994) <doi:10.2307/2297959>.
Authors: Zhaoze Liu [aut], Longwen Shang [aut], Mary Lesperance [aut], Shuqing Zhou [aut], Xuekui Zhang [aut, cre, fnd]
Maintainer: Xuekui Zhang <[email protected]>
License: MIT + file LICENSE
Version: 0.1.1
Built: 2026-05-15 09:14:27 UTC
Source: https://github.com/cran/bin2norm

Help Index


bin2norm: A user-friendly interface to estimate normal distribution parameters from dichotomized data

Description

This function handles two data-collection settings for estimating normal parameters from threshold-based (dichotomized) data:

  • Single-threshold per study: Each of II studies reports one threshold cic_i, a sample size nin_i, and the observed proportion piobsp_i^{obs} of samples above that threshold. We assume one normal distribution N(μ,σ2)\mathcal{N}(\mu,\sigma^2) across all studies. Methods include "MLE" and "probit".

  • Multiple-thresholds per study: Each study ii reports KiK_i thresholds {cij}\{c_{ij}\}, each with an observed proportion pijobsp_{ij}^{obs}. We assume the study-specific mean μiN(μ0,τ2)\mu_i \sim \mathcal{N}(\mu_0,\tau^2) and within-study variance σ2\sigma^2. Because each study has multiple cutpoints, one can estimate μ0,σ,τ\mu_0, \sigma, \tau. Methods include "MLE_integration", "GLMM", or "Bayesian" (MCMC).

Usage

bin2norm(
  scenario = c("single_threshold", "multiple_thresholds"),
  method = NULL,
  n_i = NULL,
  c_i = NULL,
  p_i_obs = NULL,
  data_list = NULL,
  ...
)

Arguments

scenario

character string, either "single_threshold" or "multiple_thresholds".

method

character string indicating which estimation method to use.

  • For scenario = "single_threshold", valid method are "MLE" or "probit".

  • For scenario = "multiple_thresholds", valid method are "MLE_integration", "GLMM", or "Bayesian".

n_i, c_i, p_i_obs

used only if scenario="single_threshold". Numeric vectors of the same length. nin_i is study sample size, cic_i is threshold, pi,obsp_{i,\mathrm{obs}} is observed proportion above threshold.

data_list

used only if scenario="multiple_thresholds", a list with:

  • n_i: numeric vector (length I) of sample sizes

  • c_ij: list of length I, where c_ij[[i]] is a numeric vector of thresholds in study i

  • p_ij_obs: list of length I, where p_ij_obs[[i]] is a numeric vector of observed proportions above each threshold

...

additional arguments passed to lower-level functions (e.g. use_wols_init, gh_points, iter, chains, etc.).

Value

A list of estimated parameters, depending on the data-collection setting (scenario) and the chosen method. Typically includes:

  • mu or mu0

  • sigma

  • tau (only for multiple-threshold methods)

Examples

# Single-threshold example
n_i <- c(100, 120, 80)
c_i <- c(1.2, 1.0, 1.5)
p_i_obs <- c(0.30, 0.25, 0.40)
bin2norm(scenario="single_threshold", method="MLE", n_i=n_i, c_i=c_i, p_i_obs=p_i_obs)

# Multiple-thresholds example
data_list <- list(
  n_i = c(100, 120),
  c_ij = list(c(1.0,1.2), c(0.8,1.5,2.0)),
  p_ij_obs = list(c(0.20,0.30), c(0.15,0.40,0.55))
)

# MLE with numeric integration
bin2norm(scenario="multiple_thresholds", method="MLE_integration",
         data_list=data_list, gh_points=5)

# GLMM approximation
# library(lme4)
bin2norm(scenario="multiple_thresholds", method="GLMM",
         data_list=data_list, use_lme4=TRUE)

# Bayesian MCMC approach
# library(rstan)
bin2norm(scenario="multiple_thresholds", method="Bayesian",
         data_list=data_list, iter=1000, chains=2)

Get initial values from data

Description

Get initial values from data

Usage

estimate_initial_values_from_data(data_list)

Arguments

data_list

your inputs

Value

a named list of initial values


GLMM (Multiple Thresholds per Study, Probit Link, Random Intercepts)

Description

Creates a single data frame stacking all thresholds from all studies, then calls lme4::glmer(..., family=binomial(link='probit')) to fit a random-intercept model:

kijBinomial(ni,Φ(αi+βcij)),k_{ij} \sim \mathrm{Binomial}\bigl(n_i, \Phi(\alpha_i + \beta\, c_{ij})\bigr),

with αiN(0,σα2)\alpha_i \sim \mathcal{N}(0, \sigma_\alpha^2).

Interpreting results: σ=1/β\sigma = 1/|\,\beta\,|, τ2=σ2×σα2\tau^2 = \sigma^2 \times \sigma_\alpha^2, μ0=(Intercept)×σ\mu_0 = (\mathrm{Intercept}) \times \sigma (if not forced to 0).

Usage

estimate_multiThresh_GLMM(data_list, use_lme4 = TRUE)

Arguments

data_list

same structure: n_i, c_ij, p_ij_obs

use_lme4

logical; if TRUE, calls lme4::glmer with a probit link.

Value

A list with mu0, sigma, tau, method="GLMM_probit".


Bayesian MCMC (Multiple Thresholds per Study) using rstan

Description

Builds an inline Stan model for multiple thresholds per study. The user must have the rstan package installed. We place random effects μi=μ0+τmu_raw[i]\mu_i = \mu_0 + \tau * mu\_raw[i] and use a binomial likelihood for each threshold. By default, uses simple weakly informative priors.

Usage

estimate_multiThresh_MCMC(data_list, iter = 2000, chains = 2)

Arguments

data_list

same structure as above: n_i, c_ij, p_ij_obs

iter

number of total iterations for each chain (default 2000)

chains

number of MCMC chains (default 2)

Value

a list containing stan_fit (the full Stan fit object), plus mu0_est, sigma_est, tau_est as posterior means, and method="Bayesian_MCMC".


MLE with Numeric Integration (Multiple Thresholds per Study)

Description

Each study ii has thresholds {cij}\{c_{ij}\}, each with an observed proportion pijobsp_{ij}^{obs}. We assume μiN(μ0,τ2)\mu_i \sim \mathcal{N}(\mu_0,\tau^2) and XijN(μi,σ2)X_{ij} \sim \mathcal{N}(\mu_i,\sigma^2). The log-likelihood integrates out μi\mu_i via Gauss-Hermite quadrature.

Usage

estimate_multiThresh_MLE(data_list, gh_points = 20)

Arguments

data_list

A list with:

  • n_i: numeric vector (length I)

  • c_ij: list of length I

  • p_ij_obs: list of length I

gh_points

integer; number of Gauss-Hermite points (default 12).

Value

A list with mu0, sigma, tau, method="MLE_integration".


MLE (Single Threshold per Study)

Description

Treats the count of "above threshold" in study ii as binomial with probability 1Φ((ciμ)/σ)1 - \Phi((c_i - \mu)/\sigma). This uses numerical optimization (optim) to maximize the binomial likelihood. Optionally uses Weighted OLS estimates as starting values to improve convergence.

Usage

estimate_singleThresh_MLE(n_i, c_i, p_i_obs, use_wols_init = TRUE)

Arguments

n_i

numeric vector of sample sizes

c_i

numeric vector of thresholds

p_i_obs

numeric vector of observed proportions above threshold

use_wols_init

logical; if TRUE, uses Weighted OLS estimates (estimate_singleThresh_WOLS) as initial values in optim.

Value

A list with mu, sigma, method="MLE".


GLM probit (Single Threshold per Study)

Description

For each group ii, we assume the data follows:

Pr(Yi=1)=Φ(μciσ)\Pr(Y_i = 1) = \Phi\left( \frac{\mu - c_i}{\sigma} \right)

where cic_i is a known threshold, and Φ\Phi is the standard normal CDF (the probit link). The function reconstructs individual binary outcomes based on observed probabilities, and estimates the parameters using generalized linear modeling with a probit link.

Usage

estimate_singleThresh_probit(n_i, c_i, p_i_obs)

Arguments

n_i

numeric vector

c_i

numeric vector

p_i_obs

numeric vector

Value

A list with mu, sigma, method="probit".


Weighted OLS (Initial value in Single Threshold per Study MLE)

Description

Implements the formula ci=μ+σΦ1(1piobs)c_i = \mu + \sigma * \Phi^{-1}(1 - p_i^{obs}) in a weighted least-squares sense, with weights = nin_i.

Usage

estimate_singleThresh_WOLS(n_i, c_i, p_i_obs)

Arguments

n_i

numeric vector

c_i

numeric vector

p_i_obs

numeric vector

Value

A list with mu, sigma.


Minimal Gauss-Hermite Quadrature

Description

Returns (nodes, weights) for approximating f(x)ex2dx\int f(x) e^{-x^2} dx, ignoring any normalizing constant. This is a simple demonstration; for serious applications, more robust libraries or expansions might be used.

Usage

gaussHermite(n)

Arguments

n

integer number of quadrature points

Value

list with nodes and weights