fuzzydid — fuzzydid • Rfuzzydid

Formula-first interface for fuzzy difference-in-differences estimators. Estimation is fully native in R. The object returned by fuzzydid() is an estimand summary rather than a predictive regression model: it stores local average and local quantile treatment-effect estimates, bootstrap uncertainty summaries, design cell counts, and metadata needed by extractor methods.

Usage

fuzzydid(
  data,
  formula,
  group,
  time,
  group_forward = NULL,
  did = FALSE,
  tc = FALSE,
  cic = FALSE,
  lqte = FALSE,
  newcateg = NULL,
  numerator = FALSE,
  partial = FALSE,
  nose = FALSE,
  cluster = NULL,
  breps = 50,
  eqtest = FALSE,
  modelx = NULL,
  sieves = FALSE,
  sieveorder = NULL,
  tagobs = FALSE,
  seed = NULL,
  treatment = NULL
)

Arguments

data: A data.frame.
formula: Formula of the form y ~ d + covariates.
group: Name of the group variable (backward group for multi-period).
time: Name of the time variable.
group_forward: Optional name of the forward group variable for multi-period designs.
did: Logical; compute the Wald-DID estimator.
tc: Logical; compute the Wald-TC estimator.
cic: Logical; compute the Wald-CIC estimator.
lqte: Logical; compute local quantile treatment effects.
newcateg: Optional numeric vector of upper bounds used to recategorize treatment values for TC/CIC.
numerator: Logical; return the reduced-form estimator numerators for DID/TC/CIC. As in Stata's fuzzydid, this is only defined for a single two-period, two-group design and errors otherwise.
partial: Logical; request TC partial-identification bounds.
nose: Logical; skip bootstrap standard errors and confidence intervals.
cluster: Optional name of cluster variable for one-way clustered bootstrap resampling.
breps: Integer number of bootstrap replications.
eqtest: Logical; compute equality tests across requested LATE estimands.
modelx: Optional native covariate-adjusted methods (ols, logit, probit). Two entries are required for binary treatments and three for ordered multi-valued treatments.
sieves: Logical; use sieve expansion for continuous covariates.
sieveorder: Optional sieve order control for sieves = TRUE. NULL (default) selects order by deterministic 5-fold CV. A scalar value applies to both outcome and treatment sieve bases. A length-2 vector is accepted for backward compatibility and interpreted as (outcome_order, treatment_order).
tagobs: Logical; return logical mask of observations used.
seed: Optional integer seed used for bootstrap resampling when nose = FALSE. If NULL (default), bootstrap draws use the current RNG state. Supply a value to make bootstrap standard errors, confidence intervals, and diagnostics reproducible.
treatment: Optional treatment variable name for multi-term formulas. If NULL, treatment is inferred from formula RHS when unambiguous.

Value

An object of class "fuzzydid". This is a list whose late component is a data frame of requested LATE-type estimators with columns estimator, estimate, std.error, conf.low, and conf.high. eqtest is either NULL or an analogous data frame of pairwise equality contrasts, and lqte is either NULL or a data frame with columns quantile, estimate, std.error, conf.low, and conf.high for local quantile treatment effects. Additional components include matrices, a named list of Stata-style result matrices; tagobs, an optional logical mask of retained observations; sample-size diagnostics n, n11, n10, n01, and n00; bootstrap diagnostics n_reps, n_misreps, and share_failures; and metadata such as backend, call, and options. The estimate tables report point estimates and, unless nose = TRUE, bootstrap standard errors and percentile confidence limits.

Details

fuzzydid() uses complete cases across the outcome, treatment, group, time, optional forward-group, covariate, and cluster variables. Missing NA and NaN values are dropped; non-finite numeric values such as Inf and -Inf are rejected. The outcome and treatment must be numeric vectors. Group and time identifiers must be numeric vectors; with one group variable, group values must be in {0, 1, NA}. Covariates may be numeric, factor, character, or logical vectors. Numeric covariates enter as continuous predictors; factor, character, and logical covariates enter as qualitative predictors expanded to indicator columns. When sieves = TRUE, continuous covariates are expanded to polynomial sieve terms.

Standard errors and confidence intervals are percentile bootstrap summaries. Use seed to make bootstrap draws reproducible. If tagobs = TRUE, the returned object includes a logical vector identifying the input rows retained after complete-case filtering.

Examples

make_example_cell <- function(g, t, ones, n_cell = 20L) {
  data.frame(
    g = rep.int(g, n_cell),
    t = rep.int(t, n_cell),
    d = c(rep.int(1L, ones), rep.int(0L, n_cell - ones))
  )
}

df <- rbind(
  make_example_cell(0L, 0L, 4L),
  make_example_cell(0L, 1L, 8L),
  make_example_cell(1L, 0L, 6L),
  make_example_cell(1L, 1L, 16L)
)
df$id <- seq_len(nrow(df))
df$y <- 1 + 0.5 * df$g + 0.4 * df$t + 2 * df$d + sin(df$id / 7)
example_data <- df

fit <- fuzzydid(
  data = example_data,
  formula = y ~ d,
  treatment = NULL,
  group = "g",
  time = "t",
  group_forward = NULL,
  did = TRUE,
  tc = TRUE,
  cic = TRUE,
  lqte = TRUE,
  newcateg = c(0, 1),
  cluster = NULL,
  modelx = NULL,
  sieveorder = NULL,
  seed = NULL,
  nose = TRUE
)

fit$late
#>   estimator estimate std.error conf.low conf.high
#> 1     W_DID 3.225827        NA       NA        NA
#> 2      W_TC 2.754817        NA       NA        NA
#> 3     W_CIC 3.117704        NA       NA        NA