Formula-first interface for fuzzy difference-in-differences estimators.
Estimation is fully native in R. The object returned by fuzzydid()
is an estimand summary rather than a predictive regression model: it stores
local average and local quantile treatment-effect estimates, bootstrap
uncertainty summaries, design cell counts, and metadata needed by extractor
methods.
Usage
fuzzydid(
data,
formula,
group,
time,
group_forward = NULL,
did = FALSE,
tc = FALSE,
cic = FALSE,
lqte = FALSE,
newcateg = NULL,
numerator = FALSE,
partial = FALSE,
nose = FALSE,
cluster = NULL,
breps = 50,
eqtest = FALSE,
modelx = NULL,
sieves = FALSE,
sieveorder = NULL,
tagobs = FALSE,
seed = NULL,
treatment = NULL
)Arguments
- data
A
data.frame.- formula
Formula of the form
y ~ d + covariates.- group
Name of the group variable (backward group for multi-period).
- time
Name of the time variable.
- group_forward
Optional name of the forward group variable for multi-period designs.
- did
Logical; compute the Wald-DID estimator.
- tc
Logical; compute the Wald-TC estimator.
- cic
Logical; compute the Wald-CIC estimator.
- lqte
Logical; compute local quantile treatment effects.
- newcateg
Optional numeric vector of upper bounds used to recategorize treatment values for TC/CIC.
- numerator
Logical; return estimator numerators for DID/TC/CIC.
- partial
Logical; request TC partial-identification bounds.
- nose
Logical; skip bootstrap standard errors and confidence intervals.
- cluster
Optional name of cluster variable for one-way clustered bootstrap resampling.
- breps
Integer number of bootstrap replications.
- eqtest
Logical; compute equality tests across requested LATE estimands.
- modelx
Optional native covariate-adjusted methods (
ols,logit,probit). Two entries are required for binary treatments and three for ordered multi-valued treatments.- sieves
Logical; use sieve expansion for continuous covariates.
- sieveorder
Optional sieve order control for
sieves = TRUE.NULL(default) selects order by deterministic 5-fold CV. A scalar value applies to both outcome and treatment sieve bases. A length-2 vector is accepted for backward compatibility and interpreted as(outcome_order, treatment_order).- tagobs
Logical; return logical mask of observations used.
- seed
Optional integer seed used for bootstrap resampling when
nose = FALSE. IfNULL(default), bootstrap draws use the current RNG state. Supply a value to make bootstrap standard errors, confidence intervals, and diagnostics reproducible.- treatment
Optional treatment variable name for multi-term formulas. If
NULL, treatment is inferred from formula RHS when unambiguous.
Value
An object of class "fuzzydid". This is a list whose
late component is a data frame of requested LATE-type estimators
with columns estimator, estimate, std.error,
conf.low, and conf.high. eqtest is either
NULL or an analogous data frame of pairwise equality contrasts,
and lqte is either NULL or a data frame with columns
quantile, estimate, std.error, conf.low, and
conf.high for local quantile treatment effects. Additional
components include matrices, a named list of Stata-style result
matrices; tagobs, an optional logical mask of retained
observations; sample-size diagnostics n, n11, n10,
n01, and n00; bootstrap diagnostics n_reps,
n_misreps, and share_failures; and metadata such as
backend, call, and options. The estimate tables
report point estimates and, unless nose = TRUE, bootstrap
standard errors and percentile confidence limits.
Details
fuzzydid() uses complete cases across the outcome, treatment, group,
time, optional forward-group, covariate, and cluster variables. Missing
NA and NaN values are dropped; non-finite numeric values such
as Inf and -Inf are rejected. The outcome and treatment must
be numeric vectors. Group and time identifiers must be numeric vectors; with
one group variable, group values must be in {0, 1, NA}. Covariates may
be numeric, factor, character, or logical vectors. Numeric covariates enter
as continuous predictors; factor, character, and logical covariates enter as
qualitative predictors expanded to indicator columns. When sieves =
TRUE, continuous covariates are expanded to polynomial sieve terms.
Standard errors and confidence intervals are percentile bootstrap summaries.
Use seed to make bootstrap draws reproducible. If tagobs =
TRUE, the returned object includes a logical vector identifying the input
rows retained after complete-case filtering.
Examples
make_example_cell <- function(g, t, ones, n_cell = 20L) {
data.frame(
g = rep.int(g, n_cell),
t = rep.int(t, n_cell),
d = c(rep.int(1L, ones), rep.int(0L, n_cell - ones))
)
}
df <- rbind(
make_example_cell(0L, 0L, 4L),
make_example_cell(0L, 1L, 8L),
make_example_cell(1L, 0L, 6L),
make_example_cell(1L, 1L, 16L)
)
df$id <- seq_len(nrow(df))
df$y <- 1 + 0.5 * df$g + 0.4 * df$t + 2 * df$d + sin(df$id / 7)
example_data <- df
fit <- fuzzydid(
data = example_data,
formula = y ~ d,
treatment = NULL,
group = "g",
time = "t",
group_forward = NULL,
did = TRUE,
tc = TRUE,
cic = TRUE,
lqte = TRUE,
newcateg = c(0, 1),
cluster = NULL,
modelx = NULL,
sieveorder = NULL,
seed = NULL,
nose = TRUE
)
fit$late
#> estimator estimate std.error conf.low conf.high
#> 1 W_DID 3.225827 NA NA NA
#> 2 W_TC 2.754817 NA NA NA
#> 3 W_CIC 3.117704 NA NA NA