Installation
Install the development version from GitHub with:
install.packages("remotes")
remotes::install_github("kmfrick/Rfuzzydid")Full documentation and worked examples are available at https://kmfrick.github.io/Rfuzzydid/.
Syntax
fuzzydid(
data,
formula,
group,
time,
group_forward = NULL,
did = FALSE,
tc = FALSE,
cic = FALSE,
lqte = FALSE,
newcateg = NULL,
numerator = FALSE,
partial = FALSE,
nose = FALSE,
cluster = NULL,
breps = 50,
eqtest = FALSE,
modelx = NULL,
sieves = FALSE,
sieveorder = NULL,
tagobs = FALSE,
seed = NULL,
treatment = NULL
)Description
fuzzydid() computes estimators of local average and quantile treatment effects in fuzzy DID designs, following de Chaisemartin and D’Haultfoeuille (2018a). It also computes their standard errors and confidence intervals.
Rfuzzydid is an R port of the Stata fuzzydid package. Its development aim is feature parity with the Stata package while exposing the estimators through a formula-first R interface.
Lifecycle and Prior Art
Rfuzzydid is a maturing R implementation of the estimators introduced by de Chaisemartin and D’Haultfoeuille (2018a) and implemented for Stata by de Chaisemartin, D’Haultfoeuille, and Guyonvarch (2018b). New development is focused on native R parity, input validation, and review-ready documentation rather than adding estimators beyond those references.
Arguments:
-
data: Data frame containing all variables. -
formula: A formula of the formy ~ dory ~ d + x1 + x2, whereyis the outcome variable and RHS terms include treatment plus optional covariates. -
treatment: Optional treatment variable name for multi-term formulas. If omitted, treatment is inferred when unambiguous (single RHS term, or a uniquedterm). -
group: Name of the group variable (backward group for multi-period designs). See Section 4.2 of de Chaisemartin et al. (2018b; doi:10.1177/1536867X19854019) for details on constructing this variable. -
time: Name of the time period variable. -
group_forward: Optional name of the forward group variable for multi-period designs.
A detailed introduction to the methodology is given in de Chaisemartin et al. (2018b; doi:10.1177/1536867X19854019).
y, d, group, time, and group_forward must be numeric vectors. Numeric covariates are treated as continuous; factor, character, and logical covariates are treated as qualitative predictors. NA and NaN values are removed by complete-case filtering over all analysis variables. Inf and -Inf are rejected. Use tagobs = TRUE to recover the retained-row mask.
Options
Estimators:
-
did: Logical; computes the Wald-DID estimator. -
tc: Logical; computes the Wald-TC estimator. -
cic: Logical; computes the Wald-CIC estimator. Only available when no covariates are included. -
lqte: Logical; computes estimators of the LQTE for quantiles of order 5%, 10%, …, 95%. Only available when D, G, and T are binary, and no covariates are included.
At least one of did, tc, cic, or lqte must be specified. If several are specified, all requested estimators are computed.
Treatment categorization:
-
newcateg: Numeric vector of upper bounds to group treatment values together for Wald-TC and Wald-CIC. Useful when treatment takes many values. See Section 3.3 of de Chaisemartin et al. (2018b; doi:10.1177/1536867X19854019).
Numerators and bounds:
-
numerator: Logical; return only the numerators of Wald-DID, Wald-TC, and Wald-CIC estimators. Useful for placebo tests (see Section 3.3.3 of the supplement of de Chaisemartin and D’Haultfoeuille 2018a). -
partial: Logical; compute bounds on local average treatment effects in the absence of a “stable” control group. Only available without covariates.
Inference:
-
nose: Logical; compute only point estimates, not standard errors. -
cluster: Name of cluster variable for block bootstrap. Only one clustering variable is allowed. -
breps: Integer number of bootstrap replications. Default is 50. -
eqtest: Logical; perform equality tests between estimands when at least two ofdid,tc,cicare specified.
Covariates:
-
modelx: Character vector specifying parametric methods for estimating conditional expectations in Wald-DID and Wald-TC with covariates. Two entries required for binary treatments; three for ordered multi-valued treatments. Values must be"ols","logit", or"probit". -
sieves: Logical; use nonparametric sieve estimation for conditional expectations. -
sieveorder: Optional sieve order control whensieves = TRUE. DefaultNULLselects order by deterministic 5-fold CV. A scalar applies to both outcome and treatment sieve bases. A length-2 vector(outcome, treatment)is supported for backward compatibility. Values must be ≥ 2 and satisfy the basis capmin(4800, floor(n/5)).
When covariates are included and neither modelx nor sieves is specified, all conditional expectations are estimated by OLS by default.
Other:
-
tagobs: Logical; return a logical mask of observations used byfuzzydid(). -
seed: Optional integer seed for bootstrap resampling whennose = FALSE. The defaultNULLuses the current R RNG state; supply a seed for reproducible standard errors, confidence intervals, and bootstrap diagnostics.
Extractors
fuzzydid objects support print(), summary(), coef(), confint(), nobs(), formula(), vcov(), plot(), generics::tidy(), and generics::glance(). They do not implement predict(), fitted(), or residuals() because the object summarizes causal estimands rather than observation-level fitted outcomes.
Returned Values
An object of class "fuzzydid" containing:
Data frames:
-
late: LATE estimates with columns:estimator,estimate,std.error,conf.low,conf.high -
eqtest: Equality test results (ifeqtest = TRUE) -
lqte: LQTE estimates at quantiles 0.05, 0.10, …, 0.95 (iflqte = TRUE)
Matrices (Stata-parity):
-
matrices$b_LATE: k × 1 matrix of requested estimators -
matrices$se_LATE: k × 1 matrix of bootstrap standard errors -
matrices$ci_LATE: k × 2 matrix of 95% percentile bootstrap confidence intervals -
matrices$b_LQTE: 19 × 1 matrix of LQTE estimates at quantiles 0.05–0.95 -
matrices$se_LQTE: 19 × 1 matrix of LQTE bootstrap standard errors -
matrices$ci_LQTE: 19 × 2 matrix of LQTE 95% confidence intervals
Counts:
-
n: Number of observations used -
n11,n10,n01,n00: Cell sizes for (G,T) combinations -
n_reps: Number of bootstrap replications requested -
n_misreps: Number of failed/degenerate bootstrap replications -
share_failures: Proportion of failed replications
Examples
Generate the dataset
# Generate simulated data (saved to CSV for R/Stata parity verification)
set.seed(50321)
n_cell <- 80
df <- rbind(
data.frame(y = rnorm(n_cell, 1 + 1.8 * rbinom(n_cell, 1, 0.20)), g = 0, t = 0, d = rbinom(n_cell, 1, 0.20)),
data.frame(y = rnorm(n_cell, 1 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.35)), g = 0, t = 1, d = rbinom(n_cell, 1, 0.35)),
data.frame(y = rnorm(n_cell, 1 + 0.7 + 1.8 * rbinom(n_cell, 1, 0.30)), g = 1, t = 0, d = rbinom(n_cell, 1, 0.30)),
data.frame(y = rnorm(n_cell, 1 + 0.7 + 0.5 + 1.8 * rbinom(n_cell, 1, 0.70)), g = 1, t = 1, d = rbinom(n_cell, 1, 0.70))
)
# Save for Stata comparison
write.csv(df, "fuzzydid_example.csv", row.names = FALSE)Stata
Note: The Stata command is shown for parity/reference only. Rfuzzydid does not bundle the Stata fuzzydid sources, so Stata users need that command installed separately in their Stata environment.
Point estimates from R and Stata will be identical for the covered parity fixtures, but bootstrap confidence intervals can differ due to RNG differences between the two platforms. Results remain comparable across implementations.
References
de Chaisemartin, C. and D’Haultfoeuille, X. 2018a. Fuzzy Differences-in-Differences. Review of Economic Studies, 85(2): 999-1028. doi:10.1093/restud/rdx049.
de Chaisemartin, C., D’Haultfoeuille, X., and Guyonvarch, Y. 2018b. Fuzzy Differences-in-Differences with Stata. Stata Journal. doi:10.1177/1536867X19854019.