Screens for influential producers by asking: "if this producer is removed from the sample, how much does the group classification image change?" Producers whose removal moves the group pattern disproportionately are flagged for inspection. This is an influence / outlier screening tool, not a reliability statistic.
Usage
rel_loo(
signal_matrix,
flag_threshold = 2.5,
flag_method = c("mad", "sd"),
flag_threshold_sd = NULL,
mask = NULL
)Arguments
- signal_matrix
Pixels x participants, base-subtracted.
- flag_threshold
Numeric multiplier on
sd(ormad) below the centre. Default 2.5.- flag_method
One of
"mad"(default) or"sd". SD/mean is retained for backwards compatibility and emits a one-time per-session deprecation message.- flag_threshold_sd
Deprecated alias for
flag_threshold.- mask
Optional logical vector of length
nrow(signal_matrix)(column-major) restricting the LOO correlation to a region. Build withmake_face_mask()(parametric oval and sub-regions) orread_face_mask()(PNG/JPEG mask).
Details
For each producer i, compute the Pearson correlation between
the full-sample group CI and the group CI recomputed without
that producer:
Because the full-sample mean and the leave-one-out mean share
(N - 1) / N of their data, r_loo values are near 1 by
construction even on noisy data, typically [0.95, 0.999] at
N = 30. The absolute level of r_loo is not informative.
What is informative is the relative ordering: producers whose
r_loo sits clearly below the pack are candidates for
inspection. The function therefore returns a z-scored version of
r_loo in $z_scores using the same centre / spread estimators
as the flagging rule. $z_scores is the recommended quantity to
plot or report.
Two flagging rules:
"mad"(default): flag producers withr_loo < median(r) - flag_threshold * mad(r). Robust to the few atypical producers RC datasets often contain."sd": flag producers withr_loo < mean(r) - flag_threshold * sd(r). Sensitive to the very outliers it is trying to detect; emits a one-time per-session deprecation message and is scheduled for removal in v0.2.0.
Default flag_threshold = 2.5 is calibrated so a 30-producer
dataset flags roughly 0.3 producers by chance under "sd",
rather than the ~1.5 a 2-SD rule would produce. Under "mad"
this is roughly comparable thanks to MAD's 1.4826 consistency
factor.
What this function is, and is not
rel_loo() is an influence-screening diagnostic. It answers
"which producers disproportionately shape the group CI?", not
"how reliable is the group CI?". For reliability use
rel_split_half() or rel_icc(). A flag does not mean the
producer is "bad"; it means the producer's individual CI sits
far enough from the group pattern that the data deserve a second
look.
Reading the result
$z_scores, named numeric vector, per-producer standardised influence. The recommended quantity to plot or threshold.$correlations, named numeric vector, raw per-producerr_loovalues.$mean_r,$sd_r,$median_r,$mad_r, centre / spread under each rule.$threshold, raw cutoff value onr_loo.$flagged, character vector of producer ids below threshold.$summary_df, one row per producer withcorrelation,z_score, andflag, sorted byz_score.$flag_method,$flag_threshold.
Common mistakes
Reading
r_looas a reliability. Anr_looof .98 does not mean the CI is 98% reliable; it means a single producer's removal changed the group mean by 2%.Treating
$flaggedas "drop these producers". Investigate first; cross-check with the rcisignal input-siderun_diagnostics()to rule out response-coding errors.Lowering
flag_thresholdbelow 2 to flag more producers; that trades real signal for noise. Useflag_method = "mad"instead if the SD rule is dominated by outliers.
Reliability metrics expect raw masks
Operates on the raw mask; results may be distorted if
signal_matrix was extracted from rendered (scaled) PNGs.
See also
rel_loo_z() for a tidy z-score accessor;
rel_split_half(), rel_icc() for reliability metrics
proper; run_reliability().
Examples
if (FALSE) { # \dontrun{
# In a real pipeline, signal_matrix comes from earlier steps:
# signal_matrix <- ci_from_responses_briefrc(...)$signal_matrix
# For a self-contained demo we fabricate a small synthetic input:
n_pix <- 32L * 32L
n_prod <- 20L
set.seed(1)
signal_matrix <- matrix(rnorm(n_pix * n_prod), n_pix, n_prod)
r <- rel_loo(signal_matrix)
print(r)
} # }