| Title: | Proxy Indicator Diagnostic Tool for Analytical and Policy Use |
|---|---|
| Description: | Provides statistical diagnostics to evaluate whether proxy indicators reliably represent an unobservable target construct. The main function 'senser()' assesses proxies across multiple dimensions including monotonicity, information content, stability, distributional alignment, and potential bias risk. It prints a concise, interpretable summary suitable for analytical and policy-oriented assessment, without claiming causal inference. |
| Authors: | Joko Ade Nursiyono [aut, cre] (ORCID: <https://orcid.org/0009-0008-1179-6776>) |
| Maintainer: | Joko Ade Nursiyono <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-27 09:25:12 UTC |
| Source: | https://github.com/jokoadenur/senser |
senser() is a statistical diagnostic function designed to evaluate
whether one or more proxy indicators are suitable representations of an
underlying construct that cannot be directly observed or measured.
The function assesses each proxy based on multiple statistical dimensions: monotonicity, information content, stability, distributional alignment, bias risk, and dynamic range (sensitivity). The output is a concise, interpretable summary printed directly to the console.
This tool is intended for analytical diagnostics and policy-oriented indicator assessment. It does not claim causal inference.
senser( data, proxy, target, lang = c("english", "indonesia"), stagnation_cut = 0.01, cv_cut = 0.02, ceiling_cut = 0.95 )senser( data, proxy, target, lang = c("english", "indonesia"), stagnation_cut = 0.01, cv_cut = 0.02, ceiling_cut = 0.95 )
data |
A |
proxy |
A character vector specifying one or more proxy indicator
variable names contained in |
target |
A character string specifying the target construct variable
name contained in |
lang |
A character string specifying the language of interpretation.
Options are |
stagnation_cut |
Numeric scalar. Threshold used to detect stagnation (very small average absolute change). If the average absolute change of the proxy variable is below this value, a structural penalty is applied. Default is 0.01. |
cv_cut |
Numeric scalar. Threshold for the coefficient of variation (CV). If the coefficient of variation of the proxy variable is below this value, a variability penalty is applied. Default is 0.02. |
ceiling_cut |
Numeric scalar. Threshold used to detect ceiling effects. If the ceiling ratio exceeds this value, a ceiling penalty is applied. Default is 0.95. |
The stagnation threshold is used to penalize proxy variables that exhibit minimal temporal or cross-sectional variation. Extremely small changes may indicate lack of sensitivity or measurement rigidity.
The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean. Very low CV values indicate insufficient dispersion, which may reduce the informational usefulness of the proxy variable.
A ceiling effect occurs when observations cluster near the upper bound of the distribution, limiting discriminatory power. If the ceiling ratio exceeds this threshold, the proxy's score is penalized to reflect reduced measurement sensitivity.
The diagnostic score for each proxy is computed using six normalized components:
Monotonicity: Spearman rank correlation between proxy and target.
Information content: Proportion of variance explained (R-squared).
Stability: Sensitivity of regression coefficients across subsamples.
Distributional alignment: Similarity of standardized distributions using the Kolmogorov–Smirnov statistic.
Bias risk: Penalization for strong nonlinearity indicating potential proxy distortion.
Dynamic range / Sensitivity: Penalization if the proxy has very small changes relative to the target (detects ceiling effect or nearly flat proxies).
The final score is calculated as the median of all six components, providing a robust measure less sensitive to extreme values.
The score ranges from 0 to 1 and is classified into three categories:
Suitable proxy: score >= 0.70
Conditionally suitable: 0.40 <= score < 0.70
Not suitable proxy: score < 0.40
Interpretation is automatically generated in the selected language.
The function prints a structured diagnostic summary to the console. No object is returned invisibly. This design prioritizes interpretability and ease of use for applied users and policymakers.
Joko Nursiyono (concept)
senseR development team
## Example with multiple proxies set.seed(123) df <- data.frame( gdp = rnorm(100, 10, 2), ntl = rnorm(100, 50, 10), road_density = rnorm(100, 3, 0.5), mobile_signal = rnorm(100, 70, 15) ) senser( data = df, proxy = c("ntl", "road_density", "mobile_signal"), target = "gdp", lang = "english" )## Example with multiple proxies set.seed(123) df <- data.frame( gdp = rnorm(100, 10, 2), ntl = rnorm(100, 50, 10), road_density = rnorm(100, 3, 0.5), mobile_signal = rnorm(100, 70, 15) ) senser( data = df, proxy = c("ntl", "road_density", "mobile_signal"), target = "gdp", lang = "english" )