Abstract:

Machine learning pipelines commonly flatten relational data into single-table representations, disregarding the structural information in the database schema and query. This discontinuity has significant consequences: Shapley value-based feature attributions — among the most widely used post-hoc explanations in machine learning — can differ substantially depending on whether relational structure is respected, and may be defined over infeasible combinations when such constraints are disregarded. We propose RelShap, a framework that incorporates relational constraints into Shapley value computation for both background data and coalition evaluation. Beyond altering explanation outcomes, relational constraints induce equivalence classes over feature coalitions, eliminating redundant evaluations. RelShap is estimator-agnostic and can substantially reduce computational time depending on the coalition estimator and functional dependency structure. We further provide a theoretical characterization of the expected runtime reduction as a function of both factors. Extensive experiments across multiple datasets, models, and estimators demonstrate that RelShap produces statistically significant changes in explanations and achieves substantial runtime reduction, with strong empirical validation of the theoretical speedup factors.


Citation

Lee, Seungeun, Fonseca, Joao, and Stoyanovich, Julia. 2026. “RelShap: Relationally Consistent Shapley Explanations.” UNDER SUBMISSION.

@article{lee2026relshap,
  author = {Lee, Seungeun and Fonseca, Joao and Stoyanovich, Julia},
  title = {RelShap: Relationally Consistent Shapley Explanations},
  journal = {UNDER SUBMISSION},
  year = {2026},
  url = {},
  abstract = {Machine learning pipelines commonly flatten relational data into single-table representations, disregarding the structural information in the database schema and query. This discontinuity has significant consequences: Shapley value-based feature attributions — among the most widely used post-hoc explanations in machine learning — can differ substantially depending on whether relational structure is respected, and may be defined over infeasible combinations when such constraints are disregarded. We propose RelShap, a framework that incorporates relational constraints into Shapley value computation for both background data and coalition evaluation. Beyond altering explanation outcomes, relational constraints induce equivalence classes over feature coalitions, eliminating redundant evaluations. RelShap is estimator-agnostic and can substantially reduce computational time depending on the coalition estimator and functional dependency structure. We further provide a theoretical characterization of the expected runtime reduction as a function of both factors. Extensive experiments across multiple datasets, models, and estimators demonstrate that RelShap produces statistically significant changes in explanations and achieves substantial runtime reduction, with strong empirical validation of the theoretical speedup factors.}
}