Single-Stage Causal Incentive Design via Optimal Interventions
Abstract
1. Introduction
1.1. Approach and Contributions
- CID for single-stage PAPs. We establish a generic CGM for the single-stage canonical PAP in which incentives are formalized as functional interventions, yielding a principled causal estimand for policy value identified from observational data via the g-formula.
- Semi-parametric estimation strategy. We develop a practical, modular estimator for under additive Gaussian noise: (a) learn the principal’s action mechanism and outcome regression; (b) compute the inner Gaussian expectation by Gauss–Hermite quadrature; and (c) approximate the induced mediator law with policy-local kernel weights. A linear credit-market pricing example provides a closed form that clarifies the roles of policy parameters and the induced borrowing response.
- Functional Bayesian optimization for incentives. We propose the Stackelberg FCBO algorithm: a functional GP surrogate on with a functional GP-UCB acquisition functional to sequentially (offline) optimize over incentive functions when policy evaluations are expensive and noisy.
- Theoretical guarantees. We prove high-probability cumulative-regret bounds that scale with , where is the information gain and is the exploration schedule. We extend the analysis from finite admissible sets to infinite RKHS domains via covering arguments and uniform approximation—quantifying the data-efficiency and reliability of the offline design that precedes the one-shot deployment.


1.2. Related Work
1.3. Outline
2. Preliminary Background
2.1. Incentive Design as Inverse Stackelberg Games
2.2. The Single-Stage Principal–Agent Problem
2.3. Causal Graphical Models and Causal Inference
- is a set of exogenous (unobserved) variables, which are determined by factors external to the model;
- is a set , with , of endogenous (observed) variables.
- is a set of functions known as structural equations, such that , for each , with denoting the parents of ;
- is a set of probability distributions for each , where is a random disturbance distributed according to , independently of all other with .
3. Causal Incentive Design for Single-Stage Canonical PAPs
3.1. A Causal Graphical Model for the Single-Stage Canonical Principal–Agent Problem
Assumptions Underlying the CGMs for Single-Stage Canonical PAPs
3.2. A Non-Parametric Identification for the Causal Inference Target
Identification via the g-Formula
3.3. Estimations on the Semi-Parametric Identification Formula
3.3.1. The Estimation of a Linear Parametrized Instance
3.3.2. The Estimation in the General Gaussian Additive-Noise Case
3.3.3. Computing the Inner Gaussian Expectation
3.3.4. Policy-Local Empirical Measure Construction in the Outer Integral
4. A Functional Bayesian Optimization Algorithm for Single-Stage Canonical PAPs
| Algorithm 1: Stackelberg FCBO for single-stage | ||
| Input : ; the specification; the horizon T; the functional GP kernel K on ; the exploration schedule ; and the CGM for a from the principal’s perspective. | ||
| Output : , with or | ||
| 1 | . | |
| 2 | Initialize functional GP ; | |
| 3 | for do | |
| 4 | Select (see Section 4.2); | |
| 5 | Estimate as (see Section 3.3); | |
| 6 | Set ; | |
| 7 | Update functional GP (see Equation (55)); | |
| 8 | end | |
| 9 | return | |
4.1. Functional Gaussian Process Surrogate Model
4.1.1. Functional Gaussian Process Kernel
4.1.2. Functional Gaussian Process Posterior Distribution
4.2. Upper Confidence Bound Acquisition Functional
4.3. The Stackelberg FCBO Algorithm for Single-Stage Canonical PAPs
4.4. Information-Theoretic Regret Bounds on the Stackelberg FCBO Algorithm
4.4.1. Differential Information Gain
4.4.2. Regret Bound for a Finite Incentive Function Space
4.4.3. Regret Bound on an Infinite Incentive Function Space
4.4.4. Practical Implications of the Regret Bounds
4.5. On Extending CID to Multi-Follower Single-Stage Principal–Agent Problems
4.5.1. CGMs with Individualized Incentives and Independent Follower Utilities
4.5.2. CGMs with Universal Incentive and Joint Follower Utilities
5. Discussion
5.1. Offline Functional BO and Causal Decision Guarantees
5.2. Assumptions
5.3. Scope and Limitations
5.4. Time–Space Complexity over a Full FBO Run
5.5. Toward Multi-Follower Settings
6. Conclusions
Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| BO | Bayesian Optimization |
| CBO | Causal Bayesian Optimization |
| CID | Causal Incentive Design |
| CGM | Causal Graphical Models |
| FCBO | Functional Causal Bayesian Optimization |
| FBO | Functional Bayesian Optimization |
| GP | Gaussian Process |
| MAS | Multi-Agent System |
| PAP | Principal–Agent Problem |
| RKHS | Reproducing Kernel Hilbert Space |
| SCM | Structural Causal Model |
Appendix A. Regret Bound Proofs in a Finite Function Space
Appendix B. Reproducing Kernel Hilbert Spaces
A Finite Reproducing Kernel Hilbert Space
Appendix C. Bayesian Optimization
Appendix C.1. Gaussian Process Surrogate Model
Appendix C.2. Acquisition Functions
Appendix D. Functional Approximation via the Stone–Weierstrass Theorem
Appendix D.1. Approximation of the Objective Functional
Appendix D.2. Selection Methods of a Finite Set of Basis Incentive Functions
References
- Ratliff, L.J.; Dong, R.; Sekar, S.; Fiez, T. A perspective on incentive design: Challenges and opportunities. Annu. Rev. Control Robot. Auton. Syst. 2009, 2, 305–338. [Google Scholar]
- Pearl, J. A probabilistic calculus of actions. In Uncertainty in Artificial Intelligence; Morgan Kaufmann: San Mateo, CA, USA, 1994; pp. 454–462. [Google Scholar]
- Correa, J.; Bareinboim, E. A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10093–10100. [Google Scholar]
- Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar]
- Vien, N.A.; Zimmermann, H.; Toussaint, M. Bayesian Functional Optimization. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 4171–4178. [Google Scholar]
- Srinivas, N.; Krause, A.; Kakade, S.M.; Seeger, M.W. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theory 2012, 58, 3250–3265. [Google Scholar] [CrossRef]
- Chowdhury, S.R.; Gopalan, A. On kernelized multi-armed bandits. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 844–853. [Google Scholar]
- Laffont, J.J.; Martimort, D. The Theory of Incentives: The Principal-Agent Model; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
- Basar, T.; Selbuz, H. Closed-Loop Stackelberg Strategies with Applications in the Optimal Control of Multilevel Systems. IEEE Trans. Autom. Control 1979, 24, 166–179. [Google Scholar] [CrossRef]
- Dempe, S.; Zemkoho, A. (Eds.) Bilevel Optimization: Advances and Next Challenges; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Yang, J.; Wang, E.; Trivedi, R.; Zhao, T.; Zha, H. Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, Virtual, 9–13 May 2022; pp. 1436–1445. [Google Scholar]
- Ho, C.-J.; Slivkins, A.; Vaughan, J.W. Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal–Agent Problems. J. Artif. Intell. Res. 2016, 55, 317–359. [Google Scholar] [CrossRef]
- Fiez, T.; Sekar, S.; Zheng, L.; Ratliff, L.J. Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, Monterey, CA, USA, 6–10 August 2018; pp. 247–257. [Google Scholar]
- Guresti, B.; Vanlioglu, A.; Ure, N.K. IQ-Flow: Mechanism Design for Inducing Cooperative Behavior to Self-Interested Agents in Sequential Social Dilemmas. In Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems, London, UK, 29 May–2 June 2023; pp. 2143–2160. [Google Scholar]
- Mguni, D.; Jennings, J.; Macua, S.V.; Sison, E.; Ceppi, S.; Cote, E.M.D. Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 386–394. [Google Scholar]
- Jiang, J.; Wu, L.; Yu, J.; Wang, M.; Kong, H.; Zhang, Z.; Wang, J. Robustness of bilayer railway-aviation transportation network considering discrete cross-layer traffic flow assignment. Transp. Res. Part D Transp. Environ. 2024, 127, 104071. [Google Scholar] [CrossRef]
- Lattimore, F.; Lattimore, T.; Reid, M.D. Causal Bandits: Learning Good Interventions via Causal Inference. Adv. Neural Inf. Process. Syst. 2016, 29, 1181–1189. [Google Scholar]
- Bareinboim, E.; Forney, A.; Pearl, J. Bandits with Unobserved Confounders: A Causal Approach. Adv. Neural Inf. Process. Syst. 2015, 28, 1342–1350. [Google Scholar]
- Lee, S.; Bareinboim, E. Structural Causal Bandits: Where to Intervene? Adv. Neural Inf. Process. Syst. 2018, 31, 6276–6286. [Google Scholar]
- Lee, S.; Bareinboim, E. Structural Causal Bandits with Non-Manipulable Variables. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 4164–4172. [Google Scholar]
- Buesing, L.; Weber, T.; Zwols, Y.; Racaniere, S.; Guez, A.; Lespiau, J.B.; Heess, N. Woulda, Coulda, Shoulda: Counterfactually- Guided Policy Search. arXiv 2019, arXiv:1811.06272. [Google Scholar]
- Madumal, P.; Miller, T.; Sonenberg, L.; Vetere, F. Explainable Reinforcement Learning Through a Causal Lens. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2493–2500. [Google Scholar]
- Aglietti, V.; Lu, X.; Paleyes, A.; González, J. Causal Bayesian Optimization. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; Volume 108, pp. 22–31. [Google Scholar]
- Gultchin, L.; Virginia, A.; Alexis, B.; Silvia, C. Functional Causal Bayesian Optimization. In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, Pittsburgh, PA, USA, 31 July–4 August 2023; Volume 216, pp. 756–765. [Google Scholar]
- Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Cucker, F.; Smale, S. On the mathematical foundations of learning. Bull. Am. Math. Soc. 2002, 39, 1–49. [Google Scholar] [CrossRef]
- Thomas, M.T.C.A.J.; Joy, A.T. Elements of Information Theory; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
- Goldberg, P.; Williams, C.; Bishop, C. Regression with input-dependent noise: A Gaussian process treatment. Adv. Neural Inf. Process. Syst. 1997, 10, 493–495. [Google Scholar]
- Bejos, S.; Sucar, L.E.; Morales, E.F. Estimating Bounds on Causal Effects Considering Unmeasured Common Causes. In Proceedings of the International Conference on Probabilistic Graphical Models (PMLR), Nijmegen, The Netherlands, 11–13 September 2024; pp. 498–514. [Google Scholar]
- Zhang, T. Mathematical Analysis of Machine Learning Algorithms; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
- Aronszajn, N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
- Rudin, W. Principles of Mathematical Analysis, 3rd ed.; International Series in Pure and Applied Mathematics; McGraw-Hill: New York, NY, USA, 1976. [Google Scholar]
- Bach, F. On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 2017, 18, 1–38. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bejos, S.; Morales, E.F.; Sucar, L.E.; Munoz de Cote, E. Single-Stage Causal Incentive Design via Optimal Interventions. Entropy 2026, 28, 4. https://doi.org/10.3390/e28010004
Bejos S, Morales EF, Sucar LE, Munoz de Cote E. Single-Stage Causal Incentive Design via Optimal Interventions. Entropy. 2026; 28(1):4. https://doi.org/10.3390/e28010004
Chicago/Turabian StyleBejos, Sebastián, Eduardo F. Morales, Luis Enrique Sucar, and Enrique Munoz de Cote. 2026. "Single-Stage Causal Incentive Design via Optimal Interventions" Entropy 28, no. 1: 4. https://doi.org/10.3390/e28010004
APA StyleBejos, S., Morales, E. F., Sucar, L. E., & Munoz de Cote, E. (2026). Single-Stage Causal Incentive Design via Optimal Interventions. Entropy, 28(1), 4. https://doi.org/10.3390/e28010004

