The emerging standard of "decision reasonableness" in AI oversight

Abstract

Regulators and courts are shifting from outcome-based evaluation of AI deployment success to process-based evaluation of deployment decision quality. This shift introduces a new standard: demonstrating not only that an AI system performed adequately, but that the process by which the decision to deploy was made met standards of institutional reasonableness. This paper examines what decision reasonableness requires of deploying organisations and how it differs from traditional outcome evaluation.

Outcome-Based vs Process-Based Evaluation

Historically, deployment success has been evaluated based on outcomes. Did the system perform as intended? Did it improve efficiency? Did it reduce costs? Did it achieve the business objectives that motivated deployment? If the answers were affirmative, the deployment was considered successful regardless of the decision process.

Process-based evaluation asks a different question: was the decision to deploy this system made through a reasonable process, regardless of whether the deployment subsequently succeeded or failed? A system could perform well but have been deployed through a deficient process (e.g., insufficient risk assessment, inadequate governance review, unexplored material uncertainties). A system could perform poorly but have been deployed through a sound process that identified risks appropriately and took reasonable steps to mitigate them.

Under outcome-based evaluation, the latter scenario (poor outcome, sound process) would be considered deployment failure. Under process-based evaluation, it could be considered acceptable deployment decision-making that unfortunately encountered unforeseen circumstances. The shift toward process-based evaluation changes what regulators and courts examine when they investigate deployment decisions.

Elements of Institutional Reasonableness

Decision reasonableness requires several elements, applied with sufficient rigour that the decision-making process would withstand external scrutiny:

First, adequate information gathering. Did decision-makers have access to information necessary to understand the system's capabilities and limitations? Was this information obtained from sources likely to have relevant expertise? Were material uncertainties acknowledged rather than resolved through assumption?

Second, deliberative process. Did decision-makers engage in actual deliberation about whether deployment was defensible? Or was the process pro forma, with predetermined conclusions? Did decision-makers raise and address questions about risks? Did they consider alternatives to deployment?

Third, documented reasoning. Can the deploying organisation explain the reasoning that led to the deployment decision? What did decision-makers understand about risks? What risks did they identify as material? What mitigations did they consider? Why did they conclude deployment should proceed despite identified risks?

Fourth, appropriate authority. Did decision-making authority rest with people or bodies having actual power to halt deployment based on assessment findings? Or was authority diffused, with technical teams building and evaluating systems, business teams defining deployment requirements, and governance teams providing post-hoc approval?

Fifth, proportional governance. Did the level of governance engagement match the stakes of the deployment decision? Low-stakes decisions might require less governance engagement. High-stakes decisions affecting many stakeholders should involve appropriate senior governance engagement.

The Role of Documentation

Demonstration of decision reasonableness depends heavily on documentation. If a deploying organisation can produce documentation showing that decision-makers deliberated about risks, identified material concerns, and determined that risks were manageable or acceptable, that documentation supports a reasonableness claim.

If documentation is absent — if the organisation cannot point to evidence of pre-deployment deliberation about risks — the reasonableness claim is weakened. This creates strong incentives for organisations to document pre-deployment assessment and decision-making even when not explicitly required by regulation.

However, documentation alone is insufficient. If decision-makers documented pre-deployment concerns but failed to act on those concerns — if they identified risks but proceeded without mitigation — documentation actually strengthens a regulator's case that decision-making was unreasonable. The deploying organisation knowingly proceeded despite identified risks without adequate justification.

Distinction from Outcome Success

A deployment can be reasonable in process but unsuccessful in outcome. An organisation might conduct appropriate pre-deployment assessment, identify certain risks, implement reasonable mitigations, and proceed with deployment. Subsequently, an unforeseen event might occur that mitigations do not address, resulting in a poor outcome. This would not constitute unreasonable decision-making.

Conversely, a deployment can succeed in outcome but be unreasonable in process. An organisation might bypass pre-deployment assessment, deploy a system based on intuition rather than evidence, and the system might nonetheless perform adequately. Process-based evaluation would still find the deployment decision unreasonable.

This distinction is important for organisational risk management. It means that even deployments that perform well may be subject to regulatory challenge if the decision process is found to have been deficient. An organisation cannot rely on good outcomes to justify inadequate process.

Application in Regulatory and Legal Settings

Regulators investigating AI deployments increasingly focus on decision process rather than only on outcomes. If a lending model produced adverse outcomes, regulators previously might ask whether the model was biased. Regulators now additionally ask whether the deploying organisation conducted adequate pre-deployment assessment of bias risk, documented that assessment, and made a deliberate decision to deploy despite identified risks.

Courts evaluating liability from AI deployments are adopting similar process-focused analysis. In negligence cases, courts ask whether the deploying organisation conducted appropriate pre-deployment assessment of foreseeable risks. In discrimination cases, courts ask whether the organisation took steps to assess and mitigate discriminatory impact prior to deployment. In some cases, courts have held organisations liable for deploying systems without adequate pre-deployment assessment even when the systems performed adequately in outcome.

This shift affects settlement and settlement discussions. Previously, organisations might defend against adverse outcome claims by presenting evidence of system performance. Now, organisations must additionally defend the quality of pre-deployment decision-making. An organisation defending against a discrimination claim based on adverse outcomes might find that evidence of strong model performance is insufficient if the organisation cannot demonstrate adequate pre-deployment discrimination risk assessment.

Institutional Implications

The reasonableness standard creates institutional requirements that many organisations lack. Demonstrating reasonable decision-making requires governance structures capable of deliberating about deployment decisions, people with authority to halt deployment based on assessment findings, and documentation practices that capture pre-deployment reasoning.

Organisations with technical decision-making but limited governance engagement will struggle to demonstrate reasonableness. If an AI deployment proceeds based primarily on data science team judgment with pro forma governance approval, the decision process appears insufficiently deliberative to meet reasonableness standards.

Organisations implementing the reasonableness standard invest in governance structures that can engage substantively in pre-deployment assessment, technical capabilities to conduct assessment and document findings, and disciplined decision-making processes that treat assessment findings as material to deployment decisions rather than items to be evaluated after deployment.

The standard also creates pressure toward transparency. If decision-making must be defensible through documentation, organisations have incentives to conduct genuine deliberation and document honest assessment of risks. Hidden risk identification or post-hoc documented justification for predetermined deployment decisions become riskier strategies, as they may be discovered during regulatory or litigation discovery.

Interaction with Technical Performance Standards

The reasonableness standard does not replace technical performance standards. It adds a process-based overlay to outcome-based requirements. An AI system must still meet technical performance standards and anti-discrimination requirements applicable to the sector. The reasonableness standard asks additionally that the decision to deploy was made through appropriate process.

This means deploying organisations must attend to both dimensions simultaneously. They must ensure the system performs adequately for its intended purpose (outcome dimension). They must also ensure the decision to deploy was made through reasonable process with appropriate pre-deployment assessment and governance engagement (process dimension).

In practice, these dimensions reinforce each other. A robust pre-deployment assessment process is likely to identify performance issues that might otherwise surface post-deployment. A robust decision process involving substantive governance engagement is likely to improve deployment outcomes by surfacing risks and mitigations early. The process-based standard, while creating additional requirements, potentially improves deployment outcomes.