Chapter 3: Competence in Action — Institute for Ethics in AI

Attentiveness and responsibility have to go hand in hand with competent systems. “Care‑giving” means working code that does what it promised—audited, explainable, and safe‑to‑fail. Competent systems will build trust among people in the technology.

Quick version

Ship small, prove often. Shadow modes, canary releases, reversible defaults.
Reward bridges, not clicks. Use bridging‑based ranking and Reinforcement Learning from Community Feedback (RLCF).
Audit like we mean it. Reproducible evals, public traces, and incident drills.

Results we want

Systems that deliver the promised care safely under load and under scrutiny.
Prosocial incentives: agents gain reward by increasing cross‑group endorsement.
Failures are contained, reversible, and teach us.

Why Competence?

Good intentions without working code erode trust. Competence turns contracts into systems that behave—and can be proven to behave.

A simple picture: A bridge isn’t competent because the blueprint is elegant; it’s competent because it holds—and continues to hold when trucks cross, winds rise, and inspectors check the bolts.

Simple ideas behind this chapter

Safety is a property of practice. Competence is demonstrated in operation, not assumed from design.
Proof before promotion. Features graduate only after shadowing → canary → general with guardrails.
Community‑shaped reward. Train agents with RLCF: optimize for cross‑group endorsement and trust‑under‑loss, not raw engagement.
Observability over opaqueness. “Show your work” with traces, datasets, and explainable summaries tied to decisions.
Least power. Use the simplest mechanism that meets the need; complexity grows attack surface.

What good ‘competence’ looks like

Bridging‑based ranking. Recommenders score content and agent actions by how well they bridge coherent clusters, not outrage them.
RLCF training loops. Reward models when multiple groups endorse outcomes as fair/useful; penalize when trust‑under‑loss drops.
Graduated release. New policies run in shadow mode, then canary for a random, representative slice, then general rollout with rollback primed.
Eval harness. Open test suites for quality, safety, bias, privacy; localized evals contributed by communities (see Ch4).
Reproducible builds. Configs are versioned; one‑click replays re‑create results.
Guardrails as code. Rights and red lines expressed as machine‑checkable rules (deny‑by‑default when ambiguous).
Data minimalism. Collect only what the remedy needs; delete on handoff; consent honored at every stage.

From ideas to everyday practice (step by step)

Derive specs from contracts. Convert Ch2 contracts into acceptance tests.
Instrument for observability. Emit decision traces with links to sources and receipts (from Ch1).
Train with RLCF. Collect feedback from diverse cohorts; compute bridge scores; use them as reward.
Run shadow mode. New policy sees inputs and proposes actions but doesn’t act. Compare to human/previous system.
Canary safely. Release to a small, representative group with automatic rollback if drift exceeds bounds.
Audit before general. Independent audit of evals, logs, and guardrails; publish attested report.
Generalize & monitor. Enable for all; watch drift monitors; keep pause wired.
Post‑incident learning. Blameless reviews; fixes become tests.

Tools you can adopt now

Bridge score functions. PCA/embedding‑based overlap metrics.
RLCF pipeline. Human and community feedback to reward shaping.
Eval registry. Versioned tests; provenance; localized packs.
Shadow/canary orchestrator with rollback switches
Decision trace schema.Inputs, rules fired, sources, uncertainties.
Guardrail engine. Policy‑as‑code for rights/consents.
Drift monitors. Data, performance, fairness.
Repro notebooks. Seeded, containerized builds.

Flood‑bot story - Part III: delivering care

Bridge ranking. When multiple aid channels exist, the bot’s recommender prioritizes actions that increase cross‑neighborhood endorsement (e.g., messages that both renters and homeowners up‑vote as fair).
RLCF. The payout policy is trained to reward on‑time delivery without spiking appeals in any cluster.
Shadow → canary. A new “medical receipts waiver” runs in shadow for a week; then canaries to 10% of S1 claims; rollback bound: appeals >15%.
Observability. Every denial has a trace: which rule, which sources, uncertainty score, and a receipt link for the claimant.

What could go wrong (and quick fixes)

Gaming the bridge. Actors craft messages to look “bridging.” Fix: Mix human audits; require durable cross‑group endorsement over time.
Train/test leakage. Evals look good; reality fails. Fix: Hold‑out datasets, randomized spot checks, live A/Bs with rollback.
Opaque “black box.” “Trust us” explanations. Fix: Traceable summaries + public examples; auditors can reconstruct decisions.
Canary bias. Canary slice is unrepresentative. Fix: Stratified sampling; publish canary demographics.

How we keep ourselves honest (what we measure)

MTTD/MTTR for harm. Mean time to detect/repair regressions.
Bridge index. Cross‑group endorsement compared to baseline.
Rollback discipline. % of rollouts with tested rollback; time to rollback.
Drift alarms. Frequency and time‑to‑triage.
Trust‑under‑loss delta. Before/after for those who disagreed.

Interfaces with other packs

From Responsibility: specs, SLAs, brakes.
To Responsiveness: incident loops and eval registries (Ch4).
To Solidarity: bridge scores feed civic‑stack incentives (Ch5).
To Symbiosis: competence proves a kami is ready to stay local.

A Closing image: The bridge

Imagine a well‑kept bridge with inspection tags—date, load test, next check—visible to anyone crossing. People can take confidence in that the bridge will hold and is consistently tested for safety.