AI Ethics and Governance
Introduction
The widespread deployment of AI systems raises ethical challenges including fairness, bias, accountability, and regulation. Within the AI Safety & Trustworthiness section, this page answers a narrower question: once models affect real people and organizations, how do we define fairness, assign responsibility, and turn governance requirements into concrete processes?
This page focuses on four things:
- how bias enters the model lifecycle
- what fairness metrics actually measure, and why they conflict
- how regulation, model documentation, and impact assessments turn ethics into organizational controls
- which minimal engineering practices make fairness auditing operational
1. Bias in Machine Learning
1.1 Sources of Bias
| Bias Type | Source | Example |
|---|---|---|
| Historical bias | Real-world inequalities | Hiring data reflecting historical gender discrimination |
| Representation bias | Imbalanced group proportions in training data | Face datasets dominated by lighter skin tones |
| Measurement bias | Proxy variables introducing bias | Using zip codes as proxies for race |
| Aggregation bias | Using one model for different groups | Medical models ignoring racial differences |
| Evaluation bias | Evaluation data not representing actual users | Test sets lacking specific groups |
| Deployment bias | System used differently than designed | Applied to populations outside the training scope |
1.2 Notable Cases
| Case | Problem | Cause |
|---|---|---|
| Amazon hiring AI | Discriminated against female applicants | Training data reflected historical preferences |
| COMPAS recidivism | Unfair to Black defendants | Proxy variables and historical bias |
| Facial recognition | Lower accuracy for darker skin tones | Imbalanced training data |
| GPT language models | Gender/racial stereotypes | Bias in internet text |
2. Fairness Metrics
2.1 Group Fairness
Let \(A\) be the protected attribute (e.g., gender, race), \(\hat{Y}\) the model prediction, and \(Y\) the true label.
Demographic Parity:
Positive prediction rates are equal across groups.
Equalized Odds:
Prediction rates are equal across groups for each true label.
Equal Opportunity:
Only requires the true positive rate (TPR) to be equal across groups.
Predictive Parity:
Positive predictive value (Precision) is equal across groups.
2.2 Impossibility Theorems
Chouldechova (2017) and Kleinberg et al. (2016) proved:
When base rates differ across groups, it is impossible to simultaneously satisfy all fairness metrics.
This means the choice of fairness definition is itself a value judgment.
2.3 Why an engineering snippet belongs here
Governance is not just a list of principles. The moment a team claims that it has "evaluated fairness" or "trained under a fairness constraint," those claims have to map to an auditable workflow. The fairlearn snippet below is not here as a library tutorial. It is a minimal example of how governance requirements become engineering actions:
MetricFrame: computes metrics by sensitive group instead of relying on one overall averageselection_rate: checks whether positive outcomes are distributed evenly across groupsfalse_positive_rate: checks whether one group is disproportionately harmed by false alarmsaccuracy: reminds us that fairness trade-offs are evaluated alongside task performanceExponentiatedGradient + DemographicParity: shows what it means to encode a fairness constraint into training or post-processing rather than adding an after-the-fact explanation
Without this bridge, governance stays at the slogan level instead of becoming part of model evaluation and release discipline.
2.4 Minimal fairness workflow example
from fairlearn.metrics import MetricFrame, selection_rate, false_positive_rate
# Compute group-level metrics
metric_frame = MetricFrame(
metrics={
"selection_rate": selection_rate,
"false_positive_rate": false_positive_rate,
"accuracy": accuracy_score,
},
y_true=y_test,
y_pred=y_pred,
sensitive_features=sensitive_features
)
print(metric_frame.by_group)
# Fairness-constrained training
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
constraint = DemographicParity()
mitigator = ExponentiatedGradient(estimator, constraint)
mitigator.fit(X_train, y_train, sensitive_features=A_train)
3. Regulatory Frameworks
3.1 EU AI Act
The EU AI Act classifies management by risk level:
| Risk Level | Requirements | Examples |
|---|---|---|
| Unacceptable risk | Prohibited | Social scoring systems, real-time public facial recognition |
| High risk | Strict regulation | Medical AI, hiring AI, credit scoring, judiciary |
| Limited risk | Transparency requirements | Chatbots must declare identity |
| Minimal risk | No special requirements | Spam filters, game AI |
High-risk AI system requirements:
- Risk management system
- Data governance and data quality
- Technical documentation
- Record keeping and traceability
- Transparency and user information
- Human oversight
- Accuracy, robustness, and cybersecurity
- Conformity assessment
3.2 Chinese AI Regulations
| Regulation | Year | Focus |
|---|---|---|
| Deep Synthesis Management Provisions | 2023 | Deepfake labeling, content review |
| Interim Measures for Generative AI | 2023 | Training data quality, content safety |
| Algorithm Recommendation Management Provisions | 2022 | Recommendation transparency, user rights |
| Personal Information Protection Law (PIPL) | 2021 | Data protection, similar to GDPR |
3.3 Other Regions
| Region | Approach | Characteristics |
|---|---|---|
| United States | Industry self-regulation + executive orders | 2023 AI Executive Order; state-level legislation |
| United Kingdom | Principles-based | Building on existing regulatory frameworks |
| Japan | Innovation-promoting | Lighter regulation |
4. Responsible AI Principles
4.1 Major Frameworks
Microsoft Responsible AI Principles:
- Fairness
- Reliability & Safety
- Privacy & Security
- Inclusiveness
- Transparency
- Accountability
Google AI Principles:
- Be socially beneficial
- Avoid creating or reinforcing unfair bias
- Be built and tested for safety
- Be accountable to people
- Incorporate privacy design principles
- Uphold high standards of scientific excellence
- Be made available for uses that accord with these principles
Anthropic Core Safety Commitments:
- Do not pursue advanced AI capabilities at the expense of safety
- Invest substantial resources in safety research
- Collaborate with policymakers
- Transparently share safety research
4.2 Practical Recommendations
| Phase | Practice |
|---|---|
| Design | Define use cases and limitations; involve stakeholders |
| Data | Audit training data for bias; data documentation (Datasheets) |
| Development | Fairness-constrained training; multi-dimensional evaluation |
| Testing | Red teaming; group-level evaluation; adversarial testing |
| Deployment | Human oversight; monitor fairness drift; feedback mechanisms |
| Documentation | Model Cards; data statements; impact assessments |
5. AI Governance Tools
| Tool | Purpose |
|---|---|
| Model Cards | Model documentation standard (Google) |
| Datasheets for Datasets | Dataset documentation standard |
| AI Impact Assessment | Impact assessment framework |
| Fairlearn | Fairness evaluation and mitigation (Microsoft) |
| AI Verify | AI governance testing framework (Singapore) |
| NIST AI RMF | AI Risk Management Framework (US) |
6. Open Challenges
| Challenge | Description |
|---|---|
| Conflicting fairness definitions | Different fairness metrics cannot be simultaneously satisfied |
| Cross-cultural differences | Different cultures have different understandings of fairness and privacy |
| Regulation-innovation balance | Too strict hampers development; too lax causes harm |
| Generative AI | Deepfakes, misinformation, copyright issues |
| Global coordination | Inconsistent regulations across countries/regions |
| Accountability chain | Who is responsible for AI errors? Developers? Deployers? |
Relations to other topics
- For the overall security frame, see AI Safety Overview
- For privacy and high-risk deployment controls, see Privacy Attacks and AI Engineering Safety & Governance
- For value constraints and preference shaping, see AI Alignment
- For interpretability evidence in governance, see Explainability & Robustness
References
- EU AI Act Full Text
- "Fairness and Machine Learning" - Barocas, Hardt, Narayanan
- "Weapons of Math Destruction" - Cathy O'Neil
- Google Responsible AI Practices
- Microsoft Responsible AI Standard