博弈论与机制设计

引言

当多个 Agent 的目标不完全一致时，博弈论提供了分析其交互策略的数学框架。机制设计则研究如何设计规则和激励，使得自利的 Agent 在追求自身利益时也能实现整体最优。

博弈论基础

博弈的基本要素

一个博弈 \(G\) 由以下要素构成：

参与者（Players）：\(N = \{1, 2, ..., n\}\)
策略空间（Strategy Space）：每个参与者 \(i\) 的策略集 \(S_i\)
效用函数（Utility Function）：\(u_i: S_1 \times S_2 \times ... \times S_n \rightarrow \mathbb{R}\)

纳什均衡（Nash Equilibrium）

纳什均衡是一种策略组合 \((s_1^*, s_2^*, ..., s_n^*)\)，其中任何参与者都无法通过单方面改变策略来提高自己的收益：

\[u_i(s_i^*, s_{-i}^*) \geq u_i(s_i, s_{-i}^*) \quad \forall i \in N, \forall s_i \in S_i\]

其中 \(s_{-i}^*\) 表示除参与者 \(i\) 以外所有参与者的策略组合。

经典博弈示例

囚徒困境

	B 合作	B 背叛
A 合作	(-1, -1)	(-3, 0)
A 背叛	(0, -3)	(-2, -2)

纳什均衡：(背叛, 背叛)，但（合作, 合作）是帕累托最优。

Agent 场景：两个 Agent 是否共享信息？自私地保留信息（背叛）是短期最优，但共享（合作）对整体更好。

协调博弈

	B 选方案 A	B 选方案 B
A 选方案 A	(2, 2)	(0, 0)
A 选方案 B	(0, 0)	(1, 1)

两个纳什均衡：(A, A) 和 (B, B)。关键是如何协调。

Agent 场景：多个 Agent 需要选择一致的工具/标准。

混合策略纳什均衡

当不存在纯策略纳什均衡时，Agent 以概率分布选择策略：

\[\sigma_i: S_i \rightarrow [0, 1] \quad \text{where} \sum_{s_i \in S_i} \sigma_i(s_i) = 1\]

期望效用：

\[U_i(\sigma_i, \sigma_{-i}) = \sum_{s \in S} \left(\prod_{j} \sigma_j(s_j)\right) u_i(s)\]

合作博弈

Shapley 值

在合作博弈中，Shapley 值公平地衡量每个参与者的贡献：

\[\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(n-|S|-1)!}{n!} \left[ v(S \cup \{i\}) - v(S) \right]\]

其中：

\(v(S)\) 是联盟 \(S\) 的价值函数
\(n\) 是参与者总数
求和遍历所有不包含 \(i\) 的联盟

Agent 场景：评估每个 Agent 对最终结果的贡献度。

from itertools import combinations
import math

def shapley_value(n_agents, value_function):
    """计算每个 Agent 的 Shapley 值"""
    shapley = [0.0] * n_agents
    agents = list(range(n_agents))

    for i in agents:
        others = [j for j in agents if j != i]

        for size in range(len(others) + 1):
            for coalition in combinations(others, size):
                coalition_set = set(coalition)

                # 边际贡献
                marginal = (
                    value_function(coalition_set | {i}) -
                    value_function(coalition_set)
                )

                # Shapley 权重
                weight = (
                    math.factorial(size) * 
                    math.factorial(n_agents - size - 1) /
                    math.factorial(n_agents)
                )

                shapley[i] += weight * marginal

    return shapley

# 示例：3 个 Agent 合作完成任务
def value_function(coalition):
    """联盟的价值"""
    if len(coalition) == 0: return 0
    if len(coalition) == 1: return 30    # 单独工作
    if len(coalition) == 2: return 80    # 两个合作
    if len(coalition) == 3: return 120   # 三个合作
    return 0

values = shapley_value(3, value_function)
print(f"各 Agent 的 Shapley 值: {values}")
# 每个 Agent 分配到公平的贡献份额

机制设计

目标

设计博弈的规则（机制），使得：

激励相容（Incentive Compatible）：诚实报告是最优策略
个体理性（Individually Rational）：参与优于不参与
效率（Efficient）：实现社会最优结果

拍卖机制

英式拍卖（English Auction）

公开竞价，价高者得：

async def english_auction(item, agents, starting_price=0, increment=1):
    current_price = starting_price
    current_winner = None
    active_bidders = set(agents)

    while len(active_bidders) > 1:
        new_bids = []
        for agent in active_bidders:
            bid = await agent.decide_bid(item, current_price + increment)
            if bid:
                new_bids.append((agent, bid))
            else:
                active_bidders.discard(agent)

        if new_bids:
            best = max(new_bids, key=lambda x: x[1])
            current_winner = best[0]
            current_price = best[1]

    return current_winner, current_price

维克里拍卖（Vickrey / Second-Price Auction）

密封竞价，最高出价者胜出但只需支付第二高价格。这鼓励真实报价。

async def vickrey_auction(item, agents):
    """维克里拍卖：最高价者胜出，支付第二高价"""
    bids = []
    for agent in agents:
        bid = await agent.private_valuation(item)
        bids.append((agent, bid))

    bids.sort(key=lambda x: x[1], reverse=True)

    winner = bids[0][0]
    price = bids[1][1]  # 支付第二高价

    return winner, price

为什么维克里拍卖是激励相容的：出价等于真实估值是占优策略——出价过高可能赢了但亏本，出价过低可能错失有利机会。

荷式拍卖（Dutch Auction）

从高价开始递减，第一个接受的人获胜。

VCG 机制（Vickrey-Clarke-Groves）

VCG 机制是维克里拍卖的一般化。每个参与者支付的费用等于其参与对其他人造成的外部性：

\[p_i = \sum_{j \neq i} v_j(o_{-i}^*) - \sum_{j \neq i} v_j(o^*)\]

其中 \(o^*\) 是社会最优结果，\(o_{-i}^*\) 是去掉参与者 \(i\) 后的最优结果。

在 Agent 系统中的应用

任务分配的机制设计

class TaskAllocationMechanism:
    """基于机制设计的任务分配"""

    async def allocate(self, tasks, agents):
        """使用 VCG 机制分配任务"""
        allocations = {}

        for task in tasks:
            # 1. 每个 Agent 报告自己完成任务的成本
            costs = {}
            for agent in agents:
                costs[agent.id] = await agent.report_cost(task)

            # 2. 选择成本最低的 Agent
            winner_id = min(costs, key=costs.get)

            # 3. 计算 VCG 支付（激励诚实报告）
            # 支付 = 第二低成本（类似维克里拍卖）
            sorted_costs = sorted(costs.values())
            payment = sorted_costs[1] if len(sorted_costs) > 1 else sorted_costs[0]

            allocations[task.id] = {
                "agent": winner_id,
                "payment": payment,
                "reported_cost": costs[winner_id],
            }

        return allocations

多 Agent 信用分配

利用 Shapley 值评估 Agent 贡献并分配奖励：

class CreditAssignment:
    def __init__(self, agents):
        self.agents = agents
        self.performance_history = []

    async def evaluate_and_assign(self, task_result, participation):
        """评估各 Agent 的贡献并分配信用"""
        # 记录不同 Agent 组合的表现
        overall_score = evaluate_quality(task_result)

        # 计算 Shapley 值
        def value_fn(coalition):
            if not coalition:
                return 0
            # 使用历史数据估计联盟价值
            return self.estimate_coalition_value(coalition)

        credits = shapley_value(len(self.agents), value_fn)

        return {
            agent.id: credit 
            for agent, credit in zip(self.agents, credits)
        }

辩论机制

利用博弈论设计辩论规则确保质量：

class DebateMechanism:
    """基于博弈论的辩论机制"""

    def __init__(self):
        self.reputation = {}  # Agent 信誉分

    async def structured_debate(self, topic, pro_agent, con_agent, judge):
        # 规则设计：
        # 1. 每轮论点有字数限制（防止信息洪水）
        # 2. 必须引用证据（提高质量）
        # 3. 信誉系统（长期激励诚实论证）

        history = []
        for round in range(3):
            pro_arg = await pro_agent.argue(topic, history, max_tokens=500)

            # 验证论点质量
            quality = await judge.evaluate_argument(pro_arg)
            self.update_reputation(pro_agent, quality)

            con_arg = await con_agent.argue(topic, history, max_tokens=500)
            quality = await judge.evaluate_argument(con_arg)
            self.update_reputation(con_agent, quality)

            history.extend([pro_arg, con_arg])

        return await judge.verdict(topic, history)

社会选择理论

Arrow 不可能定理

不存在同时满足以下所有条件的投票规则：

帕累托效率：如果所有人偏好 A > B，则集体偏好 A > B
无独裁：没有一个人能单独决定结果
独立于无关选项：A vs B 的集体排序只取决于个人对 A vs B 的排序

对 Agent 系统的启示：多 Agent 投票系统不可能同时满足所有"公平"条件，需要在不同属性间权衡。

延伸阅读

树搜索与蒙特卡洛 - 博弈搜索方法
Nash, J. (1950). "Equilibrium points in n-person games"
Shapley, L. S. (1953). "A value for n-person games"
Nisan, N., et al. (2007). "Algorithmic Game Theory"