金融领域微调与RAG

概述

通用大语言模型 (LLM) 在金融专业任务中常面临知识不足和时效性缺失的问题。领域适配 (Domain Adaptation) 的两条主流路径是微调 (Fine-tuning) 和检索增强生成 (Retrieval-Augmented Generation, RAG)。本文详解 LoRA/QLoRA 等参数高效微调方法，以及面向金融知识库的 RAG 架构设计与检索策略。

领域适配 (Domain Adaptation)

为什么需要领域适配？

通用 LLM 在金融领域面临三大瓶颈：

术语理解不精准：金融术语的专业含义与通用语义存在差异
知识时效性：训练数据的截止日期导致缺乏最新市场信息
任务适配性：通用模型未针对金融特定任务（如因子分析、风险评估）进行优化

适配路径对比：

方法	优势	劣势	适用场景
全量微调	效果最佳	计算成本极高	大型机构
LoRA/QLoRA	低成本、效果好	需要标注数据	特定任务适配
RAG	无需训练、实时更新	依赖检索质量	知识密集型问答
Prompt Engineering	零成本	效果有限	快速原型验证

参数高效微调

LoRA (Low-Rank Adaptation)

LoRA 的核心思想是冻结预训练权重 \(W_0\)，仅训练低秩分解的增量矩阵：

\[W = W_0 + \Delta W = W_0 + BA\]

其中 \(B \in \mathbb{R}^{d \times r}\)，\(A \in \mathbb{R}^{r \times k}\)，秩 \(r \ll \min(d, k)\)。

参数量对比：

\[\text{Full: } d \times k, \quad \text{LoRA: } r \times (d + k), \quad \text{压缩比: } \frac{r(d+k)}{dk} \approx \frac{2r}{d}\]

对于 \(d = 4096, r = 16\)，参数量仅为全量微调的 \(0.78\%\)。

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载基座模型
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8B",
    torch_dtype=torch.float16,
    device_map="auto"
)

# LoRA 配置
lora_config = LoraConfig(
    r=16,                     # 秩
    lora_alpha=32,            # 缩放系数
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)
print(f"可训练参数: {model.print_trainable_parameters()}")

QLoRA (Quantized LoRA)

QLoRA 在 LoRA 基础上引入 4-bit 量化 (4-bit Quantization)，进一步降低显存需求：

\[W_{\text{4bit}} = \text{NF4}(W_0), \quad W = \text{Dequant}(W_{\text{4bit}}) + BA\]

其中 NF4 (NormalFloat 4-bit) 是针对正态分布权重优化的量化格式。

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,  # 双重量化
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3-8B",
    quantization_config=bnb_config,
    device_map="auto"
)

QLoRA 的显存需求

QLoRA 使得在单张消费级 GPU（如 24GB RTX 4090）上微调 7B-13B 参数的模型成为可能。这大幅降低了金融机构进行领域适配的硬件门槛。

金融微调数据集构建

def prepare_financial_sft_data(raw_data):
    """构建金融领域 SFT (Supervised Fine-Tuning) 数据"""
    sft_examples = []

    # 1. 财报分析任务
    for filing in raw_data['filings']:
        sft_examples.append({
            "instruction": "分析以下公司的季度财报要点",
            "input": filing['text'][:4000],
            "output": filing['analyst_summary']
        })

    # 2. 金融问答任务
    for qa in raw_data['financial_qa']:
        sft_examples.append({
            "instruction": qa['question'],
            "input": qa.get('context', ''),
            "output": qa['answer']
        })

    # 3. 情感分类任务
    for sample in raw_data['sentiment']:
        sft_examples.append({
            "instruction": "判断以下金融文本的情感倾向",
            "input": sample['text'],
            "output": sample['label']
        })

    return sft_examples

RAG 架构 (Retrieval-Augmented Generation)

基本架构

RAG 将外部知识检索与 LLM 生成相结合：

\[P(y|q) = \sum_{d \in \mathcal{D}} P(d|q) \cdot P(y|q, d)\]

其中 \(q\) 为用户查询，\(d\) 为检索到的文档，\(y\) 为生成的回答。

[用户查询] → [向量化] → [向量检索] → [上下文拼接] → [LLM生成] → [回答]
                              ↑
                        [金融知识库]
                     (研报/财报/法规/数据)

金融知识库构建

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

class FinancialKnowledgeBase:
    def __init__(self, embedding_model="BAAI/bge-large-zh-v1.5"):
        self.embeddings = HuggingFaceEmbeddings(
            model_name=embedding_model
        )
        self.splitter = RecursiveCharacterTextSplitter(
            chunk_size=512,
            chunk_overlap=50,
            separators=["\n\n", "\n", "。", "；", " "]
        )
        self.vectorstore = None

    def ingest_documents(self, documents):
        """文档入库: 分块、向量化、索引"""
        chunks = []
        for doc in documents:
            splits = self.splitter.split_text(doc['text'])
            for i, chunk in enumerate(splits):
                chunks.append({
                    'text': chunk,
                    'metadata': {
                        'source': doc['source'],
                        'date': doc['date'],
                        'doc_type': doc['type'],  # report/filing/regulation
                        'chunk_id': i
                    }
                })
        texts = [c['text'] for c in chunks]
        metadatas = [c['metadata'] for c in chunks]
        self.vectorstore = FAISS.from_texts(
            texts, self.embeddings, metadatas=metadatas
        )

    def retrieve(self, query, k=5, filters=None):
        """检索相关文档片段"""
        if filters:
            results = self.vectorstore.similarity_search(
                query, k=k, filter=filters
            )
        else:
            results = self.vectorstore.similarity_search(query, k=k)
        return results

分块策略的重要性

分块 (Chunking) 是 RAG 系统中影响最大的设计决策之一。金融文本的分块需要考虑：(1) 表格数据不应被拆分；(2) 财务指标与其上下文需保持在同一块中；(3) 章节标题应作为元数据保留而非截断。

检索策略 (Retrieval Strategies)

混合检索 (Hybrid Retrieval)

结合稠密向量检索 (Dense Retrieval) 与稀疏关键词检索 (Sparse Retrieval)：

\[\text{Score}_{\text{hybrid}} = \alpha \cdot \text{Score}_{\text{dense}} + (1-\alpha) \cdot \text{Score}_{\text{sparse}}\]

def hybrid_retrieve(query, vectorstore, bm25_index, alpha=0.7, k=10):
    """混合检索: 向量 + BM25"""
    # 稠密检索
    dense_results = vectorstore.similarity_search_with_score(query, k=k*2)
    # 稀疏检索
    sparse_results = bm25_index.search(query, k=k*2)
    # 归一化分数并融合
    dense_scores = normalize_scores(dense_results)
    sparse_scores = normalize_scores(sparse_results)
    combined = merge_results(dense_scores, sparse_scores, alpha)
    return combined[:k]

重排序 (Re-ranking)

使用 Cross-Encoder 对初检结果进行精细排序：

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("BAAI/bge-reranker-large")

def retrieve_and_rerank(query, vectorstore, k_initial=20, k_final=5):
    """两阶段检索: 召回 + 重排"""
    # Stage 1: 召回
    candidates = vectorstore.similarity_search(query, k=k_initial)
    # Stage 2: 重排
    pairs = [(query, doc.page_content) for doc in candidates]
    rerank_scores = reranker.predict(pairs)
    top_indices = np.argsort(rerank_scores)[-k_final:][::-1]
    return [candidates[i] for i in top_indices]

时间感知检索 (Time-aware Retrieval)

金融信息有强时效性，需对近期文档赋予更高权重：

\[\text{Score}_{\text{final}} = \text{Score}_{\text{semantic}} \times \exp\left(-\lambda \cdot \Delta t\right)\]

其中 \(\Delta t\) 为文档发布时间距当前的天数，\(\lambda\) 控制时间衰减速率。

金融 RAG 的最佳实践

使用金融领域的 Embedding 模型（如 FinBERT-based）提升检索相关性
对数值数据（财务指标、价格）建立结构化索引，不仅依赖向量检索
实施源头追溯 (Source Attribution)，让用户能验证 LLM 引用的信息来源
定期更新知识库，确保信息时效性

微调 vs RAG：如何选择？

\[\text{Decision} = \begin{cases} \text{微调} & \text{任务固定、需要特定输出格式或风格} \\ \text{RAG} & \text{知识密集、需要实时更新} \\ \text{微调 + RAG} & \text{两者兼需} \end{cases}\]

在金融实践中，推荐的组合方案是：用 LoRA 微调使模型熟悉金融任务格式和术语，同时用 RAG 补充实时的市场数据和研报信息。

小结

金融领域的 LLM 适配是一个系统工程。LoRA/QLoRA 以极低的计算成本实现了有效的领域微调，而 RAG 则解决了知识时效性和可追溯性问题。两者的结合——微调模型理解金融语言，RAG 提供实时知识——是当前金融 AI 应用的最佳实践路径。