Skip to content

Advanced Prompt Techniques

1. Self-Consistency

1.1 Core Idea

Self-Consistency improves reasoning accuracy by sampling multiple reasoning paths and performing majority voting on the final answers.

1.2 Workflow

Question → [CoT Reasoning Path 1 → Answer A]
         → [CoT Reasoning Path 2 → Answer B]
         → [CoT Reasoning Path 3 → Answer A]
         → [CoT Reasoning Path 4 → Answer A]
         → [CoT Reasoning Path 5 → Answer C]

Majority Vote → Answer A (3/5) ✓

1.3 Implementation

import openai
from collections import Counter

def self_consistency(prompt, n_samples=5, temperature=0.7):
    """Use Self-Consistency with multiple sampling and voting"""
    answers = []
    for _ in range(n_samples):
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        # Extract final answer from reasoning
        answer = extract_final_answer(response.choices[0].message.content)
        answers.append(answer)

    # Majority vote
    counter = Counter(answers)
    return counter.most_common(1)[0][0]

1.4 Applicability and Limitations

Suitable for: Math problems, logical reasoning, multiple choice — tasks with clear-cut answers

Limitations:

  • Higher cost (multiple API calls)
  • Limited effectiveness for open-ended generation tasks
  • Requires a reliable answer extraction mechanism

2. Tree of Thoughts (ToT)

2.1 Core Idea

Tree of Thoughts organizes the reasoning process as a tree structure, generating multiple "thought" branches at each node and selecting the optimal path through evaluation.

2.2 Comparison with CoT

Aspect CoT ToT
Reasoning structure Linear chain Tree branches
Exploration Single path Multiple paths in parallel
Backtracking Not supported Supported
Evaluation Final result only Intermediate steps evaluable

2.3 Implementation Framework

class TreeOfThoughts:
    def __init__(self, model, evaluator):
        self.model = model
        self.evaluator = evaluator

    def solve(self, problem, max_depth=3, branch_factor=3):
        """BFS-style ToT solver"""
        root = ThoughtNode(problem, depth=0)
        current_level = [root]

        for depth in range(max_depth):
            next_level = []
            for node in current_level:
                # Generate multiple thought branches
                thoughts = self.generate_thoughts(node, branch_factor)
                for thought in thoughts:
                    # Evaluate the promise of each thought
                    score = self.evaluator.evaluate(thought)
                    child = ThoughtNode(thought, depth=depth+1, score=score)
                    node.add_child(child)
                    next_level.append(child)

            # Select the most promising nodes for further expansion
            next_level.sort(key=lambda x: x.score, reverse=True)
            current_level = next_level[:branch_factor]

        # Return the best leaf node
        return max(current_level, key=lambda x: x.score)

2.4 Prompt Example

Problem: {problem}

Please generate 3 different solution approaches (first step only):

Approach 1:
Approach 2:
Approach 3:

Evaluate the feasibility of each approach (1-10):

2.5 Applicable Scenarios

  • Creative writing (exploring different narrative directions)
  • Mathematical proofs (trying different proof strategies)
  • Planning problems (exploring different action plans)
  • Code design (comparing different architectural approaches)

3. RAG-Augmented Prompts

3.1 Basic Pattern

Inject retrieved context into the prompt to augment LLM knowledge:

Answer the question based on the following references. If the references do not contain relevant information, state this honestly.

References:
---
{retrieved_context_1}
---
{retrieved_context_2}
---
{retrieved_context_3}

Question: {user_question}

Please cite specific content from the references to support your answer.

3.2 Advanced RAG Prompt Patterns

Multi-step reasoning RAG:

Based on the following materials, please:
1. First summarize the key points of each material
2. Analyze the connections between materials
3. Synthesize an answer to the user's question
4. Identify any information gaps that may need supplementation

Materials: {contexts}
Question: {question}

RAG with confidence levels:

Answer the question based on the provided materials. Label each claim with a confidence level:
- [High]: Directly supported by the materials
- [Medium]: Can be inferred from the materials
- [Low]: Limited material support, partially based on general knowledge

Materials: {contexts}
Question: {question}

3.3 Integration with Memory Systems

RAG can serve as an external memory system for AI Agents. See RAG-Augmented Memory for details.

4. DSPy: Programmatic Prompt Optimization

4.1 DSPy Overview

DSPy (Declarative Self-improving Python) is a framework that transforms prompt engineering into a programming problem, defining LLM programs declaratively and optimizing prompts automatically.

4.2 Core Concepts

  • Signature: Type signatures defining inputs and outputs
  • Module: Composable LLM operation modules
  • Teleprompter/Optimizer: Algorithms for automatic prompt optimization
  • Metric: Metrics for evaluating prompt quality

4.3 Basic Usage

import dspy

# Configure LLM
lm = dspy.OpenAI(model="gpt-4", max_tokens=300)
dspy.settings.configure(lm=lm)

# Define Signature
class SentimentClassification(dspy.Signature):
    """Classify sentiment of a text."""
    text = dspy.InputField(desc="Text to classify")
    sentiment = dspy.OutputField(desc="positive, negative, or neutral")

# Use Module
classify = dspy.Predict(SentimentClassification)
result = classify(text="This product exceeded my expectations!")
print(result.sentiment)  # "positive"

4.4 Automatic Optimization

from dspy.teleprompt import BootstrapFewShot

# Prepare training data
trainset = [
    dspy.Example(text="Great product!", sentiment="positive"),
    dspy.Example(text="Terrible experience.", sentiment="negative"),
    # ...more samples
]

# Define evaluation metric
def accuracy_metric(example, pred, trace=None):
    return example.sentiment == pred.sentiment

# Automatic optimization
teleprompter = BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=4)
optimized_classify = teleprompter.compile(classify, trainset=trainset)

4.5 Advantages of DSPy

  • Programmable: Prompt logic expressed in code, testable and versionable
  • Auto-optimized: No manual prompt debugging needed
  • Composable: Modular design supports complex pipelines
  • LLM-agnostic: Prompts auto-adapt when switching models

5. Automatic Prompt Optimization

5.1 APE (Automatic Prompt Engineer)

Methods for automatically searching for optimal prompts:

1. Given a task description and evaluation dataset
2. Use an LLM to generate candidate prompts
3. Test each prompt on the evaluation set
4. Select the best-performing prompt
5. Iterate and improve

5.2 OPRO (Optimization by PROmpting)

Leveraging the LLM itself as an optimizer:

Here are some prompts and their performance scores:

Prompt: "Classify the sentiment" → Score: 0.72
Prompt: "Determine if positive or negative" → Score: 0.78
Prompt: "Analyze the emotional tone" → Score: 0.75

Based on the above information, generate a new prompt that might perform better.

5.3 Practical Prompt Optimization Workflow

1. Define evaluation metrics and test set
2. Write an initial prompt
3. Evaluate on the test set
4. Analyze failure cases
5. Modify the prompt (manual + automatic)
6. Repeat 3-5 until requirements are met
7. A/B test before production rollout

6. Meta-Prompting

6.1 Concept

Using LLMs to generate, evaluate, and improve prompts. Essentially, "prompts for writing prompts."

6.2 Prompt Generator

You are a prompt engineering expert. The user will describe a task, and you need to:

1. Analyze the key elements of the task
2. Generate 3 prompts in different styles
3. Evaluate the strengths and weaknesses of each prompt
4. Recommend the best prompt

Task description: {task_description}

Please generate prompts suitable for use with GPT-4.

6.3 Prompt Evaluator

Please evaluate the quality of the following prompt:

Prompt: {prompt_to_evaluate}
Target task: {task_description}

Evaluation dimensions:
1. Clarity (1-10): Are instructions unambiguous?
2. Completeness (1-10): Does it cover all necessary information?
3. Formatting (1-10): Is the output format clear?
4. Robustness (1-10): Can it handle abnormal inputs?
5. Efficiency (1-10): Token usage efficiency

Overall score and improvement suggestions:

6.4 Prompt Iterator

Current prompt: {current_prompt}
Evaluation results: {evaluation_results}
Failure cases: {failure_cases}

Please improve the prompt based on the above information:
1. Fix issues causing the failure cases
2. Preserve the strengths of the original prompt
3. Improve overall robustness

Improved prompt:

7. Advanced Techniques Summary

7.1 Prompt Chaining

Decompose complex tasks into chained prompt calls:

Step 1: Analyze → Extract key information
Step 2: Plan → Develop action plan
Step 3: Execute → Generate final output
Step 4: Review → Check and correct

7.2 Role-Play Enhancement

Have 3 experts analyze this problem separately:
- Expert A (Data Scientist): Analyze from a data perspective
- Expert B (Product Manager): Analyze from a user needs perspective
- Expert C (Security Engineer): Analyze from a security perspective

Then synthesize the opinions of all 3 experts to provide a final recommendation.

7.3 Constraint Escalation

When LLM output does not meet requirements, progressively add constraints:

# First attempt
Summarize this article.

# Second attempt (add constraints)
Summarize this article in 3 bullet points, each no more than 20 words.

# Third attempt (further constraints)
Summarize this article in 3 bullet points, each no more than 20 words.
Format:
- Point 1: [content]
- Point 2: [content]
- Point 3: [content]
Output only the bullet points, do not add any other content.

8. Summary

Technique Core Idea Use Cases Complexity
Self-Consistency Multiple sampling + voting Math/logical reasoning Medium
Tree of Thoughts Tree exploration + evaluation Planning/creative/complex reasoning High
RAG-Augmented Prompt Retrieval + context injection Knowledge-intensive tasks Medium
DSPy Programmatic prompt definition Pipelines needing auto-optimization Medium-High
APE/OPRO Auto-search for optimal prompts Large-scale prompt optimization High
Meta-Prompting LLM generates/evaluates prompts Prompt development process Low-Medium

References

  • Wang et al., "Self-Consistency Improves Chain of Thought Reasoning in Language Models", 2023
  • Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models", 2023
  • Khattab et al., "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines", 2023
  • Chain-of-Thought and Reasoning Patterns — Reasoning techniques in Agents
  • Prompt Design Fundamentals — Foundational prompt techniques

评论 #