Vector Database in Practice
1. Embedding Models
1.1 Embedding Models Overview
Embedding models convert text into high-dimensional vectors so that semantically similar texts are closer in vector space. Choosing the right embedding model is key to RAG system performance.
1.2 Major Embedding Model Comparison
| Model | Dimensions | Max Length | Chinese Support | Features |
|---|---|---|---|---|
| OpenAI text-embedding-ada-002 | 1536 | 8191 tokens | Good | Versatile, API-based |
| OpenAI text-embedding-3-small | 1536 | 8191 tokens | Good | Cost-effective |
| OpenAI text-embedding-3-large | 3072 | 8191 tokens | Good | Highest accuracy |
| BGE-large-zh | 1024 | 512 tokens | Excellent | Top open-source Chinese model |
| BGE-M3 | 1024 | 8192 tokens | Excellent | Multi-lingual, multi-granularity, multi-functional |
| E5-large-v2 | 1024 | 512 tokens | Good | From Microsoft, strong performance |
| E5-mistral-7b | 4096 | 32768 tokens | Good | LLM-based embedding model |
| Cohere embed-v3 | 1024 | 512 tokens | Good | Supports retrieval/classification/clustering |
| Jina-embeddings-v2 | 768 | 8192 tokens | Good | Good long-text support |
1.3 Selection Recommendations
- Chinese scenarios: BGE-M3 or BGE-large-zh
- Multilingual: BGE-M3 or Cohere embed-v3
- Quick prototyping: OpenAI text-embedding-3-small
- Highest accuracy: OpenAI text-embedding-3-large
- Long text: E5-mistral-7b or Jina-embeddings-v2
- Private deployment: BGE series or E5 series
1.4 Usage Examples
# OpenAI Embeddings
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input=["What is a vector database?", "Vector database introduction"]
)
embeddings = [item.embedding for item in response.data]
# BGE Embeddings (local deployment)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-large-zh-v1.5")
texts = ["What is a vector database?", "Principles of vector retrieval"]
embeddings = model.encode(texts, normalize_embeddings=True)
2. Vector Database Comparison
2.1 Major Vector Databases
ChromaDB — Lightweight Starter
import chromadb
# Create client
client = chromadb.PersistentClient(path="./chroma_db")
# Create collection
collection = client.create_collection(
name="documents",
metadata={"hnsw:space": "cosine"}
)
# Add documents
collection.add(
documents=["Document 1 content", "Document 2 content"],
metadatas=[{"source": "file1"}, {"source": "file2"}],
ids=["id1", "id2"]
)
# Query
results = collection.query(
query_texts=["search keyword"],
n_results=5
)
Features: Embedded, Python-native, zero configuration, ideal for prototyping
Pinecone — Fully Managed Service
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
# Create index
pc.create_index(
name="documents",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("documents")
# Upsert
index.upsert(
vectors=[
{"id": "id1", "values": embedding1, "metadata": {"text": "..."}},
{"id": "id2", "values": embedding2, "metadata": {"text": "..."}},
]
)
# Query
results = index.query(vector=query_embedding, top_k=5, include_metadata=True)
Features: Fully managed, zero-ops, auto-scaling, enterprise SLA
Milvus — Scalable Open-Source Solution
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
# Connect
connections.connect("default", host="localhost", port="19530")
# Define Schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=65535),
]
schema = CollectionSchema(fields)
# Create collection
collection = Collection("documents", schema)
# Create index
collection.create_index(
field_name="embedding",
index_params={"index_type": "HNSW", "metric_type": "COSINE", "params": {"M": 16, "efConstruction": 256}}
)
# Search
collection.load()
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 64}},
limit=5,
output_fields=["text"]
)
Features: Distributed architecture, billion-scale vectors, GPU acceleration, production-grade reliability
pgvector — PostgreSQL Extension
-- Install extension
CREATE EXTENSION vector;
-- Create table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- Create index
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Insert data
INSERT INTO documents (content, embedding)
VALUES ('Document content', '[0.1, 0.2, ...]');
-- Query
SELECT content, 1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;
Features: Leverages existing PostgreSQL infrastructure, SQL queries, transaction support, joint queries with relational data
Weaviate — Semantic Search Engine
import weaviate
client = weaviate.Client("http://localhost:8080")
# Create Schema
client.schema.create_class({
"class": "Document",
"vectorizer": "text2vec-openai",
"properties": [
{"name": "content", "dataType": ["text"]},
{"name": "source", "dataType": ["string"]},
]
})
# Add data (auto-vectorization)
client.data_object.create(
{"content": "Document content", "source": "file1"},
class_name="Document"
)
# Semantic search
result = client.query.get("Document", ["content", "source"]) \
.with_near_text({"concepts": ["search keyword"]}) \
.with_limit(5) \
.do()
Features: Built-in vectorization, GraphQL API, hybrid search, modular architecture
2.2 Comparison Summary
| Database | Deployment | Scalability | Ease of Use | Use Cases |
|---|---|---|---|---|
| ChromaDB | Embedded/local | Low | Very high | Prototyping, small-scale apps |
| Pinecone | Fully managed | High | High | Enterprise, zero-ops |
| Milvus | Self-hosted/cloud | Very high | Medium | Large-scale, high-performance |
| pgvector | PostgreSQL extension | Medium | High | Existing PG infrastructure |
| Weaviate | Self-hosted/cloud | High | High | Semantic search, multimodal |
3. Similarity Search
3.1 Distance Metrics
Cosine Similarity
\[\text{cosine}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \cdot ||\mathbf{b}||}\]
- Range: [-1, 1], where 1 means identical
- Use case: Text semantic similarity (most common)
- Insensitive to vector magnitude
Dot Product
\[\text{dot}(\mathbf{a}, \mathbf{b}) = \mathbf{a} \cdot \mathbf{b} = \sum_{i} a_i \cdot b_i\]
- Range: Unbounded
- Use case: Normalized vectors (equivalent to cosine similarity)
- Fastest computation
Euclidean Distance
\[L_2(\mathbf{a}, \mathbf{b}) = \sqrt{\sum_{i}(a_i - b_i)^2}\]
- Range: [0, +infinity), where 0 means identical
- Use case: Scenarios where vector magnitude matters
3.2 Selection Recommendations
- Text retrieval: Cosine similarity (recommended)
- Normalized embeddings: Dot product (faster)
- Image/multimodal: Euclidean distance
4. Index Types
4.1 HNSW (Hierarchical Navigable Small World)
Hierarchical structure:
Level 2: *-----------------*
Level 1: *---*---*---*---*
Level 0: ****************
- Principle: Builds a multi-layer graph structure; search starts from the top layer and descends
- Pros: Fast query speed, high accuracy
- Cons: High memory usage (graph structure must be stored)
- Parameters:
-
M: Number of connections per node (16-64) -efConstruction: Search width during construction (higher = more accurate but slower) -ef: Search width during query
4.2 IVF (Inverted File Index)
Cluster centers: C1, C2, C3, ...
C1 → [v1, v5, v12, ...]
C2 → [v2, v7, v15, ...]
C3 → [v3, v8, v20, ...]
- Principle: Clusters vectors first; during query, only searches the nearest clusters
- Pros: Memory efficient
- Cons: Lower accuracy than HNSW
- Parameters:
-
nlist: Number of clusters -nprobe: Number of clusters to search during query
4.3 Index Selection Recommendations
| Data Scale | Recommended Index | Notes |
|---|---|---|
| < 100K | Flat (brute force) | Small data, exact search |
| 100K-1M | HNSW | Best balance of accuracy and speed |
| 1M-10M | IVF + HNSW | Layered indexing |
| > 10M | IVF + PQ | Compressed vectors, saves memory |
5. Practical Recommendations
5.1 Performance Optimization
- Batch insertion: Use batch APIs instead of inserting one by one
- Index warm-up: Load the index into memory before querying
- Async queries: Use async APIs for higher concurrency
- Cache hot spots: Cache results of frequently queried terms
5.2 Data Management
- Metadata filtering: Use metadata to narrow the search scope
- Incremental updates: Support adding, updating, and deleting documents
- Data backup: Regularly back up vector data
- Version management: Track embedding model versions; re-embed when the model changes
5.3 Monitoring Metrics
- Query latency (P50, P95, P99)
- Recall@K
- Index size and memory usage
- QPS (queries per second)
References
- MTEB Leaderboard (Massive Text Embedding Benchmark)
- ANN Benchmarks (ann-benchmarks.com)
- RAG Architecture Design — Overall RAG pipeline architecture
- RAG Evaluation and Optimization — Retrieval quality evaluation methods