Vector Databases and Embeddings: Powering Search and AI Features on Your Hosting Stack
Traditional Search Matches Keywords — Semantic Search Understands Meaning
A user types "how to make my site load faster" into your search bar. Traditional keyword search looks for documents containing those exact words. It might return a page titled "How to Make Your Site Load Faster" but miss an excellent article called "Reducing Time to First Byte and Optimising Core Web Vitals" because the words do not overlap enough. Semantic search, powered by vector embeddings, understands that these topics are closely related — even though they share almost no keywords — because it operates on meaning rather than literal word matching.
For hosting platforms adding search, recommendations, or AI features, vector databases and embeddings are the foundational infrastructure. This guide covers what embeddings are, how vector databases store and query them, how to implement semantic search with pgvector in your existing PostgreSQL stack, and the practical patterns for AI-powered features built on similarity search.
What Are Vector Embeddings?
An embedding is a numerical representation of content — text, images, or any data — as a list of numbers (a vector). Embedding models are trained so that similar content produces similar vectors. The vector for "web performance optimisation" is close to the vector for "site speed improvement" in the embedding space because they represent similar concepts. The distance between vectors corresponds to the semantic similarity between the content they represent.
How Embeddings Are Generated
You pass your content (a sentence, a paragraph, an article) through an embedding model, and it returns a vector — typically 384 to 1536 dimensions depending on the model. Models like OpenAI's text-embedding-3-small, Cohere's embed-v3, and open-source options like BGE and E5 produce high-quality embeddings for text. The model runs once during indexing (to convert your content into vectors) and once per query (to convert the search query into a vector).
Vector Databases: Storing and Querying Embeddings
Regular databases are designed for exact matching: find the row where id = 123. Vector databases are designed for approximate nearest neighbour (ANN) search: find the vectors closest to this query vector. This is a fundamentally different operation that requires specialised indexing structures.
Dedicated Vector Databases
Pinecone, Weaviate, Qdrant, Milvus, and ChromaDB are purpose-built for vector storage and similarity search. They offer high query performance, built-in metadata filtering, and scaling features. The trade-off is operational complexity — another database to manage, another service to monitor and maintain.
pgvector: Vectors in PostgreSQL
For hosting customers already running PostgreSQL, pgvector adds vector storage and similarity search as a PostgreSQL extension. You store embeddings in a vector column alongside your regular data, create an index for fast similarity search, and query using PostgreSQL's familiar SQL syntax. No additional database, no new operational burden.
pgvector supports multiple distance metrics (cosine similarity, Euclidean distance, inner product) and index types (IVFFlat for moderate datasets, HNSW for high-performance queries on larger datasets). For most hosting applications — search across thousands to low millions of documents — pgvector provides excellent performance without the complexity of a dedicated vector database.
Building Semantic Search with pgvector
Step 1: Generate Embeddings for Your Content
Process your existing content (help articles, product descriptions, blog posts, support tickets) through an embedding model. Store the resulting vectors in a vector column in your PostgreSQL table alongside the content they represent.
Step 2: Index the Vectors
Create an HNSW or IVFFlat index on the vector column. HNSW (Hierarchical Navigable Small World) provides the best query performance and recall for most workloads. The index build takes time proportional to the dataset size but query performance after indexing is sub-millisecond for typical hosting-scale datasets.
Step 3: Query by Similarity
When a user searches, generate an embedding for their query and find the nearest vectors. The SQL query uses pgvector's distance operators: ORDER BY embedding <=> query_vector LIMIT 10 returns the ten most semantically similar documents. Combine with standard SQL filtering for metadata constraints (category, date range, status) to narrow results.
Step 4: Combine with Full-Text Search
Semantic search and keyword search are complementary. Semantic search understands meaning but may miss exact term matches. Keyword search matches exact terms but misses semantic relationships. A hybrid approach runs both searches, normalises the scores, and combines them — giving you the best of both worlds. PostgreSQL's built-in full-text search and pgvector's similarity search can run in a single query.
AI Features Built on Vector Search
Retrieval-Augmented Generation (RAG)
The most common application: when a user asks a question, retrieve the most relevant documents via vector similarity search and include them as context for an LLM to generate an answer. The LLM's response is grounded in your actual content rather than its general training data. pgvector serves as the retrieval layer in this architecture.
Recommendation Systems
Generate embeddings for your products, articles, or services. When a user views an item, find similar items by vector proximity. "Customers who read this article may also be interested in..." becomes a simple nearest-neighbour query against the embedding index.
Duplicate and Similar Content Detection
Find duplicate support tickets, identify similar documentation pages that should be consolidated, or detect near-duplicate content that may cause SEO issues. Vector similarity search identifies these relationships even when the text is paraphrased or restructured.
Clustering and Categorisation
Cluster your content by embedding similarity to discover natural groupings. Support tickets cluster into topic areas. Blog posts cluster into content themes. These clusters inform content strategy, support triage, and product development priorities.
Operational Considerations
Embedding Model Selection
Model choice affects search quality, latency, and cost. Larger models (1536 dimensions) produce higher-quality embeddings but require more storage and slower queries. Smaller models (384 dimensions) are faster and more storage-efficient with modestly lower quality. For most hosting search applications, a mid-range model provides the best trade-off.
Keeping Embeddings Current
When content changes, re-generate its embedding. Implement a pipeline that triggers embedding regeneration when content is created, updated, or deleted. For frequently changing content, consider batch regeneration on a schedule rather than real-time to reduce embedding API costs.
Performance Tuning
- Index parameters: HNSW's
mandef_constructionparameters control the trade-off between index build time, query speed, and recall accuracy. Higher values improve recall but increase build time and memory usage. - Dimensionality: Lower-dimensional embeddings query faster and use less storage. If your embedding model supports dimensionality reduction (Matryoshka embeddings), experiment with reduced dimensions.
- Partitioning: For very large datasets, partition the vector table by a natural key (content type, date range) to reduce the search space for each query.
The Bottom Line
Vector databases and embeddings are the infrastructure layer that powers semantic search, RAG, recommendations, and content intelligence. For hosting customers running PostgreSQL, pgvector provides this capability as an extension — no new database, no new operational complexity. Generate embeddings for your content, index them, and query by meaning instead of keywords. The improvement in search quality and the AI features it enables are substantial, and the implementation path is straightforward for any team comfortable with PostgreSQL.