API Performance and Pagination at Scale: Queries, Indexes, and Caching
APIs That Work at 100 Records Break at 100,000
Your API works perfectly during development. Endpoints respond in milliseconds, pagination feels smooth, and everything looks good. Then the production database grows to millions of rows, traffic increases, and suddenly those fast endpoints take seconds to respond. List endpoints time out. Pagination becomes unpredictable. The database's CPU spikes with every request.
API performance at scale is not about exotic optimization — it is about getting the fundamentals right: efficient queries, proper indexing, smart pagination, and strategic caching. This guide covers the practical patterns that keep APIs fast as data volumes and traffic grow.
The Problem with OFFSET Pagination
OFFSET/LIMIT pagination is the most common pattern and the most problematic at scale. The query SELECT * FROM orders ORDER BY created_at LIMIT 20 OFFSET 10000 looks simple, but the database must scan and discard 10,000 rows before returning the 20 you want. At offset 100,000, it scans and discards 100,000 rows. Performance degrades linearly with the page number.
For small datasets (thousands of records), OFFSET pagination is fine. For datasets that grow into hundreds of thousands or millions, it becomes a serious performance bottleneck — and users who navigate to deeper pages experience increasingly slow responses.
Cursor-Based Pagination: The Scalable Alternative
Cursor-based pagination (also called keyset pagination) eliminates the OFFSET problem entirely. Instead of saying "skip the first 10,000 rows," it says "give me the next 20 rows after this specific row." The cursor is a unique, sortable value from the last item of the previous page — typically a primary key or a timestamp.
The query becomes SELECT * FROM orders WHERE created_at < '2023-06-01T12:00:00' ORDER BY created_at DESC LIMIT 20. The database uses the index on created_at to jump directly to the right position — no scanning, no discarding. Performance is constant regardless of how deep into the dataset you paginate.
Implementation
Your API response includes a cursor field — typically the ID or timestamp of the last item returned. The client passes this cursor in the next request to fetch the subsequent page. The API decodes the cursor and uses it in the WHERE clause.
For opaque cursors, Base64-encode the cursor value so clients treat it as an opaque token rather than constructing cursors themselves. This gives you flexibility to change the cursor format without breaking clients.
Trade-offs
Cursor-based pagination does not support jumping to arbitrary page numbers ("go to page 47"). It is forward-and-backward only. For most API use cases — infinite scroll, "load more" buttons, pagination through large result sets — this is perfectly acceptable. If your UI genuinely needs arbitrary page access, consider a hybrid approach: cursor-based pagination for the API with a separate endpoint that returns total count for UI purposes.
Indexing for API Queries
The right indexes can make a hundred-fold difference in query performance. For API endpoints, indexes should cover the columns used in filtering, sorting, and pagination.
Composite Indexes
If an endpoint filters by status and sorts by created_at, a composite index on (status, created_at) serves both the filter and the sort in a single index scan. Without this composite index, the database may filter by status using one index and then sort the results in memory — much slower for large result sets.
Covering Indexes
A covering index includes all the columns the query needs, eliminating the need to access the main table at all. If an endpoint returns only id, title, and created_at, an index on (status, created_at) INCLUDE (id, title) serves the entire query from the index. This is particularly effective for list endpoints that return a subset of columns.
Partial Indexes
If an API endpoint primarily queries a subset of data (e.g., active orders, published posts), a partial index that covers only those rows is smaller and faster than a full index. CREATE INDEX ON orders (created_at) WHERE status = 'active' indexes only active orders — perfect for an endpoint that lists active orders by date.
Query Optimization Patterns
Select Only What You Need
Avoid SELECT * in API queries. Fetch only the columns the endpoint returns. This reduces I/O, memory usage, and network transfer. For endpoints that return related data (joins), select only the columns needed from each table.
Avoid N+1 Queries
The N+1 problem occurs when your API fetches a list of items and then makes a separate query for each item's related data. For 100 items, this means 101 queries. Use JOINs, batch queries, or ORM eager loading to fetch related data in a single query or a small, fixed number of queries regardless of the item count.
Use COUNT Sparingly
Counting the total number of matching records (SELECT COUNT(*) FROM orders WHERE status = 'active') requires scanning all matching rows. For large tables, this is expensive. If your API returns a total count for pagination, consider whether it is truly needed. Many modern UIs use infinite scroll or "load more" patterns that do not require a total count. If you must provide a count, cache it with a short TTL rather than computing it on every request.
Caching API Responses
Not every API request needs to hit the database. Strategic caching reduces database load and improves response times dramatically.
Response-Level Caching
For endpoints that return the same data for all callers (public listings, category trees, configuration data), cache the entire serialized response in Redis. Serve subsequent requests from cache without touching the database. Set TTLs based on the data's freshness requirements — 60 seconds for frequently changing data, minutes or hours for relatively static data.
Query-Level Caching
Cache the results of individual database queries. When the API layer receives a request, check the cache first. If the result exists and is fresh, return it. If not, execute the query, cache the result, and return it. Use cache keys that incorporate the query parameters so different filters produce different cache entries.
Cache Invalidation
When data changes, invalidate the affected cache entries. For simple cases, delete the cache key when the underlying data is modified. For complex cases (where a change affects multiple cached responses), use cache tags — tag each cache entry with the entities it depends on, and invalidate all entries with a given tag when that entity changes.
Rate Limiting and Resource Protection
At scale, API performance is not just about making individual requests fast — it is about protecting the database from request volumes that overwhelm it.
- Rate limits per client: Prevent any single client from consuming a disproportionate share of resources.
- Maximum page size: Set a hard limit on the number of items per page (e.g., 100). A client requesting 10,000 items in a single request can exhaust database connections and memory.
- Query complexity limits: For APIs that support filtering and sorting, limit the number of simultaneous filters and the complexity of sort expressions. Complex queries on unindexed combinations can trigger expensive full table scans.
- Timeout protection: Set query timeouts at the database level so a runaway query cannot hold a connection indefinitely. Return a clear error to the client rather than making them wait.
Monitoring API Performance
Track these metrics for every endpoint:
- Response time percentiles: p50, p95, p99. The p99 reveals the worst experiences that averages hide.
- Database query time per request: How much of the response time is spent in the database?
- Cache hit ratio: A declining hit ratio suggests cache invalidation is too aggressive or cache TTLs are too short.
- Error rates: Spikes in 500 errors or timeout errors indicate performance problems that need immediate attention.
- Slow query log: Monitor for new slow queries introduced by code changes or data growth.
The Bottom Line
API performance at scale comes down to discipline: use cursor-based pagination instead of OFFSET, index for your actual query patterns, select only the columns you need, cache aggressively, and protect the database with rate limits and timeouts. These are not advanced techniques — they are fundamentals that every API should implement from the start. The payoff is an API that stays fast and reliable as your data and traffic grow, instead of one that degrades into a bottleneck that limits your entire platform.