Search Implementation

Overview

Search functionality transforms how users discover and access content. Modern search systems must handle full-text queries, provide relevant results quickly, support filtering and faceting, and scale with data growth. This guide covers search engine selection, implementation patterns, relevance tuning, and performance optimization.

Search implementation involves three core concerns: indexing (structuring data for efficient retrieval), querying (understanding user intent and executing searches), and ranking (determining result relevance). The architecture typically separates the search engine from the primary database to optimize each for its specific purpose.

The indexing pipeline extracts data from the source database, transforms it into search documents, and writes to the search engine. The query pipeline parses user input, executes searches, ranks results, and returns responses. Analytics feed back into relevance tuning to continuously improve search quality.

Search Engine Selection

Choose search engines based on use case, scale, features, and operational complexity. Different engines optimize for different trade-offs.

Elasticsearch / OpenSearch

Elasticsearch is the most widely adopted search engine for complex use cases. OpenSearch is an open-source fork maintained by AWS with identical API compatibility.

Strengths:

Powerful query DSL with boolean logic, fuzzy matching, boosting
Advanced aggregations for faceted search and analytics
Distributed architecture scales to billions of documents
Rich ecosystem of plugins and integrations
Near real-time indexing and search
Comprehensive REST API

Considerations:

Resource intensive (memory, CPU)
Complex cluster management and tuning
Requires dedicated infrastructure
Learning curve for query DSL

Best for: Large-scale applications, complex queries, analytics, faceted search, and applications requiring advanced features like geo-search or nested document queries.

Typesense

Typesense is a modern, fast, typo-tolerant search engine optimized for instant search experiences.

Strengths:

Simple API and minimal configuration
Automatic typo tolerance with configurable fuzziness
Sub-50ms search latency for instant search
Built-in faceting and filtering
Easy deployment (single binary)
Semantic search capabilities

Considerations:

Smaller ecosystem than Elasticsearch
Limited advanced features
Data must fit in RAM for best performance
Newer project with evolving features

Best for: Applications requiring instant search, e-commerce catalogs, documentation search, and use cases prioritizing simplicity and speed over advanced features.

Algolia

Algolia is a hosted search-as-a-service platform optimized for end-user search experiences.

Strengths:

Extremely fast (sub-10ms queries globally)
Comprehensive UI libraries (InstantSearch)
Built-in typo tolerance and relevance
Automatic infrastructure scaling
Rich dashboard for configuration
A/B testing and analytics

Considerations:

Expensive at scale (pricing by operations and records)
Less flexible than self-hosted solutions
Vendor lock-in
Limited query complexity compared to Elasticsearch

Best for: Customer-facing search, e-commerce, content discovery, and applications requiring world-class search UX without operational overhead.

Meilisearch

Meilisearch provides powerful search with minimal configuration, designed for modern web applications.

Strengths:

Simple setup with sensible defaults
Fast search responses (sub-50ms)
Automatic typo tolerance
Multi-language support
Easy deployment (Docker, single binary)
Good documentation and community

Considerations:

Limited advanced features
Smaller community than Elasticsearch
Basic analytics capabilities

Best for: Small to medium applications, documentation sites, content management systems, and projects requiring quick search implementation.

Apache Solr

Solr is a mature, enterprise-grade search platform built on Apache Lucene (same as Elasticsearch).

Strengths:

Proven stability and reliability
Rich feature set comparable to Elasticsearch
Strong consistency guarantees
Excellent documentation
Enterprise support options

Considerations:

XML-based configuration (more verbose)
Less modern API compared to Elasticsearch
Smaller community momentum

Best for: Enterprise applications, government systems, and organizations requiring proven stability with strong support.

Full-Text Search Implementation

Full-text search analyzes text, tokenizes it into terms, and enables finding documents matching query terms. Understanding text analysis is critical for effective search.

Text Analysis Pipeline

Text analysis transforms raw text into searchable terms through a series of steps:

Character Filters: Clean text before tokenization (remove HTML tags, normalize characters).

Tokenizer: Split text into tokens (words). Common tokenizers include:

Standard: Splits on whitespace and punctuation
Whitespace: Splits only on whitespace
N-gram: Creates overlapping character sequences for partial matching

Token Filters: Transform tokens (lowercase, stemming, stopword removal, synonyms).

Example analysis configuration in Elasticsearch:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_english_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "char_filter": ["html_strip"],
          "filter": [
            "lowercase",
            "english_stop",
            "english_stemmer",
            "asciifolding"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "custom_english_analyzer",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "content": {
        "type": "text",
        "analyzer": "custom_english_analyzer"
      },
      "category": {
        "type": "keyword"
      }
    }
  }
}

The html_strip character filter removes HTML tags. The lowercase filter normalizes case so "Search" matches "search". The english_stop filter removes common words like "the", "is", "at" that don't add meaning. The english_stemmer reduces words to their root form so "searching", "searched", and "searches" all become "search".

The asciifolding filter converts accented characters to ASCII equivalents so "café" matches "cafe". The multi-field setup (title.keyword) enables both full-text search and exact matching or aggregations.

Search Query Implementation

Build search queries that handle user intent, typos, and relevance requirements:

// Elasticsearch search service
import { Client } from '@elastic/elasticsearch';

interface SearchQuery {
  query: string;
  filters?: Record<string, any>;
  page?: number;
  pageSize?: number;
  sortBy?: string;
  sortOrder?: 'asc' | 'desc';
}

interface SearchResult<T> {
  results: T[];
  total: number;
  page: number;
  pageSize: number;
  facets?: Record<string, FacetResult[]>;
}

class ElasticsearchService {
  private client: Client;

  constructor() {
    this.client = new Client({
      node: process.env.ELASTICSEARCH_URL,
      auth: {
        apiKey: process.env.ELASTICSEARCH_API_KEY,
      },
    });
  }

  async search<T>(
    index: string,
    searchQuery: SearchQuery
  ): Promise<SearchResult<T>> {
    const { query, filters, page = 1, pageSize = 20, sortBy, sortOrder } = searchQuery;

    const esQuery = this.buildQuery(query, filters);
    const from = (page - 1) * pageSize;

    const response = await this.client.search({
      index,
      body: {
        query: esQuery,
        from,
        size: pageSize,
        sort: sortBy ? [{ [sortBy]: sortOrder || 'desc' }] : undefined,
        highlight: {
          fields: {
            title: { pre_tags: ['<mark>'], post_tags: ['</mark>'] },
            content: {
              pre_tags: ['<mark>'],
              post_tags: ['</mark>'],
              fragment_size: 150,
              number_of_fragments: 3,
            },
          },
        },
        aggs: this.buildAggregations(filters),
      },
    });

    return {
      results: response.hits.hits.map(hit => ({
        ...hit._source,
        id: hit._id,
        score: hit._score,
        highlights: hit.highlight,
      })) as T[],
      total: response.hits.total.value,
      page,
      pageSize,
      facets: this.parseFacets(response.aggregations),
    };
  }

  private buildQuery(query: string, filters?: Record<string, any>): any {
    const must: any[] = [];
    const filter: any[] = [];

    // Multi-match query for text search
    if (query) {
      must.push({
        multi_match: {
          query,
          fields: [
            'title^3',      // Boost title 3x
            'description^2', // Boost description 2x
            'content',
          ],
          type: 'best_fields',
          fuzziness: 'AUTO',
          operator: 'or',
          minimum_should_match: '75%',
        },
      });
    }

    // Apply filters
    if (filters) {
      Object.entries(filters).forEach(([field, value]) => {
        if (Array.isArray(value)) {
          filter.push({ terms: { [field]: value } });
        } else if (typeof value === 'object' && value.min !== undefined) {
          // Range filter
          filter.push({
            range: {
              [field]: {
                gte: value.min,
                lte: value.max,
              },
            },
          });
        } else {
          filter.push({ term: { [field]: value } });
        }
      });
    }

    return {
      bool: {
        must: must.length > 0 ? must : [{ match_all: {} }],
        filter,
      },
    };
  }

  private buildAggregations(filters?: Record<string, any>): any {
    return {
      categories: {
        terms: { field: 'category', size: 20 },
      },
      price_ranges: {
        range: {
          field: 'price',
          ranges: [
            { to: 50 },
            { from: 50, to: 100 },
            { from: 100, to: 200 },
            { from: 200 },
          ],
        },
      },
    };
  }

  private parseFacets(aggregations: any): Record<string, FacetResult[]> {
    if (!aggregations) return {};

    const facets: Record<string, FacetResult[]> = {};

    if (aggregations.categories) {
      facets.categories = aggregations.categories.buckets.map(bucket => ({
        value: bucket.key,
        count: bucket.doc_count,
      }));
    }

    if (aggregations.price_ranges) {
      facets.priceRanges = aggregations.price_ranges.buckets.map(bucket => ({
        label: this.formatPriceRange(bucket),
        value: `${bucket.from || 0}-${bucket.to || ''}`,
        count: bucket.doc_count,
      }));
    }

    return facets;
  }
}

The multi_match query searches across multiple fields with different boost factors. The title^3 syntax boosts title matches 3x more than content matches, reflecting that title matches are typically more relevant. The fuzziness: 'AUTO' setting allows typos (1 edit for words 3-5 characters, 2 edits for longer words).

The minimum_should_match: '75%' parameter requires that 75% of query terms match, preventing irrelevant results when users enter many terms. The type: 'best_fields' setting scores documents by the best matching field rather than summing scores across fields.

Filters in the filter clause don't affect relevance scores but narrow results. This is more efficient than including filter criteria in the must clause. Terms filters handle multi-select facets (e.g., multiple categories). Range filters handle numerical or date ranges (e.g., price ranges).

Handling the N+1 Problem

The N+1 problem occurs when search results require fetching related data from the primary database, resulting in one query for search results plus N queries for related data:

// Anti-pattern: N+1 queries
async function searchProducts(query: string): Promise<Product[]> {
  const searchResults = await searchService.search('products', { query });

  // N additional database queries
  const products = await Promise.all(
    searchResults.results.map(result =>
      db.products.findUnique({
        where: { id: result.id },
        include: { category: true, reviews: true },
      })
    )
  );

  return products;
}

Solution 1: Denormalize into search documents

Include all necessary data in search documents to avoid database queries:

// Index document with denormalized data
interface ProductSearchDocument {
  id: string;
  name: string;
  description: string;
  price: number;
  categoryId: string;
  categoryName: string;
  averageRating: number;
  reviewCount: number;
  imageUrl: string;
  inStock: boolean;
}

// No additional queries needed
async function searchProducts(query: string): Promise<ProductSearchDocument[]> {
  const searchResults = await searchService.search<ProductSearchDocument>('products', { query });
  return searchResults.results;
}

Denormalization trades data redundancy for query performance. When category names change, reindex all products in that category. Use message queues or database triggers to keep search documents synchronized with source data.

Solution 2: Batch database queries

If denormalization isn't feasible, batch database queries:

async function searchProducts(query: string): Promise<Product[]> {
  const searchResults = await searchService.search('products', { query });

  const productIds = searchResults.results.map(r => r.id);

  // Single query with WHERE IN clause
  const products = await db.products.findMany({
    where: { id: { in: productIds } },
    include: { category: true, reviews: true },
  });

  // Preserve search order
  const productMap = new Map(products.map(p => [p.id, p]));
  return productIds.map(id => productMap.get(id)).filter(Boolean);
}

Search Relevance Tuning

Relevance determines which results appear first. Good relevance balances textual similarity with business logic like popularity, recency, and user preferences.

Field Boosting

Boost important fields to prioritize their matches:

const query = {
  multi_match: {
    query: userQuery,
    fields: [
      'title^5',        // Title matches most important
      'sku^4',          // Product codes highly relevant
      'description^2',  // Description somewhat important
      'content',        // Body content baseline
      'tags^1.5',       // Tags slightly boosted
    ],
    type: 'cross_fields',
  },
};

Field boost values are relative. Doubling a boost value doesn't necessarily double the score due to Elasticsearch's scoring algorithm (BM25 by default). Tune boosts empirically by testing queries and adjusting based on result quality.

Function Score Queries

Combine text relevance with custom scoring functions:

const functionScoreQuery = {
  function_score: {
    query: {
      multi_match: {
        query: userQuery,
        fields: ['title^3', 'description^2', 'content'],
      },
    },
    functions: [
      {
        // Boost popular items
        field_value_factor: {
          field: 'popularity_score',
          factor: 1.2,
          modifier: 'log1p',
          missing: 0,
        },
      },
      {
        // Boost recent items
        gauss: {
          created_at: {
            origin: 'now',
            scale: '30d',
            decay: 0.5,
          },
        },
      },
      {
        // Boost in-stock items
        filter: { term: { in_stock: true } },
        weight: 1.5,
      },
    ],
    score_mode: 'multiply',
    boost_mode: 'multiply',
  },
};

The field_value_factor function incorporates a numeric field into scoring. The log1p modifier applies logarithmic scaling to prevent extreme values from dominating scores (log1p(x) = log(1 + x)). This means an item with popularity 1000 doesn't score 1000x higher than one with popularity 1.

The gauss function creates a decay curve for date-based boosting. Items created "now" receive full boost. Items created 30 days ago receive 50% boost (decay: 0.5). Older items decay further. This prioritizes recent content without completely eliminating older results.

The filter function applies a constant boost to items matching a filter (in-stock products). The score_mode: 'multiply' combines function scores by multiplication. The boost_mode: 'multiply' multiplies function scores with text relevance scores.

Personalized Ranking

Incorporate user behavior and preferences into ranking:

async function personalizedSearch(
  userId: string,
  query: string
): Promise<SearchResult> {
  const userProfile = await getUserSearchProfile(userId);

  const personalizedQuery = {
    function_score: {
      query: {
        multi_match: {
          query,
          fields: ['title^3', 'description^2', 'content'],
        },
      },
      functions: [
        // Boost categories user frequently views
        ...userProfile.preferredCategories.map(category => ({
          filter: { term: { category: category.name } },
          weight: 1 + category.affinity * 0.5,
        })),
        // Boost items similar to past interactions
        {
          script_score: {
            script: {
              source: `
                cosineSimilarity(params.user_vector, 'embedding_vector') + 1.0
              `,
              params: {
                user_vector: userProfile.embeddingVector,
              },
            },
          },
        },
      ],
      score_mode: 'sum',
      boost_mode: 'multiply',
    },
  };

  return await searchService.search('products', {
    query: personalizedQuery,
  });
}

interface UserSearchProfile {
  userId: string;
  preferredCategories: Array<{ name: string; affinity: number }>;
  embeddingVector: number[];
}

async function getUserSearchProfile(userId: string): Promise<UserSearchProfile> {
  // Build profile from user behavior
  const interactions = await db.userInteractions.findMany({
    where: { userId },
    orderBy: { timestamp: 'desc' },
    take: 100,
  });

  // Calculate category affinities
  const categoryFrequency = new Map<string, number>();
  interactions.forEach(interaction => {
    const count = categoryFrequency.get(interaction.category) || 0;
    categoryFrequency.set(interaction.category, count + 1);
  });

  const preferredCategories = Array.from(categoryFrequency.entries())
    .map(([name, count]) => ({
      name,
      affinity: count / interactions.length,
    }))
    .sort((a, b) => b.affinity - a.affinity)
    .slice(0, 5);

  // Generate embedding vector from interaction history
  const embeddingVector = await generateUserEmbedding(interactions);

  return {
    userId,
    preferredCategories,
    embeddingVector,
  };
}

User profiles track behavior like viewed categories, clicked results, and purchase history. Category affinity represents the proportion of interactions with each category. The embedding vector represents user interests in high-dimensional space for similarity comparisons.

The script score function computes cosine similarity between the user's embedding vector and product embedding vectors. This enables semantic matching based on interest similarity rather than just keywords. For vector search and embeddings, see Machine Learning Integration.

Pagination Strategies

Pagination affects performance, deep result access, and API design.

Offset-Based Pagination

Traditional pagination using offset and limit:

interface PaginationParams {
  page: number;
  pageSize: number;
}

async function searchWithOffsetPagination(
  query: string,
  pagination: PaginationParams
): Promise<SearchResult> {
  const from = (pagination.page - 1) * pagination.pageSize;

  return await esClient.search({
    index: 'products',
    from,
    size: pagination.pageSize,
    body: {
      query: { match: { title: query } },
    },
  });
}

Advantages: Simple implementation, users can jump to any page, easy to calculate total pages.

Disadvantages: Deep pagination is slow (Elasticsearch must score all documents up to the offset), performance degrades linearly with page number, results can shift between pages if documents are added/deleted.

Elasticsearch limits offset + size to 10,000 by default (index.max_result_window setting). Deep pagination requires alternative approaches.

Cursor-Based Pagination

Use search_after for efficient deep pagination:

interface CursorPaginationParams {
  pageSize: number;
  searchAfter?: any[];
}

async function searchWithCursorPagination(
  query: string,
  pagination: CursorPaginationParams
): Promise<SearchResult & { cursor: string }> {
  const response = await esClient.search({
    index: 'products',
    size: pagination.pageSize,
    body: {
      query: { match: { title: query } },
      sort: [
        { _score: 'desc' },
        { id: 'asc' }, // Tiebreaker for consistent ordering
      ],
      search_after: pagination.searchAfter,
    },
  });

  const results = response.hits.hits.map(hit => hit._source);
  const lastHit = response.hits.hits[response.hits.hits.length - 1];

  // Encode cursor for next page
  const cursor = lastHit
    ? Buffer.from(JSON.stringify(lastHit.sort)).toString('base64')
    : null;

  return {
    results,
    total: response.hits.total.value,
    pageSize: pagination.pageSize,
    cursor,
  };
}

// Decode cursor from client
function decodeCursor(cursor: string): any[] {
  return JSON.parse(Buffer.from(cursor, 'base64').toString('utf-8'));
}

The search_after parameter uses the last document's sort values to fetch the next page. This performs consistently regardless of depth because Elasticsearch doesn't score skipped documents. The tiebreaker sort field (id) ensures consistent ordering when scores are identical.

Advantages: Constant performance for any page depth, efficient for infinite scroll UIs, resilient to data changes during pagination.

Disadvantages: Cannot jump to arbitrary pages, cannot calculate total pages, requires stateful cursor management.

Relay Connection Specification

For GraphQL APIs, the Relay connection spec standardizes cursor pagination:

interface Connection<T> {
  edges: Array<{
    node: T;
    cursor: string;
  }>;
  pageInfo: {
    hasNextPage: boolean;
    hasPreviousPage: boolean;
    startCursor: string;
    endCursor: string;
  };
  totalCount: number;
}

async function searchProducts(
  query: string,
  first: number,
  after?: string
): Promise<Connection<Product>> {
  const searchAfter = after ? decodeCursor(after) : undefined;

  const response = await esClient.search({
    index: 'products',
    size: first + 1, // Fetch one extra to determine hasNextPage
    body: {
      query: { match: { title: query } },
      sort: [{ _score: 'desc' }, { id: 'asc' }],
      search_after: searchAfter,
    },
  });

  const hasMore = response.hits.hits.length > first;
  const hits = response.hits.hits.slice(0, first);

  const edges = hits.map(hit => ({
    node: hit._source as Product,
    cursor: encodeCursor(hit.sort),
  }));

  return {
    edges,
    pageInfo: {
      hasNextPage: hasMore,
      hasPreviousPage: !!after,
      startCursor: edges[0]?.cursor,
      endCursor: edges[edges.length - 1]?.cursor,
    },
    totalCount: response.hits.total.value,
  };
}

The Relay spec uses opaque cursors (no assumption about their contents) and provides pagination metadata in pageInfo. Fetching one extra result determines if more pages exist without requiring a separate count query.

Faceted Search and Filtering

Faceted search enables users to narrow results using filters based on document attributes. Facets show available filter options with result counts.

async function searchWithFacets(
  query: string,
  filters: Record<string, string[]>
): Promise<SearchResult> {
  const response = await esClient.search({
    index: 'products',
    body: {
      query: this.buildFilteredQuery(query, filters),
      aggs: {
        categories: {
          terms: {
            field: 'category.keyword',
            size: 50,
          },
        },
        brands: {
          terms: {
            field: 'brand.keyword',
            size: 50,
          },
        },
        price_ranges: {
          range: {
            field: 'price',
            ranges: [
              { key: 'Under $50', to: 50 },
              { key: '$50-$100', from: 50, to: 100 },
              { key: '$100-$200', from: 100, to: 200 },
              { key: '$200+', from: 200 },
            ],
          },
        },
        attributes: {
          nested: {
            path: 'attributes',
          },
          aggs: {
            attribute_names: {
              terms: {
                field: 'attributes.name.keyword',
                size: 20,
              },
              aggs: {
                attribute_values: {
                  terms: {
                    field: 'attributes.value.keyword',
                    size: 10,
                  },
                },
              },
            },
          },
        },
      },
    },
  });

  return {
    results: response.hits.hits.map(hit => hit._source),
    total: response.hits.total.value,
    facets: this.parseFacets(response.aggregations),
  };
}

private buildFilteredQuery(
  query: string,
  filters: Record<string, string[]>
): any {
  const must: any[] = [
    {
      multi_match: {
        query,
        fields: ['title^3', 'description'],
      },
    },
  ];

  const filter: any[] = [];

  // Apply selected filters
  Object.entries(filters).forEach(([field, values]) => {
    if (values.length > 0) {
      filter.push({
        terms: { [`${field}.keyword`]: values },
      });
    }
  });

  return {
    bool: {
      must,
      filter,
    },
  };
}

Aggregations run concurrently with the query and calculate facet values from the filtered result set. The terms aggregation groups results by field values and counts documents in each group. The size parameter limits the number of buckets returned (top 50 categories).

The range aggregation creates predefined buckets for numerical ranges. The nested aggregation handles nested document structures (e.g., products with multiple attributes). The sub-aggregation attribute_values groups values within each attribute name.

Multi-select facets allow selecting multiple values within a facet (e.g., multiple categories). Other facets must show counts as if the category filter weren't applied, otherwise all other facets would show zero counts after selecting one category.

private buildMultiSelectFacetAggregations(
  query: string,
  filters: Record<string, string[]>
): any {
  const aggs = {};

  // For each facet, calculate counts excluding its own filter
  ['categories', 'brands', 'colors'].forEach(facet => {
    const filtersExcludingThisFacet = { ...filters };
    delete filtersExcludingThisFacet[facet];

    aggs[facet] = {
      filter: this.buildFilteredQuery(query, filtersExcludingThisFacet).bool,
      aggs: {
        values: {
          terms: {
            field: `${facet}.keyword`,
            size: 50,
          },
        },
      },
    };
  });

  return aggs;
}

Each facet aggregation wraps a filter that excludes that facet's selections. This ensures facet counts reflect result counts if that facet value were selected, enabling progressive refinement.

Autocomplete and Search Suggestions

Autocomplete helps users formulate queries by suggesting completions as they type.

Prefix Matching

Simple prefix matching for autocomplete:

async function autocomplete(prefix: string, limit: number = 10): Promise<string[]> {
  const response = await esClient.search({
    index: 'products',
    body: {
      suggest: {
        title_suggest: {
          prefix,
          completion: {
            field: 'title_suggest',
            size: limit,
            skip_duplicates: true,
          },
        },
      },
    },
  });

  return response.suggest.title_suggest[0].options.map(
    option => option.text
  );
}

The completion suggester requires a completion field type in the mapping:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "title_suggest": {
        "type": "completion"
      }
    }
  }
}

Index documents with suggestion inputs:

{
  "title": "Wireless Bluetooth Headphones",
  "title_suggest": {
    "input": [
      "Wireless Bluetooth Headphones",
      "Bluetooth Headphones",
      "Headphones",
      "Wireless Headphones"
    ],
    "weight": 10
  }
}

The input array contains phrases that should trigger this suggestion. The weight parameter prioritizes suggestions (higher weights appear first). The completion suggester uses FST (Finite State Transducers) for extremely fast prefix matching (~ms latency).

Typo-Tolerant Autocomplete

Elasticsearch suggestions don't support fuzzy matching out of the box. For typo tolerance, use Typesense or custom implementation:

// Typesense autocomplete with typo tolerance
async function typoTolerantAutocomplete(
  prefix: string,
  limit: number = 10
): Promise<string[]> {
  const response = await typesenseClient
    .collections('products')
    .documents()
    .search({
      q: prefix,
      query_by: 'title',
      prefix: true,
      num_typos: 2, // Allow up to 2 typos
      per_page: limit,
    });

  return response.hits.map(hit => hit.document.title);
}

Typesense's typo tolerance uses weighted edit distance, allowing configurable typo counts. This handles common typing errors like transpositions, insertions, deletions, and substitutions.

Popular Query Suggestions

Suggest popular queries based on analytics:

interface PopularQuery {
  query: string;
  count: number;
  lastSeen: Date;
}

class QuerySuggestionService {
  async trackQuery(query: string, resultCount: number): Promise<void> {
    await db.searchQueries.upsert({
      where: { query: query.toLowerCase() },
      create: {
        query: query.toLowerCase(),
        count: 1,
        lastSeen: new Date(),
        resultCount,
      },
      update: {
        count: { increment: 1 },
        lastSeen: new Date(),
        resultCount,
      },
    });
  }

  async getPopularQueries(
    prefix: string,
    limit: number = 10
  ): Promise<PopularQuery[]> {
    return db.searchQueries.findMany({
      where: {
        query: {
          startsWith: prefix.toLowerCase(),
        },
        resultCount: {
          gt: 0, // Only suggest queries that return results
        },
      },
      orderBy: [
        { count: 'desc' },
        { lastSeen: 'desc' },
      ],
      take: limit,
    });
  }

  // Clean up old queries periodically
  async cleanupStaleQueries(): Promise<void> {
    const cutoffDate = new Date();
    cutoffDate.setDate(cutoffDate.getDate() - 90);

    await db.searchQueries.deleteMany({
      where: {
        lastSeen: { lt: cutoffDate },
        count: { lt: 10 },
      },
    });
  }
}

Track all search queries and their result counts. Suggest queries that match the prefix, prioritizing by frequency and recency. Filter out queries with zero results to avoid suggesting dead ends.

Indexing Strategies

Indexing performance and freshness affect search functionality and infrastructure costs.

Real-Time Indexing

Index documents immediately when they're created or updated:

// Event-driven indexing
class ProductService {
  async createProduct(product: Product): Promise<Product> {
    // Save to primary database
    const created = await db.products.create({ data: product });

    // Index in search engine
    await searchService.indexDocument('products', {
      id: created.id,
      ...this.transformForSearch(created),
    });

    return created;
  }

  async updateProduct(id: string, updates: Partial<Product>): Promise<Product> {
    const updated = await db.products.update({
      where: { id },
      data: updates,
    });

    // Update search index
    await searchService.updateDocument('products', id, {
      ...this.transformForSearch(updated),
    });

    return updated;
  }

  private transformForSearch(product: Product): ProductSearchDocument {
    return {
      id: product.id,
      title: product.title,
      description: product.description,
      category: product.category.name,
      price: product.price,
      inStock: product.quantity > 0,
      popularity: product.viewCount + product.purchaseCount * 10,
    };
  }
}

Real-time indexing provides immediate search visibility but increases latency for write operations. Use async processing for non-critical updates.

Bulk Indexing

Batch index operations for better performance:

class BulkIndexingService {
  private indexQueue: Array<{ id: string; document: any }> = [];
  private flushInterval: NodeJS.Timer;

  constructor(private batchSize: number = 100, private flushIntervalMs: number = 5000) {
    // Flush periodically
    this.flushInterval = setInterval(() => this.flush(), flushIntervalMs);
  }

  async queueForIndexing(id: string, document: any): Promise<void> {
    this.indexQueue.push({ id, document });

    if (this.indexQueue.length >= this.batchSize) {
      await this.flush();
    }
  }

  private async flush(): Promise<void> {
    if (this.indexQueue.length === 0) return;

    const batch = this.indexQueue.splice(0, this.batchSize);

    const operations = batch.flatMap(({ id, document }) => [
      { index: { _index: 'products', _id: id } },
      document,
    ]);

    try {
      const response = await esClient.bulk({ operations });

      if (response.errors) {
        const failed = response.items
          .filter(item => item.index?.error)
          .map(item => ({
            id: item.index._id,
            error: item.index.error,
          }));

        logger.error('Bulk indexing errors', { failed });
      }
    } catch (error) {
      logger.error('Bulk indexing failed', { error, count: batch.length });
      // Re-queue failed items
      this.indexQueue.unshift(...batch);
    }
  }
}

Bulk operations reduce network overhead and improve throughput by sending multiple documents in a single request. Elasticsearch's bulk API processes operations in batches internally. The flush interval ensures documents are indexed within a predictable time window even if the batch size isn't reached.

Change Data Capture (CDC)

Use database change streams to keep search indexes synchronized:

// PostgreSQL logical replication for CDC
import { Client } from 'pg';

class CDCIndexingService {
  private pgClient: Client;

  async startListening(): Promise<void> {
    this.pgClient = new Client({
      connectionString: process.env.DATABASE_URL,
    });

    await this.pgClient.connect();

    // Create replication slot
    await this.pgClient.query(`
      SELECT pg_create_logical_replication_slot('search_indexing', 'wal2json');
    `);

    // Start consuming changes
    this.consumeChanges();
  }

  private async consumeChanges(): Promise<void> {
    while (true) {
      const result = await this.pgClient.query(`
        SELECT * FROM pg_logical_slot_get_changes(
          'search_indexing',
          NULL,
          NULL,
          'format-version', '2'
        );
      `);

      for (const row of result.rows) {
        const change = JSON.parse(row.data);
        await this.handleChange(change);
      }

      await this.sleep(1000);
    }
  }

  private async handleChange(change: any): Promise<void> {
    if (change.table === 'products') {
      if (change.action === 'INSERT' || change.action === 'UPDATE') {
        const product = await this.fetchProductWithRelations(change.data.id);
        await searchService.indexDocument('products', this.transformForSearch(product));
      } else if (change.action === 'DELETE') {
        await searchService.deleteDocument('products', change.data.id);
      }
    }
  }
}

CDC provides reliable, eventual consistency between the database and search engine without coupling them in application code. Database changes are the source of truth. This approach handles scenarios like database migrations, bulk operations, or external database modifications that bypass the application.

Alternative CDC implementations include Debezium (Kafka-based), AWS DMS, or database triggers. For message queue integration, see Event-Driven Architecture.

Search Analytics

Track search behavior to improve relevance and identify content gaps.

Query Analytics

interface SearchAnalytics {
  query: string;
  resultCount: number;
  clickedResults: string[];
  firstClickPosition: number | null;
  userId?: string;
  timestamp: Date;
}

class SearchAnalyticsService {
  async trackSearch(analytics: SearchAnalytics): Promise<void> {
    await db.searchAnalytics.create({
      data: analytics,
    });
  }

  // Identify queries with no results
  async getNoResultQueries(
    startDate: Date,
    endDate: Date
  ): Promise<Array<{ query: string; count: number }>> {
    return db.searchAnalytics.groupBy({
      by: ['query'],
      where: {
        resultCount: 0,
        timestamp: { gte: startDate, lte: endDate },
      },
      _count: { query: true },
      orderBy: { _count: { query: 'desc' } },
      take: 100,
    });
  }

  // Calculate click-through rate by query
  async getClickThroughRate(
    startDate: Date,
    endDate: Date
  ): Promise<Array<{ query: string; ctr: number }>> {
    const searches = await db.searchAnalytics.groupBy({
      by: ['query'],
      where: {
        timestamp: { gte: startDate, lte: endDate },
      },
      _count: { query: true },
      _sum: { firstClickPosition: true },
    });

    return searches.map(row => ({
      query: row.query,
      ctr: row._sum.firstClickPosition
        ? row._count.query / row._sum.firstClickPosition
        : 0,
    }));
  }

  // Identify poorly ranked results
  async getLowClickPositions(
    startDate: Date,
    endDate: Date
  ): Promise<Array<{ query: string; avgPosition: number }>> {
    const results = await db.searchAnalytics.groupBy({
      by: ['query'],
      where: {
        firstClickPosition: { not: null },
        timestamp: { gte: startDate, lte: endDate },
      },
      _avg: { firstClickPosition: true },
      _count: { query: true },
      having: {
        _avg: { firstClickPosition: { gt: 5 } },
      },
    });

    return results
      .filter(row => row._count.query > 10) // Minimum query volume
      .map(row => ({
        query: row.query,
        avgPosition: row._avg.firstClickPosition,
      }));
  }
}

No-result queries indicate missing content or inadequate synonyms. Low click-through rates suggest poor relevance or unappealing result presentation. High average click positions indicate relevant results are ranked too low.

Use analytics to:

Add synonyms for common query variations
Adjust field boosts based on click patterns
Identify content gaps to fill
A/B test relevance changes

For observability and monitoring of search systems, see Observability Guidelines.

API Design - Designing search API endpoints
Performance Testing - Load testing search infrastructure
Caching - Caching search results
Database Design - Designing searchable data models
Observability - Logging - Logging search queries and performance
GraphQL - Implementing GraphQL search APIs
Spring Boot API Design - Search endpoints in Spring Boot
React State Management - Managing search state in React
Angular State Management - Search state with Signals

Search Implementation

Overview

Search Engine Selection

Elasticsearch / OpenSearch

Typesense

Algolia

Meilisearch

Apache Solr

Full-Text Search Implementation

Text Analysis Pipeline

Search Query Implementation

Handling the N+1 Problem

Search Relevance Tuning

Field Boosting

Function Score Queries

Personalized Ranking

Relay Connection Specification

Faceted Search and Filtering

Aggregations for Facets

Multi-Select Facets

Autocomplete and Search Suggestions

Prefix Matching

Typo-Tolerant Autocomplete

Popular Query Suggestions

Indexing Strategies

Real-Time Indexing

Bulk Indexing

Change Data Capture (CDC)

Search Analytics

Query Analytics

Overview​

Search Engine Selection​

Elasticsearch / OpenSearch​

Typesense​

Algolia​

Meilisearch​

Apache Solr​

Full-Text Search Implementation​

Text Analysis Pipeline​

Search Query Implementation​

Handling the N+1 Problem​

Search Relevance Tuning​

Field Boosting​

Function Score Queries​

Personalized Ranking​

Pagination Strategies​

Offset-Based Pagination​

Cursor-Based Pagination​

Relay Connection Specification​

Faceted Search and Filtering​

Aggregations for Facets​

Multi-Select Facets​

Autocomplete and Search Suggestions​

Prefix Matching​

Typo-Tolerant Autocomplete​

Popular Query Suggestions​

Indexing Strategies​

Real-Time Indexing​

Bulk Indexing​

Change Data Capture (CDC)​

Search Analytics​

Query Analytics​

Related Topics​

Overview

Search Engine Selection

Elasticsearch / OpenSearch

Typesense

Algolia

Meilisearch

Apache Solr

Full-Text Search Implementation

Text Analysis Pipeline

Search Query Implementation

Handling the N+1 Problem

Search Relevance Tuning

Field Boosting

Function Score Queries

Personalized Ranking

Pagination Strategies

Offset-Based Pagination

Cursor-Based Pagination

Relay Connection Specification

Faceted Search and Filtering

Aggregations for Facets

Multi-Select Facets

Autocomplete and Search Suggestions

Prefix Matching

Typo-Tolerant Autocomplete

Popular Query Suggestions

Indexing Strategies

Real-Time Indexing

Bulk Indexing

Change Data Capture (CDC)

Search Analytics

Query Analytics

Related Topics