Skip to content

Elasticsearch

Elasticsearch (ES) is an open-source search and analytics engine that powers Jetpack Search and WordPress VIP’s Enterprise Search.

When Elasticsearch is powering a site’s search, it continually indexes the site’s content. During publishing actions, action hooks capture the change events and identify the changed data to be indexed. Elasticsearch has its own environment and data store, interacting via REST API requests. As search requests are made on a site, API calls tell ES what to search for and how to weigh the results.

These communications occur asynchronously, so there may be a slight delay between when a change is made in WordPress and when the change appears in Elasticsearch. For this reason, the WordPress database should be referred to as the source of truth for search results.

To integrate a WordPress site with Elasticsearch, code will be needed to monitor for content changes and send those changes to the Elasticsearch cluster for indexing. A “cluster” is a group of one or more Elasticsearch nodes working together.

Code is also needed to intercept the search queries and, instead of making LIKE queries to the MySQL database, send an API request to the Elasticsearch endpoint. The ES endpoint will return a set of search results containing post IDs.

Example output from an ES endpoint:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 569,
      "relation": "eq"
    },
    "max_score": 540.97675,
    "hits": [
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4536344",
        "_score": 540.97675,
        "_source": {
          "post_id": 4536344
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "105829",
        "_score": 516.1369,
        "_source": {
          "post_id": 105829
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "306074",
        "_score": 516.1369,
        "_source": {
          "post_id": 306074
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "3688167",
        "_score": 476.97778,
        "_source": {
          "post_id": 3688167
        }
      },
      {
        "_index": "vip-2737-post-1",
        "_type": "_doc",
        "_id": "4616046",
        "_score": 476.97778,
        "_source": {
          "post_id": 4616046
        }
      }
    ]
  }
}

Post IDs can be used to fetch the actual data from the database and display post summaries. For example:

SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND wp_posts.ID IN (426,506,192)
AND wp_posts.post_type IN ('post', 'page')
AND wp_posts.post_status = 'publish'
ORDER BY wp_posts.post_date DESC
LIMIT 0, 3

In a typical search request:

  • The normal WPDB query is intercepted.
  • A request to the ES endpoint is made with the details from the query (i.e. the search terms).
  • A response containing a list of matching post IDs (and often, other data such as rankings) is received.
  • A new DB query is made to get the list of posts, or a series of get_post() calls are made for individual posts.
  • Results are returned for the matching posts and are rendered on the page.

Last updated: March 13, 2024

Relevant to

  • WordPress