Elasticsearch is a leading search engine that works well with different data types, including numerical, textual, structured, and unstructured data. It’s a core component of the Elastic Stack (aka the ELK Stack), an all-in-one solution for data ingestion, transformation, storage, processing, and visualization.
Elasticsearch lies at the heart of several IT infrastructures. We can track its performance and health metrics in real-time to ensure it is smooth running and free of any bottlenecks. In the following article, we will discuss Elasticsearch’s architecture, its importance, and some of the key metrics to monitor.
Elasticsearch is an open-source, distributed, highly scalable, fault-tolerant search and analytics engine. It is built on top of Apache Lucene, an open-source Java library with powerful search and indexing features.
Elasticsearch has a RESTful interface with API clients for many programming languages, including Java, Python, .NET, PHP, Ruby, C++, Rust, and more. REST APIs are available for data search, aggregation, ingestion, and management.
The Elasticsearch DSL (domain-specific language) is a query language based on JSON. It offers numerous search features, including phrase matching, wildcards, regex, and geo queries. The SQL module allows you to execute custom queries against an SQL database and store the results in Elasticsearch.
Logstash, another component of the ELK Stack, is a data processing pipeline used to automate the ingestion and transformation of multi-source data at scale. It enables Elasticsearch to aggregate and ingest data from multiple sources.
Elasticsearch stores data as JSON documents. This data is indexed to allow fast search and retrieval. An index in the Elasticsearch world plays the same role as that of a table in the relational database world.
Just like tables contain columns and rows, an index includes types containing documents with fields. For example, an index named Users, may contain three types: Partners, Vendors, and Employees.
Each type will contain documents belonging to it. For example, Partners may contain JSON documents for all your partners. This data categorization not only makes Elasticsearch fast, but also makes it easy to query data. To retrieve the document for a partner named Alice, a user can send an http request using the following structure:
http://server-ip-and-port/Users/Partners/_search?q=Alice
A shard is the basic building block of an Elasticsearch cluster. Each index is divided into one or more shards. You can imagine a shard as a tiny, self-contained search engine responsible for indexing and processing queries for a subset of data stored in the cluster.
Elasticsearch uses an inverted index data structure that allows you to search for words inside JSON documents.An inverted index maintains a list of all unique words and documents containing each word.
Elasticsearch provides efficient indexing and searching capabilities that can meet various business needs.
Elasticsearch’s powerful search features enable users to build intuitive search engines for multiple use cases. Whether you want to implement a search bar for your online store, or an internal document store for your employees, Elasticsearch is the way to go.
Elasticsearch’s inverted indices enable fast full-text searches across millions of documents. You can build aggregations based on terms, date ranges, and more to achieve faceted navigation. The type-ahead suggester displays similar results to the user as they type. Fuzzy searching allows for misspellings while searching.
The ELK Stack is often used to aggregate, transform, search, and analyze logs. Logstash loads and transforms logs from multiple sources. Elasticsearch indexes logs and allows you to analyze them. With the ELK Stack, you can look for anomalies, filter for errors, match patterns, and perform system-wide debugging from a central place.
Kibana, the visualization component of the stack, creates graphs so users can analyze trends visually. Kibana also provides the ability to create triggers that execute automated workflows and to set up contextualized alerts to help you resolve issues quickly.
Elasticsearch is a top choice for performing real-time analysis of application and infrastructure performance. You can aggregate and index various types of metric data in a central location and track in real-time using Elasticsearch’s fast querying capabilities.
For example, you might set up a Logstash pipeline to fetch CPU usage data from different application servers and store it in Elasticsearch. You can also create a customized dashboard on Kibana that fetches these statistics and displays them as graphs and charts.
The Elastic Web Crawler is an indexing tool that automates the indexing of your website content. It periodically crawls your website, identifies new content, and indexes it in Elasticsearch. Any changes you make to your website are automatically propagated to Elasticsearch.
This eradicates the need to ingest content manually. It also enables a better search experience by making new content searchable instantly.
Whether you want to monitor website activity, perform sentiment analysis, track your business KPIs, or analyze financial data, the ELK Stack will be a great choice. The stack enables you to aggregate, ingest, and process data from different sources, including social media feeds, enterprise applications, and marketing tools.
Use Elasticsearch and Kibana to generate actionable insights and create fact-based reports for your team. On-demand forecasting allows you to apply machine learning to historical data and predict future trends.
Monitoring the health and performance of Elasticsearch is important for the following reasons:
Nobody likes a slow search bar. Users expect to see search results appear instantly, sometimes even before they finish typing. Elasticsearch can help deliver this experience – but only if it’s functioning properly.
Monitoring an Elasticsearch instance enables you to track its health and performance. For example, you can track request-response metrics to ensure that Elasticsearch responds to requests at an acceptable rate and with minimum latency.
Programming errors, misconfigurations, or scalability issues can cause malfunctions or bottlenecks. For instance, poorly structured queries may lead to slow operations that decrease the overall throughput of the instance. Or inadequate resources may cause spikes in CPU usage during peak hours. Periodic monitoring can help detect and debug such issues.
As Elasticsearch often acts as the backbone of IT infrastructures, monitoring performance metrics equips you with insights to optimize the performance of Elasticsearch as well as the larger system. For example, if a decline in response rate coincides with an increase in slow operation logs, you can conclude that some operations are taking too long to execute.
Monitoring Elasticsearch’s audit logs helps detect security events, such as authentication failures, refused connections, and insufficient permissions. You can also specify your criteria for logging events in the audit log. Periodic monitoring ensures you don’t overlook security-critical events and protects Elasticsearch from unauthorized access.
Metrics like the number of indices and nodes, document counts, and search rate allow you to monitor the cluster state in real-time. This way, you can detect anomalies or fluctuations in key performance metrics and take remedial action.
Elasticsearch exposes several metrics that can be used to track the performance of its key areas and elements.
The cluster health API provides a basic overview of the current health of the Elasticsearch cluster. It can be accessed via:
curl -XGET '[server-ip-and-port]/_cluster/health?pretty';
A sample response is as follows:
{
"cluster_name" : "test",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 10,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks" : 2,
"number_of_in_flight_fetch": 2,
"task_max_waiting_in_queue_millis": 10
}
The possible values for the status field are green, yellow, and red. Green indicates that all shards are assigned. Yellow means that all primary shards are assigned, but some replica shards remain unassigned. Red indicates that one or more primary shards are unassigned, making some data unavailable.
The number_of_nodes indicates the total number of nodes, whereas the number_of_data_nodes represents the number of dedicated data nodes. The different _shards fields indicate the number of active, relocating, initializing, and unassigned shards.
If the value of relocating_shards is greater than zero, the cluster is moving data shards to restore balance. This typically occurs when a node is added or removed or when a failed node is restarted.
The number_of_pending_tasks field is a measure of cluster-level changes that haven’t been implemented. The number_of_in_flight_fetch metric represents the number of unfinished fetches.
Correlating different health metrics allows administrators to gauge how a cluster is performing. For example, if the relocating_shards metric is regularly more than zero even though new nodes are not being added, it means that specific nodes are repeatedly failing. Or if the number_of_pending_tasks field remains more than zero several hours after cluster initialization, it means that something is wrong within the cluster.
The Elasticsearch Service console presents several metrics related to CPU and memory. For example:
The hot threads API can help identify blocked processes contributing to high CPU usage. To retrieve hot threads for all the cluster’s nodes, use the following:
curl -XGET 'http://[server-ip-and-port]/_nodes/hot_threads';
To retrieve hot threads for a specific node, use the following:
curl -XGET 'http://[server-ip-and-port]/_nodes/[node_id]/hot_threads';
Unlike other Elasticsearch APIs, the hot threads API doesn’t return a JSON. Instead, it returns formatted text that includes information about the node and the percentage of CPU usage by the hot threads.
Tracking node metrics is crucial to ensure overall optimal performance of an Elasticsearch cluster. The nodes stats API returns several node statistics related to the operating system, file stores, JVM, and more. It can be invoked as:
curl -XGET 'http://[server-ip-and-port]/_nodes/stats';
Some of the metrics included in the response are:
In the following sections, we’ll discuss how to track key Elasticsearch metrics using monitoring tools.
Metricbeat is a lightweight data shipper that is a part of the Elastic Stack. With Metricbeat, you can collect metric data from production Elasticsearch clusters and load it to an Elasticsearch cluster dedicated to monitoring. The loaded data can then be visualized using Kibana. Here are the steps:
Site24x7’s Elasticsearch plugin can also be used to monitor Elasticsearch in real-time. It offers visibility into key metrics related to sharding, JVM, cluster status, and memory and CPU usage. You can install the plugin using these steps:
Elasticsearch is a leading search and analytics engine that can aggregate data from diverse sources. Powerful search features, flexible deployment options, extensive security controls, and out-of-the-box scalability make it an ideal fit for many business use cases. In this article, we aimed to share a comprehensive guide to monitoring an Elasticsearch instance.
Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.
Apply Now