There are three (3) main components that define the Architecture of Splunk. Those three components work together to accomplish Splunk functionality. The three components are a Forwarder, an Indexer and a Search Head. A Forwarder sends events to an Indexer, the indexer ingests the events and store them into an index, and the Search Head searches the index to fetch those events from indexer when requested by a user.
If the indexer goes down, the Search Head can’t fetch the events due to the outage caused by the indexer. However, If the events were replicated to multiple indexers, an outage can be prevented in the case of an indexer going down. Hence, the concept of Indexer Clustering was introduced in Splunk, which is the subject of this post.
What is Splunk Index Clustering?
Index clustering is Splunk’s implementation of Data High Availability. As we have seen in the example of an indexer failure, data will not be available for searching if events were sent to only one indexer. For customers interested in Splunk Data High Availability, Splunk Index Clustering is the solution.
Index Cluster consists of multiple indexers managed by a single server called cluster manager. The cluster manager orchestrates the replication of data (buckets) to one or more indexers depending on desired replication factor.
A Cluster would consist of:
- A single server acting as a manager node.
- Multiple indexers, referred to by peer nodes, that index and replicate data to other peer nodes, as well as search data and forward to Search Heads when requested.
- One or more search heads that coordinate searches across all peer nodes.
Here are some details that helps understanding the above diagram and index clustering.
The manager node manages the cluster, by orchestrating the replication activities among the peer nodes (indexers) and informs the Search Head where to find the required data. It does not index any data.
Peer Nodes (indexers) perform indexing for the whole cluster. Receive and index incoming data. They also send replicated data to other nodes in the cluster and receive replicated data from other peers. The number of peer nodes is dependent on two factors, the cluster replication factor and the indexing load. For example, if you have a replication factor of 3 (3 copies of the data), you need at least three peers. If you have more indexing load than 3 indexers can handle, you need to add more peers to increase capacity.
The search head manages searches across all peer nodes, it distributes search queries to all peers, and consolidates the results. A cluster must have at least one search head.
Forwarders act the same as in any Splunk Enterprise Deployment. They consume data from external sources and forward the data to indexers (peers). It can communicate with Manager node (indexer discovery) to get an updated list of available indexers.
Replication Factor is the number of copies of data the cluster maintains, determines the level of failure tolerance.
Search Factor is the number of searchable copies of the data the cluster maintains, which determines how quickly the cluster can recover its searching capability after a peer node goes down.
Buckets are the basic units of index storage used.
The Benefits of Splunk Index Clustering
The most important benefit of Splunk Index Clustering is the High Availability of Data, making it immune to indexer outages. It does simplify forwarder outputs as well.
Benefit #1 High Availability of Data
Index Clustering provides a higher availability of data used for searching when needed by making the cluster immune to indexer outages. Without index clustering a Splunk search might not provide the correct results if one of the indexers is down (not available).
Benefit #2 Simplify Forwarder outputs
Index Clustering simplifies the outputs.conf file required to forward data by forwarders to indexers by contacting cluster manager to provide the list of available indexers. If index clustering is not used, then a complete list of indexers should be included in the outputs.conf file on forwarders, as well as to maintain the list of new indexers are added/removed.
Types of Splunk Index Cluster
There are two major types of indexer clusters
- Single-Site index cluster where data is replicated to multiple indexers within a single geographical location. Data will still be available if enough indexers are still available to meet search factor requirement.
- Multi-Site index cluster where data is replicated to multiple indexers in more than one geographical location for Disaster Recovery requirements. Data will still be available even if a whole site goes down.
Index Clustering consists of multiple indexers managed by a single node called cluster manager. The cluster manager orchestrates the replication of buckets (data) to one or more indexers depending on desired replication factor. Replication factor is the number of copies the data should be replicated (duplicated), and the Search Factor is the number of those copies that are ready to search.
If you found this helpful…
You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.
Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below: