Article

Intermediate Splunk Indexes and Index Management

Written by: Michael Tolbert | Last Updated:

April 19, 2024

Originally Published:

March 17, 2023

What is Splunk Indexes and Index Management?

Splunk indexes and Index Management are key to understanding how to store your data ingestion in a Splunk indexer. Mismanagement of the index can cause performance issues and generally leads to instability in a Splunk Deployment. Management of the index also includes configuration of index for established a proper condition for storing ingested data. Splunk Index Management: Managing Indexes covers three aspects of Index Management: Managing indexes for Data Ingestion and Managing Index Settings, and how to Update Index Settings.

Managing Indexes of Data Ingestion

The indexes are the houses where Splunk ingested events are stored. If indexes are not properly managed and controlled, you could run into a situation like DLL(s) on a Windows operating system commonly referred to as DLL hell. In Splunk you can experience the situation where many indexes are created but many are unused, some are incorrectly used, some are overly populated with uncommon events.

Unused indexes:

This can occur with data boarding is performed over time and possibly not by the same Splunk Admin or the data ingest was stopped because maybe no longer required. However, there was no clean-up process to remove the unused index.

Incorrectly used indexes:

Data sources that do not defined an index at the input configuration will cause data to be stored in the last-chance or default index, which is the main index. Some add-ons are set up to use the default index as well of not properly configured.

Overly populated indexes.

This usually resulted from a poorly planned data onboarding exercise for data sources. Sometimes Splunk admins will elect to create a generic named index-like network, and then onboard every data source like the firewall, switches, routers, etc. This caused the index to experience uneven population of events. Sometimes an index with such a name ends up becoming a dumping-ground for unrelated data events. Additionally, an overly populated index can provide poorer search performance when utilizing in a dashboard, reports or alert.

Cleaning up your indexes:

There are several ways you can manage, tracking and cleaning up your unimportant indexes. One way would be using SPL searches to track the state of your indexes. Unused indexes can have either a low event count or events that are very old. To find indexes that has under 100 events, try this SPL query:

|eventcount summarize=false index=false index=* index=_*  

|dedup index  

| fields server, index, count  

| search<100

To find indexes that contain low events try this SPL query:

|metadata type=sourcetypes where index=_internal  

| convert(lastTime)  

| sort - lastTime  

| head 100

Overpopulated indexes are simple to determine. A basic query such as: index=YourIndexName should be enough to examine the type of events and subsequently disable those inputs.

Another option for managing your indexes is to use a Splunk App. There is a great app in Splunkbase called Atlas which is idea for managing indexes. Atlas provides a Data utilization module that aids Splunk Admins in tracking index utilization by index and sourcetype. There is also a Data Management module that tracks your data though its entire lifecycle. These are just some of the powerful features provided by Atlas. There is also the Atlas Assessment application available free of charge that can provide you with recommendations for how the Atlas platform can help you understand and manage your Splunk environment.

Managing Index Settings

There is a plethora of settings associated with indexes. Every index created has a default setting applied at creation. You can examine these settings by either reviewing the $SPLUNK_HOME/etc/system/default/indexes.conf file on your indexer or by running a SPL query such as this from your indexer. For example, if you wanted to see the setting on the Splunk _internal index in the Search and Reporting app, you could try this query:

|rest /services/data/indexes 

|search title=_internal 

| transpose

Note, the result from this query is not a one-to-one match to the setting in the indexes.conf file. It provides enough of the settings to determine the default state of the index.

A definition of every index setting can be viewed in the indexes.conf Spec. This link can be used to view this page: Indexes.conf.spec.

How to Update Index Settings

Changes are made to the default setting based on how an index should be configured for data storage. For example, a common configuration for data storage is applying a retention period on an index. The retention period determines how long data should remain in the index before deleting the oldest events. Before attempting to set a retention period, you should understand the lifecycle of Splunk data. When Splunk data is written to an indexer it is stored in a bucket (basically a directory on the file system). That data goes through a lifecycle where buckets are moved from hot to warm to cold to frozen buckets over time. Frozen buckets can either be deleted or archived depending on other configuration settings.

The retention period for the entire lifecycle of data can be configurated by the frozenTimePeriodInSecs setting. The default value in seconds is 188697600 or 6 years. You can modify this setting to customize retention period by calculating the duration of your retention period (in seconds) and applying the parameter and value to your index configuration (e.g., frozenTimePeriodInSecs=value), and restarting your indexer. To learn more about configuration files see here.

Beside this, there is another setting that can affect retention, which is the maxTotalDataSizeMB. This setting should reflect the amount of storage you wish to allocation to an index, subsequently there should be enough available storage on your indexer file system to cover the configured value. Splunk will use these settings to determine the final retention period.

Another setting that is commonly configurated on an index is the “enableDataIntegrityControl”. This setting defaults to a Boolean value of false to disable data integrity checking. Applying a value of true would enable data integrity checking. Data integrity control enables Splunk to compute hashes on raw-data slices and stores the hashes for future data integrity check. Be advised that when enabling this setting any data that was in the index prior to enabling might be flagged as a failure in the integrity check. Also, this may continue to occur until that data ages out of index.

There are many ways to configure your Splunk indexes to tailor them for your unique storage needs. For more information on other settings refer to the indexes.conf spec for details examinations.

If you found this helpful…

You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.

Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below: