Article

Splunk SmartStore: Everything You Need To Know

Written by: Michael Simko | Last Updated:

February 25, 2024

Originally Published:

February 24, 2023

What is Splunk SmartStore?

SmartStore is Splunk’s implementation of the S3 Object Storage Service. It primarily operates with Amazon S3 but works with block storage and dedicated S3 storage devices from other providers. This post will explore SmartStore and how it may benefit your Splunk Enterprise deployments. We’ll see how SmartStore works, discuss the architectural designs, and explore their impact on your Splunk environment.

Each Indexer has a cache (SSD on the server), ideally sized to hold at least thirty days of ingest. This cache contains a copy of the active warm buckets, which along with the hot buckets, have the vast majority of data that users require. Copies of all warm buckets (there are no cold buckets in SmartStore) exist in the S3 object store. When requested data is not in the cache, SmartStore fetches it from the S3 object store.

Splunk SmartStore Created for Splunk Cloud

Splunk Cloud Platform runs primarily on Amazon Web Services (AWS). Switching from traditional file-based storage to block storage saves much expense in AWS. However, to be performant, Splunk requires adequate IOPs (a calculation of input and output operations per second). S3 alone wasn’t enough to meet the needs of Splunk Cloud customers, so Splunk developed a hybrid architecture that provides the IOPs required while keeping storage costs lower. This hybrid storage is Splunk SmartStore.

Splunk chose to share SmartStore with Splunk Enterprise users in case they benefit. Most of this blog will focus on Splunk Enterprise implementation since all Splunk Cloud customers use SmartStore, which the SaaS offering provides.

How SmartStore Works in Splunk Enterprise

In SmartStore implementations, there are only two types of buckets: Hot and Warm. SmartStore design contrasts with traditional Splunk Enterprise indexing, which adds a concept of Cold Buckets to reduce storage costs. With S3’s use case, there is no need for Cold Buckets.

Hot Buckets: Newly arrived data land in Hot Buckets. Events coming with significant time differences split into separate hot buckets. For example, if a cargo ship only sends events when it reaches port, those logs will be older than the events from the always-connected network devices. Hot Buckets reside on the indexers in both SmartStore and traditional indexing. Hot Buckets roll to Warm Buckets, which stay in the same directory as the Hot Buckets. Hot Buckets replicate in Indexer Clusters.
Warm Buckets: The process differences between SmartStore and Traditional Indexing first appear in Warm Buckets. In SmartStore, when a Hot Bucket rolls to Warm, it is sent to the S3 object store. A copy remains in the cache, replicated, and kept safe via the S3 bucket’s replication settings. In Clustering, only the primary buckets return from the object store.
Freezing: Buckets move to frozen directly from the S3 object store.

Selective SmartStore Use

You may configure SmartStore at the individual index level. All buckets from an Individual index must be either a traditional index store or SmartStore. Using this method, high-value data must have disaster recovery replicate while mundane sources do not. Ergo, reducing storage costs.

Types of S3 object stores

The most popular SmartStore object store is AWS S3. It is inexpensive and provides adequate performance. Other cloud vendors offer equivalent methods: Google GCS (Google Storage Cloud) is their S3 equivalent, while Microsoft offers Azure Blob Storage.

The leading storage vendors detail how to use their storage devices locally or via their storage clouds to serve SmartStore. Pure Storage, Dell EMC, Oracle, IBM, and Hitachi have articles up. Apologies if I didn’t mention your favorite storage vendor.

The big issue is Bandwidth.

Data moves between the cache and the object store. When designing SmartStore, you must consider the costs of moving the data and the time it takes.

When SmartStore makes perfect sense: When you are hosting your Splunk Enterprise in a cloud that offers a compatible object store. If you have your Splunk in AWS, it makes sense to use AWS S3.

Alternately, when you already own an available storage device that is SmartStore compatible, it makes sense not to spend on other storage.

When SmartStore may make sense, depending on the circumstances: When you have great Bandwidth and a remote storage option mixed with solid local SSD.

When SmartStore makes little sense: When you have a small internet pipe.

When SmartStore isn’t the right choice 2: When your users perform many long-time searches that use so many buckets that exceed the search window.

Conclusion

SmartStore is a promising alternative indexing storage method. SmartStore uses object storage with SSD to provide enough storage speed for search activities and acceptable costs. For the right customers, SmartStore may save them significant storage costs. The biggest pain point is to ensure the data the users search for most exists in the cache and does not need retrieving from the object store.

If you found this helpful…

You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.

Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below: