Defining Data Sprawl in Splunk: Why it Matters, and What it’s Costing You

“Data Sprawl” isn’t really a technical term you’ll find in the Splexicon (Splunk’s glossary). Here at Kinney Group, however, we’ve been around Splunk long enough to identify and define this concept as a real problem in many Splunk environments.

What exactly is Data Sprawl? It’s not one, single thing you can point to, rather a combination of symptoms that generally contribute to poorly-performing and difficult-to-manage Splunk implementations. Let’s take a look at each of the three symptoms we use to define Data Sprawl, and break down the impacts to your organization:

  1. Ingesting unused or unneeded data in Splunk
  2. No understanding of why certain data is being collected by Splunk
  3. No visibility into how data is being utilized by Splunk

Ingesting unused or unneeded data in Splunk

When you ingest data you don’t need into Splunk, the obvious impact is on your license usage (if your Splunk license is ingest-based). This may not be terribly concerning if you aren’t pushing your ingest limits, but there are other impacts lurking behind the scenes.

For starters, your Splunk admins could be wasting time managing this data. They may or may not know why the data is being brought into Splunk, but it’s their responsibility to ensure this happens reliably. This is valuable time your Splunk admins could be using to achieve high-value outcomes for your organization rather than fighting fires with data you may not be using.

Additionally, you may be paying for data ingest you don’t need. If you’re still on Splunk’s ingest-based pricing model, and you’re ingesting data you don’t use, there’s a good chance you could lower Splunk license costs by reducing your ingest cap. In many cases, we find that customers have license sizes higher than they need to plan for future growth.

We commonly run into scenarios where data was being brought in for a specific purpose at one point in the past, but is no longer needed. The problem is that no one knows why it’s there, and they’re unsure of the consequences of not bringing this data into Splunk. Having knowledge and understanding of these facts provides control of the Splunk environment, and empowers educated decisions.

No understanding of why certain data is being collected by Splunk

Another common symptom of Data Sprawl is a lack of understanding around why certain data is being collected by Splunk in your environment. Having the ability to store and manage custom metadata about your index and sourcetype pairs — in a sane and logical way — is not a feature that Splunk gives you natively. Without this knowledge, your Splunk administrators may struggle to prioritize how they triage data issues when they arise. Additionally, they may not understand the impact to the organization if the data is no longer is coming in to Splunk.

The key is to empower your Splunk admins and users with the information they need to appropriately make decisions about their Splunk environment. This is much more difficult when we don’t understand why the data is there, who is using it, how frequently it is being used, and how it is being used. (We’ll cover that in more detail later.)

This becomes an even bigger issue with Splunk environments that have scaled fast. As time passes, it becomes easier to lose the context, purpose, and value the data is bringing to your Splunk mission.

Let’s consider a common example we encounter at Kinney Group.

Many organizations must adhere to compliance requirements related to data retention. These requirements may dictate the collection of specific logs and retaining them for a period of time. This means that many organizations have audit data coming in to Splunk regularly, but that data rarely gets used in searches or dashboards. It’s simply there to meet a compliance requirement.

Understanding the “why” is key for Splunk admins because that data is critical, but the importance of the data to end users is likely minimal.

(If this sounds like your situation, it might be time to consider putting that compliance data to work for you. See how we’re helping customers do this with their compliance data today with Atlas.)

The Atlas Data Management application allows you to add “Data Definitions,” providing clear understanding of what data is doing in your environment.

No visibility into how data is being utilized by Splunk

You’ve spent a lot of time and energy getting your data into Splunk but now you don’t really know a lot about how it’s being used. This is another common symptom of Data Sprawl. Making important decisions about how you spend your time managing Splunk is often based on who screams the loudest when a report doesn’t work. But do your Splunk admins really have the information they need to put their focus in the right place? When they know how often a sourcetype appears in a dashboard or a scheduled search, they have a much clearer picture about how data is being consumed.

Actively monitoring how data is utilized within Splunk is extremely important because you can understand how to effectively support your existing users and bring light to what Splunk calls “dark data” in your environment. Dark data is all of the unused, unknown, and untapped data generated by an organization that could be a tremendous asset if they knew it existed.

Kinney Group’s Atlas platform includes Data Utilization — an application designed to show you exactly what data you’re bringing in, how much of your license that data is using, and if it’s being utilized by your users and admins.

Conclusion

Most organizations may not realize that Data Sprawl is impacting their Splunk environment because it doesn’t usually appear until something bad has happened. While not all symptoms of Data Sprawl are necessarily urgent, they can be indicators that a Splunk environment is growing out of control. If these symptoms go unchecked over a period of time they could lead to bigger, more costly problems down the line.

Knowledge is power when it comes to managing your Splunk environment effectively. Kinney Group has years of experience helping customers keep Data Sprawl in check. In fact, we developed the Atlas platform for just this purpose. Atlas applications are purpose-built to keep Data Sprawl at bay (and a host of other admin headaches) by empowering Splunk admins with the tools they need.

Click here to learn more about the Atlas platform, to get a video preview, schedule a demo, or for a free 30-day trial of the platform.

Meet Atlas Data Management

Splunk is the data to everything platform, capturing massive volumes of data every day. Users will know, though, that without visibility, it can be difficult to extract the maximum value from Splunk. Too often, insufficient monitoring can lead to serious issues in a Splunk instance: platform underutilization, license overage, and even missing data. Each of these problems translates into a serious cost in financial resources, not to mention the hours of human intervention spent on troubleshooting a Splunk environment.

Atlas makes data management easy.

Figure 1 — the Data Management icon on the Atlas Core homepage

Atlas, Kinney Group’s revolutionary new platform for Splunk, includes the Data Management application, a tool that de-mystifies your license costs and improves Splunk stability with automated reporting. Gone are the days of license overages and data streams quietly dying in the dark—Atlas ensures unparalleled visibility to guarantee efficient use of data resources.

Attack Your Sprawl!

Data Management has the pivotal tool in Atlas for reigning in your license and keeping it in check. The Data Inventory dashboard organizes your existing data by sourcetype and index, with visibility into license utilization per sourcetype. With the Atlas Feature, Data Definitions, Admins can record a plethora of notes on each sourcetype, keep track of the who, what, when, where, and why of each data point! Data Definitions also automatically pull important Index and Sourcetype information, coalescing many different data points helpful for triaging into one place.

With these two enhancements together, Admins can use Data Management to finally own their Data Ingests into Splunk, with the ability to easily track, update, and be informed on how they are spending their license. This gives Admins the tools they need to course correct or triage issues like never before!

Automate Alerts and Awareness!

Data Management has another trick up it’s sleeve. On the Data Inventory dashboard, there exists a feature capable of greatly increasing visibility into data pipelines and ensuring critical dashboards and alerts stay accurate. Meet Data Watch!

Each sourcetype on Data Inventory has a bell icon that can be selected to turn on Data Watch, an automated alert interface that turns your needs for stability and visibility into easy to track and edit alerting!

Using Data Watch, Admins can quickly create alerts for data streams to ensure they are the first to know if a data pipeline faulters. Using pattern recognition, Data Watch can alert if the data stream dips below a threshold or comes from fewer sources, creating a technical trip wire to alert based on bad behavior in your pipelines.

With Data Watch, Admins can quickly expand their horizon on their Splunk Deployment, with minimal time needed thanks to Atlas’s automation to set up and further edit!

Conclusion

The “data to everything” platform promises incredible results—but you need a high degree of visibility within a Splunk environment to make that happen. Atlas’s Data Management application provides the transparency you need to ensure your license and ingests are being properly tracked to prevent sprawl and managed to prevent adverse errors seeping into your dashboards. Teams can now collaborate seamlessly with the knowledge that their data streams won’t be hidden or lost, bringing your organization one step closer to getting every insight you can out of your data.

There’s more to come from Atlas! Fill out the form below to stay in touch with Kinney Group.