Skip to content

The Recipe for Data Onboarding into Splunk


Written by: Kinney Group | Last Updated:

July 8, 2022

Originally Published:

May 11, 2020

Your data is in. So, what’s up with your dashboards?

Splunk can be easy to set up and easy to get started. But as Splunk develops, and use cases expand, and data starts pouring in, some important things tend to fall through the cracks. Data in Splunk may be hard to find because it’s missing important identifying components. Or Splunk apps from Splunkbase may not be showing up in their designated dashboards in the app.

You know the data is there. You know how to find it. So, why can’t your apps find it?

Begin with Data Onboarding

Getting data into Splunk is flexible and easy. Splunk has a number of ways to get data in from forwarded data, files, CSV, and even via HTTP event collectors. When bringing data into Splunk, how you describe your data makes it easier to search, easier for apps to find it, and easier to populate premade dashboards. Step one in making your data useful to Splunk is onboarding your data.

Figure 1 - Data onboarding into Splunk
Figure 1 – Data onboarding into Splunk

Six Ingredients of Onboarding 

When data comes into Splunk it looks for certain aspects and features of the data.   Of these features, six key elements make searching, correlating, and making your data work for you much easier.  If you nail these 6 ingredients, you’ve got the recipe for data onboarding success.

Ingredient #1: HOST

  • A default field that contains the hostname or IP address of the network device that generated the event
  • Use the host field in searches to narrow the search results to events that originate from a specific device
  • Allows you to locate the originating device

To find the hosts in your Splunk instance try this simple search:

| metadata type=hosts index=*

Ingredient #2:  Source

  • A default field that identifies the source of an event, that is, where the event  originated
  • For data monitored from files and directories, the source consists of the full  pathname of the file or directory
      • /var/log/messages, /var/log/messages.1, /var/log/secure
  • For network-based sources, the source field consists of the protocol and port
      • UDP:514, TCP:1514

To take a peek at sources already in your instance use this simple search:

| metadata type=sources index=*


New call-to-action


Ingredient #3:  Sourcetype

  • A default field that identifies the data structure of an event
  • The format of the data input from which it originates
  • access_combined, cisco:asa
  • Determines how Splunk extracts & calculates fields during search time
  • Use the sourcetype field in searches to find all data of a certain type (as opposed  to all data from a certain source)

To have a quick look at the sourcetypes in your instance, use this command:

| metadata type=sourcetypes index=*

Ingredient #4: INDEX

  • The repository for data in Splunk Enterprise
  • Indexes reside in flat files
  • Similar to a folder
  • Used for data access, retention, or organization

This command will give you a snapshot of your indexes and number of records in the indexes:

| eventcount summarize=false index=*

Ingredient #5: Timestamp

  • Splunk uses timestamps to:
    • correlate events by time
    • create the timeline histogram in Splunk Web
    • set time ranges for searches
  • Usually automatic
    • index=* earliest=+5m latest=+10y
Figure 2 - Graph of a timestamp in Splunk
Figure 2 – Graph of a timestamp in Splunk

Ingredient #6: Linebreak

    • Allows Splunk to break the incoming stream of bytes into separate events
    • Supports single-line and multi-line
    • Splunk can usually do this automatically
    Figure 3 - 6 Key Elements of Data Onboarding
    Figure 3 – 6 Key Elements of Data Onboarding

    The “Add-On” Ingredients

    Apps, Add-ons and TA’s can help you with your data onboarding by doing a lot of the work for you.  These can be helpful paired with your six main ingredients!

    • Includes relevant config files  (props/transforms) and ancillary  scripts & binaries.
    • Ongoing enhancements & updates.
    • Help make data CIM compliant
    Figure 4 - Apps and Add-On as a solution
    Figure 4 – Apps and Add-On as a solution

    Final Tips

    We’ve reviewed the some steps on entering data in the right way… don’t forget these tips:

    • Always set a sourcetype!
    • Don’t use syslog as a sourcetype!
    • Don’t use csv, json, xml as a sourcetype!
    • Use Splunkbase as a starting point!
    • Make your own sourcetype if you must, but use the naming format
      • <Technology>:<Technology Tier>:<Sourcetype>
    • Make the data usable – CIM!

    Need help getting your help onboarding data into Splunk? We have a fleet of Splunk consultants with the expertise to help. If you’re interested in speaking with our Kinney team, check out the form below.

New call-to-action
Helpful? Don't forget to share this post!
Share on linkedin
Share on reddit
Share on email
Share on twitter
Share on facebook

No comment yet, add your voice below!

Add a Comment

Your email address will not be published. Required fields are marked *