Keep Things Flowing in Your Splunk Data Pipeline

In a given Splunk environment, there are hundreds (or magnitudes more) of machines generating metrics and feeding your syslog data into Splunk. In a small environment, there is a single syslog server receiving data from hundreds or thousands of machines, like your work laptop, for instance. On the syslog server, there is a Splunk Universal Forwarder collecting all of it and sending it to the indexers for storage and so that it can be searched by users.

Most of the time, the data is coming from sources that may be distinctly separate companies or sections of a business, like HR and SecOps/ITops. We do not want HR seeing SecOps/ITops data and vice versa. Additionally, this data is likely to be of different formats, and we have to tell Splunk what to expect in order to efficiently and accurately index the data.

We can use Splunk to analyze the data before it gets to the indexes so that when it does get to the indexes, we send HR data to one index and SecOps/ITops to a different index (keeping data in different indexes is the only way to control access to and retention of the data). Additionally, if the data is of different formats, we can teach Splunk to recognize the difference and to extract fields and timestamps accordingly.

The Problem

You have syslog data coming into Splunk. It is a combination of technologies, Cisco and Juniper devices, Bluecoat data, etc. Instead of all of this data going into a single index as a single sourcetype, you need for Splunk to split this data in a way that makes sense (and abides by security and compliance policies). In this example, you are aggregating a bunch of data from multiple syslog servers on to a single syslog “aggregator.”

The syslog aggregator is putting all of the syslog data into common files rather than separating it. When you configure the UF to monitor the data, you have to set a sourcetype and index when you configure the input.  You can only have one setting for each attribute, so you set “sourcetype=syslog” and “index=common”, but this is not acceptable for the previously described reasons.

The Fix

We need Splunk to perform two operations on these events:

  1. Change the sourcetype as appropriate for the data
  2. Send data whose sourcetypes were renamed to a specific index

We will have two types of events, dhcp and dns.

Here are the sample events…

<pre>Feb  5 19:51:45 10.11.32.139 dhcpd[25291]: DHCPACK on 10.5.128.190 to 00:e0:c5:2f:55:d1 (blah-T-blah) via eth1 relay 10.5.128.131 lease-duration 691200 (RENEW)

Feb  5 19:51:45 10.11.32.139 dhcpd[25291]: DHCPREQUEST for 10.5.128.190 from 00:e0:c5:2f:55:d1 (blah-T-blah) via 10.5.128.131 TransID 400968e0 (RENEW)

Feb  5 19:51:45 10.11.32.139 dhcpd[25291]: DHCPACK on 10.5.128.190 to 00:e0:c5:2f:55:d1 (blah-T-blah) via eth1 relay 10.5.128.130 lease-duration 691200 (RENEW)

Feb  5 19:51:23 10.121.8.77 named[13705]: client 10.111.9.62#9832: received notify for zone '10.in-addr.arpa'

Feb  5 19:51:23 10.121.9.62 named[9607]: client 10.111.9.62#9832: received notify for zone '10.in-addr.arpa'

Feb  5 19:51:23 10.111.8.77 named[16928]: client 10.111.9.62#9832: received notify for zone '10.in-addr.arpa'

Feb  5 19:51:23 10.101.9.62 named[12039]: client 10.40.96.40#56180: updating zone blah.fdbl- blah.com/IN': adding an RR at '_ldap._tcp. blah.fdbl-int.com' SRV 0 100 389 usaokay1.fdbl-int.com.

Feb  5 19:51:35 10.41.1.139 dhcpd[7958]: DHCPOFFER on 10.235.140.193 to b0:90:7e:69:32:01 (blah-uc-blah) via eth1 relay 10.235.140.3 lease-duration 119 offered-duration 43200 uid 00:63:69:73:63:6f:2d:62:30:39:30:2e:37:65:36:39:2e:33:32:30

Feb  5 19:51:35 10.41.1.139 dhcpd[7958]: DHCPOFFER on 10.235.140.146 to b0:90:7e:69:32:01 (blah-uc-blah) via eth1 relay 10.235.140.2 lease-duration 119 offered-duration 43200 uid 00:63:69:73:63:6f:2d:62:30:39:30:2e:37:65:36:39:2e:33:32:30

Feb  5 19:51:35 10.41.1.139 dhcpd[7958]: DHCPOFFER on 10.235.141.19 to b0:90:7e:69:32:01 (blah-uc- blah) via eth1 relay 10.235.140.3 lease-duration 119 offered-duration 43200 uid 00:63:69:73:63:6f:2d:62:30:39:30:2e:37:65:36:39:2e:33:32:30</pre>

For the DNS events, we want to apply sourcetype=infoblox:dns…for the DHCP events, we want to apply sourcetype=Infoblox:dhcp…we want Splunk to send all of the re-sourcetyped infoblox data to index=Infoblox

Here are the props and transforms to make this happen…

Props.conf

[syslog]
TRANSFORMS-sourcetype_rename = sourcetype_rename1
TRANSFORMS-sourcetype_rename1 = sourcetype_rename2
TRANSFORMS-index_reroute = index_reroute1

Transforms.conf

[sourcetype_rename1]
REGEX = dhcpd
FORMAT = sourcetype::infoblox:dhcp
DEST_KEY = MetaData:Sourcetype
[index_reroute1]
REGEX = dhcpd|named
FORMAT = infoblox
DEST_KEY = _MetaData:Index

[sourcetype_rename2]
REGEX = named
FORMAT = sourcetype::infoblox:dns
DEST_KEY = MetaData:Sourcetype

For testing, make sure to set your sourcetype to syslog, and make sure you configure the data to be sent to an empty index(so you can confirm the success of the test). Make sure you create an index called infoblox for receiving the re-routed events. When done correctly, all ten of the sample events will go to index=infoblox…six events will be sourcetype=infoblox:dhcp…4 events will be sourcetype=infoblox:dns

In props.conf, if you do not call the transforms in the correct order, you will not be successful in performing all of the desired operations. After data gets sourcetyped, it goes to the indexqueue. Once there, the sourcetype cannot be changed. There you have it, a lesson on the Splunk data pipeline.

Figure 1: Splunk Data Pipleline

Figure 1 – Splunk Data Pipeline

Ask the Experts

Looking for Splunk help? Our Expertise on Demand subscription services will help you along your Splunk Data Pipeline journey…and just about any other issues you need resolved in Splunk. Interested in learning more about Expertise on Demand or our Kinney Group professional services? Check out the form below.

Start typing and press Enter to search