As a Splunker, you’re constantly faced with questions about what can help or hurt you in Splunk. And if you attended some of this year’s .conf20 sessions, you’re probably asking yourself this question:
“Should I use INGEST_EVAL?”
The answer to this is a solid maybe.
At Splunk’s .conf20 this year, Richard Morgan and Vladimir Skoryk presented a fantastic session on different capabilities for INGEST_EVAL. When you get a chance, take a look at their presentation recording!
In this review, we’ll go through Richard and Vladmir’s presentation and discuss inspiration derived from it. These guys know what they’re talking about, now I’m giving my two-cents.
This is part one of two: in the second part, we’ll look at code samples to test some of these use cases.
Splunk added the ability to perform index-time eval-style extractions in the 7.2 release. It was in the release notes, but otherwise wasn’t much discussed. It generates more buzz in the 8.1 release as these index-time eval-style extractions (say that three times quickly) support the long-awaited index-time lookups.
The purpose of INGEST_EVAL is to allow EVAL logic on indexed fields. Traditionally in Splunk, we’d hold-off on transformations until search time — old-timers may remember Splunk branding using the term “Schema on the Fly.” Waiting for search-time is in our DNA. Yet, perhaps the ingest-time adjustments are worth investing in.
Let’s look through the key takeaways on what ingest-time eval provides. Then you can review to see it’s worth the hassle to do the prep-work to take advantage of this.
1. Selective Routing
Before you try to yank my Splunk certs away, yes, we already have a version of this capability. This is slightly different than the typical method that can send data to separate indexers, say Splunk internal logs going to a management indexer instead of the common-use one, or security logs to a parent organization’s Splunk instance.
The INGEST_EVAL version allows for selective routing based on whatever you can come up to use in an eval statement. The example from the presentation uses the match function in regex to send data from select hosts to different indexers. Ideally, this would happen on a heavy forwarder, or any other Splunk Enterprise box, before reaching the indexers. Perhaps those security logs are staying on-prem, and the rest of the logs go to Splunk Cloud.
What else could we come up with for this? If data contains a particular string, we can route that to different indexes or indexers. We already have that with transforms. But, transforms are reliant upon regex, whereas this could use eval functions. Move off large profit transaction to a separate set of indexers, if a list of special codewords appear, move it to a different indexer?
Let your imagination run on this, and you’ll find lots of possibilities.
2. Ingest log files with multiple timestamp formats
In the past, we had to dive into the depth of datetime_config.xml and roll a custom solution. INGEST_EVAL, along with if/case statements, can handle multiple timestamp formats in the same log. Brilliant. If you have ever had to deal with logs that have multiple timestamp formats (and the owners of those logs who won’t fix their rotten logs), then you’ll be thrilled to see an easy solution.
INGEST_EVAL can look at the data and search for different formats until it finds a match.
3. Synthesizing dates from raw data mixed with directory names
Sometimes we find data, often IoT or custom syslog data, where the log file only has a timestamp. In these cases, we normally see the syslog server write the file into a directory with a date name. Example: /data/poutinehuntingcyborg/2020-10-31.log
Using INGEST_EVAL, it’s possible to create an _time that uses part of the source and part of the raw data to create a timestamp that matches what Splunk expects. A lovely solution that wasn’t so easy otherwise.
This simple trick could replace having to use ETL.
4. Event Sampling
Using eval’s random function and an if/case statement, it is possible to send along only the percentage of events. Combine with other eval elements such as sending on one in ten login errors or one in one-thousand successful purchases.
By combining multiple eval statements, you could create a sample data set that includes data from multiple countries, different products, and different results.
5. Event Sampling combined with Selective Routing
Sample the data, and then send the sample to test, dev, or over to your machine learning environment. This is big.
6. Dropping unwanted data from structured data
Using INGEST_EVAL, we can drop fields that we otherwise don’t need. With indexed extractions for csv and json, each column or element becomes a field. Sometimes we don’t want those extra fields.
Let’s look at an example: an excel spreadsheet exported as csv, where a user has been adding notes that are unneeded in Splunk.
In standard Splunk ingest, those notes become fields in Splunk and we have to use SPL to remove them from our searches. How often does a csv dump contain hundreds of fields, but we only care about four? (Answer: often).
Using INGEST_EVEL, we can onboard only the columns or elements that we want, and the rest poof away. Not only would this save disk space, but it makes for cleaner searching.
My Final Thoughts
Back to our question… “Should I use INGEST_EVAL?” Again, it’s a solid maybe.
If you need to keep licensing down by only ingesting what you need, then sure. If you need to modify data beyond what sed or a regex can perform, then give it a try. INGEST_EVAL isn’t for every Splunk admin, but not every admin hunts down blogs like this.
Stay tuned for more takeaways on INGEST_EVAL in part two.