Article

Best Practices for Splunk SOAR

Written by: Vittorio Paquette | Last Updated:

February 26, 2024

Originally Published:

March 31, 2023

What is Splunk SOAR?

Splunk SOAR is a Security Orchestration, Automation, and Response (SOAR) solution. Security automation leverages machine-based execution of security actions to detect, investigate and remediate threats programmatically. SOAR can ingest security events from various sources, lets you track, analyze, and triage events, and use ‘playbooks’ to automate responses – all from one interface.

Playbook Error Correction

When developing Splunk SOAR playbooks it’s important to carefully consider error correction in your playbooks. Playbooks that are built without error correction may work, but when an issue occurs it can impact your recovery time or go unnoticed for an extended period of time.

For example, say you have a playbook that checks a file hash against VirusTotal. It queries the VT database to provide results as to if it’s malicious or not. However, let’s say that VT has an issue causing their service for Rest API to go down, or to be updated such that the REST commands to no longer work. If you haven’t built-in some level of error correction, this could go unnoticed. When someone does finally notice that no results are being provided it will take some investigation time to determine the cause.

If a particular action fails during a playbook run, it will show in the activity window that the playbook had an error. However, if you don’t have logging turned on for that playbook you won’t see the cause of the error in the activity history. With logging turned on you can investigate the issue in the debug log.

Having to search through debug logs takes precious time from an analyst. If you had built-in error correction, you could have a note provided to the analyst in the activity history that identifies exactly what action failed and why, reducing the amount of time it takes to investigate the issue, as well as ensuring the analyst is notified of the issue immediately.

The other benefit to having error correction is you can have the playbook stop immediately at the error point. This is especially important if the data output from that action is necessary for other blocks downstream in the playbook to run correctly. This would allow for a clean exit in the playbook with documentation provided such as a note or a comment.

Implementing error correction is quite simple. After every action you should have a check to determine if the output you are expecting is there. So, for example you have VT results and you have two IF statements. The first statement is looking for results greater than 20 for a positive identification of a malicious file hash. Then, you can have an IF statement for less than 20. You could also do an else statement, but then you wouldn’t be able to properly do error correction. So, we will go with an IF statement. The last entry can be an IF or ELSE statement. If you are going to do an IF statement, you could do something like not equal to null. The final statement can be an ELSE since anything other than the first two would mean there is an issue.

The last IF or ELSE statement should use the API ‘comment’ command to output a note that the VT output was null. We can take this a step further and output the JSON with the error code to let the analyst know what the issue is. In my previous example of the REST API being updated such that the REST endpoint is broken, we might see an error code of 400 or similar.

After the API comment block, we will end the playbook as the blocks downstream require the data output from the VT action block. This would reduce unnecessary calls to system resources and simplify the notification and identification of issues in your playbooks runs.

Playbook Documentation

Designing and developing a playbook takes time and an understanding of the process you are trying to automate. The visual editor makes this process easier than having to manually create all the python code as it allows you to drag and drop code blocks with relative ease. However, let’s not forget the core of Splunk SOAR is running on top of python. A standard for any programming language is to document your code. When looking at any python code, you’ll find documentation, or at the very least comments, within the code that help in understanding what that function was designed to do.

However, with the Visual Block Editor (VBE) it’s easy to forget to take the time to document your playbook and visual code blocks. It’s important to name each code block and provide a clear definition of the purpose of the playbook, and within each code block you should clearly define what each block is for and what it does.

Another important step is to properly document what changes were made when you save the playbook. This allows you to determine what changes were made in previous iterations. If you do not properly document the changes when saving, then you’ll need to open each iteration of the playbook and look through the code blocks, or code, to determine what’s different. This is especially important when making frequent changes. If you need to roll back to a previous version it will be much easier to determine which iteration to choose.

Splunk SOAR I2A2

I2A2 is an acronym for Inputs, Interactions, Actions and Artifacts. Splunk SOAR leverages the I2A2 methodology to help in developing playbooks. You’ll need to select a use case, such as determining if a file hash is malicious. An example of a I2A2 configuration is listed below:

INPUT: FileHash, SourceAddress, HostName

INTERACTIONS: VirusTotal, Symantec, ServiceNow, Analyst

ACTIONS: File Reputation, Get Report, Promote to Case, Quarantine Host, Submit ServiceNow Ticket

ARTIFACTS: FileHash

Your I2A2 automation will typically reflect the actions you would take during a manual process. The first step is determining which Inputs are we going to be working with. In this example, we selected FileHash, SourceAddress, and HostName. Next is Interactions, in this example the file will be uploaded to VirusTotal for a reputation check and Get the Report. If the results are bad, we will Promote to Case so an analyst can review the report before determining if the host needs to be quarantined. If the FileHash is malicious, then the host will need to be Quarantined using Symantec. The next step would be to open a ServiceNow ticket to your service desk to have the desktop team clean the host before removing it out of quarantine.

Splunk SOAR Utility Playbooks

When developing playbooks, it’s possible to build an end-to-end playbook that is dedicated to just one-use case. However, this method creates large complex playbooks that are difficult to update and make changes to. The other issue is, if you follow this method, you will have duplicated the same process multiple times in several playbooks. That’s an inefficient method for design and management of your playbooks.

A simpler method is to develop a utility playbook that is modular and reusable. With this design methodology it’s easier to update and manage. For example, I’m currently running an IP reputation in 5 separate playbooks. I have now duplicated that work 5 times. However, if I were to build a single utility playbook for IP reputation then I would only need to go to one location to update the actions I’m using for all 5 playbooks.

Here is an example that helps justify using this method: say I’m currently using VirusTotal for IP reputation, but management has decided they no longer want to use a paid version of VT. We switched to a free license only to find we are currently running more IP Reputation than is allowed by that license. The decision would then be made to start using GreyNoise, which is another paid IP reputation service, and the management team wants this change made immediately as they paid for a yearly subscription.

If I have a utility playbook for IP reputation, I now only have to modify a single playbook to switch from VirusTotal to GreyNoise once. The other benefit is that when building a new playbook, I don’t have to go through the trouble of building out visual blocks for IP reputation. I only need to simply call one block to call the IP Reputation playbook.

This re-use policy works with more complicated playbooks, such as formatting everything gathered during an investigation to open a ticket in ServiceNow. If you have a ServiceNow utility playbook you can simply drag and drop a block into your new playbook to open a SNOW ticket. This allows you to focus on optimal investigation processes instead of getting bogged down in supporting details.

Conclusion

In this article, I have outlined some of the best practices that should be employed when working with Splunk SOAR including adding error detection and correction to your playbooks, documenting your code blocks for ease of understanding and reuse, leveraging the I2A2 methodology in your investigation processes, and leveraging utility playbooks to achieve code re-use and speed playbook development.

If you found this helpful…

You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.

Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below: