SPL // Splunk

Using the diff and set Commands

Written by: Ellis DeVaney, Eric Holsinger | Last Updated:

March 15, 2024

Originally Published:

March 7, 2024

Organizations generate immense volumes of machine data today, making extracting meaningful insights from this deluge of information a challenge. However, Splunk provides a powerful solution by offering a versatile platform designed specifically to collect and analyze machine data. The Search Processing Language (SPL) lies at its core – a robust set of commands that empowers users to unlock actionable insights hidden within their data. In this blog post, we will examine using the diff and set commands, exploring their syntax, usage, and benefits. Ultimately, the diff and set commands prove invaluable when identifying differences between datasets, thereby aiding effective data comparison and analysis.

Understanding the diff and set Commands

To begin with, let’s understand the diff command. Users commonly employ the diff command to compare the results of two consecutive events within a dataset. Unlike other commands that focus on aggregating or merging data, diff tailors itself to highlight differences between successive events.

Alternatively, the set command can apply similar comparative logic to separate, larger datasets. In this article, we will focus on the diff variant of the set command, which can expand the use cases for SPL when comparing changes in data values in Splunk.

Syntax of the diff and set Commands

The fundamental syntax of the diff command is structured as follows:

				
					<base_search>
| diff attribute=<field> position1=<row_number1> position2=<row_number2>

The fundamental syntax of the set diff command is structured as follows:

				
					| set diff
[ <base_search1> ]
[ <base_search2> ]

Benefits of Using the diff and set Commands

Moreover, the diff command offers several use cases that make it a valuable addition to your arsenal of Splunk commands.

Change Detection: First and foremost, the diff command allows for efficient change detection between two data points within a dataset, making it easier to pinpoint alterations in data. Similarly, the set diff command This is especially valuable in scenarios where tracking modifications in real-time or historical data is crucial.
Data Validation: Additionally, using the diff command enables you to validate the consistency and integrity of datasets. It helps ensure data remains accurate and records expected changes correctly. The set diff command can easily be applied in the inverse to validate expected changes over time.
Anomaly Detection: Furthermore, the diff command makes identifying anomalies in data seamless. It can quickly identify sudden variations or unexpected changes, thereby aiding proactive troubleshooting and analysis.

Usage of the diff and set Commands

To illustrate usage, let’s consider a practical example. Suppose you monitor user access to a system and want to identify changes in login patterns over time. The following examples will demonstrate comparing access patterns between servers, as well as identifying changes in access patterns over time. To start a base search is needed to provide authentication data associating users to servers they have accessed:

				
					index=authentication earliest=-1d@d latest=now
| stats values(user) as user by dest

To compare access between select servers, the search will be filtered to these servers using the dest field. The diff comand can then be applied to show which users are unique to each server’s authentication logs.

				
					index=authentication earliest=-1d@d latest=now
| stats values(user) as user by dest
| search dest IN (<server1>, <server2>)
| diff attribute=user position1=1 position2=2

To expand this comparison across multiple servers and points in time, the set command can be given the base search with separate time ranges. The output provided identifies changed patterns of authentication between two days:

				
					| set diff
[| search index=authentication earliest=-2d@d latest=-1d@d
 | stats values(user) as user by dest
 | fields dest, user]
[| search index=authentication earliest=-1d@d latest=@d
 | stats values(user) as user by dest
 | fields dest, user]

This search displays the login differences between the two time periods, helping you identify any new logins or changes in login patterns.

For a more advanced use case, the search below uses tstats to return the same output:

				
					| set diff 
    [| tstats summariesonly=t values(Authentication.user) as user from datamodel=Authentication.Authentication where earliest=-2d@d latest=-1d@d by Authentication.dest
     | rename Authentication.* as *
     | fields dest, user]
    [| tstats summariesonly=t values(Authentication.user) as user from datamodel=Authentication.Authentication where earliest=-1d@d latest=@d by Authentication.dest
     | rename Authentication.* as *
     | fields dest, user]

Considerations and Limitations

While powerful, using diff and set judiciously remains important. Like other SPL commands, excessive use can impact system resources. Therefore, consider the overall system specifications and limitations of the environment in which you work. The required use of subsearches with set also introduces considerations of configured environment limits and the size of datasets being processed. For the best performance and accurate reporting, use these commands with searches that provide concise results sets.

Conclusion

In conclusion, the diff and set commands are powerful tools in Splunk’s extensive Search Processing Language, empowering users to compare and analyze data with precision and efficiency. Their capabilities are far-reaching, enabling you to:

Track and monitor changes in your data over time
Validate data integrity and identify inconsistencies
Detect anomalies and deviations from expected patterns

Through practical examples like monitoring user login activity, this article has demonstrated how the diff and set commands can provide deeper insights into your data. However, as with any SPL command, it’s essential to understand its nuances and system considerations to ensure optimal performance and accurate results.

Ultimately, mastering these commands can unlock a new level of data analysis prowess, allowing you to make informed decisions, enhance operational efficiency, and gain a competitive edge. Embrace the power of Splunk SPL and elevate your data analysis capabilities to new heights.

To get access to more Splunk searches like the ones in this article, check out Atlas Search Library which is one of the elements in the Atlas Platform. Atlas Search Library is a curated list of optimized searches that you can use to empower your Splunk users to search without having to learn SPL. You can also create, customize and maintain your own library of searches to ensure that your users are getting the most from using Splunk.