When searching across your data, you may find it necessary to pull fields and values from two different data sources. But is it possible to do that?
The answer is yes! In these cases, we can use the join command to achieve the results we’re looking for.
What is the Join Command in Splunk?
The join command brings together two matching fields from two different indexes. To use the join command, the field name must be the same in both searches and it must correlate to two data sets. To minimize the resource consumption within Splunk, the join command is primarily used when the results of the subsearch are relatively small — 50,000 rows or fewer.
Let’s say you’re trying to match an IP address information from one index to another index with CIDR’s. Or you’re trying to compare values from a lookup because you need to find values that match or don’t match.
WARNING: The join command should not be used lightly. While on the surface it seems like a solution that could be applied to everything, it can consume too much time and Splunk resources if it’s used irresponsibly. Read on to learn how to use the join command responsibly.
Join Command Syntax
Now that we know what to prepare with join, let’s take a look at the syntax:
|join type= left|inner <matching field> [subsearch]
Types of Join Commands
There are two types of joins: left and inner.
- A left join produces ALL of the results from the main search joined with matching results from the subsearch
- An inner join produces only results where the main search and subsearch match
How to Use the Join Command in Splunk (+Example)
Let’s look at a sample search that draws a simple picture of what you can do to join.
Index=test | dedup ip | eval temp_value=0 | table ip temp_value | join type=left ip [|inputlookup blacklist.csv | rename ip_address as ip | eval temp_value=1 | table ip temp_value] | table ip temp_value | where temp_value=0
In this search, we are looking for ip addresses that are not found on our ip blacklist.
Step 1: Start by creating a temporary value that applies a zero to every ip address in the data.
Step 2: Use the join command to add in the IP addresses from the blacklist, including every IP address that matches between the two changes from a 0 to a 1.
Step 3: Filter the search using “where temp_value =0” and filter out all the results of the match between the two.
The Pros and Cons of the Splunk Join Command
- The join command requires a subsearch. This means that a second search inside the main search will retrieve results first and then apply those results to the results of the main search.
- The subsearch is limited to returning the first 50,000 results.
- Search times are not reduced. If you build a complicated subsearch that takes a long time to complete, it will always a long time to complete, even when using the join command. You will still have to wait for the main search to finish.
Join can be a very powerful tool for building coherent tables of data from multiple sources. However, we want to use it responsibly, so we don’t accidentally clog up our environment. Whenever possible, try to find alternative solutions before using the join command.
If you found this helpful…
You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.
Cue Atlas Assessment: a customized report to show you where your Splunk environment is excelling and opportunities for improvement. Once you download the app, you’ll get your report in just 30 minutes.