When searching across your data, you may find yourself trying to pull fields and values from two different data sources. Le’ts say you’re trying to match ip information from one index with another index with CIDR’s… Or you’re trying to compare values from a lookup because you really need to find values that match or don’t match…
In these cases, there is a command we can use to achieve the results we want.
If you’re looking to pull your data from two different sources, join command is the one for you.
WARNING: the join command should not be used lightly. While on the surface it seems like a solution that could be applied to everything, there are a few things we need to know.
- Join requires a subsearch, this means that a second search inside out main search will run and retrieve results first and then apply those results to the results of the main search
- The subsearch is limited to returning ONLY the first 50,000 results
- Search times are not reduced. If you build a complicated subsearch that takes a long time to complete, it will always a long time to complete. You will still have to wait for the main search to finish.
How to Use join
Now that we know what to prepare with join, let’s take a look at the syntax:
|join type= left|inner <matching field> [subsearch]
Type = there are two types of joins, left and inner
- A left join produces ALL of the results from the main search joined with matching results from the subsearch
- An inner join produces only results where the main search and subsearch match
Matching field = a field whose name is the same in both searches and correlates between the two data sets
join In Action
Let’s look at a sample search that draws a simple picture of what you can do to join.
Index=test | dedup ip | eval temp_value=0 | table ip temp_value | join type=left ip [|inputlookup blacklist.csv | rename ip_address as ip | eval temp_value=1 | table ip temp_value] | table ip temp_value | where temp_value=0
In this search, we are looking for ip addresses that are not found on our ip blacklist. First, we start by creating a temporary value that applies a zero to every ip address in our data. Then, we’ll use join to add in the ips from our blacklist, including every ip that matches between the two changes from a 0 to a 1. Now, we can filter our search using “where temp_value =0” and filter out all the results of the match between the two.
Join can be a very powerful tool for building coherent tables of data from multiple sources. However, we want to use it responsibly, so we don’t accidentally clog up our environment. Whenever possible, try to find alternative solutions before using the join command.
Ask the Experts
Our Splunk Search Command of the Week series is created by our Expertise on Demand (EOD) experts. Every day, our team of Splunk certified professionals works with customers through Splunk troubleshooting support, including Splunk search command best practice. If you’re interested in learning more about our EOD service or chat with our team of experts, fill out the form below!