Skip to content
SPL // Splunk

Using the join Command

 

Written by: Brett Woodruff | Last Updated:

 
January 30, 2024
 
Splunk Search Command Of The Week: join
 
 

Originally Published:

 
December 13, 2023

The Splunk join Command: Unraveling the Magic

If there were a magic spell in the world of Splunk that could seamlessly merge data, would you want to know about it? Imagine facing a challenge where you have multiple event datasets and need to correlate them for analysis. Think about the world of enterprise security and performance monitoring, where Splunk plays a pivotal role. Did you know that nearly 91% of organizations using Splunk heavily rely on specific commands to improve their data analytics capabilities? Not just any command, but the Splunk join command is the unsung hero in this realm. If you’re keen on enhancing your Splunk prowess and understanding the power of join, then this blog is your treasure map.

 

What is the Splunk join Command?

The Splunk join command is akin to the SQL JOIN function, tailored for Splunk’s unique ecosystem. Predominantly used by data analysts and IT professionals, it merges the results of a primary search with results from a subsearch. The sheer power of this command lies in its ability to combine datasets based on a common field, enriching the Splunk user’s insights. Understanding the intricacies and applications of the join command can be a game-changer in your data analysis journey.

 

The Benefits of the Splunk join Command

Before diving into this any deeper, let’s familiarize ourselves with the benefits it brings to its users:

  • Easy Data Enrichment: By using join, one can effortlessly enrich primary datasets with secondary data, providing a comprehensive view. On the flip side, neglecting this powerful command can leave analysts grappling with fragmented data, missing out on vital data correlations.
  • Time Savings: Rather than using multiple tools or manual methods to correlate data, the join command expedites the process. However, misuse or over-reliance can sometimes slow down searches, emphasizing the need for judicious application.
  • Enhanced Security Analytics: For security analysts, correlating events across multiple logs is crucial. Using the join command, threats can be detected more efficiently, whereas not leveraging it might mean overlooking potential security breaches.


Command Variations

Splunk join has several variations. Here are the main types you should be familiar with: 

 

  • Type #1Inner Join: Returns results that have matches in both primary and secondary datasets.  
  • Type #2Left Join: Returns all results from the primary dataset and matched results from the secondary dataset.  
  • Type #3 Outer Join: Synonymous with left joins

How To Use the join Command

Let’s walk through the process of using the join command in Splunk:

 

Step 1: Specify the Primary Search 
Search for your primary dataset in Splunk.   

Splunk Tip: Always ensure that the primary dataset is larger than the subsearch for optimized performance. 

Step 2: Initiate the join Command 
After the primary search, use the join command followed by the field you want to join on that exists in both datasets.

Step 3: Specify the Subsearch 
After the join command, initiate the subsearch that is enclosed in square brackets. 

Splunk Tip: Use a relevant time range for subsearch to ensure data accuracy. 

 

Examples to Help Get You Started

Example 1: Correlating server logs with security alerts to pinpoint potential breaches.

 
Step 1: Search for Failed Login Attempts: This search retrieves events of failed login attempts from a web application log within the last 24 hours. 
				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time 
| where count > 5 
				
			

This search tracks the source IP (src_ip), the user who attempted to login, and the timestamp (_time) of failed login attempts. The stats command is used to count occurrences and where filters out any source IP with fewer than five failed attempts, as it may indicate a brute force attack. 

Step 2: Search for Security Alters from IDS: This search pulls alerts from an Intrusion Detection System (IDS) that have been triggered within the same timeframe. 

				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time
| where count > 5 
				
			

This search filters for IDS alerts specifically for brute force attacks, and uses the table command to display the relevant fields. 

Step 3: Correlate the two searches together using the ‘join’ command: To correlate these two searches, you would combine them using a join on the src_ip field, which is common to both datasets: 

				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time 
| where count > 5 
| join type=inner src_ip 
    [search index=ids_alerts alert_type=brute_force_attack earliest=-24h@h latest=@h 
    | table src_ip, alert_severity, _time] 
				
			

The join command is used to merge the two searches on the src_ip field, which is the IP address suspected of malicious activity. The type=inner ensures that only matching entries from both searches are returned. The result would be a list of failed login attempts that correlate with IDS alerts for brute force attacks from the same source IP, within the last 24 hours. 

 

This correlation could suggest that the IP in question is engaged in a brute force attack and requires further investigation. The result of this correlation search can help in quick identification and response to potential threats. 

Example 2: Merging sales data with customer feedback to assess the impact of product reviews on sales.

 

Step 1: Customer Sales Data: This search would pull from a sales data index and could include fields like transaction ID, customer ID, the amount of sale, and the date of the transaction. 

Step 2: Customer Feedback Data: This search would pull from a feedback data index and could include fields like customer ID, review score, and the date of the feedback. 

				
					index="sales_data" | stats sum(sale_amount) as total_sales by customer_id  
				
			
				
					index="feedback_data" | stats avg(review_score) as average_score by customer_id  

 
				
			

Step 3: Join the two searches together using the ‘join’ command: To join these two searches, you would use a common field, which in this case is customer_id. 

				
					index="sales_data" | stats sum(sale_amount) as total_sales by customer_id
| join customer_id
    [ search index="feedback_data"
    | stats avg(review_score) as average_score by customer_id ] 
				
			

This Splunk search would provide you with a table where each customer_id is associated with their total_sales and average_score. With this data, you can begin to assess the impact of customer reviews on sales. For example, you could look for correlations between high average review scores and total sales amounts. 

 

Conclusion

The Splunk join command stands as a formidable tool for data analysts, providing unparalleled data enrichment and correlation capabilities. With variations like inner, left, and outer joins, it caters to diverse analytical needs. By mastering the steps and understanding its nuances, you can leverage the join command to its fullest potential. Remember, the use cases provided are just the tip of the iceberg; dive in, practice, and explore the vast depths of insights that await you. 

Get Atlas Free Trial Today
Helpful? Don't forget to share this post!
Share on linkedin
LinkedIn
Share on reddit
Reddit
Share on email
Email
Share on twitter
Twitter
Share on facebook
Facebook