SPL // Splunk

Using the join Command

Written by: Brett Woodruff | Last Updated:

May 1, 2024

Originally Published:

April 17, 2024

If there were a magic spell in the world of Splunk that could seamlessly merge data, would you want to know about it? Imagine facing a challenge where you have multiple event datasets and need to correlate them for analysis. Think about the world of enterprise security and performance monitoring, where Splunk plays a pivotal role. Nearly 91% of organizations using Splunk heavily rely on specific commands to improve their data analytics capabilities? Indeed, the Splunk join command emerges as the unsung hero in this realm. If you’re eager to enhance your Splunk prowess and grasp the power of join, then this blog will help.

What is the Splunk join Command?

The Splunk join command is akin to the SQL JOIN function, tailored for Splunk’s unique ecosystem. Primarily join is used to merge the results of a primary search with results from a subsearch. The power of this command lies in its ability to combine datasets based on a common field. Understanding the intricacies and applications of the join command can be a game-changer in your data analysis journey.

Before diving into this any deeper, let’s familiarize ourselves with the benefits it brings to its users:

Easy Data Enrichment: By using join, one can effortlessly enrich primary datasets with secondary data, providing a comprehensive view. However, neglecting this powerful command can leave analysts grappling with fragmented data, missing out on vital insights.
Time Savings: Rather than using multiple tools or manual methods to correlate data, the join command expedites the process. However, misuse or over-reliance can sometimes slow down searches, emphasizing the need for judicious application.
Enhanced Security Analytics: For security analysts, correlating events across multiple logs is crucial. Using the join command, threats can be detected more efficiently, whereas not leveraging it might mean overlooking potential security breaches.

Command Variations

Splunk join has several variations. Here are the main types you should be familiar with:

Type #1 – Inner Join: Returns results that have matches in both primary and secondary datasets.
Type #2 – Left Join: Returns all results from the primary dataset and matched results from the secondary dataset.
Type #3 – Outer Join: Synonymous with left joins

How to Use the join Command

Let’s walk through the process of using the join command in Splunk:

#1: Specify the Primary Search – Search for your primary dataset in Splunk.
#2: Initiate the join Command – After the primary search, use the join command followed by the field you want to join on that exists in both datasets.
#3: Specify the Subsearch – After the join command, initiate the subsearch that is enclosed in square brackets.

Proper Command Syntax

The basic syntax for the join command in Splunk is as follows:

				
					... | join [type=<type>] <field_list> [<subsearch>]

Tips for Success:
#1: Always ensure that the primary dataset is larger than the subsearch for optimized performance.
#2: Use a relevant time range for subsearch to ensure data accuracy.

Sample Use Cases

Example 1: Correlating server logs with security alerts to pinpoint potential breaches

Step #1: This search retrieves events of failed login attempts from a web application log within the last 24 hours.

				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time 
| where count > 5

This search tracks the source ip src_ip, the user who attempted to login, and the timestamp _time of the failed login attempts. The stats command is used to count occurrences and where filters out any source IP with fewer than five failed attempts. This indicates that it may be a brute force attack.

Step #2: Search for Security Alters from IDS: This search pulls alerts from an Intrusion Detection System (IDS) that have been triggered within the same timeframe.

				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time
| where count > 5

This search filters for IDS alerts specifically for brute force attacks, and uses the table command to display the relevant fields.

Step #3: Correlate the two searches together using the join command: To correlate these two searches, you would combine them using a join on the src_ip field, which is common to both datasets:

				
					index=web_application_logs status=failed_login earliest=-24h@h latest=@h 
| stats count by src_ip, user, _time 
| where count > 5 
| join type=inner src_ip 
    [search index=ids_alerts alert_type=brute_force_attack earliest=-24h@h latest=@h 
    | table src_ip, alert_severity, _time]

The join command is used to merge the two searches on the src_ip field, which is the IP address suspected of malicious activity. The type=inner ensures that only matching entries from both searches are returned. The result would be a list of failed login attempts that correlate with IDS alerts for brute force attacks from the same source IP, within the last 24 hours.

This correlation suggests that the IP in question is engaged in a brute force attack and requires further investigation. Therefore, the result of this correlation search can help in quick identification and response to potential threats.

Example 2: Merging sales data with customer feedback to assess the impact of product reviews on sales

Step #1: This search would pull from a sales data index and could include fields like transaction ID, customer ID, the amount of sale, and the date of the transaction.

Step #2: Customer Feedback Data: This search would pull from a feedback data index and could include fields like customer ID, review score, and the date of the feedback.

				
					index="sales_data" | stats sum(sale_amount) as total_sales by customer_id

				
					index="feedback_data" | stats avg(review_score) as average_score by customer_id

Step #3: Join the two searches together using the ‘join’ command: To join these two searches, you would use a common field, which in this case is customer_id.

				
					index="sales_data" | stats sum(sale_amount) as total_sales by customer_id
| join customer_id
    [ search index="feedback_data"
    | stats avg(review_score) as average_score by customer_id ]

This Splunk search would provide you with a table where each customer_id is associated with their total_sales and average_score. With this data, you can begin to assess the impact of customer reviews on sales. For example, you could look for correlations between high average review scores and total sales amounts.

Conclusion

The Splunk join command stands as a formidable tool for data analysts, providing unparalleled data enrichment and correlation capabilities. With variations like inner, left, and outer joins, it caters to diverse analytical needs. Some of the advantages of the join command that we covered are:

Enhanced Data Correlation: The Splunk join command is a crucial tool for data analysts by enabling seamless data enrichment and correlation. Therefore, its ability to merge diverse datasets empowers users to uncover meaningful correlations, enhancing the depth and accuracy of their analyses.
Versatile Analytical Solutions: The Splunk join command offers versatile solutions for many analytical challenges. Moreover, its flexibility in handling various join types caters to the specific needs of different use cases, making it an indispensable asset.
Continuous Learning and Optimization: While the examples provided showcase the immediate benefits of the Splunk join command, mastering its nuances can be a challenge. By delving deeper and experimenting with different scenarios, users can extract maximum value from their data.

By mastering the steps and understanding its nuances, you can leverage the join command to its fullest potential. Lastly, the use cases provided here are just the tip of the iceberg; dive in, practice, and explore the vast depths of the insights that await you.

To access more Splunk searches, check out Atlas Search Library, which is part of the Atlas Platform. Specifically, Atlas Search Library offers a curated list of optimized searches. These searches empower Splunk users without requiring SPL knowledge. Furthermore, you can create, customize, and maintain your own search library. By doing so, you ensure your users get the most from using Splunk.