SPL // Splunk

Using the MLTK Commands

Written by: Robert Caldwell | Last Updated:

September 4, 2025

Originally Published:

September 4, 2025

Splunk’s Machine Learning Toolkit (MLTK) fundamentally transforms how organizations approach predictive analytics and data science within their existing Splunk infrastructure. By extending Search Processing Language (SPL) with several powerful machine learning commands, MLTK enables analysts to build, deploy, and manage sophisticated models directly within the Splunk platform. This comprehensive guide explores the essential MLTK commands that form the foundation of any machine learning workflow in Splunk.

Understanding MLTK's Command Structure

The Machine Learning Toolkit introduces specialized search commands that provide advanced statistical and machine learning capabilities. These commands integrate seamlessly with any Splunk environment. This enables organizations to scale machine learning operations across large datasets while maintaining security and governance standards.

The commands work together as fit trains models, apply generates predictions, score validates performance, summary provides model insights, listmodels manages inventory, deletemodel handles cleanup, and sample optimizes data preparation. This command structure supports complete machine learning lifecycles from development through production deployment.

Want to learn more about MLTK as an application? Check out our blog post here.

Essential MLTK Commands & Their Applications

The fit Command: Building Models

The fit command serves as MLTK’s model training engine, transforming historical data into predictive models using machine learning algorithms. This command creates persistent knowledge objects that integrate with Splunk’s existing security and permissions framework.

				
					| fit <algorithm> [function]=[<function_value>] [into <model_name>]

Example – Email Marketing Optimization:

This example demonstrates how marketing teams can optimize campaign performance by training models on historical engagement data. The Random Forest algorithm learns complex relationships between timing, content characteristics, and conversion outcomes, enabling data-driven campaign planning.

				
					index=marketing sourcetype=email_campaigns 
| stats avg(click_rate) as avg_clicks, 
  sum(conversions) as total_conversions, count(subject_line) as emails_sent 
| eval conversion_rate = total_conversions / emails_sent 
| fit RandomForestRegressor conversion_rate from avg_clicks

The apply Command: Generating Predictions

The apply command leverages trained models to generate real-time predictions and classifications on new data. This command enables operational teams to embed machine learning insights directly into their monitoring and response workflows.

				
					apply <model_name> [as <output_field>]

Example – Inventory Management:

This search transforms inventory data into actionable recommendations, automatically flagging products requiring attention based on predicted demand patterns. Operations teams can integrate these insights into existing dashboards and alerting systems.

				
					index=inventory sourcetype=warehouse_data 
| stats avg(daily_demand) as avg_demand, sum(stockouts) as stockout_events, 
  avg(lead_time) as avg_lead_time, last(current_stock) as current_level 
  by product_sku, warehouse_location 
| apply inventory_reorder_predictor as recommended_stock_level 
| eval reorder_urgency = case( 
  current_level < recommended_stock_level * 0.5, "immediate", 
  current_level < recommended_stock_level * 0.7, "within_week", 
  current_level < recommended_stock_level, "monitor", 
  1=1, "sufficient") 
| where reorder_urgency!=sufficient

The score Command: Validating Model Performance

Model validation represents a critical component of any production machine learning system. The score command provides comprehensive performance metrics appropriate for different model types, ensuring teams can confidently deploy models that meet business requirements.

				
					| score <scoring-method-name> <actual_field_1> ... <actual_field_n> against <predicted_field_1> ... <predicted_field_n> [options]

Example – Content Personalization:

This validation approach connects model performance metrics directly to business outcomes, enabling stakeholders to understand the practical impact of machine learning investments.

				
					index=sales sourcetype=historical_data 
| fit LinearRegression sales_amount from month quarter seasonal_factor 
  into sales_forecast_model 
| apply sales_forecast_model as predicted_sales 
| score regression rmse(sales_amount) against predicted_sales

The sample Command: Strategic Data Sampling

The sample command addresses the common challenge of working with large datasets while maintaining statistical validity. Strategic sampling enables faster model development cycles and balanced training datasets.

				
					| sample [ratio=<float between 0 and 1>] [count=<positive integer>] [proportional=<name of numeric field> [inverse]] [partitions=<natural number greater than 1> [fieldname=<string>]] [seed=<number>] [by <split_by_field>]

Example – Sampling for A/B Testing:

This example takes a 10% sample of web logs distributed across 5 partitions for quality analysis. The sampling enables faster statistical analysis of HTTP status codes and methods across large log volumes, providing representative insights without processing the entire dataset.

				
					index=web_logs sourcetype=access_combined  
| sample ratio=0.1 partitions=5  
| stats count by status_code, method

Operational Benefits & Strategic Advantages

#1: Enhanced Decision-Making Capabilities

MLTK commands transform historical data into actionable intelligence, enabling organizations to shift from reactive to predictive operational models. Teams can identify trends, anomalies, and opportunities before they become critical issues, improving overall business responsiveness.

#2: Reduced Time-to-Insight

By integrating machine learning directly within Splunk’s familiar interface, organizations eliminate the complexity and overhead associated with separate analytics platforms. Data scientists and analysts can collaborate more effectively using shared SPL knowledge and existing security frameworks.

#3: Scalable Intelligence Operations

The distributed nature of Splunk’s architecture enables MLTK commands to scale across large datasets and multiple concurrent users. Organizations can deploy machine learning capabilities across different teams and use cases without requiring additional infrastructure investments.

Conclusion

Splunk’s Machine Learning Toolkit commands enable organizations to transform their data into predictive insights. By mastering these commands, teams can build sophisticated machine learning solutions that deliver measurable business value while leveraging existing Splunk investments and expertise.

To access more Splunk searches, check out Atlas Search Library, which is part of the Atlas Platform. Specifically, Atlas Search Library offers a curated list of optimized searches. These searches empower Splunk users without requiring SPL knowledge. Furthermore, you can create, customize, and maintain your own search library. By doing so, you ensure your users get the most from using Splunk.