Skip to content
Article

Fishbucket in Splunk: All Your Questions Answered

KGI Avatar
 

Written by: Ravi Anumula | Last Updated:

 
February 28, 2024
 
fishbucket in splunk
 
 

Originally Published:

 
February 24, 2023

How does Splunk keep track of log files that it is monitoring? With Fishbucket. 

What is Fishbucket? 

Fishbucket in Splunk is a sub-directory that maintains CRC (Cyclic Redundancy Check) state for monitored files. It contains seek pointers and CRCs and keeps track of how much of your file has been indexed into Splunk. The default location of the fish bucket sub-directory is the $splunk_home/var/lib/splunk. 

CRC helps Splunk to determine if it has read a file before or not. To prevent re-indexing a previously-read file Splunk runs a cyclic redundancy check against first and last 256 bytes of a file. When Splunk is restarted file monitor processor checks if CRC is present in its database. If CRC is found and seek pointer is same as previous, then Splunk knows file has already been ingested and if CRC is not present or seek pointer is different than Splunk re-ingests whole file again. 

You can use initCrcLength setting in inputs.conf to tell Splunk to read beyond the first 256 bytes and create a unique CRC. 

 To check current file monitoring status:

  • REST: /services/admin/inputstatus/TailingProcessor:FileStatus 
  • CLI: splunk list inputstatus [-input | -type]  

How To re-Ingest Log File Data Into Splunk

There will be situations when there is a need to re-ingest log file data into Splunk for whatever reasons. Btprobe command comes in handy. You can use the btrpobe — reset command to clear the CRC seek pointer of a single file or multiple files and restart Splunk. This makes Splunk think it has a new file(s) and ingests same data again.  

Splunk Pro Tip: Btprobe—reset command must be used with caution in Production environments as this may have adverse effects if not used correctly.

Check the Fishbucket and change/reset the tailing records if needed: 

CLI Commands: 

  • Splunk cmd btprobe -d <fishbucket_path> -k <file_path> –validate 
  • Splunk cmd btprobe -d <fishbucket_path> –file <file_path> –reset 

You can check to see which files are being monitored by a Universal Forwarder: 

CLI Command: 

  • Splunk cmd btprobe -h 

TailingProcessor:FileStatus : 

        C:\Program Files\Splunk/var/log/python_upgrade_readiness_app\pura_utils.log 

                file position = 182118 

                file size = 182118 

                parent = $SPLUNK_HOME/var/log/python_upgrade_readiness_app 

                percent = 100.00 

                type = open file 


Common use cases for resetting the Fishbucket:

  1. Delete sensitive data from Splunk and re-ingest with masking sensitive fields 
  1. Transition from test index to production 
  1. Re-ingest data with correct sourcetype 

Conclusion 

Splunk’s Fishbucket provides a lot of information in which files are being monitored, file size and what percentage of file is read by forwarder. Duplicate ingestion of log data issue can also be caught using fish bucket data. Use this data for insights into what forwarder is doing with log files. 

If you found this helpful…

You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.

Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below:

Get Atlas Free Trial Today

Helpful? Don't forget to share this post!
LinkedIn
Reddit
Email
Facebook