How does Splunk keep track of log files that it is monitoring? With Fishbucket.
What is Fishbucket?
Fishbucket in Splunk is a sub-directory that maintains CRC (Cyclic Redundancy Check) state for monitored files. It contains seek pointers and CRCs and keeps track of how much of your file has been indexed into Splunk. The default location of the fish bucket sub-directory is the $splunk_home/var/lib/splunk.
CRC helps Splunk to determine if it has read a file before or not. To prevent re-indexing a previously-read file Splunk runs a cyclic redundancy check against first and last 256 bytes of a file. When Splunk is restarted file monitor processor checks if CRC is present in its database. If CRC is found and seek pointer is same as previous, then Splunk knows file has already been ingested and if CRC is not present or seek pointer is different than Splunk re-ingests whole file again.
You can use initCrcLength setting in inputs.conf to tell Splunk to read beyond the first 256 bytes and create a unique CRC.
To check current file monitoring status:
- REST: /services/admin/inputstatus/TailingProcessor:FileStatus
- CLI: splunk list inputstatus [-input | -type]
How To re-Ingest Log File Data Into Splunk
There will be situations when there is a need to re-ingest log file data into Splunk for whatever reasons. Btprobe command comes in handy. You can use the btrpobe — reset command to clear the CRC seek pointer of a single file or multiple files and restart Splunk. This makes Splunk think it has a new file(s) and ingests same data again.
Splunk Pro Tip: Btprobe—reset command must be used with caution in Production environments as this may have adverse effects if not used correctly.
Check the Fishbucket and change/reset the tailing records if needed:
- Splunk cmd btprobe -d <fishbucket_path> -k <file_path> –validate
- Splunk cmd btprobe -d <fishbucket_path> –file <file_path> –reset
You can check to see which files are being monitored by a Universal Forwarder:
- Splunk cmd btprobe -h
file position = 182118
file size = 182118
parent = $SPLUNK_HOME/var/log/python_upgrade_readiness_app
percent = 100.00
type = open file
Common use cases for resetting the Fishbucket:
- Delete sensitive data from Splunk and re-ingest with masking sensitive fields
- Transition from test index to production
- Re-ingest data with correct sourcetype
Splunk’s Fishbucket provides a lot of information in which files are being monitored, file size and what percentage of file is read by forwarder. Duplicate ingestion of log data issue can also be caught using fish bucket data. Use this data for insights into what forwarder is doing with log files.
If you found this helpful…
You don’t have to master Splunk by yourself in order to get the most value out of it. Small, day-to-day optimizations of your environment can make all the difference in how you understand and use the data in your Splunk environment to manage all the work on your plate.
Cue Atlas Assessment: Instantly see where your Splunk environment is excelling and opportunities for improvement. From download to results, the whole process takes less than 30 minutes using the button below: