From Splunk Wiki
How to index different sized archives
This topic is useful if you have an archive of data that you want to index in Splunk. There are different ways to handle an archive, depending on its size. Before you try to archive large amounts of data, you may also want to understand how buckets work.
Here are some general guidelines for different amounts of data.
5-100 GB of data
You can index this amount of data directly into Splunk. Use standard instructions data inputs for files and directories found here.
100 GB to 1 TB
There are two general options if you have hundreds of gigabytes of data:
Index only the most recent data
If you only want to index the most recent data in your archive, you can simply enable monitor. Follow these instructions on configuring monitor.
Index all of the data
If you have a large archive and you want to index the whole thing, start with the oldest data first and index the newest data last. Whichever data you index last will be in the most recent buckets and is the quickest to search. It is therefore the best practice to index the oldest data first and the most recent data last.
- Isolate the old data into a directory.
- Index the directory of old data using batch processing or monitor, instructions here.
- After Splunk has consumed the older data, have it index the newer data. You can use the same method -- monitor or batch.