From Splunk Wiki

(Redirected from Deploy:DifferentDataSizes)
Jump to: navigation, search

< Back to Best Practices

How to index different sized archives

This topic is useful if you have an archive of data that you want to index in Splunk. There are different ways to handle an archive, depending on its size. Before you try to archive large amounts of data, you may also want to understand how buckets work.

Here are some general guidelines for different amounts of data.

5-100 GB of data

You can index this amount of data directly into Splunk. Use standard instructions data inputs for files and directories found here.

100 GB to 1 TB

There are two general options if you have hundreds of gigabytes of data:

Index only the most recent data

If you only want to index the most recent data in your archive, you can simply enable monitor. Follow these instructions on configuring monitor.

Index all of the data

If you have a large archive and you want to index the whole thing, start with the oldest data first and index the newest data last. Whichever data you index last will be in the most recent buckets and is the quickest to search. It is therefore the best practice to index the oldest data first and the most recent data last.

  1. Isolate the old data into a directory.
  2. Index the directory of old data using batch processing or monitor, instructions here.
  3. After Splunk has consumed the older data, have it index the newer data. You can use the same method -- monitor or batch.
Personal tools
Hot Wiki Topics

About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk