Community:AddingArchivedData

From Splunk Wiki

Jump to: navigation, search

Adding Archived or Historic Data to Splunk

This document is intended to guide an experienced administrator of Splunk. You should have working knowledge on installing Splunk, adding new data inputs to Splunk, and be familiar with buckets and indexes.

Background

This topic is intended to cover how to handle historic data. We define archived or historic data as events which are time stamped in the past (e.g. - an error log containing events that occurred a few weeks ago). In a large scale deployment, you may want to load current and historic data to your Splunk installation.

To achieve the best performance from Splunk, create a separate index to contain the historic events. For more information regarding how indexes and buckets work, you can visit the following URL http://www.splunk.com/base/Deploy:UnderstandingBuckets. If you intend to add a significant amount of historic data to a working instance of Splunk, then you should read on.

Methodology

The basic steps to add historic data are as follows: create a standalone instance of Splunk; load the historic data into this instance; move the indexed data to your primary installation of Splunk.

1. Create a standalone default instance of Splunk. This can be created on the same server if there are sufficient resources. Do not index any data with this instance at this point. Ensure functionality by starting the server and running a test search against the internal index. Query: "index=_internal *"

2. Via your preferred input method (directory, file, network, etc...), load the historic data.

  • Load the data from oldest to newest. Avoiding a single batch upload and loading in chronological order will significantly improve search performance.
  • (Splunk 3.x) Once all of the data has finished loading, roll the hot database (db-hot) using the following command:
./splunk search '| oldsearch !++cmd++::roll' -auth splunk
  • Stop this instance of Splunk

3. Rename the bucket/database identifier numbers. Each bucket has a unique ID at the end of the directory name (see above link regarding buckets). Your new standalone instance will have low numbered identifiers. Your working/primary instance of Splunk may have clashing identifiers. The identifier is the last integer in the directory name. For example, you may see the following in your $SPLUNK_DB/defaultdb/db:

/opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_1
/opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_2
/opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3
/opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_4
/opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_5

The identifiers for these directories are 1, 2, 3, 4, and 5. In your running/primary instance of Splunk, you will likely have much higher numbers. You must manually set (rename) the identifiers in the directories you plan to move. Using a starting value such as 3000 should work for most cases. The idea is to have a number high enough to avoid a clashing identifier (number).

If most of my identifiers in my production/working instance of Splunk are already in the range of 3000, then you should consult with Splunk Support (http://www.splunk.com/support). Assuming that my new standalone instance has created identifiers numbered as above (1,2,3,4 and 5), and my working instance has identifiers only in the single digit or hundreds range, then rename the directories from the standalone instance as follows:

change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_1 
to:     /opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_3001

change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_2 
to:     /opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_3002

change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3 
to:     /opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3003

change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_4 
to:     /opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_3004

change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_5 
to:     /opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_3005

4. Stop the primary/working instance of Splunk.

Important: You must stop Splunk to avoid the risk of data corruption.

5. Move the newly renamed directories and their contents to the primary/working installation of Splunk.

You will want to place these directories and their contents in the $SPLUNK_DB/defaultdb/colddb location. Creating a tar ball to move the data is the easiest way to achieve this step. Make sure that the colddb directory only contains the current and newly moved dbs/directories. Splunk may not start if any other files or directories are left in this location. Your $SPLUNK_DB/defaultdb/colddb might now look as follows:

/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062401_1232062340_3001
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062461_1232062401_3002
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062525_1232062461_3003
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062587_1232062525_3004
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062648_1232062587_3005
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236911598_1236883849_1
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236914140_1236906359_2
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236929058_1236870277_3
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236958669_1236928918_4
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236974637_1236957784_5

6. Start the primary/working instance of Splunk.

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk