Community:AddingArchivedData
From Splunk Wiki
Adding Archived or Historic Data to Splunk
This document is intended to guide an experienced administrator of Splunk. You should have working knowledge on installing Splunk, adding new data inputs to Splunk, and be familiar with buckets and indexes.
Background
This topic is intended to cover how to handle historic data. We define archived or historic data as events which are time stamped in the past (e.g. - an error log containing events that occurred a few weeks ago). In a large scale deployment, you may want to load current and historic data to your Splunk installation.
To achieve the best performance from Splunk, create a separate index to contain the historic events. For more information regarding how indexes and buckets work, you can visit the following URL http://www.splunk.com/base/Deploy:UnderstandingBuckets. If you intend to add a significant amount of historic data to a working instance of Splunk, then you should read on.
Methodology
The basic steps to add historic data are as follows: create a standalone instance of Splunk; load the historic data into this instance; move the indexed data to your primary installation of Splunk.
1. Create a standalone default instance of Splunk. This can be created on the same server if there are sufficient resources. Do not index any data with this instance at this point. Ensure functionality by starting the server and running a test search against the internal index. Query: "index=_internal *"
2. Via your preferred input method (directory, file, network, etc...), load the historic data.
- Load the data from oldest to newest. Avoiding a single batch upload and loading in chronological order will significantly improve search performance.
- (Splunk 3.x) Once all of the data has finished loading, roll the hot database (
db-hot
) using the following command:
./splunk search '| oldsearch !++cmd++::roll' -auth splunk
- Stop this instance of Splunk
3. Rename the bucket/database identifier numbers. Each bucket has a unique ID at the end of the directory name (see above link regarding buckets). Your new standalone instance will have low numbered identifiers. Your working/primary instance of Splunk may have clashing identifiers. The identifier is the last integer in the directory name. For example, you may see the following in your $SPLUNK_DB/defaultdb/db:
/opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_1 /opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_2 /opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3 /opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_4 /opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_5
The identifiers for these directories are 1, 2, 3, 4, and 5. In your running/primary instance of Splunk, you will likely have much higher numbers. You must manually set (rename) the identifiers in the directories you plan to move. Using a starting value such as 3000 should work for most cases. The idea is to have a number high enough to avoid a clashing identifier (number).
If most of my identifiers in my production/working instance of Splunk are already in the range of 3000, then you should consult with Splunk Support (http://www.splunk.com/support). Assuming that my new standalone instance has created identifiers numbered as above (1,2,3,4 and 5), and my working instance has identifiers only in the single digit or hundreds range, then rename the directories from the standalone instance as follows:
change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_1 to: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236911598_1236883849_3001 change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_2 to: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236914140_1236906359_3002 change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3 to: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236929058_1236870277_3003 change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_4 to: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236958669_1236928918_3004 change: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_5 to: /opt/splunk/var/lib/splunk/defaultdb/db/db_1236974637_1236957784_3005
4. Stop the primary/working instance of Splunk.
Important: You must stop Splunk to avoid the risk of data corruption.
5. Move the newly renamed directories and their contents to the primary/working installation of Splunk.
You will want to place these directories and their contents in the $SPLUNK_DB/defaultdb/colddb
location. Creating a tar ball to move the data is the easiest way to achieve this step. Make sure that the colddb directory only contains the current and newly moved dbs/directories. Splunk may not start if any other files or directories are left in this location. Your $SPLUNK_DB/defaultdb/colddb
might now look as follows:
/opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062401_1232062340_3001 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062461_1232062401_3002 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062525_1232062461_3003 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062587_1232062525_3004 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1232062648_1232062587_3005 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236911598_1236883849_1 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236914140_1236906359_2 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236929058_1236870277_3 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236958669_1236928918_4 /opt/splunk/var/lib/splunk/defaultdb/colddb/db_1236974637_1236957784_5
6. Start the primary/working instance of Splunk.