Considerations for Access Control

From Splunk Wiki

Jump to: navigation, search

Applies to: Splunk 4.0 and higher

Status: Work In Progress

Design Questions

When designing an access control strategy for your Splunk deployment, there are (like most things involving Splunk) many factors to consider. The primary questions that we recommend you consider are:

  • Requirements for Data Compartmentalization
    • Do you need to control data access by host, source, or sourcetype?
    • Do you need to control data access based on search keywords or fields?
    • Do you need to control data access based on discreet business units?
  • Requirements for Retention
    • Does your organization have one specific retention policy for IT data?
    • Does your organization have variable (per application, per message type) retention policies for IT data?
  • Requirements for Roles
    • How many different roles do you need to control data access?
      • A more direct question might be "How many different teams or groups will require access to Splunk"?
    • Will each role be restricted to its own app(s) or will some app(s) need to support multiple roles?
  • Requirements for Manageability
    • How often do you anticipate having to add/remove/change access controls within Splunk?
    • How many index servers do you have in your Splunk deployment?

Recommended Approaches

Depending on the answers to the above questions, there are four basic access control approaches to consider:

  • Single Index
    • No data compartmentalization by index
    • Use search filters to control access to data
  • Multiple Indexes for Retention
    • Compartmentalize data into discreet indexes based on retention policy
    • Use search filters to control access to data
  • Multiple Indexes for Access Control
    • Compartmentalize data into discreet indexes delineated by finite and static criteria (business unit, data category/class)
    • Use indexes as the primary access control restriction, with search filters reserved for more complex use cases
  • Multiple Index Clusters for Retention and Access Control
    • Compartmentalize data into multiple clusters of indexes delineated by finite and static criteria (business unit, data category/class)
    • Each index in the cluster used to satisfy variable retention policies
    • Use indexes as the primary access control restriction, with search filters reserved for more complex use cases

Single Index

In a single index configuration, which is the default behavior of Splunk, all data will flow to the "default" main index. All roles will be allowed to search the main index, and search filters will need to be devised and tested to ensure that the principle of least privilege is successfully enforced on each role.

Multiple Indexes for Retention

In a multiple indexes for retention configuration, multiple indexes will be created to reflect an organization's tiered retention policy. For example, if an organization mandates that confidential data be kept for 30 days, secret data for 45 days, and compliance data for 90, then three indexes would be created, each with different frozenTimePeriodInSecs settings to reflect the associated retention policy. Search filters will need to be devised and tested to ensure that the principle of least privilege is successfully enforced on each role.

Multiple Indexes for Access Control

In a multiple indexes for access control configuration, multiple indexes will be created to reflect the different levels of confidentiality and classification required by an organization's security policy. For example, if an organization mandates that a team's IT data should only be accessed by active members of the team, then each team would be provisioned an index and all the team's data would flow to it. In most cases, access control will be enforced at the role level on a per-index basis. However, there may be some cases where roles will need to be able to search parts of more than one index. In these cases, search filters must supplement the per-index restrictions.

Multiple Index Clusters for Retention and Access Control

In a multiple index clusters for retention and access control configuration, an organization may mandate that access to data be controlled on a per-team basis as well as that each team must leverage a multi-tiered retention policy. In such cases, each team will need N indexes created, with N being equal to the number of retention tiers that a given team requires. Each team will thus have a discreet "cluster" of indexes. In most cases, access control will be enforced at the role level on a per-index cluser basis. However, there may be some cases where roles will need to be able to search parts of more than one index cluster. In these cases, search filters must supplement the per-index cluster restrictions.

Important Considerations for Implementation

Capacity Planning

Perhaps the biggest challenge in designing access control that leverages a multi-index approach is designing and implementing indexes for compartmentalization and capacity. Although not strictly an exercise in access control, it is nonetheless very important estimating the required size of each index and weighing that against the total drive space available and the probability that new indexes will need to be created in the future.

The primary considerations for index capacity planning are:

  • How many indexes will you need initially?
  • How many indexes might you need in a year?
  • How many tiers of storage will you be leveraging?
  • Do you have different retention policies for some indexes?

In considering the questions "how many indexes now?", think about how much space you have in aggregate across each tier of storage you are using for your Splunk data store. For example, if you have 1224 MB of storage and 10 indexes, it might be as simple as specifying that each index can grow to a maxTotalDataSizeMB of 119070, which is 1/10th the size of 95% of the aggregate storage:

Warm storage - 1224 GB
Cold storage - 1 TB
Maximum capacity - 95%
Indexes required - 10

(1224 GB)*.95/10*1024 = 119070 MB <-- maxTotalDataSizeInMB

However, life is hardly ever this elegant or simple, especially in IT. Consider that the default Splunk indexes (e.g. _internal,_audit) will consume a non-zero amount of space on disk. If you utilize summary indexing heavily, you must factor in size allowances for the summary index. Finally, if $SPLUNK_HOME or other non-index data resides one of the storage tiers that you utilize, you will need to account for the storage requirements of those other files. So to go back to the above index:

Warm storage - 1224 GB
Maximum capacity - 95%
Splunk summary index - 30 GB
Splunk internal indexes - 4 * 5 GB
SPLUNK_HOME - 10 GB
Indexes required today - 10


(1224 GB - 30 GB - 20 GB - 10 GB)*.95/10*1024 = 113233 MB <-- maxTotalDataSizeInMB

If you feel it is likely that you will need to add three more indexes within the next 12 months, it might be wise to account for this requirement now, instead of having to recalculate all index sizes or make capacity concessions at the time that the future indexes need to be created.

Warm storage - 1224 GB
Maximum capacity - 95%
Splunk summary index - 30 GB
Splunk internal indexes - 4 * 5 GB
SPLUNK_HOME - 10 GB
Indexes required today - 10
Indexes required in the next 12 months - 3

(1224 GB - 30 GB - 20 GB - 10 GB)*.95/13*1024 = 87103 MB <-- maxTotalDataSizeInMB

Now to fold in a common wrinkle. Given high daily indexing volumes and retention requirements, most customers utilize two-tiered storage for Splunk's index - faster and more expensive "warm" storage, where the most recent data is kept, and "cold" storage where data is moved as it ages. The calculation becomes more difficult in the example above when there is 200 GB of "warm" storage and 1024 GB of "cold" storage:

Warm storage - 200 GB
Cold storage - 1024 GB
Maximum capacity - 95%
Splunk summary index - 30 GB
Splunk internal indexes - 4 * 5 GB
SPLUNK_HOME - 10 GB
Indexes required today - 10
Indexes required in the next 12 months - 3
Assumption of maxDataSize = 10000

(1224 GB - 30 GB - 20 GB - 10 GB)*.95/13*1024 = 87103 MB <-- maxTotalDataSizeInMB

round((200 GB - 30 GB - 20 GB - 10 GB)*.95/13*1024/10000) = 1 <--maxWarmDBCount

So far, we have assumed that data in each of the required indexes falls under the same retention requirements (in this case, the very common "as long as you can keep it given the storage we could afford" requirement). Let's say that there are 3 indexes where the data must be retained for 45 days, and the average daily throughput for these indexes will be 10 GB, 10 GB, and 5 GB respectively. The rest of the data is subject to best-effort retention as described above. Assuming that Splunk can achieve 2-to-1 compression on the size of the raw data versus the size of the compressed raw data and indexes created by Splunk:


Warm storage - 200 GB
Cold storage - 1024 GB
Maximum capacity - 95%
Splunk summary index - 30 GB
Splunk internal indexes - 4 * 5 GB
SPLUNK_HOME - 10 GB
Indexes required today - 10
Indexes required in the next 12 months - 3
Assumption of maxDataSize = 10000
Indexes with 10 GB/day throughput and 45 day retention requirement - 2
Indexes with 5 GB/day throughput and 45 day retention requirement - 1
Indexes with best-effort requirement and unknown throughput - 10

10 GB/day * .5 compression * 45 days = 225 GB
5 GB/day * .5 compression * 45 days = 113 GB

(225 GB + 225 GB + 113 GB) = 563 GB for 3 indexes with finite retention requirements

225 GB * 1024 = 230400 MB <-- maxTotalDataSizeInMB for indexes with 10 GB/throughput and 45 day retention requirement

113 GB * 1024 = 115712 MB <-- maxTotalDataSizeInMB for indexes with 5 GB/throughput and 45 day retention requirement

((1224 GB *. 95) - 225 GB - 225 GB - 113 GB - 30 GB - 20 GB - 10 GB) = 570 GB for the other 10 indexes

570/10*1024 = 58368 MB <-- maxTotalDataSizeInMB for the other 10 indexes

round((200 GB - 30 GB - 20 GB - 10 GB)*.95/13*1024/10000) = 1 <--maxWarmDBCount for all indexes

As you can see from above, there are many factors to consider in planning for capacity. Please read the below section on manageability for a discussion of related issues.

Search Filters

When implementing search filters as a means of access control, it is key to account for the following behaviors:

  • Search filters leak data via the following vectors:
    • metadata - search filters do not restrict queries against metadata. As such, users in roles that are restricted only by search filters will still be able to view meta-information (e.g. event count) for all the hosts, sources, and sourcetypes that exist in the role's allowed indexes.
    • typeahead - search filters do not restrict queries against typeahead (otherwise known as predictive text).
  • Search filters are additive, meaning that if a user is in two roles that have search filters defined for them, the two search filters will be joined together via a union. For example:
[role_foo]
srchFilter = sourcetype=foo

[role_bar]
srchFilter = sourcetype=bar

A user in both the "foo" and the "bar" role will have the effective search filter of:

sourcetype=foo OR sourcetype=bar
  • By default, if a user is a member of one or more roles which has a search filter defined, she will see a blue information bar at the top of their browser informing them that their "search has been restricted by " followed by the (sometimes long) search filter that is being applied to all of their searches.

Search filters also have an implicit performance disadvantage, which is mentioned in the section below entitled "Performance".

Manageability

If an organization is dynamic, meaning that it required frequent organizational and environmental changes in order to function, there are specific manageability concerns that should be accounted for:


Performance

Examples

Multiple business units

Legacy single index deployment

Roles with overlapping responsibilities

Conclusion

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk