Community:Multi-tenant scenario
From Splunk Wiki
Deployment scenario: multi-tenant Splunk deployment with minimal hardware
Many Splunk customers wish to compartmentalize their IT data. Among the reasons for this requirement are the desire to limit access of one business unit's data to only employees of that unit.
Along with this requirement for data segregation, there is often a companion requirement for the central business unit to have the ability to search the IT data in each sub-unit.
Most customers also wish to deploy the minimum amount of hardware that is feasible given the volume of IT data they are indexing and searching per day.
Requirements
Create a multi-tenant IT search environment
It is possible to create a unique index for each business unit on a single Splunk instance and limit each business unit to search only their specific business unit's index.
However, Splunk is not currently capable of role-based typeahead or _internal and _audit index configuration, causing potential information leaks.
Additionally, it is not currently possible for Splunk to simultaneously search across multiple indexes; as such, meeting the requirement for corporate to search across all IT data is not easily facilitated
The preferred solution is to create a unique instance of Splunk per business unit.
Users in each business unit can only search their own IT data
Creating a unique instance of Splunk per business unit facilitates thorough IT data compartmentalization, including the siloing of typeahead and internal indexes.
Since each instance is completely independent of the others, there is no risk of information leaks via typeahead, internal indexes, or other user information gathering techniques.
Allow corporate the ability to search across all IT data
This deployment approach also meets the requirement for the central business unit to perform a search across all corporate IT data.
This capability is made possible by leveraging Splunk's distributed search feature. A corporate instance of Splunk is set up that distributes search requests to each business unit's Splunk instance.
Since distributed search is configured on a per-instance basis, it is not transitively possible for the sub-units to distribute search requests to the corporate or other sub-unit instances.
Maximize the utilization of hardware
Finally, in many cases each business unit's instance will only be indexing a few GB per day.
Assuming that network bandwidth is not severely limited, there is no reason not to install several Splunk instances on the same hardware to keep costs down while maximizing hardware utilization.
Deployment details
Hardware
The following hardware was used in this deployment:
- (2x) 4-core 2.4 Ghz Intel Xeon CPU
- 12 GB RAM
- 146 GB 10k RPM SAS HDD
- QLogic qla2400 FC-HBA
- 1 TB SAN storage
As a soft limit, it is not recommended that more than ten instances of Splunk are installed on a given physical machine. However, given the disparity between low and high-end commodity hardware, it is recommended that each deployment be staged and tested to determine limits to scalability. Data sources, input methods, and indexing volumes
The data sources and their respective input methods for this deployment were as follows:
- Windows event logs
- Tailed via a Splunk lightweight forwarder and output to TCP port 999x on the business unit's Splunk instance (x corresponds to the instance's specific listening port)
- Estimated volume per day, per site: 2 GB
- Network syslog
- Each business unit's network devices output syslog to their designated Splunk instance on UDP port 51x (x corresponds to the instance's specific listening port)
- Estimated volume per day, per site: 500 MB
- Mainframe reports
- Each business unit's mainframe outputs nightly reports to unique files on an exported IFS volume which the business unit's Splunk instance tails
- Estimated volume per day, per site: 100 MB
To get these estimates, we analyzed the available archive data and calculated daily averages. In some cases, there was approximately one week of archive data available, while in other cases more thorough archive data was available.
Licensing
Using the estimates above, each instance is projected to index approximately 2.6 GB per day.
Since smaller sites are likely to index slightly less than this while larger sites might index slightly more, a fair estimate of total indexing volume is 18.2 GB per day. This would dictate a 20 GB license divided into six 3 GB licenses and one 2 GB licenses.
The licensing and input assumptions should be validated during the first 30 days in production. It is a low risk endeavor, as Splunk's licensing model will never terminate forwarding or indexing, and furthermore will only disable search if the license is violated 7 times within a 30 day rolling window.
Server Installation
Server installation was performed using the standard linux tarball downloaded from http://www.splunk.com/download. The Splunk binaries and logs will go under /opt, while the data store will go on the SAN storage which is mounted at /mnt/i01. The exported IFS volume from each mainframe is mounted under /mnt/ifs/site_name. Many of the steps below can be automated using simple shell, python, or perl scripts.
Note: In many cases, it is not advisable to index directly to a SAN; however, in this case performance was benchmarked and tested under load before implementing in production.
- Change directory to /mnt/i01
- Create a directory that corresponds to the name of each instance of Splunk that will be installed on the server.
- Change directory to /opt
- Unzip the compressed tarball
- Extract the uncompressed tarball
- Recursively copy the extracted /opt/splunk directory to each instance name
- Manually edit each ./splunk_*/etc/splunk-launch.conf file and make the following changes:
- Uncomment the $SPLUNK_HOME line and change to $SPLUNK_HOME=/opt/splunk_n (where n corresponds to the particular instance)
- Uncomment the $SPLUNK_DB line and change to $SPLUNK_DB=/mnt/i01/splunk_n (where n corresponds to the particular instance)
- Install licenses for each version
- Open /opt/splunk_n/etc/splunk.license and add the license key specific to the instance
- Remove the /opt/splunk directory
The bash script below achieves the same end:
#!/bin/bash
LIC=( "empty0" "<license1>" "<license2>" "<license3>" "<license4>" "<license5>" "<license6>" "<license7>"
for (( i=1;i<=7;i+=1 )) ;
do
cd /opt;
mkdir /mnt/i01/splunk_$i;
cp -R ./splunk ./splunk_$i;
echo -e "\$SPLUNK_HOME=/opt/splunk_$i\n\$SPLUNK_DB=/mnt/i01/splunk_$i\n" > ./splunk_$i/etc/splunk-launch.conf;
echo -e "${LIC[i]}" > ./splunk_$i/etc/splunk.license;
rm -rf ./splunk;
done
Authentication & Authorization Configuration
To ensure that each business unit's Splunk users can only log in to their intended Splunk instance, we configured Splunk to authenticate to Active Directory(AD) and mapped security groups from AD to roles within Splunk.
- For each instance, we created an authentication configuration that bound to the directory server corpdc01 with the bind account "svc_splunk". The default LDAP port 389 was used since secure LDAP was not a required by the corporate security policy.
- "ou=splunk,ou=security_groups,dc=corp,dc=local" off of the directory root was configured as the group base, from which Splunk will enumerate security groups to be mapped to internal Splunk roles.
- "ou=users,dc=corp,dc=local" was configured as the user base, from which Splunk will enumerate users that belong to the aforementioned security groups mapped to Splunk roles.
- A security group was created for corporate administrators and mapped to the "admin" role within Splunk. Site-specific security groups were created for power users and standard users and mapped to the "power" and "user" roles within Splunk.
The configuration below was taken from the CHI Splunk instance's /opt/splunk/system/local/authentication.conf:
[CHI] Admin = CORP_splunk_admin; Power = CHI_splunk_power, CORP_splunk_power; User = CHI_splunk_user, COPR_splunk_user; SSLEnabled = 0 bindDN = cn=svc_splunk,ou=service_accounts,ou=users,dc=corp,dc=local bindDNpassword = failsafeLogin = admin failsafePassword = groupBaseDN = ou=splunk,ou=security_groups,dc=corp,dc=local; groupBaseFilter = (objectclass=*) groupMappingAttribute = dn groupMemberAttribute = member groupNameAttribute = cn host = corpdc01 pageSize = 800 port = 389 realNameAttribute = name userBaseDN = ou=users,dc=corp,dc=local userBaseFilter = (objectclass=*) userNameAttribute = sAMAccountName _actions = new,edit,delete [auth] authSettings = CHI authType = LDAP
Receiving Configuration
Each instance must have at least three unique TCP ports and one unique UDP port to bind to. In this deployment, the following ports were decided upon:
| Instance name | SplunkWeb port | Splunkd port | Splunk data port | Syslog Port |
| splunk_1 | 8001 | 8091 | 9991 | 511 |
| splunk_2 | 8002 | 8092 | 9992 | 512 |
| splunk_3 | 8003 | 8093 | 9993 | 513 |
| splunk_4 | 8004 | 8094 | 9994 | 514 |
| splunk_5 | 8005 | 8095 | 9995 | 515 |
| splunk_6 | 8006 | 8096 | 9996 | 516 |
| splunk_7 | 8007 | 8097 | 9997 | 517 |
To force each instance of Splunk to listen on the selected ports on startup, /opt/splunk_n/system/local/web.conf and /opt/splunk_n/system/local/inputs.conf were created before the initial startup. While we are editing inputs.conf, we added the tailed IFS directory as well.
For example, /opt/splunk_1/system/local/web.conf contained the following:
[settings] mgmtHostPort = localhost:8091 httpport = 8001
/opt/splunk_1/system/local/inputs.conf contained the following:
[splunktcp://9991] disabled = false queue = parsingQueue sourcetype = tcp-9991 [udp://511] disabled = false sourcetype = syslog [tail:///mnt/ifs/chi/] disabled=false followTail=1
After configuring each instance of Splunk as such, start the instance and accept the license agreement:
/opt/splunk_1/bin/splunk start --accept-license
Forwarder Installation & Configuration
After changing the network devices of each business unit to direct its syslog output to the correct UDP port, we installed Splunk in a lightweight forwarder configuration on the Windows servers as follows (the examples below assume the target instance is splunk_1 on server Splunk1):
- Download the Windows binary from http://www.splunk.com/download
- Install interactively using the MSI, accepting all defaults (i.e. run as local system and index all event logs)
- Start splunk if it does not start automatically, and ensure that the service is set to start automatically at boot
- Open c:\program files\splunk\etc\splunk.license with a text editor and paste in your forwarder license
- Issue the following commands from a command prompt
c:\program files\splunk\bin> splunk.exe disable webserver c:\program files\splunk\bin> splunk.exe set server-type forwarder
- Modify c:\program files\splunk\system\local\inputs.conf by placing the following at the top of the file:
queue=indexQueue
- Create c:\program files\splunk\system\local\outputs.conf as follows:
[tcpout:splunk_1] server=Splunk1:9991
- Restart the forwarder
c:\program files\splunk\bin> splunk.exe restart
- Change the default admin password
c:\program files\splunk\bin> splunk.exe edit user admin -password fflanda
After configuring all Windows forwarders as detailed above, we verified that each Splunk index server received new events from the domain controller event logs, network devices, and (the next day) the mainframe reports from the IFS share.
Conclusion
Aside from meeting all the project requirements, there are substantial maintenance benefits to this deployment model:
- If one business unit experiences performance problems with search or indexing, the impact of the problems is limited to their particular instance.
- Routine maintenance such as upgrades, restarts, or configuration changes can be made on a per-instance basis, thereby not affecting other business units.
In a multi-tenant environment such as a managed service provider (MSP), there are the following additional benefits:
- Higher tiers of hardware with low tenant to resource ratios and highly redundant configurations are cost effective and can be made available as a premium service.
- Each instance's data store can be kept on different partitions, allowing higher volume instances to reside on faster (more expensive) storage while lower tier storage can be leveraged for lower volume instances.