Deploy:DeploymentServer
From Splunk Wiki
Recommendation For Sizing
Briefly,
- A small deployment server (30 or fewer clients) can co-reside with a splunk instance which has other duties, such as a search head, indexer, or other splunk instance.
- At moderate to large sizes (30-300), the deployment server should reside on its own splunk instance which does not have other duties.
- The deployment server accesses can interfere with other management port activities, such as search, management, UI functionality, distributed search, etc. etc.
- At moderate sizes, the phoneHomeIntervalInSecs should be increased from its default value of 30 seconds, to a larger value which meets your business goals. Can deployment clients wait 10 minutes to receive updates? Perhaps 600 is more appropriate then.
- In very large deployments, multiple deployement servers should be used, where a ratio of 300 clients per server is known to perform well. We definitely have at least one customer running with 1,000 clients for one deployment server. There are likely scalability issues at larger sizes which are not identified yet.
Issues to be aware of
- Older clients, (before around 4.1.4) would try connections over http if the https connection would time out. This means that an overbooked deployment server would have non-ssl connections coming into its ssl port, which result in errors reported in the server's log, and further may possibly trigger other overload problems in the management port behavior.
- Some state exists where a deployment server is not servicing all phonehome requests, but is succeeding often enough that the clients eventually receive their data. To determine if this sort of situation is occurring, you should search the splunkd.log reset messages on the deployment clients, to find how soon they reset upon receiving updated bundles from the deployment server.
How Apps are deployed by checking checksum
The checksum is compared at the client and not at the server. The sequence is:
- client sends its details (ip, machine type etc) to the server
- server matches the client attributes to the filter configured in deployment server configuration(whitelist/blacklist) and creates a response which includes the list of apps and their checksums.
- when client receives this response, it compares it with what it had and does action accordingly. So if the response has apps which client does not have or the checksum mismatches, it sends a download request for example.
Example: How To Set Up Deployment Server and Client
##############################################3# # # Deployment Server deployment # ##############################################3# 1. @Server: Configure 'serverclass.conf' Note: Whenever you edit sererclass.conf, you must restart Splunk! Note: Once the app is applied by the server, you cannot modify the config at Client. Note: You MUST edit the app at the Server and deploy it. Note: serverclass.conf is associated to tenants.conf. Do not delete tenants.conf ############################################ # It is IMPORTANT to have a general filter here, and a more specific filter at the app level. # An app is matched _only_ if the server class it is contained in was successfully matched! # # - Deploy general apps: default, local directories, inputs.conf # # [global] # whitelist.0=* # restartSplunkd=true # stateOnClient = enabled # [serverClass:<Class Name>] # whitelist.0=<host or ip> # [serverClass:<Class Name>:app:<App Name>] # whitelist.0=<host or ip> # blacklist.0=* ############################################ # Example #- serverclass.conf #------------------------------------------- [global] whitelist.0=* restartSplunkd=true stateOnClient = enabled [serverClass:FWD2Local] whitelist.0=* [serverClass:FWD2Local:app:LWFoutputs] #------------------------------------------- 2. Place your apps to deploy under $SPLUNK_HOME/etc/deployment-apps directory Example of custom app, LWFoutputs, in the directory etc/deployment-apps etc/deployment-apps/LWFoutputs etc/deployment-apps/LWFoutputs/local etc/deployment-apps/LWFoutputs/local/app.conf etc/deployment-apps/LWFoutputs/local/outputs.conf - etc/deployment-apps/LWFoutputs/local/app.conf [install] state = enabled build = 10001 - etc/deployment-apps/LWFoutputs/local/outputs.conf # Forward to local 55513 [tcpout] defaultGroup = local_55153 [tcpout:local_55153] server = 127.0.0.1:55153 3. @Server #./splunk enable deploy-server -auth admin:changeme1 4. @Server #./splunk restart (Whenever serverclass.conf is modified, Splunk needs to be restarted.) 5. @Client #./splunk set deploy-poll <deployment-server>:<mgmtPort> -auth admin:changeme1 => This will generate 'depeloymentclient.conf' => In 1-5 minutes, the app should be found in etc/apps at the Client => This might fail in Universal Forwarder or license slave, in such case, edit deploymentclient.conf - deploymentclient.conf [target-broker:deploymentServer] targetUri = <deployment-server host or ip>:<mgmtPort> => Option: - @Client #./splunk enable deploy-client -auth admin:changeme1 - @Client #./splunk disable deploy-client -auth admin:changeme1 6. @Client #./splunk restart ------>>>> More things you can do <<<------- 7. @Server #./splunk list deploy-clients -auth admin:changeme1 - This is a snapshot status, and dynamically changing - If you want to see all the list of Clients => Go WebGUI-> Manager -> Deployment Server -> Your Class -> Status Option: ----------------------------------------------------------------- $ $SPLUNK_HOME/bin/splunk list deploy-clients -auth admin:changeme1 Deployment client: ip=10.1.8.28, dns=myana-mbp15.splunk.com, hostname=myana-mbp15.splunk.com, mgmt=8089, build=80534, name=depClient_myana_mbp15_LWF, id=connection_10.1.8.28_8089_myana-mbp15.splunk.com_myana-mbp15.splunk.com_depClient_myana_mbp15_LWF, utsname=darwin-i386 utsname: darwin-i386 name: depClient_myana_mbp15_LWF ip: 10.1.8.28 hostname: myana-mbp15.splunk.com build: 80534 dns: myana-mbp15.splunk.com mgmt: 8089 phoneHomeTime: Fri Jan 7 09:28:30 2011 id: connection_10.1.8.28_8089_myana-mbp15.splunk.com_myana-mbp15.splunk.com_depClient_myana_mbp15_LWF Deployment client: ip=10.1.8.40, dns=sup-vmbox.splunk.com, hostname=sup-vmbox, mgmt=8089, build=89596, name=deploymentClient, id=connection_10.1.8.40_8089_sup-vmbox.splunk.com_sup-vmbox_deploymentClient, utsname=windows-intel utsname: windows-intel name: deploymentClient ip: 10.1.8.40 hostname: sup-vmbox build: 89596 dns: sup-vmbox.splunk.com mgmt: 8089 phoneHomeTime: Fri Jan 7 09:28:57 2011 id: connection_10.1.8.40_8089_sup-vmbox.splunk.com_sup-vmbox_deploymentClient Option: ----------------------------------------------------------------- @Server To check the bundle for each app ------- $ ls -l $SPLUNK_HOME/var/run/tmp total 0 drwx------ 2 myana staff 68 Jan 6 10:22 FWD2Local $ ls -l $SPLUNK_HOME/var/run/tmp/FWD2Local total 24 -rw------- 1 myana staff 10240 Jan 6 14:30 LWFoutputs-1294353032.bundle @Client to check if the app was "being" deployed (This bundle might be gone quickly after the app is deployed) ------- bash-3.2$ ls -l /Applications/splunk_413/var/run total 8 drwx------ 5 myana staff 170 Mar 21 18:48 FWD2Local <======= Here you can find the downloaded class! drwx------ 2 myana staff 68 Jul 15 2010 searchpeers -rw------- 1 myana staff 824 Apr 4 14:29 serverclass.xml drwx--x--x 11 myana staff 374 Apr 4 14:30 splunk 8. After you changed any app configurations. @Server #./splunk reload deploy-server -class <className> -auth admin:changeme1 (DEBUG: ./splunk reload deploy-server -auth admin:changeme1 -debug ) 9. You must restart Splunk whenever serverclass.conf was edited
Troubleshooting Splunk Deployment Server and Client
###################################################### # # Troubleshooting Deployment Server and Client # ###################################################### # # Check if Deployment Server/Client was enabled or disabled # - From command line @Server # ./splunk display deploy-server @Client # ./splunk display deploy-client # For Deployment server, tenants.conf is to set disable/enable. Here is an example of tenants.conf which disabled default serverclass.conf. - tenants.conf [tenant:default] whitelist.0 = * disabled = true # # Log $SPLUNK_HOME/etc/log-local.cfg: # Enable DEBUG settings on the both instances in log.cfg # (can be done via the UI: Manager -> System Settings -> System logging ) # - From command line @Server # ./splunk set log-level DeploymentServer -level DEBUG @Client # ./splunk set log-level DeploymentClient -level DEBUG - From log.cfg (Must restart Splunk after changing this log.cfg) [splunkd] category.DeploymentServer = DEBUG Or, category.DeploymentClient=DEBUG And, possibly category.HTTPClient = DEBUG category.TcpInputProc = DEBUG category.TcpOutputProc = DEBUG # # Recommendation for scalability concern # - A small deployment server (30 or fewer clients) can co-reside with a splunk instance which has other duties, such as a search head, indexer, or other splunk instance. - At moderate to large sizes (30-300), the deployment server should reside on its own splunk instance which does not have other duties. - The deployment server accesses can interfere with other management port activities, such as search, management, UI functionality, distributed search, etc. etc. - At moderate sizes, the phoneHomeIntervalInSecs should be increased from its default value of 30 seconds, to a larger value which meets your business goals. Can deployment clients wait 10 minutes to receive updates? Perhaps 600 is more appropriate then. - In very large deployments, multiple deployement servers should be used, where a ratio of 300 clients per server is known to perform well. We definitely have at least one customer running with 1,000 clients for one deployment server. There are likely scalability issues at larger sizes which are not identified yet. # Issues to be aware of - Older clients, (before around 4.1.4) would try connections over http if the https connection would time out. This means that an overbooked deployment server would have non-ssl connections coming into its ssl port, which result in errors reported in the server's log, and further may possibly trigger other overload problems in the management port behavior. - Some state exists where a deployment server is not servicing all phonehome requests, but is succeeding often enough that the clients eventually receive their data. To determine if this sort of situation is occurring, you should search the splunkd.log reset messages on the deployment clients, to find how soon they reset upon receiving updated bundles from the deployment server. - Whitelist and Blacklist are applied to 'clientName, ip address, host name in DNS record, and Splunk host name => Check the output of "splunk list deploy-clients -auth admin:changeme" - Copy from the $SPLUNK_HOME/etc/system/README/serverclass.conf.spec ------------------------------------------------------------------- whitelist.<n> = <clientName> | <ip address> | <hostname> blacklist.<n> = <clientName> | <ip address> | <hostname> * 'n' is a number starting at 0, and increasing by 1. Stop looking at the filter when 'n' breaks. * The value of this attribute is matched against several things in order: * Any clientName specified by the client in its deploymentclient.conf file * The ip address of the connected client * The hostname of the connected client as provided by reverse DNS lookup * The hostname of the client as provided by the client * All of these can be used with wildcards. * will match any sequence of characters. For example: * Match an network range: 10.1.1.* * Match a domain: *.splunk.com * These patterns are PCRE regular expressions with the additional mappings: * '.' is mapped to '\.' * '*' is mapped to '.*' * Can be overridden at the serverClass level, and the serverClass:app level. * There are no whitelist or blacklist entries by default. ------------------------------------------------------------------- # # Deployment Server/Client do not start # - Make sure DeploymentNG pipeline is enabled The splunkd.log can tell if the module is enabled when starting Splunk => Default is enabled. Only if it's changed manually, it is disabled. - default-mode.conf [pipeline:distributedDeploymentNG] Disabled = false # # Deployment server problems # My apps are not appearing on my client instances ? Things to check for: - Is the client trying to contact the correct server? - Is it getting a connection? - Is the server matching it correctly in serverclass.conf? - Are whitelist/blacklist in use? Are their regex correct? # # Deployment Client error # => Cannot connect to the Server # ==> Probably Deployment server side issue. No port is available, or sockets are used up # This splunkd.log at Client is okay (DEBUG mode) ------------------------------------ 08-16-2011 17:55:09.258 -0700 DEBUG DeploymentClient - PhoneHomeThread woke up 08-16-2011 17:55:09.258 -0700 DEBUG DeploymentClient - Send phone home 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Current state: 3, new state: 4 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Phone home recvd reply: <?xml version="1.0" encoding="UTF-8"?> 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - DeploymentClient is about to reload a new manifest.... 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - stateOnClient=enabled 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - DeploymentClient is done reloading... 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Current state: 4, new state: 3 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Sent phonehome to deployment server on topic: deploymentServer/phoneHome/default 08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Phonehome thread waiting for :15000 mecs 08-16-2011 17:55:24.268 -0700 DEBUG DeploymentClient - PhoneHomeThread woke up 08-16-2011 17:55:24.268 -0700 DEBUG DeploymentClient - Send phone home 08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - Current state: 3, new state: 4 08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - Phone home recvd reply: <?xml version="1.0" encoding="UTF-8"?> 08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - DeploymentClient is about to reload a new manifest.... 08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - stateOnClient=enabled 08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - DeploymentClient is done reloading... 08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Current state: 4, new state: 3 08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Sent phonehome to deployment server on topic: deploymentServer/phoneHome/default 08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Phonehome thread waiting for :15000 mecs # # Deployment Client # => Permission issue in the directory to expand, or the files in the archived app. # 05-27-2010 15:18:07.106 WARN DeployedApplication - Installing app: windows to location: C:\Program Files\Splunk\etc\apps\windows 05-27-2010 15:18:07.106 ERROR DeployedApplication - There was a problem unarchiving file to: C:\Program Files\Splunk\etc\apps\windows\local\service?WSDL due to The filename, directory name, or volume label syntax is incorrect # # Deployment Client # => Applied app's configuration is not available in WebGUI # This is not a Deployment Server/Client issue. If you deploy a configuration file which is for system wide configuration such as email setting or authentication etc, you must edit apps/<app name>/metadata/local.meta and add export = system for the configuration or default. - local.meta [] export = system
Example: How to propagate apps from Primary to Secondary Deployment Server
################################################################### # # How to propagate apps from Primary to Secondary Deployment Server # ( Assuming you already have in apps in default repository in "deployment-apps" dir. ################################################################### # # Primary Deployment Server # - splunkd-port 55041 # - address: 10.1.1.10 - serverclass.conf [global] whitelist.0=* #blacklist.0 = * restartSplunkd = true stateOnClient = enabled [serverClass:UF] [serverClass:UF:app:LWFoutputs] # # Secoudary Deployment Server # - splunkd-port 55051 # - address: 10.1.1.10 # - deploymentclient.conf [deployment-client] disabled = false repositoryLocation = $SPLUNK_HOME/etc/deployment-apps serverRepositoryLocationPolicy = rejectAlways reloadDSOnAppInstall = true [target-broker:deploymentServer] targetUri = 10.1.1.10:55041 - serverclass.conf [global] whitelist.0=* #blacklist.0 = * restartSplunkd = true stateOnClient = enabled # This works [serverClass:UF] [serverClass:UF:app:LWFoutputs] # # End Deployment Client # - splunkd-port 55001 # - address: 10.1.1.10 # - deploymentclient.conf [target-broker:deploymentServer] targetUri = 10.1.1.10:55051 [deployment-client] disabled = false
There are plenty of other possibilities, DEBUG logging will highlight any errors on either instance