Deploy:DeploymentServer

From Splunk Wiki

Jump to: navigation, search

Recommendation For Sizing

Briefly,

  • A small deployment server (30 or fewer clients) can co-reside with a splunk instance which has other duties, such as a search head, indexer, or other splunk instance.
  • At moderate to large sizes (30-300), the deployment server should reside on its own splunk instance which does not have other duties.
    • The deployment server accesses can interfere with other management port activities, such as search, management, UI functionality, distributed search, etc. etc.
    • At moderate sizes, the phoneHomeIntervalInSecs should be increased from its default value of 30 seconds, to a larger value which meets your business goals. Can deployment clients wait 10 minutes to receive updates? Perhaps 600 is more appropriate then.
    • In very large deployments, multiple deployement servers should be used, where a ratio of 300 clients per server is known to perform well. We definitely have at least one customer running with 1,000 clients for one deployment server. There are likely scalability issues at larger sizes which are not identified yet.

Issues to be aware of

  • Older clients, (before around 4.1.4) would try connections over http if the https connection would time out. This means that an overbooked deployment server would have non-ssl connections coming into its ssl port, which result in errors reported in the server's log, and further may possibly trigger other overload problems in the management port behavior.
  • Some state exists where a deployment server is not servicing all phonehome requests, but is succeeding often enough that the clients eventually receive their data. To determine if this sort of situation is occurring, you should search the splunkd.log reset messages on the deployment clients, to find how soon they reset upon receiving updated bundles from the deployment server.

How Apps are deployed by checking checksum

The checksum is compared at the client and not at the server. The sequence is:

  • client sends its details (ip, machine type etc) to the server
  • server matches the client attributes to the filter configured in deployment server configuration(whitelist/blacklist) and creates a response which includes the list of apps and their checksums.
  • when client receives this response, it compares it with what it had and does action accordingly. So if the response has apps which client does not have or the checksum mismatches, it sends a download request for example.



Example: How To Set Up Deployment Server and Client

##############################################3#
# 
# Deployment Server deployment 
#
##############################################3#


1. @Server: Configure 'serverclass.conf'
Note: Whenever you edit sererclass.conf, you must restart Splunk!
Note: Once the app is applied by the server, you cannot modify the config at Client. 
Note: You MUST edit the app at the Server and deploy it.
Note: serverclass.conf is associated to tenants.conf. Do not delete tenants.conf 
############################################
# It is IMPORTANT to have a general filter here, and a more specific filter at the app level. 
# An app is matched _only_ if the server class it is contained in was successfully matched!
#
# - Deploy general apps: default, local directories, inputs.conf
#
# [global]
# whitelist.0=*
# restartSplunkd=true     
# stateOnClient = enabled 
# [serverClass:<Class Name>]
# whitelist.0=<host or ip>
# [serverClass:<Class Name>:app:<App Name>]
# whitelist.0=<host or ip>
# blacklist.0=*
############################################
# Example
#- serverclass.conf
#-------------------------------------------
[global]
whitelist.0=*
restartSplunkd=true     
stateOnClient = enabled 

[serverClass:FWD2Local]
whitelist.0=*
[serverClass:FWD2Local:app:LWFoutputs]
#-------------------------------------------


2. Place your apps to deploy under $SPLUNK_HOME/etc/deployment-apps directory

Example of custom app, LWFoutputs, in the directory
etc/deployment-apps
etc/deployment-apps/LWFoutputs
etc/deployment-apps/LWFoutputs/local
etc/deployment-apps/LWFoutputs/local/app.conf
etc/deployment-apps/LWFoutputs/local/outputs.conf

- etc/deployment-apps/LWFoutputs/local/app.conf 
[install]
state = enabled
build = 10001

- etc/deployment-apps/LWFoutputs/local/outputs.conf 
# Forward to local 55513
[tcpout]
defaultGroup = local_55153

[tcpout:local_55153]
server = 127.0.0.1:55153


3. @Server #./splunk enable deploy-server  -auth admin:changeme1



4. @Server #./splunk restart
(Whenever serverclass.conf is modified, Splunk needs to be restarted.)



5. @Client #./splunk set deploy-poll <deployment-server>:<mgmtPort> -auth admin:changeme1
   => This will generate 'depeloymentclient.conf'
   => In 1-5 minutes, the app should be found in etc/apps at the Client
   => This might fail in Universal Forwarder or license slave, in such case, edit deploymentclient.conf
- deploymentclient.conf
[target-broker:deploymentServer]
targetUri = <deployment-server host or ip>:<mgmtPort>

   => Option:
   - @Client #./splunk enable deploy-client  -auth admin:changeme1
   - @Client #./splunk disable deploy-client  -auth admin:changeme1


6. @Client  #./splunk restart




------>>>> More things you can do  <<<-------

7. @Server #./splunk list deploy-clients -auth admin:changeme1
- This is a snapshot status, and dynamically changing
- If you want to see all the list of Clients
  => Go WebGUI-> Manager -> Deployment Server -> Your Class -> Status

Option:
-----------------------------------------------------------------
$ $SPLUNK_HOME/bin/splunk list deploy-clients -auth admin:changeme1

Deployment client: ip=10.1.8.28, dns=myana-mbp15.splunk.com, hostname=myana-mbp15.splunk.com, mgmt=8089, build=80534, name=depClient_myana_mbp15_LWF, id=connection_10.1.8.28_8089_myana-mbp15.splunk.com_myana-mbp15.splunk.com_depClient_myana_mbp15_LWF, utsname=darwin-i386
                 utsname:       darwin-i386
                 name:       depClient_myana_mbp15_LWF
                 ip:       10.1.8.28
                 hostname:       myana-mbp15.splunk.com
                 build:       80534
                 dns:       myana-mbp15.splunk.com
                 mgmt:       8089
                 phoneHomeTime:       Fri Jan  7 09:28:30 2011
                 id:       connection_10.1.8.28_8089_myana-mbp15.splunk.com_myana-mbp15.splunk.com_depClient_myana_mbp15_LWF

Deployment client: ip=10.1.8.40, dns=sup-vmbox.splunk.com, hostname=sup-vmbox, mgmt=8089, build=89596, name=deploymentClient, id=connection_10.1.8.40_8089_sup-vmbox.splunk.com_sup-vmbox_deploymentClient, utsname=windows-intel
                 utsname:       windows-intel
                 name:       deploymentClient
                 ip:       10.1.8.40
                 hostname:       sup-vmbox
                 build:       89596
                 dns:       sup-vmbox.splunk.com
                 mgmt:       8089
                 phoneHomeTime:       Fri Jan  7 09:28:57 2011
                 id:       connection_10.1.8.40_8089_sup-vmbox.splunk.com_sup-vmbox_deploymentClient


Option:
-----------------------------------------------------------------
@Server To check the bundle for each app
-------
$ ls -l $SPLUNK_HOME/var/run/tmp
total 0
drwx------  2 myana  staff   68 Jan  6 10:22 FWD2Local
$ ls -l $SPLUNK_HOME/var/run/tmp/FWD2Local
total 24
-rw-------  1 myana  staff  10240 Jan  6 14:30 LWFoutputs-1294353032.bundle


@Client to check if the app was "being" deployed
(This bundle might be gone quickly after the app is deployed)
-------
bash-3.2$ ls -l /Applications/splunk_413/var/run
total 8
drwx------   5 myana  staff  170 Mar 21 18:48 FWD2Local         <======= Here you can find the downloaded class!
drwx------   2 myana  staff   68 Jul 15  2010 searchpeers
-rw-------   1 myana  staff  824 Apr  4 14:29 serverclass.xml
drwx--x--x  11 myana  staff  374 Apr  4 14:30 splunk



8. After you changed any app configurations.
   @Server #./splunk reload deploy-server -class <className> -auth admin:changeme1
           (DEBUG: ./splunk reload deploy-server -auth admin:changeme1 -debug )

9. You must restart Splunk whenever serverclass.conf was edited

Troubleshooting Splunk Deployment Server and Client

######################################################
#
# Troubleshooting Deployment Server and Client
#
######################################################

#
# Check if Deployment Server/Client was enabled or disabled
#

- From command line
  @Server # ./splunk display deploy-server  
  @Client # ./splunk display deploy-client 

# For Deployment server, tenants.conf is to set disable/enable.

Here is an example of tenants.conf which disabled default serverclass.conf.

- tenants.conf
[tenant:default]
whitelist.0 = *
disabled = true


#
# Log $SPLUNK_HOME/etc/log-local.cfg:
# Enable DEBUG settings on the both instances in log.cfg 
# (can be done via the UI: Manager -> System Settings -> System logging )
#

- From command line
  @Server # ./splunk set log-level DeploymentServer -level DEBUG 
  @Client # ./splunk set log-level DeploymentClient -level DEBUG 


- From log.cfg (Must restart Splunk after changing this log.cfg)
[splunkd]
category.DeploymentServer = DEBUG

Or, 
category.DeploymentClient=DEBUG

And, possibly
category.HTTPClient = DEBUG
category.TcpInputProc = DEBUG
category.TcpOutputProc = DEBUG


#
# Recommendation for scalability concern
#

- A small deployment server (30 or fewer clients) can co-reside with a splunk instance which has other duties, such as a search head, indexer, or other splunk instance.
- At moderate to large sizes (30-300), the deployment server should reside on its own splunk instance which does not have other duties.
- The deployment server accesses can interfere with other management port activities, such as search, management, UI functionality, distributed search, etc. etc.
- At moderate sizes, the phoneHomeIntervalInSecs should be increased from its default value of 30 seconds, to a larger value which meets your business goals. Can deployment clients wait 10 minutes to receive updates? Perhaps 600 is more appropriate then.
- In very large deployments, multiple deployement servers should be used, where a ratio of 300 clients per server is known to perform well. We definitely have at least one customer running with 1,000 clients for one deployment server. There are likely scalability issues at larger sizes which are not identified yet.


# Issues to be aware of

- Older clients, (before around 4.1.4) would try connections over http if the https connection would time out. This means that an overbooked deployment server would have non-ssl connections coming into its ssl port, which result in errors reported in the server's log, and further may possibly trigger other overload problems in the management port behavior.
- Some state exists where a deployment server is not servicing all phonehome requests, but is succeeding often enough that the clients eventually receive their data. To determine if this sort of situation is occurring, you should search the splunkd.log reset messages on the deployment clients, to find how soon they reset upon receiving updated bundles from the deployment server.
- Whitelist and Blacklist are applied to 'clientName, ip address, host name in DNS record, and Splunk host name
  => Check the output of "splunk list deploy-clients -auth admin:changeme"

- Copy from the $SPLUNK_HOME/etc/system/README/serverclass.conf.spec
-------------------------------------------------------------------
whitelist.<n> = <clientName> | <ip address> | <hostname>
blacklist.<n> = <clientName> | <ip address> | <hostname>
    * 'n' is a number starting at 0, and increasing by 1. Stop looking at the filter when 'n' breaks.
    * The value of this attribute is matched against several things in order:
         * Any clientName specified by the client in its deploymentclient.conf file
         * The ip address of the connected client
         * The hostname of the connected client as provided by reverse DNS lookup
         * The hostname of the client as provided by the client
    * All of these can be used with wildcards.  * will match any sequence of characters.  For example:
        * Match an network range: 10.1.1.*
        * Match a domain: *.splunk.com
    * These patterns are PCRE regular expressions with the additional mappings:
        * '.' is mapped to '\.'
        * '*' is mapped to '.*'
    * Can be overridden at the serverClass level, and the serverClass:app level.
    * There are no whitelist or blacklist entries by default.
-------------------------------------------------------------------

#
# Deployment Server/Client do not start
#
- Make sure DeploymentNG pipeline is enabled
  The splunkd.log can tell if the module is enabled when starting Splunk
   => Default is enabled. Only if it's changed manually, it is disabled.

- default-mode.conf 
[pipeline:distributedDeploymentNG]
Disabled = false




#
# Deployment server problems
#
My apps are not appearing on my client instances ? Things to check for:

   - Is the client trying to contact the correct server?
   - Is it getting a connection?
   - Is the server matching it correctly in serverclass.conf?
   - Are whitelist/blacklist in use? Are their regex correct?

#
# Deployment Client error
# => Cannot connect to the Server 
# ==> Probably Deployment server side issue. No port is available, or sockets are used up
# 

This splunkd.log at Client is okay (DEBUG mode)
------------------------------------
08-16-2011 17:55:09.258 -0700 DEBUG DeploymentClient - PhoneHomeThread woke up
08-16-2011 17:55:09.258 -0700 DEBUG DeploymentClient - Send phone home
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Current state: 3, new state: 4
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Phone home recvd reply: <?xml version="1.0" encoding="UTF-8"?>
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - DeploymentClient is about to reload a new manifest....
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - stateOnClient=enabled
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - DeploymentClient is done reloading...
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Current state: 4, new state: 3
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Sent phonehome to deployment server on topic: deploymentServer/phoneHome/default
08-16-2011 17:55:09.267 -0700 DEBUG DeploymentClient - Phonehome thread waiting for :15000 mecs
08-16-2011 17:55:24.268 -0700 DEBUG DeploymentClient - PhoneHomeThread woke up
08-16-2011 17:55:24.268 -0700 DEBUG DeploymentClient - Send phone home
08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - Current state: 3, new state: 4
08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - Phone home recvd reply: <?xml version="1.0" encoding="UTF-8"?>
08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - DeploymentClient is about to reload a new manifest....
08-16-2011 17:55:24.277 -0700 DEBUG DeploymentClient - stateOnClient=enabled
08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - DeploymentClient is done reloading...
08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Current state: 4, new state: 3
08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Sent phonehome to deployment server on topic: deploymentServer/phoneHome/default
08-16-2011 17:55:24.278 -0700 DEBUG DeploymentClient - Phonehome thread waiting for :15000 mecs



#
# Deployment Client 
# => Permission issue in the directory to expand, or the files in the archived app.
#
05-27-2010 15:18:07.106 WARN  DeployedApplication - Installing app: windows to location: C:\Program Files\Splunk\etc\apps\windows
05-27-2010 15:18:07.106 ERROR DeployedApplication - There was a problem unarchiving file to: C:\Program Files\Splunk\etc\apps\windows\local\service?WSDL due to The filename, directory name, or volume label syntax is incorrect



#
# Deployment Client 
# => Applied app's configuration is not available in WebGUI
#

This is not a Deployment Server/Client issue. If you deploy a configuration file which is for system wide configuration such as email setting or authentication etc, you must
edit apps/<app name>/metadata/local.meta and add export = system for the configuration or default. 
- local.meta
[]
export = system


Example: How to propagate apps from Primary to Secondary Deployment Server

###################################################################
#
# How to propagate apps from Primary to Secondary Deployment Server
# ( Assuming you already have in apps in default repository in "deployment-apps" dir. 
###################################################################

#
# Primary Deployment Server
# - splunkd-port 55041
# - address: 10.1.1.10

- serverclass.conf
[global]
whitelist.0=*
#blacklist.0 = *
restartSplunkd = true
stateOnClient = enabled

[serverClass:UF]
[serverClass:UF:app:LWFoutputs]


#
# Secoudary Deployment Server
# - splunkd-port 55051
# - address: 10.1.1.10
#

- deploymentclient.conf
[deployment-client]
disabled = false
repositoryLocation = $SPLUNK_HOME/etc/deployment-apps
serverRepositoryLocationPolicy = rejectAlways
reloadDSOnAppInstall = true

[target-broker:deploymentServer]
targetUri = 10.1.1.10:55041


- serverclass.conf
[global]
whitelist.0=*
#blacklist.0 = *
restartSplunkd = true
stateOnClient = enabled

# This works
[serverClass:UF]
[serverClass:UF:app:LWFoutputs]

#
# End Deployment Client
# - splunkd-port 55001
# - address: 10.1.1.10
#

- deploymentclient.conf
[target-broker:deploymentServer]
targetUri = 10.1.1.10:55051

[deployment-client]
disabled = false



There are plenty of other possibilities, DEBUG logging will highlight any errors on either instance

Personal tools
Hot Wiki Topics


About Splunk >
  • Search and navigate IT data from applications, servers and network devices in real-time.
  • Download Splunk