From Splunk Wiki
Best practices for configuring Splunk forwarders
When designing a massive deployment of Splunk forwarders, there are many options for dividing the cost of data collection, indexing and management overhead. These include whether to disable the web UI, indexing and / or processing. Turning off these services can significantly reduce the operational burden of Splunk forwarders on their hosts, but do place additional requirements on your overall Splunk deployment.
The most important consideration is whether to disable or enable pre-processing on the forwarders. Processing is a precursor to indexing, where certain fields are extracted and you can apply data transformations via regex. The most important of these pre-index extractions is date.
Pre-processing takes approximate 20MB worth of memory, and a single 2Ghz core can generally handle between 2-3MB per second. Exact performance will vary depending on the ease of date extraction, and can be significantly complicated if customized regexs are applied. However, as a rough estimate, you can assume that whatever fraction of 3MB you are continuously processing is the fraction of core you’re using; 150KB per second ought to be about 5% of a core, for example.
If you disable processing on the forwarders, you save this overhead. However, the work still needs to get done, and the receiving Splunk indexer will be tasked with preparing the entirety of all forwarder data, as well as indexing that data and potentially servicing search requests. A large, 64-bit multi-core system can generally handle between 100GB and 200GB of data per day, depending on the cost and complexity of your pre-index processing. If pre-processing is distributed to the forwarders, the same machine could handle 200GB to 300 GB of pure indexing.
Depending on your environment and load of the forwarder hosts, distributing the burden of pre-processing to the endpoints may make sense; in terms of cost, compute cycles are generally cheaper on commodity hardware than big server iron. However, that may not be feasible either because of the nature of the forwarders (which could also be big servers) or due to organizational considerations.
Generally speaking, indexing should be turned off on forwarders. The primary reason to continue indexing locally to the forwarder in addition to the main searching index is to create a duplicate record of logs that roll or other transient data. Assuming that isn’t a specific requirement, indexing can be safely turned off.
Note that while it is theoretically possible to distribute searches across very large deployments of forwarders, it is not recommended.
Splunk doesn’t require the Web interface to pre-process data or forward. You can safely stop the splunkweb service or daemon. That will require you to modify the underlying conf files, and / or use Deployment Server to configure your forwarders. Your primary benefit for doing so is memory and attack surface. Note that you can restart it at a later time if need be.