What is the Processing Manager and what does it do?
The Processing Manager (PM) can be thought of as the "brains" of the engine. It controls and orchestrates how work is done when submitting jobs that require an engine. The PM can be found in both the local Evidence Processing Engine (EP) and the Distributed Processing Manager (DPM) components of AccessData software.
The Processing Manager controls the distribution of work to a single local engine or several distributed engines. This is done by dividing up the data to be processed into small manageable chunks and then distributing those chunks to the engine(s).
When installing the EP or DPM, the installer will prompt for the Processing State Folder. This folder is where the Processing Manager stores the information on what is being worked on and which engine is working on it. While processing data, you should see a new job folder with a unique ID as the name (e.g., "7f482a5d4cc847859b11a16bb74ea0de.job") under "[PM_folder]\10\Jobs\" Inside this job folder are several other folders including Queue and Wrk folders. Inside both Queue and Wrk is where the individual components of the engine (Processor, Indexer, and Loader or "PLI") retrieve assigned chunks to work on.
Queue & Wrk - Queue is where chunks of data to be loaded, processed, and indexed are stored until assigned out and picked up by an engine. Wrk is where they go once they are being worked on. Each chunk is identified by a ".w" (dot double-u) file with its own unique identifier (e.g., "e0a67a60cafdc45e3b202db0aaa4476fc00.w"). These .w files contain reference information to a chunk of data located within the evidence being processed (i.e., E01, AD1, native, etc.). Data to be loaded or indexed are in the Ldr and Idx folders (respectively) while data to be processed is in the root of the Queue or Wrk folder.
During processing, the .w files should be moving from the Queue to the Wrk folders regularly. Some objects may take longer to process or index than others so their corresponding .w files won't move through the system quite as fast.
Paying attention to the total number of items in the Queue and Wrk folders can help to determine if the engine is still actively working on a job when the status updates through the software's user interface (UI) don't update frequently.
Throttling the Engine
In Summation and eDiscovery, the PM has the ability to limit the amount of system resources that the engines use during processing. This option can be found under Settings >> System Configuration >> Processing Priority Options.
By default, the Evidence Processing Engine(s) will use all available system resources to work on jobs. This can have a negative impact on user experience in the system as the memory and CPU are being monopolized by the engine and not divided among the other application components so the system can serve up web pages. The option to "Reduce Processing Priority" should be checked for ALL single servers installs of Summation and eDiscovery so that the system will have enough resources for user access while processing data.
Distributed Processing Benefits
Utilizing distributed processing functionality can provide vast improvements, not only in performance of jobs that utilize the engine (i.e., processing, indexing, exporting, etc.), but in user experience as well. Properly spec'ing and building a dedicated server for processing can significantly decrease processing times. It can also aid in recovering from engine crashes if the Processing Manager is installed along with the other system services rather than a local engine.
Using multiple processing managers in a single environment can, however, cause problems and is not a supported architecture. The PM is not aware of other managers that may be installed on different servers. Therefore, if two PM's are working on the same case, they can potentially task different engines to write to the dtSearch indexes or tables in the database at the same time. If this happens, it will corrupt those indexes and/or database. For this reason, only a single EP or DPM should be used in an environment.
Understanding what the Processing Manager is and what it does will provide a greater depth of understanding to how AccessData products function. It can also assist in the troubleshooting of issues that may arise.