The Server

Read(678) Label: server, parallel computing,

Parallel computing solves a computational problem by breaking it apart into discrete subtasks and distributing them among multiple servers on which they will be implemented concurrently. Each subtask returns its own computational result to the master task for combination. Through the simultaneous use of multiple compute resources, parallel computing can effectively enhance the computational speed and the data processing ability.

There are two forms of parallel computing - the multithreading executed on a single computer and the parallel system established through a computer cluster consisting of a network of multiple stand-alone computers. Nowadays when we talk about parallel computing, usually it is the latter, which we call cluster computing.

This section discusses the configuration of parallel servers in esProc and their application, giving you basic ideas about the cluster computing.

esProc parallel servers

A cluster computing system consists of multiple sub nodes. There is a main node at the highest level of all processes that controls the computing jobs on all sub nodes. A running node receives computing tasks, computes local cellset files and returns results to the main node. In a cluster network, the sub nodes may run on multiple machines and each can accommodate one or more processes. A process is identified by the IP address and port number of its host node. The multiple processes running on a sub node are called sub processes, among which there is a main process that controls all the others. A sub node system for a parallel computing system is made up of all running servers.

esProc provides the server class – com.raqsoft.ide.dfx.UnitServerConsole – to get addresses and ports according to the configuration files and to launch the parallel servers.

esProc parallel system hasn’t a certain single "manager" node centrally conducting a group of "worker" nodes. An available computer is designated to serve as the provisional node for execution of each parallel task.

Yet each parallel task has its logical center – the main node – from which instructions are given and to which results are returned for being combined. The task usually fails if a node malfunctions; in certain cases, the main node will redistribute the subtask to another capable one. To know more about the execution of parallel tasks in esProc, see Cluster computations.

The data file needed by a computing node to perform its own task should be stored on it beforehand. Data can be stored in multiple computing nodes, which is called redundant storage, so that all these computing nodes have the potential to perform the same subtask. The main node then selects an idle computing node to which the subtask will be distributed. Redundant storage guarantees that, when a computing node breaks down due to disasters caused by system failure, memory over flow and network anomaly, etc., there are able computing nodes to one of which the main node can redistribute the subtask. Therefore the use of redundant storage ensures both fault-tolerance and the performance steadiness. 10.4 Data Redundancy will cover the details about the design of redundancy.

Data can be also stored on a Network File System (NFS), such as HDFS, over which it can be accessed by the computing nodes. The NFS redundancy management is simpler than the strategy of storing data on certain computing nodes for fault-tolerance. But compared with accessing files stored locally, it may sacrifice some performance due to the network transmission.

Parallel server configuration

Run the esprocs.exe file under esProc installation directory’s esProc\bin path to launch or configure parallel servers. The jars needed by the file will be automatically loaded under the installation directory. Note that the configuration files – raqsoftConfig.xml and unitServer.xml – must be placed under the esProc\config path in esProc installation directory. The following window pops up after the server is started:

During the execution of esprocs.exe, the window displays the loaded initial information, which is set in the configuration file raqsoftConfig.xml. Click Options on the right-side menu to configure information for the parallel server. The pop-up window is as follows:

On the page, you can configure license file, main path, search path, date and time format, default charset, log level, number of bytes in the file buffer area and other information. For the Log Level, there are OFF, SEVERE, WARNIGN, INFO, AND DEBUG, whose priorities decrese from left to right. The OFF level turns of any log output. The INFO level outputs information of levels on and below it, including SEVERE, WARNING and INFO. Other levels also output information in this way.

The configuration information is the same as the configuration in esProc IDE. It can be viewed or modified in Tool>Options>Environment:

Click Config on the right-side menu to configure node information on Unit page:

Temp file timeout sets the life span (Hours) for a temporary file. Check interval is the number of seconds between two expiration checks, which must be a positive value or 0. Proxy timeout is the agent life span, i.e. the remote cursor and task space’s life span (Hours). Do not perform expiration check if Temp file timeOut or Proxy timeout is set as 0.

Under Host list, you can configure IP addresses of all nodes on the local machine that potentially can run servers. Under Process list, you can configure Ports of multiple processes for one IP address on the local machine, among which the first one is the main process. The server automatically searches the node list for one with idle processes at the launch, which will give a task to an idle process to execute. The IP address should be real and multiple IP addresses are allowed when there are network adapters.

Under Host list, Max task num is the maximum number of tasks a node is allowed to perform; Preferred task num is the appropriate number of tasks a node can perform. When multiple processes are running on a node, Preferred task num is the number of processes. You can configure data partition root paths on each node under Data zone root path.

The Enable clients tab offers the settings of client-side whitelist:

Select Check clients to configure an IP whitelist that can invoke the parallel server under Clients hosts. IP addresses that are not in the whitelist cannot invoke the parallel server for computations.

When parallel server configuration is done, click OK to automatically set the corresponding configuration file unitServer.xml, as shown below:

<?xml version="1.0" encoding="UTF-8"?>

<SERVER Version="3">

<TempTimeOut>12</TempTimeOut>

<Interval>1800</Interval>

<ProxyTimeOut>12</ProxyTimeOut>

<Hosts>

<Host ip="192.168.0.197" maxTaskNum="4" preferredTaskNum="2">

dataPath="D:\file\parallel\node1"

 

<Units>

<Unit port="8281">

</Unit>

<Unit port="8282">

</Unit>

 

</Units>

</Host>

</Hosts>

<EnabledClients check="true">

<Host start="192.168.0.197">

</Host>

</EnabledClients>

</SERVER>

Launching parallel servers

Now click on Start button on the following window to run the parallel server. Click Stop to suspend the server service; after that, you can click Quit to exit the service. Click Reset to initialize and restart the server and to remove all global variables and release memory at the same time.

Once a node is started, the processes on it start, too. You can check the main process on the Main tab, or click a port number to view the execution status of a sub process.

Run ServerConsole.sh to launch the parallel server class under Linux:

The node running information window under Linux is the same as that under Windows:

We can also add the –p parameter in the execution command to launch a parallel server in a non-GUI way to directly execute operations:

 

Data storage

esProc provides storage service to manage parallel servers in a cluster and the subtasks run on them.

Under esProc installation directory’s esProc\bin path, click on datastore.exe file (datastore.sh under Linux) to open a window as follows:

On the Data Store interface, click node search icon  to search the current cluster for servers and list them. Select an operating server and you can check the information about it on the right part of the interface, like the partitions or the list of subtasks running on it. You can also force a termination of a certain subtask. When the subtask is aborted, the operation performed on the main node stops too.

Application

callx instruction is used in a cellset to distribute subtasks among running servers. Here’s the cellset parallel01.dfx:

 

A

1

=file("D:/files/txt/PersonnelInfo.txt")

2

=A1.import@t(;pPart:pAll)

3

=A2.select(State==pState)

4

return A3

The program imports a data segment from the personnel information file PersonnelInfo.txt and selects employees coming from the specified state. Here’re the cellset parameters used in it:

The main program invokes parallel.dfx to find out all employees from Ohio concurrently using cluster computing:

 

A

1

[192.168.10.229:8281]

2

=callx("D:/files/dfx/parallel01.dfx","OH",to(20),20;A1)

3

=A2.conj()

A1 specifies a list of parallel servers for computation. A2 invokes these servers to execute the parallel computing. When executed, A2’s result is as follows:

A computational task, when performed on a server cluster, will be split into multiple subtasks according to the number of parameters to be distributed among servers in the cluster. Then each server will allocate its subtask to the processes running on it which will return results separately. A2’s data is a record sequence containing these results which are sequences. A3 concatenates records in these sequences to get the final result:

Through this form of parallel computing, the main program divides a complicated computational task or an operation involving big data into multiple subtasks, distributes them to multiple servers to compute separately and then joins the results. The topic of parallel computing will continue in Cluster computations.

Remote cluster server & cluster manager

esProc supports calculations on remote cluster server, which operates on the server-side and is used and managed on the client-side. To launch a cluster server, start ManagerServer.ex or ManagerServer.sh. Configuration for starting a remote cluster server is similar to that for starting a common parallel server. There isn’t a special platform for managing the active remote cluster server. It will be centrally managed from the cluster manager.

From the cluster manager, we can manipulate the remote nodes, like uploading files, configuring node license, starting/closing a node, and updating node version. To launch the cluster manager, start manageclient.exe or manageclient.sh on the client-side. The following shows an active cluster manager:

There are detailed explanations about the remote cluster server and cluster manager in related documentations.