The Server

Read(541) Label: server, parallel computing,

Parallel computing solves a computational problem by breaking it apart into discrete subtasks and distributing them among multiple servers on which they will be implemented concurrently. Each subtask returns its own computational result to the master task for combination. Through the simultaneous use of multiple compute resources, parallel computing can effectively enhance the computational speed and the data processing ability.

There are two forms of parallel computing - the multithreading executed on a single computer and the parallel system established through a computer cluster consisting of a network of multiple stand-alone computers. Nowadays when we talk about parallel computing, usually it is the latter that we refer to.

This section discusses the configuration of parallel servers in esProc and their application, giving you more basic ideas about the parallel computing.

esProc parallel servers

A cluster computing system consists of multiple sub nodes. There is a main node at the highest level of all processes that controls the computing jobs on all sub nodes. A running node receives computing tasks, computes local cellset files and returns results to the main node. In a cluster network, the sub nodes may run on multiple machines and each can accommodate one or more processes. A process is identified by the IP address and port number of its host node. The multiple processes running on a sub node are called sub processes, among which there is a main process that controls all the others. A sub node system for a parallel computing system is made up of all running servers.

esProc provides the server class – com.raqsoft.ide.dfx.UnitServerConsole – to get addresses and ports according to the configuration files and to launch the parallel servers.

esProc parallel system hasn’t a certain single "manager" node centrally conducting a group of "worker" nodes. An available computer is designated to serve as the provisional node for execution of each parallel task.

Yet each parallel task has its logical center – the main node – from which instructions are given and to which results are returned for being combined. The task fails if the main node malfunctions; but if one of the computing nodes malfunctions, the main node will redistribute the subtask to another capable one. To know more about the execution of parallel tasks in esProc, see Cluster computations.

The data file needed by a computing node to perform its own task should be stored on it beforehand. Data can be stored in multiple computing nodes, which is called redundant storage, so that all these computing nodes have the potential to perform the same subtask. The main node then selects an idle computing node to which the subtask will be distributed. Redundant storage guarantees that, when a computing node breaks down due to disasters caused by system failure, memory over flow and network anomaly, etc., there are able computing nodes to one of which the main node can redistribute the subtask. Therefore the use of redundant storage ensures both fault-tolerance and the performance steadiness. 10.4 Data Redundancy will cover the details about the design of redundancy.

Data can be also stored on a Network File System (NFS), such as HDFS, over which it can be accessed by the computing nodes. The NFS redundancy management is simpler than the strategy of storing data on certain computing nodes for fault-tolerance. But compared with accessing files stored locally, it may sacrifice some performance due to the network transmission.

Parallel server configuration

Run the esprocs.exe file under esProc installation directory’s esProc\bin path to launch or configure parallel servers. The jars needed by the file will be automatically loaded under the installation directory. Note that the configuration files – raqsoftConfig.xml and unitServer.xml – must be placed under the esProc\config path in esProc installation directory. The following window pops up after the server is started:

During the execution of esprocs.exe, the window displays the loaded initial information, which is set in the configuration file raqsoftConfig.xml. Click Options on the right-side menu to configure information for the parallel server. The pop-up window is as follows:

On the page, you can configure license file, main path, search path, date and time format, default charset, log level, number of bytes in the file buffer area and other information. For the Log Level, there are OFF, SEVERE, WARNIGN, INFO, AND DEBUG, whose priorities decrese from left to right. The OFF level turns of any log output. The INFO level outputs information of levels on and below it, including SEVERE, WARNING and INFO. Other levels also output information in this way.

The configuration information is the same as the configuration in esProc IDE. It can be viewed or modified in Tool>Options>Environment:

Click Config on the right-side menu to configure node information on Unit page:

Temp file timeout sets the life span (in hours) for the temporary files; Check interval is the number of seconds between two expiration checks and must be set as a positive value or 0; Proxy timeout is the agent life span, that is, the life span (in hours) of the remote cursor and task space. If Temp file timeOut or Proxy timeout is set as 0, then the expiration check won’t be performed.

The node list Host list includes all local machine’s IP addresses and ports that potentially can be used to run the servers. In the process list Process list, configure the port number Port for multiple proceses with the same IP addres. The first process is the main process. Servers will automatically search the node list for one with idle processss at the launch, and the node will give a task to an idle process to execute. The IP addresses should be the local machine’s real IP addresses. Multiple IP addresses can be set when using multiple network adapters.

In Host list, Max task number is the maximum number of tasks a node is allowed to perform, and Preferred task number is the appropriate number of task the node can perform. When multiple processes are running on a node, the Preferred task number is the number of processes. Select desired data partitions on each node under Partitions.

The Enable clients page offers the setting of client-side whitelist:

Select Check clients to set in the Clients hosts list the IP adress whitelist that can invoke the parallel server. IP addresses that are not in the whitelist cannot invoke the parallel server for performing computations.

When the parallel server configuration is done, click OK to automatically set the correponding configuration file unitServer.xml, as shown in the following:

<?xml version="1.0" encoding="UTF-8"?>

<SERVER Version="3">

<TempTimeOut>12</TempTimeOut>

<Interval>1800</Interval>

<ProxyTimeOut>12</ProxyTimeOut>

<Hosts>

<Host ip="192.168.107.1" maxTaskNum="8" preferredTaskNum="3">

<Partitions>

<Partition name="0" path="d:/file/parallel/node1/0">

</Partition>

<Partition name="1" path="d:/file/parallel/node1/1">

</Partition>

</Partitions>

<Units>

<Unit port="8281">

</Unit>

<Unit port="8282">

</Unit>

 

</Units>

</Host>

</Hosts>

<EnabledClients check="true">

<Host start="192.168.107.1" end="192.168.107.1">

</Host>

</EnabledClients>

</SERVER>

Launching parallel servers

With the configuration-ready parallel server, click on Start on node launching window to run it, and click Stop to suspend the server and then click Quit to exit. Click Reset to initialize and restart the server while removing all global variables and realeasing memroy capacity.

We can see that once the nodes are started, the processes on it start, too. Check the the main process on each node on the Main page, or click a port number to check the sub processes on the node.

Under Linux, run ServerConsole.sh to launch the parallel service class:

The node’s system information window under Linux is the same as that under Windows:

We can also add the –p parameter in the execution command to launch parallel servers in a non-GUI way to directly execute operations:

 

Data storage

esProc provides storage service to manage parallel servers in a cluster and the subtasks run on them.

Under esProc installation directory’s esProc\bin path, click on datastore.exe file (datastore.sh under Linux) to open a window as follows:

On the Data Store interface, click node search icon  to search the current cluster for servers and list them. Select an operating server and you can check the information about it on the right part of the interface, like the partitions or the list of subtasks running on it. You can also force a termination of a certain subtask. When the subtask is aborted, the operation performed on the main node stops too.

Application

callx instruction is used in a cellset to distribute subtasks among running servers. Here’s the cellset parallel01.dfx:

 

A

1

=file("D:/files/txt/PersonnelInfo.txt")

2

=A1.import@t(;pPart:pAll)

3

=A2.select(State==pState)

4

return A3

The program imports a data segment from the personnel information file PersonnelInfo.txt and selects employees coming from the specified state. Here’re the cellset parameters used in it:

The main program invokes parallel.dfx to find out all employees from Ohio concurrently using cluster computing:

 

A

1

[192.168.10.229:8281]

2

=callx("D:/files/dfx/parallel01.dfx","OH",to(20),20;A1)

3

=A2.conj()

A1 specifies a list of parallel servers for computation. A2 invokes these servers to execute the parallel computing. When executed, A2’s result is as follows:

A computational task, when performed on a server cluster, will be split into multiple subtasks according to the number of parameters to be distributed among servers in the cluster. Then each server will allocate its subtask to the processes running on it which will return results separately. A2’s data is a record sequence containing these results which are sequences. A3 concatenates records in these sequences to get the final result:

Through this form of parallel computing, the main program divides a complicated computational task or an operation involving big data into multiple subtasks, distributes them to multiple servers to compute separately and then joins the results. The topic of parallel computing will continue in Cluster computations.