The Built-in Parallelism

Read(313) Label: built-in parallelism,

Multithreading discusses how to increase efficiency through multithreaded computation. Besides using fork statement in the cellset code to achieve the multithreaded parallel processing, esProc also packages the parallel computing approach into some functions. We’ll cover this in the following.

Parallel data retrieval

We can retrieve data from data tables through multithreaded processing, if the order of records is irrelevant to the result. This type of processing can make full use of the system resources, thereby enhancing the efficiency. Add @m option to CS.conj@mx() function to use the multithreaded parallel processing. The multithreaded parallel processing applies to cases where data is imported from a single data file:

 

A

B

1

=file("PersonnelInfo.txt")

=now()

2

=A1.import@t()

=now()

3

=A1.import@mt()

=now()

4

=string(interval@ms(B1,B2))+"/"+string(interval@ms(B2,B3))

 

A2 imports data directly; while A3 uses f.import@m() to import data with multithreads. A4 compares the time the two methods use and gets the following result:

Importing data in parallel can significantly improve performance.

While data is retrieved from a single data file using multithreads, the multithreaded system will divide the file into multiple segments and every segment will be retrieved through a file cursor. Each file cursor uses a separate thread for the data retrieval, which resembles the above example where files are retrieved using multithreaded processing. For more information about retrieving data by segment in esProc, read Bin Files.

Because of multithreads used during retrieving data from a single data file by parallel processing, the order of records in the result is irregular too. This can be proved by comparing results of A2 and A3:

With f.import@m(), the number of threads it uses is determined by the pre-specified Number of parallel tasks. Set this parameter on General page by clicking Tool>Option on esProc’s menu bar:

Merging cursors in a certain order

There is anther operation that specifically requires parallel processing. That is merging cursors in a certain order, i.e. CS.mergex(). For example:

 

A

B

1

=file("Order_Wines.txt").cursor@t()

=file("Order_Foods.txt").cursor@t()

2

=file("Order_Electronics.txt").cursor@t()

=file("Order_Books.txt").cursor@t()

3

=[A1:B2].mergex(Date)

=A3.fetch(10000)

4

>A3.close()

 

A3 merges the ordering data from four text files by date and generates a new cursor.

During the whole process of merging cursors in a certain order, you need to judge which cursor from which the data should be fetched according to the sorting expression. This means cursors should be in place simultaneously throughout the computation, which is a problem also need to be approached by multithreading. Equally, esProc provides automatic use of parallel processing for this kind of situation wthout adding @m option. B3 fetches the first 10,000 rows:

To learn more about the topic, see Merge and Join Operations on Cursors.