Cluster composite tables

The Server discussed cluster computing across a network of nodes. A composite table can be accessed through nodes and thus becomes a cluster composite table. To generate a cluster composite table, first we need to upload a composite table onto the server. Run datastore.exe under esProc\bin in esProc’s installation directory to open the following Data Store manager and upload the composite table onto the specified partitions in the nodes:

Upload the composite file onto partition 0 of every node. The disk path of each partition can be configured on the server configuration file. The default path of partition 0 is D:/0. Now we can access nodes to read the composite table file:

 

A

1

[192.168.10.229:8281]

2

=file@0("employees.ctx",A1)

3

=A2.create()

4

=A3.cursor().fetch()

5

=A3.cursor()

6

=A5.groups(Dept;count(~):Count)

In A2, file(fn,h) function reads the composite table file fn from the server list h to open a cluster file. Here only one node is involved. Since the composite table file is stored on partition 0, @0 option is used in the file function. Here’s A2’s result:

In A3, T.create() function opens the composite table’s base table, which is a cluster composite table in this circumstance. An entity table of the cluster composite table is a cluster entity table. A4 retrieves attached table T' from the cluster composite table using T.attach(T') function. A cluster table is read-only but is computed in the same way as a local composite table is handled. There are two types of cluster cluster table: distribution table and duplication table. In A4, T'.cursor() function generates a cursor from the cluster composite table’s base table T' and fetches data from the cursor:

We can view the system information on the node window:

With a cluster composite table, only the main process in the involved node is responsible for feeding the file data and outputting information of accessing the cluster file.

Cursor-related function like cs.join(), cs.groupx() and cs.groups() apply to a cursor generated from a cluster composite table. A6 performs grouping and aggregation with cs.groups() function and gets the following result:

A cluster composite table can also be used to generate a cluster memory table, which utilizes the memory capacity:

 

A

1

[192.168.10.229:8281]

2

=file@n("D:/file/dw/employees.ctx",A1)

3

=A2.create()

4

=A3.memory(;right(Name,6)=="Garcia")

5

=A4.dup()

6

=A5(4)

A2 uses @n option in opening a cluster file. In this case, the function doesn’t query the file by the node partition; instead it finds it according to the file name. Similar to generating a memory table from a local composite table, memory() function in A4 generates a cluster memory table. We can view the cluster memory table in A4:

We cannot handle a cluster memory table as we treat a table sequence. But the T.dup() function can convert the cluster memory table T into a local memory table, like what A5 gets:

This is an ordianary memory table. T.dup(h) function can convert an ordinary memory table T into a cluster memory table.

In fact, employees.ctx is also stored in partition 0 in another node 192.168.10.245:8281. We can perform query using the two nodes. For example:

 

A

1

[192.168.10.245:8281, 192.168.10.229:8281]

2

=file@0("employees.ctx", A1)

3

=A2.create()

4

=A3.cursor@z()

5

=A4.fetch()

Here A2 opens the cluster file through two nodes. A4 uses @z option to generate a cursor. With this option, the retrieval of the composite file will be split between the two nodes while the cursor remains its usual way of handling data with a single node. A5 fetches data from the cursor:

Take the above computation as an example, if a cluster file is stored redundantly in multiple nodes, it is called a duplicate file; it is a distributed file if the cluster file is split into multiple parts and stored in multiple nodes. In the second case, the name of each file part in every node should be the same. Let’s take a look at how to use a distributed file in a cluser computation. First we need to create a distributed file, which stores different parts of a composite table file. For example:

 

A

B

1

=file("D:/file/dw/employees.ctx")

=file("D:/file/dw/orders.ctx")

2

=A1.create()

=B1.create()

3

=A2.attach(stable)

=B2.attach(otable)

4

=A3.cursor(EID,Name, OCount).fetch()

=B3.cursor(Date,EID,Amount).sortx(Date,EID).fetch()

5

=file("D:/file/dw/1/salespart.ctx")

=file("D:/file/dw/1/orderpart.ctx")

6

=A5.create(#EID, Name, OCount)

=B5.create(#Date,#EID,Amount)

7

>A6.append(A4.cursor(1:2))

>B6.append(B4.cursor(1:2))

8

=file("D:/file/dw/2/salespart.ctx")

=file("D:/file/dw/2/orderpart.ctx")

9

=A8.create(#EID, Name, OCount)

=B8.create(#Date,#EID,Amount)

10

>A9.append(A4.cursor(2:2))

>B9.append(B4.cursor(2:2))

A4 retrieves seller records from composite table employees.ctx’s entity table stable:

B4 retrieves orders records from composite table orders.ctx’s entity table otable. The records are sorted first by Date and then by EID. Below are the records:

Cells A5~A10 split the seller data, store it in two namesake composite table files salespart.ctx and place them in different directories. The two composite table files will constitute a distributed file.

Cells B5~B10 split the order data and store it in two namesake composite table files orderpart.ctx in different paths.

Start the two nodes Node I (192.168.10.229:8281) and Node II (192.168.10.229:8291) from the same computer and upload the above files onto the nodes on Data Store manager. Node I receives the first parts of the two files stored in D:/file/dw/1, on which partition 1 stores salespart.ctx and partition 0 stores orderpart.ctx. Node II receives the second parts the two files stored in D:/file/dw/2, on which partition 2 stores salespart.ctx and partition 0 stores orderpart.ctx:

Here is how we use the distribution cluster composite table:

 

A

B

C

1

192.168.10.229:8281

192.168.10.229:8291

[192.168.10.229:8281, 192.168.10.229:8291]

2

=file@0("orderpart.ctx", A1)

=file@0("orderpart.ctx", B1)

=file@0z("orderpart.ctx", C1)

3

=A2.create()

=B2.create()

=C2.create()

4

=A3.cursor().fetch()

=B3.cursor().fetch()

=C3.cursor().fetch()

5

 

 

 

6

=file@z("salespart.ctx", A1)

=file("D:/file/dw/2/salespart.ctx")

=file@z("salespart.ctx", C1)

7

=A6.create()

=B6.create()

=C6.create()

8

=A7.cursor().fetch()

=B7.cursor().fetch()

=C7.cursor().fetch()

On both nodes, partition 0 stores orderspart.ctx. A2 and B2 list the nodes the two namesake files are stored with file@0(fn, h) function, which means they open the file in partition 0 on the two nodes. A3 and B3 respectively retrieve the two halves of the cluster entity table stable. Below are results of A4 and B4:

They are two different parts of the order data. Node I has the first half while Node II has the second half. The two parts of data is continuous.

In C2, file@0z(fn, h) function opens a distribution cluster file. @0z options represent the distributed file containing files in partition 0 on both nodes. C3 opens stable and C4 retrieves its data:

Accessing a cluser distribution table means getting all its parts stored in both nodes.

To access a cluster file stored in other partitions, use file@z(fn, h) function to open it. The @z option has requirement about the partitions where parts of a distribution cluster file are stored. In the node list h, a distribution cluster file should be stored in partition 1 on the first node, partition 2 on the second node… and so on. So A6 opens the file in partition 1 on the corresponding node by specifying only one node. A8 returns the seller data:

We can’t directly read the file salespart.ctx in partition 2 on Node II. B6 opens the corresponding local composite table file and B8 retrieves the entity table containing seller data:

Node I stores 24852 records and node II stores the rest of the data. The two parts are continuous.

C6 generates a cluster file using distributed files stored in different partitions on both nodes. C8 retrieves data from phyiscal table containing seller data:

Now all data has been retrieved from the cluster composite table.

According to the data in the two distributed files salespart.ctx and orderpart.ctx, they are not evenly stored on the two nodes. On Node I, salespart.ctx stores the first half of the seller records and orderpart.ctx stores the first half of the order records. To join data in the two distributed files, we use cs.sortx(…;x) function or cs.groupx(…;x) function to synchronize the file distribution with the cooperation of cluster cursor x. For example:

 

A

B

1

[192.168.10.229:8281, 192.168.10.229:8291]

 

2

=file@z("salespart.ctx", A1)

=file@0z("orderpart.ctx", A1)

3

=A2.create()

=B2.create()

4

=A3.cursor()

=B3.cursor()

5

=B4.sortx(EID; A4)

=joinx(A4:s,EID;A5:o,EID)

6

=B5.new(s.EID:EID,s.Name:Name,o.Date:Date,o.Amount:Amount)

=A6.fetch()

A4 and B4 generate cluser cursors for the two distribution tables – salespart.ctx and orderpart.ctx. A5 sorts the test data in B4, synchronizes the sorting result according to the user data and generates a cluser cursor whose distribution matches the user data cursor. B5 then joins the two matching cursors and A6 generates a cursor where data is in the desired structure. B6 fetches data from A6:

Now the originally unevenly stored cluster composite table becomes evenly distributed. We get the final desired result.