11.7 Pseudo Table

Read(113) Label: pseudo table,

We can use data in a composite table by not only accessing it directly but by defining a pseudo table. A pseudo table is not a physical table but one that retrieves data from an existing composite table according to the current definition.

11.7.1 Basic uses

For the conveniene of illustration, we first generate three composite tables. The employee table consisting of fields of employee ID, department ID, gender, marital status and name, among which gender and marital status are stored together in one field named Bools. Both of the other two tables store orders in fields of salespeople ID, order code, datetime and amount. Below is the code for generating the composite tables:

 

A

B

1

=demo.query("select NAME, SURNAME, GENDER, STATE from employee")

 

2

=A1.select(GENDER=="M")

=A1\A2

3

=A2.(NAME).id()

=A3.len()

4

=B2.(NAME).id()

=A4.len()

5

=A1.(SURNAME).id()

=A5.len()

6

=A1.(STATE).id()

=A6.len()

7

[Sales,Technology,R&D,Financial,Admin]

[0,0.5,0.75,0.9,0.97,1]

8

=to(1000).new(#:EID, B7.pseg(rand()):DeptID, if(rand()<0.5,0,1):Gender, if(rand()>0.8,1,0):Married, if(Gender==0,A3(rand(B3)+1), A4(rand(B4)+1))/" "/A5(rand(B5)+1):Name, bits(Gender, Married):Bools )

 

9

=A8.select(DeptID==1)

=A9.len()

10

2020-01-01

2021-01-01

11

=periods@x(A10,B10)

=periods@x(B10,elapse@y(B10,1))

12

=A11.((a=string(~,"yyMMdd"), to(rand(100)+ 10).new(A9 if(rand()>0.9, (rand(B9)+1, rand(B9-20)+ 21)).EID:SID, a/string(#, "0000"):OID, datetime(A11.~, time(rand(8)+8, rand(60), 0)):OTime, rand(100)*10+200:Amount))).conj().sort(SID, OID)

 

13

=B11.((a=string(~,"yyMMdd"), to(rand(100)+ 10).new(A9 if(rand()>0.9, (rand(B9)+1, rand(B9-20)+ 21)).EID:SID, a/string(#, "0000"):OID, datetime(B11.~, time(rand(8)+8, rand(60), 0)):OTime, rand(100)*10+200:Amount))).conj().sort(SID, OID)

 

14

=file("D:/file/pseudo/emps.ctx")

=A14.create(#EID,DeptID,Gender, Married, Name, Bools)

15

 

>B14.append@i(A8.cursor())

16

=file("D:/file/pseudo/1.orders.ctx")

=file("D:/file/pseudo/2.orders.ctx")

17

=A16.create(SID, #OID, OTime, Amount)

>A17.append(A12.cursor())

18

=B16.create(SID, #OID, OTime, Amount)

>A18.append(A13.cursor())

19

>B14.close()

 

20

>A17.close()

>A18.close()

The given data is of small size. In Gender field, 0 represents male and 1 represents female; and in Married field, 0 means unmarried and 1 means married. A8 generates employee data as follows:

Below are orders data tables of years 2020 and 2021 A12 and A13 generate respectively:

The three tables are stored in three composite table files – emps.ctx, 1.orders.ctx and 2.orders.ctx – respectively in the pseudo path. The last two composite table files form a multi-zone composite table and we’ll take them as examples to look at how to define and use pseudo tables. Now we backup them in case data is modified later during the testing.

 

A

B

C

1

=create(file).record(["pseudo/emps.ctx"])

=pseudo(A1)

=B1.cursor().fetch@x(100)

2

=create(file, zone).record(["pseudo/orders.ctx", [1,2]])

=pseudo(A2)

=B2.import()

A pseudo table definition record is a table sequence’s record of specified structure. Its simplest form should include a file field to specify data source of the pseudo table. In the above code, A1 generates a pseudo table definition record as follows:

The source of data in a pseudo table can be a composite table or a bin file.

Based on the above record, we use pseudo(pd) function to officially define a pseudo table. Parameter pd is the pseudo table definition record. B1 generates a pseudo table definition:

Like a composite table file, a pseudo table definition feeds data using T.cursor() function that generates a cursor or T.import() function that retrieves all data. C1 retreives the first 100 records using the cursor:

A pseudo table can also be generated from a multi-zone composite table file. A2 in the above code generates a pseudo table definition as follows:

C2 retrieves data from this pseudo table:

Data is retrieved from the multi-zone composite table file in the original order .

We can use delete/update/append functions to delete, update or add one or more record from/in/to a pseudo table. The action will directly entail a corresponding change in the source composite table file. For example:

 

A

B

1

=create(file).record(["pseudo/emps.ctx"])

=pseudo(A1)

2

=B1.select(DeptID==2 && EID<10)

=A2.import()

3

>B2.run(DeptID=0)

=B1.update(B2)

4

=B1.import()

=A2.import()

5

>B1.delete(B2)

=B1.import()

6

>B2.run(DeptID=2, EID=EID+1000)

=B1.append(B2.cursor())

7

=B1.import()

 

Like using a composite table, we can select certain records from a pseudo table through the T.select() function. In the above code, A2 selects employee records where DeptID is 2 and EID is less than 10, and B2 retrieves them using the T.import() function, as shown below:

A3 resets values of DeptID field of the selected employee records as 0, and B3 updates these modified records stored in record sequence P to the pseudo table using T.update(P) function. Keep in mind that, when you use delete/update/append funcitons to maintain a pseudo table, you must first define a dimension for the corresponding composite table, because all these operations are performed based on the composite table’s dimension. With the modification, A4 and B4 get the following results:

We can see that corresponding records are modified. Now as no records can meet the filtering conditions specified in A2, the record sequence returned by B4 is empty.

A5 deletes records of record sequence P from pseudo table T using T.delete(P) function according to the composite table’s dimension. Then B5 gets records from the pseudo table:

In the above result, B2’s 2nd and 8th records are deleted.

A6 changes DeptID values of B2’s two records to their original values and adds 1000 to each EID value. B6 uses T.append(cs) function to create a cursor for B2’s pseudo table and add data of the cursor to the original pseudo table. Below is A7’s result:

We see that the newly-added records are placed at the end of the composite table. When there are multiple source composite tables for a pseudo table, the appending operation will add records to the last composite table. And, if EID values of the newly-added records are not modified, dimension of the composite table containing these records cannot remain ordered and the pseudo table cannot be updated.

We can also generate a pseudo table from one or more bin files using the same way of generating one from composite table. For example:

 

A

B

1

=create(file).record(["D:/files/txt/PersonnelInfo.btx "])

=pseudo(A1)

2

=B1.cursor().fetch@x(100)

 

3

=B1.select(City=="Columbus" && ID<500)

=A3.import()

A2 uses the cursor to access B1’s pseudo table and returns the following result:

A3 selects employees coming from Columbus city and with IDs less than 500. B3 imports records from A3 and returns result as follows:

11.7.2 Pseudo table merge

Usually, data is stored and used by user in a multi-zone composite table file, such as the orders records stored by salespeople in the previous section., and will be grouped by the first field. For example:

 

A

1

=create(file,zone).record(["pseudo/orders.ctx", [1,2]])

2

=pseudo(A1)

3

=A2.import()

4

=A2.group(year(OTime)).import()

5

=A4.new(SID,year(OTime):Year,sum(Amount):Total)

6

=A2.groups(SID, year(OTime):Year;sum(Amount):Total)

A1 generates a pseudo table definition record:

And data is retrieved from the multi-zone composite table file by zone tables in order. A3 imports data from the pseudo table as follows, which has been displayed in the previous section:

 

A4 groups the pseudo table by year:

Data is grouped by the specified field or expression. Summing amounts in A5 based on the grouped pseudo table can make this easy to understand, and below is the result:

We can also use T.groups() function to perform the above grouping and sum operation. Note that this method groups data by two layers of fields – SID and order year. Here’s A6’s result:

In a multi-zone composite table, data is stored in zone tables generally by a time type field. Each zone table contains data of a certain time range. Such a time field is called zone field or zone column, which can be set using a date field. Zone tables on the whole – rather than data in each separate zone table – need to be ordered by the date field when the multi-zone composite table file is in use. The interval of date field values will be recorded for each zone table at creation of the pseudo table and used to automatically locate the corresponding zone table file for a query or filter operation. For example:

 

A

1

=create(file,zone,user,date).record(["pseudo/orders.ctx", [1,2],"SID","OTime"])

2

=pseudo(A1)

3

=A2.import()

4

=A2.group(year(OTime)).import()

5

=A4.new(SID,year(OTime):Year,sum(Amount):Total)

6

=A2.select(OTime>date(2020,12,20) && OTime<date(2021,1,10))

7

=A6.import()

A1 generates a pseudo table definition record:

Once date field OTime is defined, the pseudo table will definitely identify that data is stored in zone tables by this zone field. Data retrieved from different zone tables of the pseudo table will thus be merged together automatically by the first field and returned. Below is A3’s result of importing data from the pseudo table:

In this result table, records of same salesperson in different years that originally stored in different zone tables are now arranged together.

The first field of data stored in a pseudo table, especially one generated from a multi-zone composite table, is usually seller, customer, account code and so on. The field is called user field. Pseudo tables store data of various user fields. When a pseudo table is generated based on data of a multi-zone composite table and the zone field is already defined, the first field will be recognized as the default user field and a merge by this first field will be performed for a query.

A4 groups data by order year and gets the following result:

As zone field is defined, the grouping operation on the pseudo table groups data first by the first field and then by the specified field/expression. A5 sums sales amounts according to the grouped pseudo table and gets the following result:

A6 queries records of orders whose dates belong to a time period between 2020-12-20 and 2021-01-10. A7 gets the desired records:

Data retrieved from zone tables will be merged by the first field if it is already filtered by zone field. A filtering by zone field can automatically find out the target zone tables from which data will be retrieved and merge data by the first field.

As can be seen from the previous example, data is retrieved from zone tables of the multi-zone composite table by the first field segment by segment and then merged to get result. But at times segmenting data by the first field leaves unevenly distributed data or generates groups containing relatively small data. In the previous example, the number of order records of the first 20 salespeople is small and that of the other salespeople is not large. In such cases, the merge operation is likely to frequently switch between zone tables for retrieving data because each segment contains relatively small data or there are too many segments, resulting in low efficiency of retrieving data from each zone table. In order to improve the situation, we can restructure data by setting “larger” groups on the first field in an effort to get higher efficiency. For example:

 

A

B

1

=demo.query("select distinct(NAME) from CITIES")

=A1.(NAME)

2

=create(file,zone).record(["pseudo/orders.ctx", [1]])

=pseudo(A2).import()

3

=create(file,zone).record(["pseudo/orders.ctx", [2]])

=pseudo(A3).import()

4

=B1.len()

=1000.(B1(rand(A4)+1))

5

=B2.new(B4(SID):City, SID, OID, OTime, Amount)

=B3.new(B4(SID):City, SID, OID, OTime, Amount)

6

=A5.sort(City)

=B5.sort(City)

7

=file("pseudo/1.orders2.ctx")

=file("pseudo/2.orders2.ctx")

8

=A7.create(City, SID, #OID, OTime, Amount)

>A8.append(A6.cursor())

9

=B7.create(City, SID, #OID, OTime, Amount)

>A9.append(B6.cursor())

The original order data does not contain city data (City), but the above testing data adds City information for the salespeople, sorts data by City, SID and OID in order and stores data in multi-zone composite table orders2.ctx. In real-world situations, a first field by which data will be first grouped can contain cities people come from, departments employees work in, transaction type codes, banks via which orders are paid, etc. Data should be arranged by the first field and then the user field.

Now we perform a query using pseudo table orders2.ctx:

 

A

B

1

=create(file,zone, date).record(["pseudo/orders2.ctx", [1,2], "OTime"])

=pseudo(A1).import()

2

=create(file,zone, user, date).record(["pseudo/orders2.ctx",  [1,2] , "SID", "OTime"])

=pseudo(A2).import()

A1’s pseudo table gets data from the multi-zone composite table and merges them by the first field. Here’s B1’s result:

The retrieved data is merged by the multi-zone composite table’s first field City, during which records of two years of same city will be put together but won’t be sorted by seller ID.

In this case, we can set a user field user. A2 specifies SID as the user field when defining a pseudo table. Now the pseudo table gets data in B2 as follows:

In the result table, data is merged first by the first field – as the user field is already defined, during which records having same first field values are sorted the second time by the user field.

User-defined fields

Apart from using fields of the original data tables, we can define fields through adding column field in the pseudo table definition record. For example:

 

A

1

[Sales,Technology,R&D,Financial,Admin]

2

=create(name,pseudo,enum).record(["DeptID","Dept",A1])

3

=create(name,bits).record(["Bools",["IfMarried","IfLady"]])

4

=create(file,column).record(["pseudo/emps.ctx",A2|A3])

5

=pseudo(A4)

6

=A5.select(Dept=="Sales").import()

7

=A5.import(EID, Name, DeptID, Dept)

8

=A5.select(!IfLady && IfMarried).import()

9

=A5.import(EID, Name, Gender, IfLady,Married,IfMarried)

Two fields are defined in the above code. A2 defines a Dept field:

In the column definition record, name is a field name. When it is a field in the original data table, it is a real field. It also can be a pseudo field generated from a real field. If values of the real field DeptID need to be transformed into corresponding department names, we call such a pseudo field enumerated pseudo field. In the column definition record again, we define enumerated pseudo field name through the pseudo field. The transformation converts department IDs into corresponding values in a specified sequence, which is configured under enum field, by matching IDs to values of the sequence. The matching begins from place 1.

A3 defines another field. In the original data table, Bools field stores data of two binary fields – Gender and Married. Below is A3’s column definition record:

The binary field that corresponds to each bit will be retrieved according to the definition. The pseudo fields under Bools field are called binary dimension pseudo fields. Their names are defined through bits field in the column definition record. Pseudo field names cannot be same as original fields, we use new names IfMarried and IfLady. The definition arranges fields under Bools from low order to high order. Values of each field are bool value true or false. A binary dimension pseudo field can store 16 binary fields at most.

A4 defines a pseudo table by setting data source (file field) and column definitions (column field):

A5 generates a pseudo table according to A4’s definition. The previously defined column field will play its part for getting data for the pseudo table. A6 selects employees in Sales department using the enumerated pseudo field Dept:

Here T.import() function retrieves records from the pseudo table and returns only the original fields by excluding the pseudo fields. Yet the selection according to the pseudo field is successful as the DeptID for all employees is 1.

In order to return a pseudo field, we need to specify the desired fields in T.import() function. A7, for instance, returns the following query result:

The result set shows clearly the correspondence between values of the enumerated pseudo field and those of the corresponding real field.

A8 performs the filtering operation to retrieve records of married male employees using the binary dimension pseudo field and return the following result:

A query on binary dimension pseudo field needs to specify the column names. Below is A9’s result:

We can also use delete/update/append functions to maintain a pseudo table using user-defined fields:

 

A

1

[Sales,Technology,R&D,Financial,Admin]

2

=create(name,pseudo,enum).record(["DeptID","Dept",A1])

3

=create(name,bits).record(["Bools",["IfMarried","IfLady"]])

4

=create(file,column).record(["pseudo/emps.ctx",A2|A3])

5

=pseudo(A4)

6

=A5.select(EID<5)

7

=A6.select(Dept=="Sales").import(EID,Name,Dept,IfMarried,IfLady)

8

>A7.run(Dept="Technology", IfLady=!IfLady)

9

>A5.update(A7)

10

=A5.import(EID,Name,Dept,IfMarried,IfLady)

11

=A5.import()

A5’s pseudo table uses an enumerated pseudo field and a binary dimension pseudo field. A7 selects employee records where EID is less than 5 and gets the following result:

A8 modifies their departments and genders, as well as executes the update. A10 retrieves records from the updated pseudo table. Note that Dept field and IfLady field of the first three records have been changed after update in the following A10’s result:

We can check in A11 the new data after modification in the composite table file the pseudo table is using:

The update automatically calculates the actual data according to the current values when trying to change values in a pseudo table’s pseudo field, and updates the data to the corresponding composite table. As the update action is based on Bools instead of Gender and Married fields used for look-up, their values become null after the update.

 

At times a field in the original data table associates with records of another table. The field is thus called foreign key column. For instance:

 

A

1

=create(file).record(["pseudo/emps.ctx"])

2

=pseudo(A1).select(DeptID==1).memory().keys(EID)

3

=create(name,dim,fkey).record(["SID",A2,["SID"]])

4

=create(name,exp).record(["OYear","year(OTime)"])

5

=create(file,zone,column).record(["pseudo/orders.ctx",[1],A3|A4])

6

=pseudo(A5)

7

=A6.select(left(SID.Name, 4)=="Jack").import(SID, OTime, OYear, Amount)

8

=A7.new(SID.EID:SID, SID.Name:Name, OTime, OYear, Amount)

9

=A8.import(SID.EID:SID, SID.Name:Name, OTime, OYear, Amount)

A2 gets records of employees in Sales department from the employee pseudo table, loads them into the memory as an in-memory table using T.memory() function, and sets EID as the in-memory table’s primary key:

A3 adds a user-defined pseudo table field:

When a pseudo table field is a foreign key field, we should set dimension table field dim and foreign key field fkey for it. In this example the real field is the foreign key field, so the fkey configuration can be omitted. Here the dimension table field is A2’s salespeople table. A dimension table can be an in-memory table generated from a pseudo table, or any ordinary memory table, table sequence or clustered in-memory table. Here name is the name of the foreign key field, which, in the current case, is the original field name SID. We can also set a new name for the foreign key field, like Seller.

A4 adds a user-defined field:

The field name OYear does not exist in the multi-zone composite table file. It is an ordinary pseudo field. We can define its expression under exp field in the column definition record to calculate its values from a real field.

A5 generates a pseudo table definition record:

A7 gets orders records of an employee named Jack from A6’s pseudo table:

Values of OYear are displayed in the result table. As we mentioned, querying a pseudo field requires specifying the target field name explicitly. In the result set, SID field values are transformed to corresponding records in the dimension table. Just double-click a value to see the corresponding record. For the convenience of viewing, A8 generates a reference table using A6’s query result set:

Besides filtering, a pseudo table’s foreign key field and pseudo field can also be used for queries by specifying the desired fields in import function. For instance, A9 gets salespeople’s IDs and names from the dimension table and orders datetime and amount from the primary table, as shown below:

During the association between a pseudo table’s foreign key field and the dimension table, the latter’s joining field can be a time key, as the following join program shows:

 

A

B

1

=demo.query("select distinct(NAME) from CITIES")

=A1.(NAME)

2

=create(City, FactoryID, PTime)

=8.(elapse@m(date(2020,1,1), (~-1)*3)+rand(3))

3

>B1.run(B2.(if(#<2 || rand()>0.5, A2.insert(0, B1.~, rand(5)+1, B2.~))))

>file("pseudo/FacSupply.btx").export@b(A2)

4

=create(name,dim,fkey,tkey).record(["Product",A2, ["City"], "OTime"])

>A2.keys@t(City;PTime)

5

=create(file,zone,user,date,column).record(["pseudo/orders2.ctx",[1,2],"SID","OTime", A4])

=pseudo(A5)

6

=A5.select(Amount>1100)

=A6.import()

7

=A6. import(City, SID, OTime, Amount, Product.FactoryID:Factory, Product.PTime:PTime)

 

B1 generates a sequence of city names. A2 creates a product supply table. A3 prepares testing data where a random one of five supplying factories supplies products to a certain city at the beginning of each quarter and by 2020-01-01 the first round of supply to all cities should be finished. After A3’s data insertion is completed, A2 has the following supply records:

B3 stores the supply file as a bin file named FacSupply.btx for the convenience of use. B4 sets primary key for table sequence A2, where City is the basic key and PTime is the time key.

A4 adds a user-defined field:

The corresponding dimension table of foreign key field Product is A2’s table sequence. A pseudo table field is set under fkey to correspond to the dimension table’s ordinary key. Since the table sequence has a time key, a pseudo table field needs to be set under tkey to point to this time key.

Below is A5’s pseudo table definition record:

B5 creates a pseudo table using A5’s definition. A6 queries B5 to get order records where Amount is greater than 1100. B6 imports records from A6:

In the retrieved records, Product field contains the supply record corresponding to each batch of products.

In order to view details of the result table of foreign key association, A7 retrieves specified fields from A6’s transaction records, including the factory ID and supply time, as shown below:

According to the above result table, each order record can relate and obtain the its product supply information through the association between time keys.

 

A pseudo table can be generated from in-memory table sequence, in-memory table and cluster in-memory table, too. With these source tables, definition of the pseudo table structure uses a var field to specify name of the variable stored in the in-memory table rather than using the file field to specify the source file name. For example:

 

A

1

>tab=demo.query("select * from CITIES")

2

=create(var).record(["tab"])

3

=pseudo(A2)

4

=A3.select(STATEID==3)

5

=A4.import()

A1 retrieves CITIES table from the database and stores the result table sequence in variable tab. A2 defines a pseudo table using an in-memory table. A4 gets information of cities whose STATEID is 3. A5 imports the selected data as follows:

A pseudo table defined based on an in-memory table has same uses as any other pseudo tables.