groups()

Read(452) Label: groups,

Here’s how to use groups() function.

A .groups()

Description:

Group a table sequence and then get the aggregate result cumulatively.

Syntax:

A.groups(x:F,…;y:G,…)

Note:

The function groups and aggregates table sequence A simultaneously by one or multiple fields/expressions to generate a new table sequence with F,.. G… as the fields. Namely, during the traversal through members of A, they will be placed to the corresponding result set one by one while a result set is aggregated cumulatively. Compared with the method of first grouping and then aggregation represented by A.group(x:F,…;y:G,…) function, the function has a better performance.

Options:

 

@o

Group records by comparing adjacent ones, which is equal to the merging operation, and the result set won’t be sorted.

 

@n

x gets assigned with group numbers which can be used to define the groups. @n and @o are mutually exclusive.

 

@u

Do not sort the result set by x. It doesn’t work with @o/@n

 

@i

x is a boolean expression. If the result of x is true, then start a new group. There is only one x.

 

@m

Use parallel algorithm to handle data-intensive or computation-intensive tasks; no definite order for the records in the result set; can’t be used with @o and @i options.

 

@0

Discard the group over which the result of grouping expression x is null

 

@h

Used over a grouped table with each group ordered to speed up grouping

@t

Return an empty table sequence with data structure if the grouping and aggregate operation over the sequence returns null

 

@b

Enable returning a resut set containing aggregates only but not subgroups

 

Parameters:

A

A sequence.

x

Grouping expression.

F

Field name of the result table sequence.

y

y is the function with which A is traversed. When y is an aggregate function, it only supports sum/count/max/min/top/avg/iterate/icount/median. When the function work with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted. When y isn’t an aggregate function, perform calculation over only the first record in each group.

G

Summary field name in the result table sequence.

Return value:

Post-grouping table sequence.

Example:

 

A

 

1

=demo.query("select * from SCORES where CLASS = 'Class one'")

2

=A1.groups(STUDENTID:StudentID;sum(SCORE):TotalScore)

group by a single field.

3

=demo.query("select * from SCORES")

 

4

=A3.groups(CLASS:Class,STUDENTID:StudentID;sum(SCORE):TotalScore)

group by multiple fields.

5

=A3.groups@m(STUDENTID:StudentID;sum(SCORE):TotalScore)

Use @m option to increase performance of big data handling.

6

=A3.groups@o(STUDENTID:StudentID;sum(SCORE):TotalScore)

Only compare and merge with the neighboring element, and the result set is not sorted.

7

=demo.query("select * from STOCKRECORDS where STOCKID<'002242'")

 

8

=A7.groups@n(if(STOCKID=="000062",1,2):StockID;sum(CLOSING):TotalPrice)

The value of x is the group sequence number.

9

=demo.query("select * from EMPLOYEE")

 

10

=A9.groups@u(STATE:State;count(STATE):TotalScore)

Do not sort result set by the sorting field.

11

=A9.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group when STATE=="California".

12

=A3.groups(CLASS:Class,STUDENTID:StudentID;iterate(~~*2,10): Score1)

Perform iterate operation within each group

13

=file("D:\\Salesman.txt").import@t()

14

=A13.groups@0(Gender:Gender;sum(Age):Total)

Discard groups where Gender values are nulls

15

=file("D:/emp10.txt").import@t()

For data file emp10.txt, every 10 records are ordered by DEPT

16

=A15.groups@h(DEPT:dept;sum(SALARY):bouns)

A15 is grouped and ordered by DEPT, for which @h option is used to speed up grouping

17

=A4.groups(STUDENTID:StudentID;SUBJECT,sum(SCORE):SUMSCORE)

Parameter y isn’t an aggregate function, so the function performs operation over the first record; the result is English

18

=demo.query("select * from SCORES where CLASS = 'Class three'")

Return an empty table sequence

19

=A18.groups@t(STUDENTID:StudentID;sum(SCORE):TotalScore)

Return an empty table sequence with the data structure

20

=A1.groups@b(STUDENTID:StudentID;sum(SCORE):TotalScore)

Perform grouping & aggregate over A1’s table sequence and return a column of aggregates

 

Related functions:

A.group(xi,)

A.group(x:F,...;y:G,…)

ch. groups()

Description:

Group records in a channel.

Syntax:

ch.groups(x:F,…;y:G…;n)

Note:

The function groups records in channel ch according to grouping expression x, by which the records are ordered, to get a channel having F,...G,… fields.

Sort records in the new channel by x. Values of G field are the results of computing expression y, which is an aggregate function executed on ch, over each group. It aims to fetch the grouping result set from the channel.

Options:

@n

With the option the value of expression x is a group number, which points to the desired group

@u

Won’t sort the resulting set by expression x; the option and @n are mutually exclusive

Parameters:

ch

Channel

x

Grouping expression, by which an aggregation over the whole grouped set is performed if x:F is omitted. In that case the semicolon “;” should not be omitted

F

Field names of the resulting table sequence

y

An aggregate function on channel ch, which only supports sum/count/max/min/top /avg/iterate; the parameter Gi should be given up if function iterate(x,a;Gi,…) is used

G

The aggregate fields in the resulting table sequence

Return value:

Channel

Example:

 

A

 

1

=demo.cursor("select * from EMPLOYEE ")

 

2

=channel()

Create a channel

3

=channel()

Create a channel

4

=channel()

Create a channel

5

=channel()

Create a channel

6

=A1.push(A2,A3,A4,A5)

Be ready to push the data in A1’s cursor into channel A2, A3 and A5, but the action needs to wait.

7

=A2.groups(;sum(SALARY):TotalSalary)

As x:F is omitted, calculate the sum of salaries of all employees

8

=A3.groups(DEPT:dept;sum(SALARY):TotalSalary)

Group and sort records by DEPT field

9

=A4.groups@n(if(GENDER=="F",1,2):SubGroups;sum(SALARY):TotalSalary)

The value of x is group number; put records where GENDER is “F” into the first group and others into the second group, and then aggregate each group

10

=A5.groups@u(STATE:State;count(STATE):count)

Won’t sort the resulting set by grouping field

11

A1.select(month(BIRTHDAY)==2)

 

12

A11.fetch()

Attach a fetch operation to A11’s cursor

13

=A2.result()

14

=A3.result()

15

=A4.result()

16

=A5.result()

cs. groups()

Description:

Group records in a cluster cursor, sort them by the grouping field and peform aggregation over each group and add each aggregate to the result set.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group.

This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The G field gets values by computing y – the aggregate function with which records of cs is aggregated – on each group.

Options:

@c

Perform the group operation over data in every node and compose the result sets into a cluster memory table in the segmentation way of the cursor; suppport a cluster dimension table

@d

Copy the result sets into nodes to compose a cluster duplicate memory table

Parameters:

cs

Records in a cluster cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon “;” must not be omitted

F

Field name in the result table sequence

y

An aggregate function on cs, which only supports sum/count/max/min/top /avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n.

 

Return value:

A table sequence/cluster memory table

Example:

 

A

 

1

=file("emp1.ctx":[2],"192.168.0.111:8281")

Below is emp1.ctx:

2

=A1.open()

Open a cluster composite table

3

=A2.cursor()

Return a cluster cursor

4

=A3.groups(Dept:dept;count(Name):count)

Group data by DEPT and perform aggregation

 

 

 

A

 

1

[192.168.0.110:8281,192.168.18.143:8281]

 

2

=file("emp.ctx":[1,2], A1)

 

3

=A2. open ()

Open a cluster composite table

4

=A3.cursor()

Create a cluster cursor

5

=A4.groups(GENDER:gender;sum(SALARY):totalSalary)

Group data by GENDER and perform aggregation and return result as a table sequence

6

=A3.cursor()

 

7

=A6.groups@c(GENDER:gender;sum(SALARY):totalSalary).dup()

Retain the way of segmentation of the distributed cursor and return a cluster memory table

8

=A3.cursor()

 

9

=A8.groups@d(GENDER:gender;sum(SALARY):totalSalary).dup()

Copy the result sets onto nodes to compose a cluster duplicate memory table

Related functions:

A.group(xi,…)

A.group(x:F,…;y:G,…)

A.groups()

cs.groupx()

cs.groups()

Description:

Group records in a cursor.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group. This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The values of F field are the value of x field of the first record in each group and G field gets values by computing y – the aggregate function with which records of cs is aggregated – on each group. The aggregation over a cluster cursor will first be performed by the main process on the local machine and the result will then be returned to the machine that initiates the invocation; the process is called reduce.

Options:

@n

The value of grouping expression is group number used to locate the group; you can use n to specify the number of groups and generate corresponding number of zones first.

@u

Do not sort the result set by the grouping expression; it doesn’t work with @n

@o

Compare each record only with its neighboring record to group, which is equivalent to the merge operation, and won’t sort the result set

 

@i

With this option, the function only has one parameter x that is a boolean expression; start a new group if its result is true

 

@h

Used over a grouped table with each group ordered to speed up grouping

 

@b

Enable returning a resut set containing aggregates only but not subgroups

 

Parameters:

cs

Records in a cursor/cluster cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon “;” must not be omitted

F

Field name in the result table sequence

y

An aggregate function on cs, which only supports sum/count/max/min/top/avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n; it doesn’t work with @n option. Support a multicuror when this parameter is absent.

Return value:

Post-grouping table sequence

Example:

 

A

 

1

=demo.cursor("select * from SCORES where CLASS = 'Class one'")

 

2

=A1.groups(;sum(SCORE):TotalScore)

As parameters x:F absent, calculate the total score of all students

3

=demo.cursor("select * from FAMILY")

 

4

=A3.groups(GENDER:gender;sum(AGE):TotalAge)

Group and order data by specified fields

5

=demo.cursor("select * from STOCKRECORDS where STOCKID<'002242'")

 

6

=A5.groups@n(if(STOCKID=="000062",1,2):SubGroups;sum(CLOSING):ClosingPrice)

The value of grouping expression is group number; put records whose STOCKID is “000062” to the first group and others to the second group; and meanwhile aggregate each group

7

=demo.cursor("select * from EMPLOYEE")

 

8

=A7.groups@u(STATE:State;count(STATE):Total)

The result set won’t be sorted by the grouping field

9

=demo.cursor("select * from EMPLOYEE")

 

10

=A9.groups@o(STATE:State;count(STATE):Total)

Compare each record with its next neighbor and won’t sort the result set

11

=demo.cursor("select * from EMPLOYEE")

 

12

=A11.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group if the current record meets the condition STATE=="California"

13

=file("D:/emp10.txt").cursor@t()

For data file emp10.txt, every 10 records are ordered by DEPT

14

=A13.groups@h(DEPT:DEPT;sum(SALARY):bouns)

As A13 is grouped and ordered by DEPT, use @h option to speed up grouping

15

=demo.query("select  * from employee")

 

16

=A15.cursor@m(3)

Return a multicursor

17

=A16.groups(STATE:state;sum(SALARY):salary)

Group the multicursor with groups function

18

=demo.cursor("select * from scores where class = 'Class one'")

Return a cursor

19

=A18.groups@b(STUDENTID:StudentID;sum(SCORE):TotalScore)

Perform grouping & aggregate over the table sequence stored in A18’s cursor and return a column of aggregates

T.groups( x : F,… ; y : G …; n )

Description:

Group and summarize records in a pseudo table.

Syntax:

T. groups(x:F,…;y:G…;n)

Note:

The function groups records in a pseudo table, sorts groups by the grouping field, and perform aggregates group by group in the sorted order to return to the result set.

It returns a table sequence ordered by grouping expression x and consisting of F,... fields and G,… fields, which are the results of performing aggregate function y on each group.

Options:

@n

Enable locating a value according to parameter x, which is a group number in this case and allow generating corresponding space first; n is the configured number of total groups

@u

Disable sorting the result set by the grouping expression; mutually exclusive with @n option

@o

Enable grouping by comparing the current grouping field value with the next nearest value, which is equivalent to a merge,f and disable sorting on the result set

@i

Enable creating a new grouping when the result of the only parameter x, which is a boolean expression, is true

@h

Used to segment ordered data to increase grouping efficiency

@b

Enable returning a result set containing only the aggregate column without grouped data columns

Parameters:

T

A pseudo table

x

Grouping expression; perform aggregate on the whole table when parameters x:F are absent (the semicolon should be retained though)

F

Field names in the result table sequence

y

An aggregate function, which is um/count/max/min/top/avg/iterate; to use iterate(x,a;Gi,…) function, parameter Gi should be absent

G

The aggregatge field name in the restul table sequence

n

The specified maximum number of groups to be divided into; stop the grouping action when the number of existing groups exceeds n. It is pesent to prevent memory overflow when the estimated number of groups is rather large; generally the real number of groups is slightly greater than n

Return value:

A table sequence

Example:

 

A

 

1

=file("emp.ctx")

 

2

=A1.open()

 

3

=A2.pseudo()

Generate a pseudo table

4

=A3.groups(DEPT:dept;avg(SALARY):AVG_SALARY)

Group records of A3’s pseudo table by DEPT field, calculate average on SALARY field in each group, and return result as a table sequence consisting of dept field and AVG SALARY field and ordered by dept

5

=A3.groups@n(if(GENDER=="F",1,2):GenderGroup;avg(SALARY):AVG_SALARY)

Devide records of A3’s pseudo table into two groups according to whether or not GENDER is F, and calculate average salary for each group

6

=A3.groups@u(DEPT:dept;avg(SALARY):AVG_SALARY)

Use @u option to skip sorting on the grouping result

7

=A3.groups@o(DEPT:dept;avg(SALARY):AVG_SALARY)

Use @o option to group by comparing neighboring value only and disable sorting on the result set

8

=A3.groups@i(GENDER=="M":isMAN;count(EID):Count)

Create a new group if the current record meets the condition GENDER=="M"

9

=A3.groups@b(DEPT:dept;avg(SALARY):AVG_SALARY)

Use @b option to keep only the aggregate column in the result set without the grouped data columns