groups()

Read(342) Label: groups,

Here’s how to use groups() function.

A .groups()

Description:

Group a table sequence and then get the aggregate result cumulatively.

Syntax:

A.groups(x:F,…;y:G,…)

Note:

The function groups and aggregates table sequence A simultaneously by one or multiple fields/expressions to generate a new table sequence with F,.. G… as the fields. Namely, during the traversal through members of A, they will be placed to the corresponding result set one by one while a result set is aggregated cumulatively. Compared with the method of first grouping and then aggregation, the function has a better performance.

Options:

@o

Group records by comparing adjacent ones, which is equal to the merging operation, and the result set won’t be sorted.

@n

x gets assigned with group numbers which can be used to define the groups. @n and @o are mutually exclusive.

@u

Do not sort the result set by x. It doesn’t work with @o/@n

@i

x is a bool expression. If the result of x is true, then start a new group. There is only one x.

@m

Use parallel algorithm to handle data-intensive or computation-intensive tasks; no definite order for the records in the result set; can’t be used with @oi options.

@0

Discard the group over which the result of grouping expression x is null

@h

Used over a grouped table with each group ordered to speed up grouping

Parameters:

A

A sequence.

x

Grouping expression.

F

Field name of the result table sequence.

y

y is the function with which A is traversed. When y is an aggregate function, it only supports sum/count/max/min/top/avg/iterate/icount/median. When the function work with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted. When y isn’t an aggregate function, perform calculation over only the first record in each group.

G

Summary field name in the result table sequence.

Return value:

Post-grouping table sequence.

Example:

 

A

 

1

=demo.query("select * from SCORES where CLASS = 'Class one'")

2

=A1.groups(STUDENTID:StudentID;sum(SCORE):TotalScore)

group by a single field.

3

=demo.query("select * from SCORES")

 

4

=A3.groups(CLASS:Class,STUDENTID:StudentID;sum(SCORE):TotalScore)

group by multiple fields.

5

=A3.groups@m(STUDENTID:StudentID;sum(SCORE):TotalScore)

Use @m option to increase performance of big data handling.

6

=A3.groups@o(STUDENTID:StudentID;sum(SCORE):TotalScore)

Only compare and merge with the neighboring element, and the result set is not sorted.

7

=demo.query("select * from STOCKRECORDS where STOCKID<'002242'")

 

8

=A7.groups@n(if(STOCKID=="000062",1,2):StockID;sum(CLOSING):TotalPrice)

The value of x is the group sequence number.

9

=demo.query("select * from EMPLOYEE")

 

10

=A9.groups@u(STATE:State;count(STATE):TotalScore)

Do not sort result set by the sorting field.

11

=A9.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group when STATE=="California".

12

=A3.groups(CLASS:Class,STUDENTID:StudentID;iterate(~~*2,10): Score1)

Perform iterate operation within each group

13

=file("D:\\Salesman.txt").import@t()

14

=A13.groups@0(Gender:Gender;sum(Age):Total)

Discard groups where Gender values are nulls

15

=file("D:/emp10.txt").import@t()

For data file emp10.txt, every 10 records are ordered by DEPT

16

=A15.groups@h(DEPT:dept;sum(SALARY):bouns)

A15 is grouped and ordered by DEPT, for which @h option is used to speed up grouping

17

=A4.groups(STUDENTID:StudentID;SUBJECT,sum(SCORE):SUMSCORE)

Parameter y isn’t an aggregate function, so the function performs operation over the first record; the result is English

Note:

Compared with A.group(x:F,…;y:G,…), A.groups(x:F,…;y:G,…) computes in a accumulative way and thus gets a better performance.

Related functions:

A.group(xi,)

A.group(x:F,...;y:G,…)

ch. groups()

Description:

Group records in a channel.

Syntax:

ch.groups(x:F,…;y:G…;n)

Note:

The function groups records in channel ch according to grouping expression x, by which the records are ordered, to get a channel having F,...G,… fields. Sort records in the new channel by x. Values of F field consist of values of x field in the first record of every group, and values of G field are the results of computing expression y, which is an aggregate function executed on cs, over each group. It aims to fetch the grouping result set from the channel.

Options:

@n

With the option the value of expression x is a group number, which points to the desired group

@u

Won’t sort the resulting set by expression x; the option and @n are mutually exclusive

Parameters:

ch

Channel

x

Grouping expression, by which an aggregation over the whole grouped set is performed if x:F is omitted. In that case the semicolon should not be omitted

F

Field names of the resulting table sequence

y

An aggregate function which only supports sum/count/max/min/top /avg/iterate; the parameter Gi should be given up if function iterate(x,a;Gi,…) is used

G

The aggregate fields in the resulting table sequence

Return value:

Channel

Example:

 

A

 

1

=demo.cursor("select * from EMPLOYEE ")

 

2

=channel()

Create a channel

3

=channel()

Create a channel

4

=channel()

Create a channel

5

=channel()

Create a channel

6

=A1.push(A2,A3,A4,A5)

Be ready to push the data in A1’s cursor into channel A2, A3 and A5, but the action needs to wait.

7

=A2.groups(;sum(SALARY):TotalSalary)

As x:F is omitted, calculate the sum of salaries of all employees

8

=A3.groups(DEPT:dept;sum(SALARY):TotalSalary)

Group and sort records by DEPT field

9

=A4.groups@n(if(GENDER=="F",1,2):SubGroups;sum(SALARY):TotalSalary)

The value of x is group number; put records where GENDER is “F” into the first group and others into the second group, and then aggregate each group

10

=A5.groups@u(STATE:State;count(STATE):count)

Won’t sort the resulting set by grouping field

11

A1.select(month(BIRTHDAY)==2)

 

12

A11.fetch()

Attach a fetch operation to A11’s cursor

13

=A2.result()

14

=A3.result()

15

=A4.result()

16

=A5.result()

cs. groups()

Description:

Group records in a cluster cursor, sort them by the grouping field and peform aggregation over each group and add each aggregate to the result set.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group. This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The values of F field are values of x field of the first record in each group and G field gets values by computing y – the aggregate function with which records of cs is aggregated – on each group.

Options:

@c

Perform the group operation over data in every node and compose the result sets into a cluster memory table in the segmentation way of the cursor; suppport a cluster dimension table

@d

Copy the result sets into nodes to compose a cluster duplicate memory table

Parameters:

cs

Records in a cluster cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon (;) must not be omitted

F

Field name in the result table sequence

y

Aggregate function that only supports sum/count/max/min/top /avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n.

 

Return value:

A table sequence/cluster memory table

Example:

 

A

 

1

=file@n("D:/emp1.ctx","192.168.0.111:8281")

Below is emp1.ctx:

2

=A1.create()

Create a cluster composite table

3

=A2.cursor()

Return a cluster cursor

4

=A3.groups(Dept:dept;count(Name):count)

Group data by DEPT and perform aggregation

 

 

 

A

 

1

[192.168.0.110:8281,192.168.18.143:8281]

 

2

=file@0("emp.ctx", A1)

emp.ctx is a distributed file on nodes 192.168.0.110 and 192.168.18.143

3

=A2.create()

Open a cluster composite table

4

=A3.cursor()

Create a cluster cursor

5

=A4.groups(GENDER:gender;sum(SALARY):totalSalary)

Group data by GENDER and perform aggregation and return result as a table sequence

6

=A3.cursor()

 

7

=A6.groups@c(GENDER:gender;sum(SALARY):totalSalary).dup()

Retain the way of segmentation of the distributed cursor and return a cluster memory table

8

=A3.cursor()

 

9

=A8.groups@d(GENDER:gender;sum(SALARY):totalSalary).dup()

Copy the result sets onto nodes to compose a cluster duplicate memory table

Related functions:

A.group(xi,…)

A.group(x:F,…;y:G,…)

A.groups()

cs.groupx()

cs.groups()

Description:

Group records in a cursor.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group. This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The values of F field are the value of x field of the first record in each group and G field gets values by computing y – the aggregate function with which records of cs is aggregated – on each group. The aggregation over a cluster cursor will first be performed by the main process on the local machine and the result will then be returned to the machine that initiates the invocation; the process is called reduce.

Options:

@n

The value of grouping expression is group number used to locate the group; you can use n to specify the number of groups and generate corresponding number of zones first.

@u

Do not sort the result set by the grouping expression; it doesn’t work with @n

@o

Compare each record only with its neighboring record to group, which is equivalent to the merge operation, and won’t sort the result set

 

@i

With this option, the function only has one parameter x that is a bool expression; start a new group if its result is true

 

@h

Used over a grouped table with each group ordered to speed up grouping

 

Parameters:

cs

Records in a cursor/cluster cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon (;) must not be omitted

F

Field name in the result table sequence

y

Aggregate function that only supports sum/count/max/min/top/avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n; it doesn’t work with @n option. Support a multicuror when this parameter is absent.

Return value:

Post-grouping table sequence

Example:

 

A

 

1

=demo.cursor("select * from SCORES where CLASS = 'Class one'")

 

2

=A1.groups(;sum(SCORE):TotalScore)

As parameters x:F absent, calculate the total score of all students

3

=demo.cursor("select * from FAMILY")

 

4

=A3.groups(GENDER:gender;sum(AGE):TotalAge)

Group and order data by specified fields

5

=demo.cursor("select * from STOCKRECORDS where STOCKID<'002242'")

 

6

=A5.groups@n(if(STOCKID=="000062",1,2):SubGroups;sum(CLOSING):ClosingPrice)

The value of grouping expression is group number; put records whose STOCKID is “000062” to the first group and others to the second group; and meanwhile aggregate each group

7

=demo.cursor("select * from EMPLOYEE")

 

8

=A7.groups@u(STATE:State;count(STATE):Total)

The result set won’t be sorted by the grouping field

9

=demo.cursor("select * from EMPLOYEE")

 

10

=A9.groups@o(STATE:State;count(STATE):Total)

Compare each record with its next neighbor and won’t sort the result set

11

=demo.cursor("select * from EMPLOYEE")

 

12

=A11.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group if the current record meets the condition STATE=="California"

13

=file("D:/emp10.txt").cursor@t()

For data file emp10.txt, every 10 records are ordered by DEPT

14

=A13.groups@h(DEPT:DEPT;sum(SALARY):bouns)

As A13 is grouped and ordered by DEPT, use @h option to speed up grouping

15

=demo.query("select  * from employee")

 

16

=A15.cursor@m(3)

Return a multicursor

17

=A16.groups(STATE:state;sum(SALARY):salary)

Group the multicursor with groups function