A. groups()

Description:

Group a table sequence and then get the aggregate result cumulatively.

Syntax:

A.groups(x:F,…;y:G,…)

Note:

The function groups and aggregates table sequence A by expression x to generate a new table sequence with F,.. G… as the fields. Namely, during the traversal through members of A, they will be placed to the corresponding result set one by one while a result set is aggregated cumulatively. Compared with the method of first grouping and then aggregation represented by A.group(x:F,…;y:G,…) function, the function has a better performance.

Option:

@o

Group records by comparing adjacent ones, which is equal to the merging operation, and the result set won’t be sorted

@n

x gets assigned with group numbers which can be used to define the groups. @n and @o are mutually exclusive

@u

Do not sort the result set by x. It doesn’t work with @o/@n

@i

x is a Boolean expression. If the result of x is true, then start a new group. There is only one x

@m

Use parallel algorithm to handle data-intensive or computation-intensive tasks; no definite order for the records in the result set; can’t be used with @o and @i options

@0

Discard the group over which the result of grouping expression x is null

@h

Used over a grouped table with each group ordered to speed up grouping

@t

Return an empty table sequence with data structure if the grouping and aggregate operation over the sequence returns null

@z(…;…;n)

Split the sequence according to groups during parallel computation, and the multiple threads share a same result set; in this case HASH space will not be dynamically adjusted; parameter n is HASH space size, whose value can be default

@e

Return a table sequence consisting of results of computing function y; expression x is a field of sequence A and y is a function on A; the result of y must be one record of A and y only supports maxp, minp and top@1 when it is an aggregate function

Parameter:

A

A sequence

x

Grouping expression

F

Field name of the result table sequence

y

y is the function with which A is traversed. When y is an aggregate function, it only supports sum/count/max/min/top/avg/iterate/icount/median/maxp/minp/concat/var. When the function work with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted. When y isn’t an aggregate function, perform calculation over only the first record in each group

G

Summary field name in the result table sequence

Return value:

Post-grouping table sequence

Example:

 

A

 

1

=demo.query("select * from SCORES where CLASS = 'Class one'")

2

=A1.groups(STUDENTID:StudentID;sum(SCORE):TotalScore)

Group by a single field.

3

=demo.query("select * from SCORES")

 

4

=A3.groups(CLASS:Class,STUDENTID:StudentID;sum(SCORE):TotalScore)

group by multiple fields.

5

=A3.groups@m(STUDENTID:StudentID;sum(SCORE):TotalScore)

Use @m option to increase performance of big data handling.

6

=A3.groups@o(STUDENTID:StudentID;sum(SCORE):TotalScore)

Only compare and merge with the neighboring element, and the result set is not sorted.

7

=demo.query("select * from STOCKRECORDS where STOCKID<'002242'")

 

8

=A7.groups@n(if(STOCKID=="000062",1,2):StockID;sum(CLOSING):TotalPrice)

The value of x is the group ordinal number.

9

=demo.query("select * from EMPLOYEE")

 

10

=A9.groups@u(STATE:State;count(STATE):TotalScore)

Do not sort result set by the sorting field.

11

=A9.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group when STATE=="California".

12

=A3.groups(CLASS:Class,STUDENTID:StudentID;iterate(~~*2,10): Score1)

Perform iterate operation within each group.

13

=file("D:\\Salesman.txt").import@t()

14

=A13.groups@0(Gender:Gender;sum(Age):Total)

Discard groups where Gender values are nulls.

15

=file("D:/emp10.txt").import@t()

For data file emp10.txt, every 10 records are ordered by DEPT.

16

=A15.groups@h(DEPT:dept;sum(SALARY):bouns)

A15 is grouped and ordered by DEPT, for which @h option is used to speed up grouping.

17

=A1.groups(STUDENTID:StudentID;SUBJECT,sum(SCORE):SUMSCORE)

Parameter y isn’t an aggregate function, so the function performs operation over the first record.

18

=demo.query("select * from SCORES where CLASS = 'Class three'")

Return an empty table sequence.

19

=A18.groups@t(STUDENTID:StudentID;sum(SCORE):TotalScore)

Return an empty table sequence with the data structure.

 

 

A

 

1

=demo.query("select * from SCORES")

 

2

=A1.groups@z(STUDENTID:StudentID;sum(SCORE):TotalScore;5)

Split A1’s sequence according to groups during parallel computation; the hash space size is 5.

 

 

A

 

1

=demo.query("select EID,NAME,GENDER,DEPT,SALARY from employee")

 

2

=A1.groups(DEPT;minp(SALARY))

Execute aggregate function minp() and return A3’s records.

3

=A1.groups@e(DEPT;minp(SALARY))

Return a table sequence consisting of result records of computing minp(SALARY).

Related function:

A.group(xi,)

A.group(x:F,...;y:G,…)