A. groups()

Read（1092） Like（0） Label: cumulatively, table sequence, group and aggregate,

Description:

Group a table sequence and then get the aggregate result cumulatively.

Syntax:

A.groups(x:F,…;y:G,…)

Note:

The function groups and aggregates table sequence A by expression x to generate a new table sequence with F,.. G… as the fields. Namely, during the traversal through members of A, they will be placed to the corresponding result set one by one while a result set is aggregated cumulatively. Compared with the method of first grouping and then aggregation represented by A.group(x:F,…;y:G,…) function, the function has a better performance.

Option:

@o	Group records by comparing adjacent ones, which is equal to the merging operation, and the result set won’t be sorted
@n	x gets assigned with group numbers which can be used to define the groups. @n and @o are mutually exclusive
@u	Do not sort the result set by x. It doesn’t work with @o/@n
@i	x is a Boolean expression. If the result of x is true, then start a new group. There is only one x
@m	Use parallel algorithm to handle data-intensive or computation-intensive tasks; no definite order for the records in the result set; can’t be used with @o and @i options
@0	Discard the group over which the result of grouping expression x is null
@h	Used over a grouped table with each group ordered to speed up grouping
@t	Return an empty table sequence with data structure if the grouping and aggregate operation over the sequence returns null
@z(…;…;n)	Split the sequence according to groups during parallel computation, and the multiple threads share a same result set; in this case HASH space will not be dynamically adjusted; parameter n is HASH space size, whose value can be default
@e	Return a table sequence consisting of results of computing function y; expression x is a field of sequence A and y is a function on A; the result of y must be one record of A and y only supports maxp, minp and top@1 when it is an aggregate function

Parameter:

A	A sequence
x	Grouping expression
F	Field name of the result table sequence
y	y is the function with which A is traversed. When y is an aggregate function, it only supports sum/count/max/min/top/avg/iterate/icount/median/maxp/minp/concat/var. When the function work with iterate(x,a;G_i,…) function, the latter’s parameter Gi should be omitted. When y isn’t an aggregate function, perform calculation over only the first record in each group
G	Summary field name in the result table sequence

Return value:

Post-grouping table sequence

Example:

	A
1	*=demo.query("select from SCORES where CLASS = 'Class one'")**
2	=A1.groups(STUDENTID:StudentID;sum(SCORE):TotalScore)	Group by a single field.
3	*=demo.query("select from SCORES")**
4	=A3.groups(CLASS:Class,STUDENTID:StudentID;sum(SCORE):TotalScore)	group by multiple fields.
5	=A3.groups@m(STUDENTID:StudentID;sum(SCORE):TotalScore)	Use @m option to increase performance of big data handling.
6	=A3.groups@o(STUDENTID:StudentID;sum(SCORE):TotalScore)	Only compare and merge with the neighboring element, and the result set is not sorted.
7	*=demo.query("select from STOCKRECORDS where STOCKID<'002242'")**
8	=A7.groups@n(if(STOCKID=="000062",1,2):StockID;sum(CLOSING):TotalPrice)	The value of x is the group ordinal number.
9	*=demo.query("select from EMPLOYEE")**
10	=A9.groups@u(STATE:State;count(STATE):TotalScore)	Do not sort result set by the sorting field.
11	=A9.groups@i(STATE=="California":IsCalifornia;count(STATE):count)	Start a new group when STATE=="California".
12	*=A3.groups(CLASS:Class,STUDENTID:StudentID;iterate(~~2,10): Score1)**	Perform iterate operation within each group.
13	=file("D:\\Salesman.txt").import@t()
14	=A13.groups@0(Gender:Gender;sum(Age):Total)	Discard groups where Gender values are nulls.
15	=file("D:/emp10.txt").import@t()	For data file emp10.txt, every 10 records are ordered by DEPT.
16	=A15.groups@h(DEPT:dept;sum(SALARY):bouns)	A15 is grouped and ordered by DEPT, for which @h option is used to speed up grouping.
17	=A1.groups(STUDENTID:StudentID;SUBJECT,sum(SCORE):SUMSCORE)	Parameter y isn’t an aggregate function, so the function performs operation over the first record.
18	*=demo.query("select from SCORES where CLASS = 'Class three'")**	Return an empty table sequence.
19	=A18.groups@t(STUDENTID:StudentID;sum(SCORE):TotalScore)	Return an empty table sequence with the data structure.

	A
1	*=demo.query("select from SCORES")**
2	=A1.groups@z(STUDENTID:StudentID;sum(SCORE):TotalScore;5)	Split A1’s sequence according to groups during parallel computation; the hash space size is 5.

	A
1	=demo.query("select EID,NAME,GENDER,DEPT,SALARY from employee")
2	=A1.groups(DEPT;minp(SALARY))	Execute aggregate function minp() and return A3’s records.
3	=A1.groups@e(DEPT;minp(SALARY))	Return a table sequence consisting of result records of computing minp(SALARY).

Related function:

A.group(xi,…)

A.group(x:F,...;y:G,…)