groups()

Read(549) Label: groups,

Here’s how to use groups() function.

A .groups()

Description:

Group a table sequence and then get the aggregate result cumulatively.

Syntax:

A.groups(x:F,…;y:G,…)

Note:

The function groups and aggregates table sequence A simultaneously by one or multiple fields/expressions to generate a new table sequence with F,.. G… as the fields. Namely, during the traversal through members of A, they will be placed to the corresponding result set one by one while a result set is aggregated cumulatively. Compared with the method of first grouping and then aggregation represented by A.group(x:F,…;y:G,…) function, the function has a better performance.

Options:

@o

Group records by comparing adjacent ones, which is equal to the merging operation, and the result set won’t be sorted.

@n

x gets assigned with group numbers which can be used to define the groups. @n and @o are mutually exclusive.

@u

Do not sort the result set by x. It doesn’t work with @o/@n

@i

x is a Boolean expression. If the result of x is true, then start a new group. There is only one x.

@m

Use parallel algorithm to handle data-intensive or computation-intensive tasks; no definite order for the records in the result set; can’t be used with @o and @i options.

@0

Discard the group over which the result of grouping expression x is null

@h

Used over a grouped table with each group ordered to speed up grouping

@t

Return an empty table sequence with data structure if the grouping and aggregate operation over the sequence returns null

@b

Enable returning a resut set containing aggregates only but not subgroups

Parameters:

A

A sequence.

x

Grouping expression.

F

Field name of the result table sequence.

y

y is the function with which A is traversed. When y is an aggregate function, it only supports sum/count/max/min/top/avg/iterate/icount/median. When the function work with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted. When y isn’t an aggregate function, perform calculation over only the first record in each group.

G

Summary field name in the result table sequence.

Return value:

Post-grouping table sequence.

Example:

 

A

 

1

=demo.query("select * from SCORES where CLASS = 'Class one'")

2

=A1.groups(STUDENTID:StudentID;sum(SCORE):TotalScore)

group by a single field.

3

=demo.query("select * from SCORES")

 

4

=A3.groups(CLASS:Class,STUDENTID:StudentID;sum(SCORE):TotalScore)

group by multiple fields.

5

=A3.groups@m(STUDENTID:StudentID;sum(SCORE):TotalScore)

Use @m option to increase performance of big data handling.

6

=A3.groups@o(STUDENTID:StudentID;sum(SCORE):TotalScore)

Only compare and merge with the neighboring element, and the result set is not sorted.

7

=demo.query("select * from STOCKRECORDS where STOCKID<'002242'")

 

8

=A7.groups@n(if(STOCKID=="000062",1,2):StockID;sum(CLOSING):TotalPrice)

The value of x is the group sequence number.

9

=demo.query("select * from EMPLOYEE")

 

10

=A9.groups@u(STATE:State;count(STATE):TotalScore)

Do not sort result set by the sorting field.

11

=A9.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group when STATE=="California".

12

=A3.groups(CLASS:Class,STUDENTID:StudentID;iterate(~~*2,10): Score1)

Perform iterate operation within each group

13

=file("D:\\Salesman.txt").import@t()

14

=A13.groups@0(Gender:Gender;sum(Age):Total)

Discard groups where Gender values are nulls

15

=file("D:/emp10.txt").import@t()

For data file emp10.txt, every 10 records are ordered by DEPT

16

=A15.groups@h(DEPT:dept;sum(SALARY):bouns)

A15 is grouped and ordered by DEPT, for which @h option is used to speed up grouping

17

=A4.groups(STUDENTID:StudentID;SUBJECT,sum(SCORE):SUMSCORE)

Parameter y isn’t an aggregate function, so the function performs operation over the first record; the result is English

18

=demo.query("select * from SCORES where CLASS = 'Class three'")

Return an empty table sequence

19

=A18.groups@t(STUDENTID:StudentID;sum(SCORE):TotalScore)

Return an empty table sequence with the data structure

20

=A1.groups@b(STUDENTID:StudentID;sum(SCORE):TotalScore)

Perform grouping & aggregate over A1’s table sequence and return a column of aggregates

Related functions:

A.group(xi,)

A.group(x:F,...;y:G,…)

ch. groups()

Description:

Group records in a channel.

Syntax:

ch.groups(x:F,…;y:G…;n)

Note:

The function groups records in channel ch according to grouping expression x, by which the records are ordered, to get a channel having F,...G,… fields.

Sort records in the new channel by x. Values of G field are the results of computing expression y over each group. It aims to fetch the grouping result set from the channel.

Options:

@n

With the option the value of expression x is a group number, which points to the desired group

@u

Won’t sort the resulting set by expression x; the option and @n are mutually exclusive

Parameters:

ch

Channel

x

Grouping expression, by which an aggregation over the whole grouped set is performed if x:F is omitted. In that case the semicolon “;” should not be omitted

F

Field names of the resulting table sequence

y

An aggregate function on channel ch, which only supports sum/count/max/min/top /avg/iterate; the parameter Gi should be given up if function iterate(x,a;Gi,…) is used

G

The aggregate fields in the resulting table sequence

Return value:

Channel

Example:

 

A

 

1

=demo.cursor("select * from EMPLOYEE ")

 

2

=channel()

Create a channel

3

=channel()

Create a channel

4

=channel()

Create a channel

5

=channel()

Create a channel

6

=A1.push(A2,A3,A4,A5)

Be ready to push the data in A1’s cursor into channel A2, A3 and A5, but the action needs to wait.

7

=A2.groups(;sum(SALARY):TotalSalary)

As x:F is omitted, calculate the sum of salaries of all employees

8

=A3.groups(DEPT:dept;sum(SALARY):TotalSalary)

Group and sort records by DEPT field

9

=A4.groups@n(if(GENDER=="F",1,2):SubGroups;sum(SALARY):TotalSalary)

The value of x is group number; put records where GENDER is “F” into the first group and others into the second group, and then aggregate each group

10

=A5.groups@u(STATE:State;count(STATE):count)

Won’t sort the resulting set by grouping field

11

A1.select(month(BIRTHDAY)==2)

 

12

A11.fetch()

Attach a fetch operation to A11’s cursor

13

=A2.result()

14

=A3.result()

15

=A4.result()

16

=A5.result()

cs. groups()

Description:

Group records in a cluster cursor, sort them by the grouping field and peform aggregation over each group and add each aggregate to the result set.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group.

This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The G field gets values by computing y on each group.

Options:

@c

Perform the group operation over data in every node and compose the result sets into a cluster memory table in the segmentation way of the cursor; suppport a cluster dimension table

Parameters:

cs

Records in a cluster cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon “;” must not be omitted

F

Field name in the result table sequence

y

An aggregate function on cs, which only supports sum/count/max/min/top /avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n.

 

Return value:

A table sequence/cluster memory table

Example:

 

A

 

1

=file("emp1.ctx","192.168.0.111:8281")

Below is emp1.ctx:

2

=A1.open()

Open a cluster composite table

3

=A2.cursor()

Return a cluster cursor

4

=A3.groups(Dept:dept;count(Name):count)

Group data by DEPT and perform aggregation

 

 

 

A

 

1

[192.168.0.110:8281,192.168.18.143:8281]

 

2

=file("emp.ctx":[1,2], A1)

 

3

=A2. open ()

Open a cluster composite table

4

=A3.cursor()

Create a cluster cursor

5

=A4.groups(GENDER:gender;sum(SALARY):totalSalary)

Group data by GENDER and perform aggregation and return result as a table sequence

6

=A3.cursor()

 

7

=A6.groups@c(GENDER:gender;sum(SALARY):totalSalary).dup()

Retain the way of segmentation of the distributed cursor and return a cluster memory table

Related functions:

A.group(xi,…)

A.group(x:F,…;y:G,…)

A.groups()

cs.groupx()

cs.groups()

Description:

Group records in a cursor.

Syntax:

cs.groups(x:F,…;y:G…;n)

Note:

The function groups records in a cluster cursor by expression x, sorts result by the grouping field, and calculates the aggregate value on each group. This creates a new table sequence consisting of fields F,...G,… and sorted by the grouping field x.The values of F field are the value of x field of the first record in each group and G field gets values by computing yon each group. The aggregation over a cluster cursor will first be performed by the main process on the local machine and the result will then be returned to the machine that initiates the invocation; the process is called reduce.

Options:

@n

The value of grouping expression is group number used to locate the group; you can use n to specify the number of groups and generate corresponding number of zones first.

@u

Do not sort the result set by the grouping expression; it doesn’t work with @n

@o

Compare each record only with its neighboring record to group, which is equivalent to the merge operation, and won’t sort the result set

@i

With this option, the function only has one parameter x that is a boolean expression; start a new group if its result is true

@h

Used over a grouped table with each group ordered to speed up grouping

@b

Enable returning a resut set containing aggregates only but not subgroups

Parameters:

cs

Records in a cursor

x

Grouping expression; if omitting parameters x:F, aggregate the whole set; in this case, the semicolon “;” must not be omitted

F

Field name in the result table sequence

y

An aggregate function on cs, which only supports sum/count/max/min/top/avg/iterate; when the function works with iterate(x,a;Gi,…) function, the latter’s parameter Gi should be omitted

G

Aggregate field name in the result table sequence

n

The specified maximum number of groups; stop executing the function when the number of data groups is bigger than n to prevent memory overflow; the parameter is used in scenarios when it is predicted that data will be divided into a large number of groups that are greater than n; it doesn’t work with @n option. Support a multicuror when this parameter is absent.

Return value:

Post-grouping table sequence

Example:

 

A

 

1

=demo.cursor("select * from SCORES where CLASS = 'Class one'")

 

2

=A1.groups(;sum(SCORE):TotalScore)

As parameters x:F absent, calculate the total score of all students

3

=demo.cursor("select * from FAMILY")

 

4

=A3.groups(GENDER:gender;sum(AGE):TotalAge)

Group and order data by specified fields

5

=demo.cursor("select * from STOCKRECORDS where STOCKID<'002242'")

 

6

=A5.groups@n(if(STOCKID=="000062",1,2):SubGroups;sum(CLOSING):ClosingPrice)

The value of grouping expression is group number; put records whose STOCKID is “000062” to the first group and others to the second group; and meanwhile aggregate each group

7

=demo.cursor("select * from EMPLOYEE")

 

8

=A7.groups@u(STATE:State;count(STATE):Total)

The result set won’t be sorted by the grouping field

9

=demo.cursor("select * from EMPLOYEE")

 

10

=A9.groups@o(STATE:State;count(STATE):Total)

Compare each record with its next neighbor and won’t sort the result set

11

=demo.cursor("select * from EMPLOYEE")

 

12

=A11.groups@i(STATE=="California":IsCalifornia;count(STATE):count)

Start a new group if the current record meets the condition STATE=="California"

13

=file("D:/emp10.txt").cursor@t()

For data file emp10.txt, every 10 records are ordered by DEPT

14

=A13.groups@h(DEPT:DEPT;sum(SALARY):bouns)

As A13 is grouped and ordered by DEPT, use @h option to speed up grouping

15

=demo.query("select  * from employee")

 

16

=A15.cursor@m(3)

Return a multicursor

17

=A16.groups(STATE:state;sum(SALARY):salary)

Group the multicursor with groups function

18

=demo.cursor("select * from scores where class = 'Class one'")

Return a cursor

19

=A18.groups@b(STUDENTID:StudentID;sum(SCORE):TotalScore)

Perform grouping & aggregate over the table sequence stored in A18’s cursor and return a column of aggregates