Using SQL

Read(3524) Label: sql, database, query,

esProc allows using SQL to retrieve data from databases and then perform further analyses and computations based on the orginal database query result to solve the complicated problems which are difficult to deal with in SQL alone.

4.1.1 Database connection and disconnection

Accessing a database with SQL requires connecting to it in the first place. Usually, there are two ways to connect to the database: direct connection in the data source manager or connection through calling a special function in a cellset.

 

A

1

=connect("demo")

A database connected in the data source manager can be referenced by its name and is valid while the connection is on; for a database connected through a function, the connection object will be stored as cell value and can be referenced using the cell name; the database is valid before the referenced connection object closes.

 

A

1

=connect("demo")

2

=demo.query("select * from STATES")

3

=A1.query("select * from STATES")

4

>A1.close()

There are two ways of database disconnection as well. Besides calling db.close() function to close the connection, another way is to close the specified data source in data source manager:

4.1.2 Basic use of SQL statements

Using db.query() function can execute SQL commands in a specified database. A SQL statement could contain various subqueries and database functions.

 

A

1

=connect("demo")

2

=A1.query("select * from STATES where ABBR like 'N%' order by POPULATION desc")

3

>A1.close()

A2 finds states whose abbreviations begin with N, and sorts them by population in descending order. Result is as follows:

A SQL statement can also use cellset data as parameters:

 

A

1

=connect("demo")

2

[CA,ME,NM,SC,LA]

3

=A1.query("select * from STATES where ABBR in (?) order by AREA",A2)

4

>A1.close()

A3 finds states whose abbreviations are contained in the specified sequence and sorts them by area in ascending order. Result is as follows:

In particular, query function will return only the first-found record by using @1 option with it. In this case, the query result will be returned as a sequence comprising field values of the first record. If the record has only one field, a single value will be returned. For example:

 

A

1

=demo.query@1("select * from CITIES")

2

=demo.query@1("select NAME from CITIES where STATEID=5")

Here are results of A1 and A2 with @1 option added:

 

Note that the return value is a single value or a sequence, instead of a table sequence, by using this option.

4.1.3 SQL statements that return no results

If the database operations are performed by a SQL which returns no result set, like create, update, delete and other SQL statements, then db.execute() function should be used. Besides, since these operations don’t assign values to cells, their expressions will begin with ">" instead of "=". For example:

 

A

1

=connect("demo")

2

>A1.execute("update STATES set ABBR='CAA' where ABBR='CA'")

3

=A1.query("select * from STATES where NAME = 'California'")

4

>A1.close()

After the statement in A2 modifies records in the database table STATES, query result of A3 is as follows

SQL statements that return no results can use parameters, too:

 

A

1

=connect("demo")

2

CA

3

CAA

4

>A1.execute("update STATES set ABBR=? where ABBR=?",A2,A3)

5

=A1.query("select * from STATES where NAME = 'California'")

6

>A1.close()

The statement in A4 restores the modified record in database STATES to the original. The query result of A5 is as follows:

4.1.4 Executing SQL statements directly from cells

Besides db.query(sql) and db.execute(sql) functions, esProc also uses $(db)sql;… statement to directly execute a SQL statement. If (db) is omitted, connect to the last database used. If parameters are used in the sql statement, put them after the semicolon. This method doesn’t add an equal sign before the sql statement and surround the statement with double quotation marks; nor does it support @1 option any more. In this case, execute function or query function is not needed to tell the statement whether to return a result set or not. A SELECT statement will return a result set, but other statements will return different values respectively. For example:

 

A

1

$(demo)select * from STATES where ABBR like 'N%' order by POPULATION desc

2

[CA,ME,NM,SC,LA]

3

$select * from STATES where ABBR in (?) order by AREA;A2

The code is functionaly equal to the first two programs in Basic use of SQL statements and gets the same result when executed. A3 doesn’t specify the data source name and thus connect to the last database used, which is still (demo).

Another situation:

 

A

1

$(demo) create table TESTTEMP (ID int, NAME varchar(20))

2

$(demo) insert into TESTTEMP values (2,'Tom Smith')

3

=demo.query("select * from TESTTEMP")

4

$(demo)update TESTTEMP set NAME='Jane White' where ID=2

5

=demo.query("select * from TESTTEMP")

6

$(demo)drop table TESTTEMP

A1 creates a new table sequence. A2 inserts a record into it. A4 modifies the data of a record and A6 drops the table sequence. The updates on the database in A3 and A5 are as follows:

 

Though A1, A2, A4 and A6 all execute a SQL statement, they return values instead. This is different from the examples in the above section. Here’re their return values:

 

 

Values of A1 and A6 mean that the SQL statements return no result sets. Values of A2 and A4 indicate that each has updated one record.

The format of $(db)sql;…. makes more concise code. Note that parameters should be introduced by a semicolon and different types of results could be returned.

4.1.5 Using SQL query results

Various operations, like filtering, sorting and aggregation, can be performed based on the SQL query results in esProc in order to increase query efficiency or to solve some complicated problems.

In the following examples, executions are performed based on A1’s query result obtained by connecting to data source demo in data source manager:

 

A

1

=demo.query("select * from STATES order by POPULATION desc")

The first is filtering the data to select states with certain abbreviations:

 

A

1

=demo.query("select * from STATES order by POPULATION desc")

2

[CA,ME,NM,SC,LA]

3

=A1.select(A2.pos(ABBR)>0)

4

=A1.select(A2.contain (ABBR))

A3 uses A2.pos(ABBR)>0 to check whether a state abbreviation is included in the specified sequence. Or you can use A.contain(x) function to do the same thing. The expressions in A3 and A4 are equal and they get the same result as follows:

Also, aggregation computations can be performed on the data. For example, counting the states whose abbreviations begin with C:

 

A

1

=demo.query("select * from STATES order by POPULATION desc")

2

=A1.count(left(ABBR,1)=="C")

More significantly, you can group the database data according to certain one or more criteria, like grouping the data according to the initials of abbreviations:

 

A

1

=demo.query("select * from STATES order by POPULATION desc")

2

=A1.group(left(ABBR,1))

A2 groups the data according to the initial of each state’s abbreviation. Double-click each group and see details.

Different from the SQL grouping operation that doesn’t produce real groups, esProc grouping operation is a real one, which enables further computations more conveniently. For example, selecting groups that contain three or more states to calculate the total number and population of the states in each of these groups:

 

A

1

=demo.query("select * from STATES order by POPULATION desc")

2

=A1.group(left(ABBR,1))

3

=A2.select(~.count()>=3)

4

=A3.new(left(ABBR,1):Initial,~.count():Count,~.sum(POPULATION):TotalPopulation)

Here’s the final result of A4:

4.1.6 Common SQL statements and their esProc counterparts

1) Select * from

 

A

1

=demo.query("select * from EMPLOYEE")

Here’s the query result is:

2) Select … from

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME, SURNAME, GENDER, BIRTHDAY, DEPT)

3

=demo.query("select EID, NAME, SURNAME, GENDER, BIRTHDAY, DEPT from EMPLOYEE")

Here specified fields are retrieved from the table. A2 and A3 have the same query results:

3As

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME+" "+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)

3

=demo.query("select EID, NAME+' '+SURNAME as FULLNAME, GENDER, year(now())-year(BIRTHDAY) as AGE, DEPT from EMPLOYEE")

Here FULLNAME is obtained according to NAME and SURNAME and AGE is calculated according to BIRTHDAY. A2 and A3 have almost the same query results:

Note that AGE is calculated in A3 by simply subtracting years. The expression will be even more complicated in order to get the exact ages since SQL hasn’t functions specific for calculating ages.

4Where

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME+" "+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)

3

=A2.select(AGE<35)

4

=demo.query("select EID, NAME+' '+SURNAME as FULLNAME, GENDER, year(now())-year(BIRTHDAY) as AGE, DEPT from EMPLOYEE where year(now())-year(BIRTHDAY)<30")

Here employees who are younger than 35 years old are selected. esProc can perform computations using a ready-made result. Here’s A3’s query result:

A4 performs the same query in SQL but the syntax is far more complicated. Besides, it cannot get an accurate result due to the inexact method of calculating ages.

5count, sum, avg, max and min functions

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME+" "+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)

3

=A2.count(AGE<35)

4

=demo.query("select count(EID) from EMPLOYEE where year(now())- year(BIRTHDAY)-(case when month(now())<month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())< day(BIRTHDAY) then 1 else 0 end)<35")

Here the number of employees who are younger than 35 years old is calculated. Based on the ready-made result, esProc performs the computation. Here’s A3’s query result:

This time, A4 uses a more exact yet more complicated method to compute AGE and gets a same query result as A3. But as it jams both the filtering and aggregate operations into one statement, with no ready-made result that can be used, the code becomes very cumbersome.

The uses of SQL functions, such as sum, avg, max and min, are similar to that of count.

6Distinct

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.id(DEPT)

3

=demo.query("select distinct DEPT from EMPLOYEE")

Here the departments in which employees are working are found. A2 and A3 have the same results. Here’s the query result:

7Order by

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME+" "+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)

3

=A2.select(AGE<35).sort(-AGE, FULLNAME ).new(FULLNAME,AGE)

4

=demo.query("select FULLNAME, AGE from (select NAME+' '+SURNAME as FULLNAME, year(now())-year(BIRTHDAY)-(case when month(now())<month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())<day(BIRTHDAY) then 1 else 0 end) as AGE from EMPLOYEE) where AGE<35 order by AGE desc, FULLNAME")

Here employees who are younger than 35 years old are found and sorted by age in descending order, with those of the same ages sorted by FULLNAME in alphabetically ascending order. A3 and A4 get the same query results as follows:

Although A4 tries to simplify the SQL statement with a nested query, it is still complicated because of the awkward age calculation and the fact that there isn’t a ready-made result for use.

8And, or, not and <>

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(EID, NAME+" "+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)

3

=A2.select(AGE<35 && left(FULLNAME,1)=="S").new(FULLNAME, AGE)

4

=demo.query("select FULLNAME, AGE from (select NAME+' '+SURNAME as FULLNAME, year(now())-year(BIRTHDAY)-(case when month(now())<month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())<day(BIRTHDAY) then 1 else 0 end) as AGE from EMPLOYEE) where AGE<35 and left(FULLNAME, 1)='S'")

Here employees who are younger than 35 years old and whose initials of full names are S are requested. Here’s the result:

In esProc, the operator && is used to represent the AND condition and two equal signs == are used to determine whether things are equal or not. These are in line with the customs of many programing languages. Besides, esProc uses the operator "||" to mean or , "!" to mean not , and "!=" to mean <> .

9) Like

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.select(like(NAME,"*a")).new(NAME+" "+SURNAME:FULLNAME)

3

=demo.query("select NAME+' '+SURNAME as FULLNAME from EMPLOYEE where NAME like '%a'")

Here the full names of employees whose names are ended by a are desired. Here’s the query result:

The use of like function requires different wildcard characters for different databases. In this example, percent mark "%" is used to represent zero or more arbitrary characters; while in some databases, asterisk "*" is used to represent the same things. esProc handles the wildcard matching by defining the same syntax for any database.

10)  group operation

 

A

1

=demo.query("select * from EMPLOYEE")

2

=A1.new(NAME+" "+SURNAME: FULLNAME, DEPT).group(DEPT)

3

=A1.groups(DEPT;count(~):ECOUNT)

4

=demo.query("select DEPT, count(*) as ECOUNT from EMPLOYEE group by DEPT order by DEPT")

In the above cellset esProc uses the group function to group records by department. The result is as follows:

It can be seen that the result of data grouping with esProc is multiple groups of records. esProc can use these groups to perform further computations as required.

A3 handles data grouping and aggregation straightforwardly with an esProc function while A4 does the job with SQL. Though they get the same results, SQL doesn’t define an independent "group" concept, and thus can’t help performing an aggregation after data grouping during a query. Here is the final result: