Simple SQL

Read(90) Label: simple sql,

Here’s how to use simple SQL.

 

Concept:

A simple SQL is used to query data in a data file, which is represented directly using the table namefile name.extensionand which can be treated as an ordinary table sequence. It is so called because the syntax is similar to that of database queries.

The file types simple SQL supports include txt, csv, xlsx, xls, btx (a bin file) and ctx (a composite file). Each involved data file is supposed to have headers. Both the relative path, which is relative to the esProc main directory, and the absolute path can be used in a simple SQL query.

Similar to familiar SQL statements, simple SQl can be used in db.query(sql) and $(db)sql. While the ordinary SQL queries data from the database through a specified to-be-connected data source db, the simple SQL queries data from a file system without specifying data source name. It just leaves data source parameter db null with syntax connect().query(sql)/$()sql.

Syntax of simple SQL:

with T as (x) 

Define a table for data file using an esProc script; the query over the file x returns a table sequence or a cursor.

select x F,…

Select data from a table where x is a field or an expression and F is its alias.

.from T T’

Use a defined table T; T’ is the alias of this table and can be omitted.

from fn T

Use file fn, whose format can be txt, csv, xls, xlsx and btx (a bin file), directly as table T; the file is opened as a cursor, and by default considered to have headers; use a relative path, which is relative to esProc main directory, or an absolute path to access file fn.

from {x}

x is an expression that can be executed within the current cellset; it should be an esProc expression that returns a table sequence or a cursor.

as

Can be used before a table alias.

where

The conditional filtering.

join, left join and full join

Associative operations that handle all data in the memory when join operator is present; the corresponding function is xjoin().

/* parallel (n) */

Perform a parallel processing when there isn’t a join and when target files are btx (bin file) or ctx (composite table).

group by

Grouping operation with groups() function, whose result set is supposed to be able to fit into the memory.

group by n 

Perform grouping by the nth expression in the select statement; n should be a constant.

having

Grouping and filtering.

order by

Sorting operations in, by default, ascending order.

order by n

Perform sorting by the nth expression in the select statement; n should be a constant.

distinct

Synonym of SQL DISTINCT clause; support only a single table.

and, or, not, in, is null and

case when else end

Used in SQL style; in only supports sets of constants and doesn’t support subqueries.

 

between

A data range sandwiched by two values, such as value f1 that falls between 1 and 3, which can be written as f1 >= 1 && f1 <= 3.

like

Fuzzy query, which supports the following wildcard characters:

%  Match multiple characters;

_  Match a single character;

[chracter,…] Match any single character in the character list;

[!chracter,…] or [^chracter,…] Won’t match any single character in the character list;

The operator does not support subqueries.

into fn

Write the result set to file fn, whose format, which can be txt, csv and btx, is determined by its extension;

Check whether the result set and the target file have same structure if the latter already exists; then perform appending if they have same structure, otherwise error will be reported;

Use a relative path, which is relative to esProc main directory, or an absolute path to access the file.

?, ?i

Parameters in a simple SQL statement; i represents the ith parameter.

aggregate functions

sum, count, max, min and avg.

set operations

union, union all, intersect and minus, which are supposed to be able to be wholly handled in the memory.

subquery

All subqueries are supposed to be able to be wholly executed in memory except for the from statement; support non-parameter subqueries.

compute functions

Defined under sqltranslate() function (without database name and have corresponding SQL expressions); do not support window functions.

top n

Get first n records.

limit n offset m

Skip m records to get n records.

other functions

esProc functions conforming to syntax.

 

String functions:

 

LOWER(str)

Convert to lower case

UPPER(str)

Convert to upper case

LTRIM(str)

Delete whitespaces on the leftmost side

RTRIM(str)

Delete whitespaces on the rightmost side

TRIM(str)

Delete whitespaces on both sides

SUBSTR(str,start,len)
SUBSTR(str,start)

Return a substring

LEN(str)

Return length of a string

INDEXOF(sub,str[,start])

Return position of a substring

LEFT(str,len)

Return substring of the specified length beginning from the leftmost of a string

RIGHT(str,len)

Return substring of the specified length beginning from the rightmost of a string

CONCAT(str1,str2)

Concatenate two strings

CONCAT(s1,s2,…)

Concatenate multiple strings

REPLACE(str,sub,rplc)

Replace a substring by another string

 

Numeric functions:

 

ABS(x)

Return absolute value

ACOS(x)

Return arc cosine

ASIN(x)

Return arc sine

ATAN(x)

Return arc tangent

ATAN2(x,y)

Return arc tangent

CEIL(x)

Return the smallest integer not less than x

COS(x)

Return cosine value

EXP(x)

Return the base of e to the power of x

FLOOR(x)

Return the biggest integer that not greater than x

LN(x)

Return the natural logarithm

LOG10(x)

Return the logarithm with base 10

MOD(x,m)

Return the remainder of x divided by m

POWER(x,y)

Return the value of x raised to the power of y

ROUND(x,n)

Return x rounded to n digits from the decimal point

SIGN(x)

Return the sign for x

SIN(x)

Return the sine value

SQRT(x)

Return the square root of x

TAN(x)

Return the tangent value

TRUNC(x,n)

Return x truncated to n decimal places

RAND(seed)

Return a random number

 

Date&Time functions:

 

YEAR(d)

Return the year

MONTH(d)

Return the month

DAY(d)

Return the day of the month

HOUR(d)

Return the hour

MINUTE(d)

Return the minute

SECOND(d)

Return the second

QUARTER(d)

Return the quarter

TODAY()

Return the date

NOW()

Return the current time

ADDYEARS(d,n)

Add years

ADDMONTHS(d, n)

Add months

ADDDAYS(d, n)

Add days

ADDHOURS(d, n)

Add hours

ADDMINUTES(d, n)

Add minutes

ADDSECONDS(d, n)

Add seconds

DAYOFYEAR(d)

Return the day of year

WEEKOFYEAR(d)

Return the calendar week of the date

 

Conversion functions:

 

ASCII(str)

Return ASCII code of the left-most character in a string

CHR(n)

Convert an integer to characters

INT(x)

Convert a string or a number to integer

DECIMAL(x,len,scale)

Convert a string or a number to number

TIMESTAMP(str)

Convert the format string of yyyy-mm-dd hh24:mi:ss to datetime

NUMTOCHAR(d)

Convert a number to string

DATETOCHAR(date)

Convert a date into a string in the format of yyyy-mm-dd hh24:mi:ss

 

Others:

 

NULLIF(x1,x2)

Return null if x1is equal to x2, otherwise return x1

COALESCE(x1,…)

Return the first non-null parameter

COUNTIF(x1,…,xn)

Count a set of data objects meeting the specified criteria

Example:

When used in db.query(sql):

 

A

 

1

=connect()

 

2

=A1.query("select * from Persons.txt")

3

=A1.query("select * from D:/Orders.txt")

4

=A1.query("select Id_P, sum(OrderNo) from Orders.csv group by 1 ")

Group by Id_P field

5

=A1.query("select Id_P, OrderNo from Orders.csv order by 2 ")

Sort by OrderNo field in default ascending order

6

=A1.query("select * into p1.txt from Persons.csv ")

Write the query result set to file p1.txt

7

=A1.query("select * from Persons.csv where Id_P=? or Id_P>? ",2,2)

Retrieve records where Id_P is equal to or greater than 2

8

=A1.query("select * from Persons.csv where Id_P=?1 or Id_P>?2 ",2,2)

Retrieve records where Id_P is equal to or greater than 2 and get same result as above; ?i represents the ith parameter

 

9

=A1.query("with persons as (file(\"D:/Persons.btx\").import@b()) select * from persons ")

Perform a query over Persons.btx, name the result set persons, and perform a second query over persons; this can optimize SQL huge data processing

 

10

=A1.query("select /*+parallel(20)*/ count(*) from Persons.btx")

Perform a query through a join Count records in table Persons.btx with parallel processing

11

=A1.query("select  *  from Persons.txt  P  join Orders.txt  O  on P.Id_P = O.Id_P  ")

Perform a query through a join

12

=A1.query("select distinct(Id_P),name  from Orders.txt  ")

Get records where Id_P values are unique

13

=A1.query(“select CASE  id_P  when 1  then 'one' when 2 then 'tow' else 'other'  end from p1.txt”)

case when statement is used here

14

=A1.query("select * from Persons.btx  where city  like  'N%' ")

Use like syntax to find from file Persons.btx the record where the first character of city field is N

15

=A1.query("select * from Persons.btx  where Name  like  '_a_t%' ")

Use like syntax to find from file Persons.btx the record where the second character of Name field is a and where the fourth character is t

16

=A1.query("select * from Persons.btx  where Name  like  '%[us]%' ")

Use like syntax to find from file Persons.btx the record where Name field contains character u or s

17

=A1.query("select * from Persons.btx  where Name  like  '%[!a]%' ")

Use like syntax to find from file Persons.btx the record where Name field does not contain character a

18

=A1.query("select * from Persons.btx  where Name  like  '%[^s]%' ")

Use like syntax to find from file Persons.btx the record where Name field does not contain character s

 

 

A

 

1

=file("score1.txt")

 

2

=file("score2.txt")

 

3

=A1.cursor@t()

 

4

=A2.cursor@t()

 

5

=[A3, A4].mcursor@t()

 

6

=connect().query("select CLASS, max(SCORE)  avg_score  from {A5} where SUBJECT='math' group by CLASS")

Use from {x} syntax, where x is a cursor

 

When used in $()sql:

 

A

 

1

$()select * from Persons.txt

Query data in file Persons.txt and return result as a table sequence

2

$select * from D:/Orders.txt

Query data using an absolute path and return result as a table sequence

3

$select * from Persons.csv where Id_P=? or Id_P>?;2,2

Retrieve records where Id_P is equal to or greater than 2

4

$select  *  from Persons.txt  P  join Orders.txt  O  on P.Id_P = O.Id_P

Perform a query through a join

5

$select  distinct(Id_P)  from Orders.btx

Get records where Id_P values are unique