Simple SQL

Read(3224) Label: simple sql,

Here’s how to use simple SQL.

Concept:

A simple SQL is used to query data in a data file, which is represented directly using the table namefile name.extensionand which can be treated as an ordinary table sequence. It is so called because the syntax is similar to that of database queries.

 

The file types simple SQL supports include txt, csv, xlsx, xls and btx (a bin file). Each involved data file is supposed to have headers. Both the relative path, which is relative to the esProc main directory, and the absolute path can be used in a simple SQL query.

 

Similar to familiar SQL statements, simple SQl can be used in db.query(sql) and $(db)sql. While the ordinary SQL queries data from the database through a specified to-be-connected data source db, the simple SQL queries data from a file system without specifying data source name. It just leaves data source parameter db null with syntax connect().query(sql)/$()sql.

Syntax of simple SQL:

with T as (x) 

Define a table for data file using an esProc script; the query over the file x returns a table sequence or a cursor

select x F,…

Select data from a table where x is a field or an expression, and F is its alias.

_file

The field is file name when data is imported from a file; it won’t be defined when data comes from other types of sources

_ext

The filename extension.

_size

The size of a file.

_date

The last modified date of a file.

*

The symbol * represents fields in the file only; it does not contain file attributes.

.from T T’

Use a defined table T; T’ is the alias of this table and can be omitted.

from fn T

Use file fn, whose format can be txt, csv, xls, xlsx and btx (a bin file), directly as table T; the file is opened as a cursor, and by default considered to have headers; use a relative path, which is relative to esProc main directory, or an absolute path to access file fn.

Support using wild character in the file name, where * matches zero or multiple characters and ? matches a single character; T can correspond to multiple files and be treated as the union of multiple same-structure data files; do not support accessing the subdirectory.

from {x}

x is an expression that can be executed within the current cellset; it should be an esProc expression that returns a table sequence or a cursor.

as

Can be used before a table alias.

where

The conditional filtering.

join, left join and full join

Associative operations that handle all data in the memory when join operator is present; the corresponding function is xjoin().

/*+parallel (n)*/

Perform a parallel processing when there isn’t a join and when target files are btx (bin file) or ctx (composite table) .

group by

Grouping operation with groups() function, whose result set is supposed to be able to fit into the memory.

group by n 

Perform grouping by the nth expression in the select statement; n should be a constant.

having

Grouping and filtering.

order by

Sorting operations in, by default, ascending order.

order by n

Perform sorting by the nth expression in the select statement; n should be a constant.

distinct

Synonym of SQL DISTINCT clause; support only a single table.

and, or, not, in, is null and

case when else end

Used in SQL style; in only supports sets of constants and doesn’t support subqueries.

 

between

A data range sandwiched by two values, such as value f1 that falls between 1 and 3, which can be written as f1 >= 1 && f1 <= 3

like

Fuzzy query, which supports the following wildcard characters:

%  Match multiple characters;

_  Match a single character;

The operator does not support subqueries.

into fn

Write the result set to file fn, whose format, which can be txt, csv and btx, is determined by its extension;

Check whether the result set and the target file have same structure if the latter already exists; then perform appending if they have same structure, otherwise error will be reported;

Use a relative path, which is relative to esProc main directory, or an absolute path to access the file.

into ${x}.ext

Can use a macro string in the file name and concatenate the result of calculating the macro expression into the file name; support exporting to multiple files.

?, ?i

Parameters in a simple SQL statement; i represents the ith parameter.

aggregate functions

sum, count, max, min and avg.

set operations

union, union all, intersect and minus, which are supposed to be able to be wholly handled in the memory.

subquery

All subqueries are supposed to be able to be wholly executed in memory except for the from statement; support non-parameter subqueries.

compute functions

Defined under sqltranslate() function (without database name and have corresponding SQL expressions); do not support window functions.

top n

Get first n records.

limit n offset m

Skip m records to get n records.

other functions

esProc functions conforming to syntax.

 

String functions:

 

LOWER(str)

Convert to lower case.

UPPER(str)

Convert to upper case.

LTRIM(str)

Delete whitespaces on the leftmost side.

RTRIM(str)

Delete whitespaces on the rightmost side.

TRIM(str)

Delete whitespaces on both sides.

SUBSTR(str,start,len)
SUBSTR(str,start)

Return a substring.

LEN(str)

Return length of a string.

INDEXOF(sub,str[,start])

Return position of a substring.

LEFT(str,len)

Return substring of the specified length beginning from the leftmost of a string.

RIGHT(str,len)

Return substring of the specified length beginning from the rightmost of a string.

CONCAT(str1,str2)

Concatenate two strings.

CONCAT(s1,s2,…)

Concatenate multiple strings.

REPLACE(str,sub,rplc)

Replace a substring by another string.

 

Numeric functions:

 

ABS(x)

Return absolute value.

ACOS(x)

Return arc cosine.

ASIN(x)

Return arc sine.

ATAN(x)

Return arc tangent.

ATAN2(x,y)

Return arc tangent.

CEIL(x)

Return the smallest integer not less than x.

COS(x)

Return cosine value.

EXP(x)

Return the base of e to the power of x.

FLOOR(x)

Return the biggest integer that not greater than x.

LN(x)

Return the natural logarithm.

LOG10(x)

Return the logarithm with base 10.

MOD(x,m)

Return the remainder of x divided by m.

POWER(x,y)

Return the value of x raised to the power of y.

ROUND(x,n)

Return x rounded to n digits from the decimal point.

SIGN(x)

Return the sign for x.

SIN(x)

Return the sine value.

SQRT(x)

Return the square root of x.

TAN(x)

Return the tangent value.

TRUNC(x,n)

Return x truncated to n decimal places.

RAND(seed)

Return a random number.

 

Date&Time functions:

 

YEAR(d)

Return the year.

MONTH(d)

Return the month.

DAY(d)

Return the day of the month.

HOUR(d)

Return the hour.

MINUTE(d)

Return the minute.

SECOND(d)

Return the second.

QUARTER(d)

Return the quarter.

TODAY()

Return the date.

NOW()

Return the current time.

ADDYEARS(d,n)

Add years.

ADDMONTHS(d, n)

Add months.

ADDDAYS(d, n)

Add days.

ADDHOURS(d, n)

Add hours.

ADDMINUTES(d, n)

Add minutes.

ADDSECONDS(d, n)

Add seconds.

DAYOFYEAR(d)

Return the day of year.

WEEKOFYEAR(d)

Return the calendar week of the date.

 

Conversion functions:

 

ASCII(str)

Return ASCII code of the left-most character in a string.

CHR(n)

Convert an integer to characters.

INT(x)

Convert a string or a number to integer.

DECIMAL(x,len,scale)

Convert a string or a number to number.

TIMESTAMP(str)

Convert the format string of yyyy-mm-dd hh24:mi:ss to datetime.

NUMTOCHAR(d)

Convert a number to string.

DATETOCHAR(date)

Convert a date into a string in the format of yyyy-mm-dd hh24:mi:ss.

 

Others:

 

NULLIF(x1,x2)

Return null if x1is equal to x2, otherwise return x1.

COALESCE(x1,…)

Return the first non-null parameter.

COUNTIF(x1,…,xn)

Count a set of data objects meeting the specified criteria.

 

Note:

In a simple SQL statement, a string used for field name, file path, file name or alias should be enclosed by double quotation marks when it contains one or more special identifiers or has a non-underline or non-letter beginning character, otherwise error will be reported. When using quotation marks, make sure they are escaped if needed.

Example:

When used in db.query(sql):

 

A

 

1

=connect()

 

2

=A1.query("select * from Persons.txt")

3

=A1.query("select * from D:/Orders.txt")

4

=A1.query("select Id_P, sum(OrderNo) from Orders.csv group by 1 ")

Group by Id_P field.

5

=A1.query("select Id_P, OrderNo from Orders.csv order by 2 ")

Sort by OrderNo field in default ascending order.

6

=A1.query("select * into p1.txt from Persons.csv ")

Write the query result set to file p1.txt.

7

=A1.query("select * from Persons.csv where Id_P=? or Id_P>? ",2,2)

Retrieve records where Id_P is equal to or greater than 2.

8

=A1.query("select * from Persons.csv where Id_P=?1 or Id_P>?2 ",2,2)

Retrieve records where Id_P is equal to or greater than 2 and get same result as above; ?i represents the ith parameter.

 

9

=A1.query("with persons as (file(\"D:/Persons.btx\").import@b()) select * from persons ")

Perform a query over Persons.btx, name the result set persons, and perform a second query over persons; this can optimize SQL huge data processing.

 

10

=A1.query("select /*+parallel(20)*/ count(*) from Persons.btx")

Perform a query through a join Count records in table Persons.btx with parallel processing.

11

=A1.query("select  *  from Persons.txt  P  join Orders.txt  O  on P.Id_P = O.Id_P  ")

Perform a query through a join.

12

=A1.query("select distinct(Id_P),name  from Orders.txt  ")

Get records where Id_P values are unique.

13

=A1.query(“"select CASE  id_P  when 1  then 'one' when 2 then 'tow' else 'other'  end from p1.txt")

case when statement is used here.

14

=A1.query("select * from Persons.btx  where city  like  'N%' ")

Use like syntax to find from file Persons.btx the record where the first character of city field is N.

15

=A1.query("select * from Persons.btx  where Name  like  '_a_t%' ")

Use like syntax to find from file Persons.btx the record where the second character of Name field is a and where the fourth character is t.

 

 

A

 

1

=file("score1.txt")

 

2

=file("score2.txt")

 

3

=A1.cursor@t()

 

4

=A2.cursor@t()

 

5

=[A3, A4].mcursor@t()

 

6

=connect().query("select CLASS, max(SCORE)  avg_score  from {A5} where SUBJECT='math' group by CLASS")

Use from {x} syntax, where x is a cursor.

When used in $()sql:

 

A

 

1

$()select * from Persons.txt

Query data in file Persons.txt and return result as a table sequence.

2

$select * from D:/Orders.txt

Query data using an absolute path and return result as a table sequence.

3

$select * from Persons.csv where Id_P=? or Id_P>?;2,2

Retrieve records where Id_P is equal to or greater than 2

4

$select  *  from Persons.txt  P  join Orders.txt  O  on P.Id_P = O.Id_P

Perform a query through a join.

5

$select  distinct(Id_P)  from Orders.btx

Get records where Id_P values are unique.

 

 

A

 

1

$()select * from Order_Wines.txt

2

$()select * from Order_Foods.txt

3

$()select * from Order_*.txt

Order_Wines.txt and Order_Foods.txt are two data files of same structure; use wild character * to match file names.

4

$()select _file,_size,_ext,_date,* from Order_*.txt

Query a data file’s fields, and its name, extension, size and the last modified time.

5

>f1="C1,C2,C3"

 

6

$()select *  into  ${f1}.txt  from City.txt

Write data queried from City.txt to a file; f1 is the macro. Below are the three data files generated under the main directory:

When there is/are special identifiers or a string has non-underline or non-letter character in a simple SQL statement:

 

A

 

1

=connect().query("select * from \"D:/2f/cities.txt\" ")

As the first character of the file path is a letter, the path is enclosed by double quotation marks; since there are also another pair of double quotation marks in the outermost layer, this double quotation marks are escaped.

2

$select * from "D:/1cities.txt"

As the first character of the file name is neither an undeline nor a letter, the name is enclosed by double quotation marks; no need to escape the quotation marks here.

3

$select "@ID" from D:/cities.txt

As the field name contains a special character, it is enclosed by double quotation marks.