Simple SQL

Read（2916） Like（0） Label: simple sql,

Here’s how to use simple SQL.

Concept:

A simple SQL is used to query data in a data file, which is represented directly using the table name【file name.extension】and which can be treated as an ordinary table sequence. It is so called because the syntax is similar to that of database queries.

The file types simple SQL supports include txt, csv, xlsx, xls and btx (a bin file). Each involved data file is supposed to have headers. Both the relative path, which is relative to the esProc main directory, and the absolute path can be used in a simple SQL query.

Similar to familiar SQL statements, simple SQl can be used in db.query(sql) and $(db)sql. While the ordinary SQL queries data from the database through a specified to-be-connected data source db, the simple SQL queries data from a file system without specifying data source name. It just leaves data source parameter db null with syntax connect().query(sql)/$()sql.

Syntax of simple SQL:

with T as (x)	Define a table for data file using an esProc script; the query over the file x returns a table sequence or a cursor
select x F,…	Select data from a table where x is a field or an expression, and F is its alias
_file	The field is file name when data is imported from a file; it won’t be defnied when data comes from other types of sources
_ext	The filename extension
_size	The size of a file
_date	The last modified date of a file
*	The symbol * represents fields in the file only; it does not contain file attributes
.from T T’	Use a defined table T; T’ is the alias of this table and can be omitted
from fn T	Use file fn, whose format can be txt, csv, xls, xlsx and btx (a bin file), directly as table T; the file is opened as a cursor, and by default considered to have headers; use a relative path, which is relative to esProc main directory, or an absolute path to access file fn. Support using wild character in the file name, where * matches zero or multile characters and ? matches a single character; T can correspond to multiple files and be treated as the union of multiple same-structure data files; do not support accessing the subdirectory
from {x}	x is an expression that can be executed within the current cellset; it should be an esProc expression that returns a table sequence or a cursor
as	Can be used before a table alias
where	The conditional filtering
join, left join and full join	Associative operations that handle all data in the memory when join operator is present; the corresponding function is xjoin()
/+parallel (n)/	Perform a parallel processing when there isn’t a join and when target files are btx (bin file) or ctx (composite table)
group by	Grouping operation with groups() function, whose result set is supposed to be able to fit into the memory
group by n	Perform grouping by the n^th expression in the select statement; n should be a constant
having	Grouping and filtering
order by	Sorting operations in, by default, ascending order
order by n	Perform sorting by the n^th expression in the select statement; n should be a constant
distinct	Synonym of SQL DISTINCT clause; support only a single table
and, or, not, in, is null and case when else end	Used in SQL style; in only supports sets of constants and doesn’t support subqueries
between	A data range sandwiched by two values, such as value f1 that falls between 1 and 3, which can be written as f1 >= 1 && f1 <= 3
like	Fuzzy query, which supports the following wildcard characters: % Match multiple characters; _ Match a single character; The operator does not support subqueries
into fn	Write the result set to file fn, whose format, which can be txt, csv and btx, is determined by its extension; Check whether the result set and the target file have same structure if the latter already exists; then perform appending if they have same structure, otherwise error will be reported; Use a relative path, which is relative to esProc main directory, or an absolute path to access the file
into ${x}.ext	Can use a macro string in the file name and concatenate the result of calculating the macro expresson into the file name; support exporting to multiple files
?, ?i	Parameters in a simple SQL statement; i represents the i^th parameter
aggregate functions	sum, count, max, min and avg
set operations	union, union all, intersect and minus, which are supposed to be able to be wholly handled in the memory
subquery	All subqueries are supposed to be able to be wholly executed in memory except for the from statement; support non-parameter subqueries
compute functions	Defined under sqltranslate() function (without database name and have corresponding SQL expressions); do not support window functions
top n	Get first n records
limit n offset m	Skip m records to get n records
other functions	esProc functions conforming to syntax
*String functions:*
LOWER(str)	Convert to lower case
UPPER(str)	Convert to upper case
LTRIM(str)	Delete whitespaces on the leftmost side
RTRIM(str)	Delete whitespaces on the rightmost side
TRIM(str)	Delete whitespaces on both sides
SUBSTR(str,start,len) SUBSTR(str,start)	Return a substring
LEN(str)	Return length of a string
INDEXOF(sub,str[,start])	Return position of a substring
LEFT(str,len)	Return substring of the specified length beginning from the leftmost of a string
RIGHT(str,len)	Return substring of the specified length beginning from the rightmost of a string
CONCAT(str1,str2)	Concatenate two strings
CONCAT(s1,s2,…)	Concatenate multiple strings
REPLACE(str,sub,rplc)	Replace a substring by another string
*Numeric functions:*
ABS(x)	Return absolute value
ACOS(x)	Return arc cosine
ASIN(x)	Return arc sine
ATAN(x)	Return arc tangent
ATAN2(x,y)	Return arc tangent
CEIL(x)	Return the smallest integer not less than x
COS(x)	Return cosine value
EXP(x)	Return the base of e to the power of x
FLOOR(x)	Return the biggest integer that not greater than x
LN(x)	Return the natural logarithm
LOG10(x)	Return the logarithm with base 10
MOD(x,m)	Return the remainder of x divided by m
POWER(x,y)	Return the value of x raised to the power of y
ROUND(x,n)	Return x rounded to n digits from the decimal point
SIGN(x)	Return the sign for x
SIN(x)	Return the sine value
SQRT(x)	Return the square root of x
TAN(x)	Return the tangent value
TRUNC(x,n)	Return x truncated to n decimal places
RAND(seed)	Return a random number
*Date&Time functions:*
YEAR(d)	Return the year
MONTH(d)	Return the month
DAY(d)	Return the day of the month
HOUR(d)	Return the hour
MINUTE(d)	Return the minute
SECOND(d)	Return the second
QUARTER(d)	Return the quarter
TODAY()	Return the date
NOW()	Return the current time
ADDYEARS(d,n)	Add years
ADDMONTHS(d, n)	Add months
ADDDAYS(d, n)	Add days
ADDHOURS(d, n)	Add hours
ADDMINUTES(d, n)	Add minutes
ADDSECONDS(d, n)	Add seconds
DAYOFYEAR(d)	Return the day of year
WEEKOFYEAR(d)	Return the calendar week of the date
*Conversion functions:*
ASCII(str)	Return ASCII code of the left-most character in a string
CHR(n)	Convert an integer to characters
INT(x)	Convert a string or a number to integer
DECIMAL(x,len,scale)	Convert a string or a number to number
TIMESTAMP(str)	Convert the format string of yyyy-mm-dd hh24:mi:ss to datetime
NUMTOCHAR(d)	Convert a number to string
DATETOCHAR(date)	Convert a date into a string in the format of yyyy-mm-dd hh24:mi:ss
*Others:*
NULLIF(x1,x2)	Return null if x1is equal to x2, otherwise return x1
COALESCE(x1,…)	Return the first non-null parameter
COUNTIF(x1,…,xn)	Count a set of data objects meeting the specified criteria

Note:

In a simple SQL statement, a string used for field name, file path, file name or alias should be enclosed by double quotation marks when it contains one or more special identifiers or has a non-underline or non-letter beginning character, otherwise error will be reported. When using quotation marks, make sure they are escaped if needed.

Example:

When used in db.query(sql):

	A
1	=connect()
2	=A1.query("select * from Persons.txt")
3	=A1.query("select * from D:/Orders.txt")
4	=A1.query("select Id_P, sum(OrderNo) from Orders.csv group by 1 ")	Group by Id_P field.
5	=A1.query("select Id_P, OrderNo from Orders.csv order by 2 ")	Sort by OrderNo field in default ascending order.
6	=A1.query("select * into p1.txt from Persons.csv ")	Write the query result set to file p1.txt.
7	=A1.query("select * from Persons.csv where Id_P=? or Id_P>? ",2,2)	Retrieve records where Id_P is equal to or greater than 2.
8	=A1.query("select * from Persons.csv where Id_P=?1 or Id_P>?2 ",2,2)	Retrieve records where Id_P is equal to or greater than 2 and get same result as above; ?i represents the i^th parameter.
9	=A1.query("with persons as (file(\"D:/Persons.btx\").import@b()) select * from persons ")	Perform a query over Persons.btx, name the result set persons, and perform a second query over persons; this can optimize SQL huge data processing.
10	=A1.query("select /+parallel(20)/ count(*) from Persons.btx")	Perform a query through a join Count records in table Persons.btx with parallel processing.
11	=A1.query("select * from Persons.txt P join Orders.txt O on P.Id_P = O.Id_P ")	Perform a query through a join.
12	=A1.query("select distinct(Id_P),name from Orders.txt ")	Get records where Id_P values are unique.
13	=A1.query(“"select CASE id_P when 1 then 'one' when 2 then 'tow' else 'other' end from p1.txt")	case when statement is used here.
14	=A1.query("select * from Persons.btx where city like 'N%' ")	Use like syntax to find from file Persons.btx the record where the first character of city field is N.
15	=A1.query("select * from Persons.btx where Name like '_a_t%' ")	Use like syntax to find from file Persons.btx the record where the second character of Name field is a and where the fourth character is t.

	A
1	=file("score1.txt")
2	=file("score2.txt")
3	=A1.cursor@t()
4	=A2.cursor@t()
5	=[A3, A4].mcursor@t()
6	=connect().query("select CLASS, max(SCORE) avg_score from {A5} where SUBJECT='math' group by CLASS")	Use from {x} syntax, where x is a cursor.

When used in $()sql:

	A
1	$()select * from Persons.txt	Query data in file Persons.txt and return result as a table sequence.
2	$select * from D:/Orders.txt	Query data using an absolute path and return result as a table sequence.
3	$select * from Persons.csv where Id_P=? or Id_P>?;2,2	Retrieve records where Id_P is equal to or greater than 2
4	$select * from Persons.txt P join Orders.txt O on P.Id_P = O.Id_P	Perform a query through a join.
5	$select distinct(Id_P) from Orders.btx	Get records where Id_P values are unique.

	A
1	$()select * from Order_Wines.txt
2	$()select * from Order_Foods.txt
3	$()select * from Order_*.txt	Order_Wines.txt and Order_Foods.txt are two data files of same structure; use wild character * to match file names.
4	$()select _file,_size,_ext,_date,* from Order_*.txt	Query a data file’s fields, and its name, extension, size and the last modified time.
5	>f1="C1,C2,C3"
6	$()select * into ${f1}.txt from City.txt	Write data queried from City.txt to a file; f1 is the macro. Below are the three data files generated under the main directory:

When there is/are special identifiers or a string has non-underline or non-letter character in a simple SQL statement:

	A
1	=connect().query("select * from \"D:/2f/cities.txt\" ")	As the first character of the file path is a letter, the path is enclosed by double quotation marks; since there are also another pair of double quotation marks in the outermost layer, this double quotation marks are escaped.
2	$select * from "D:/1cities.txt"	As the first character of the file name is neither an undeline nor a letter, the name is enclosed by double quotation marks; no need to escape the quotation marks here.
3	$select "@ID" from D:/cities.txt	As the field name contains a special character, it is enclosed by double quotation marks.