Bin Files

Read(264) Label: bin file,

Two types of data files – text files and bin files – are mostly used in esProc, of which bin files use data compression code with low CPU consumption, thus taking up less space than the uncompressed text files and enabling more efficient data retrieval. They are a better choice for the use of data files.

Text Files discussed how to manipulate text data in esProc. Here we’ll deal with the manipulation of bin files.

Comparison between bin files and text files

Their uses are almost the same, except that related functions like import, export and cursor need @b option when handling bin files. Here’s an example:

The above two files – PersonnelInfo.btx and PersonnelInfo.txt – hold the same personnel information stored in binary format and text format respectively. Each file includes 100,000 records in 6 fields. As can be seen, the bin file occupies less hard disk space than the text file does.

Let’s look at the data retrieval based on the two types of files:

 

A

1

=now()

2

=file("PersonnelInfo.btx")

3

=A2.cursor@b()

4

=A3.groups(State;count(~):Count)

5

=interval@ms(A1,now())

In the cellset, A3 creates a cursor using the bin file with @b option used. A5 computes the time (in millisecond) spent in performing the grouping and aggregate operation over the bin file:

 

A

1

=now()

2

=file("PersonnelInfo.txt")

3

=A2.cursor@t()

4

=A3.groups(State;count(~):Count)

5

=interval@ms(A1,now())

The above code performs same operation with the text file. Here’s the time taken to do it:

The above cellsets perform the same grouping and aggregate operation on the bin file and the text file respectively to count the employees of each state and then compute the time (in millisecond) taken to do this in A5. As can be seen from the result, the data retrieval speed with the bin file is significantly higher than that with the text file. So, it is more convenient to use data stored as a bin file in esProc.

The esProc installation package includes a btx Viewer. You can execute BTX.exe in esProc\bin directory to view a btx file. Below is the pop-up window after the execution:

Click the Open icon to open a btx file (whose default extension is btx). You can view a btx file page by page:

Click buttons on the tool bar to view the first page (), the previous page (), the next page (), or the last page ().

Click Tool>Option on the menu to set btx Viewer properties, such as the number of rows displayed per page:

 

Retrieving data with the cursor by segment

It is a common approach to split a big data file into segments and then compute each segment separately. Both text files and bin files can be imported by segment by adding a file segmentation parameter. Text Files explains generating a text-based table sequence by importing data by segment. Importing data through cursor is almost identical. For example:

 

A

1

=file("PersonnelInfo.btx")

2

=A1.cursor@b(;1:5)

3

=A2.fetch()

4

=A1.cursor@b(;2:5)

5

=A4.fetch()

Both A2 and A4, when generating the cursor, use a file segmentation parameter, such as 1:5. When data is retrieved from a bin file by segment, the file needs to be pre-handled by the export@z function. That means the export() function uses the @z option to generate the PersonnelInfo.btx file. According to the file segmentation parameter, the cursor data is divided into 5 parts. A2 returns the 1st part and A4 returns the 2nd part. A3 and A5 fetch data respectively as follows:

The PersonnelInfo.btx file has 100,000 records in total. The five parts into which it is divided are approximately same in sizes but don’t necessary contain the same number of records. When performing segmental retrieval, esProc can automatically adjust the range of data being retrieved so as to ensure the data integrity, as with the 1st and the 2nd segments in the exaple. This will ensure the continuity and uniqueness of the data that is being handled.

It’s the same way to perform segmental retrieval on a text file and a bin file, except that @b option will be omitted for the former. While retrieving data by segment from a bin file, the file can be divided into multiple segments corresponding to different groups of data through @z option. For detailed information, see Group Cursor. Segmenting data by group is a feature uniquely applied to bin files; it can’t be used on text files.