chardetect()

Description:

Auto-identify the character set used for a string or a text file.

Syntax:

chardetect(fn,cs)

Note:

The function identifies the character set used for a specified text file when no options are used. Character sets it supports include UTF-8, GBK, UTF-16LE and UTF-16BE.

 

Identify the character set as GB18030 for text files where the original character sets are GBK, GB2312 and GB18030.

 

There could be multiple character set values when trying to identify the character set used for a specified string or binary code representing a (Traditional) Chinese character, a Japanese character or a Korean character because there are overlaps between character sets for the three languages.

 

The function returns the first character set value by default. When parameter cs is present, return the first eligible character set value in the list.

 

fn is interpreted as a URL if it begins with http:// or "https://".

 

When parameter cs is present, return the encoded values included in the cs list.

Option:

@v

Get the character set from fn if it is a string or a binary variable

@a

Return the list of all eligible character sets; return the first eligible one by default

Parameter:

fn

The to-be-identified string or binary value, name of the text file to be identified or object/URL of the text file to be identified

cs

The list of available character sets; can be omitted

Return value:

A charset value or a sequence of charset values

Example:

 

A

 

1

>www="http://www.baidu.com"

 

2

=chardetect(www)

UTF-8.

3

=chardetect@v("abc一二三123")

GB-2312.

4

>file1="d:/UTF8.xml"

Use character set UTF-8.

5

>file2="d:/UTF16LE.xml"

Use UTF-16LE character set.

6

=chardetect(file1)

UTF-8; parameter fn is file name.

7

=file(file2)

 

8

=chardetect(A7)

UTF-16LE; parameter fn is file object.

9

=chardetect@v("你好")

GB2312.

10

=chardetect@av("你好")

Return a list of all eligible character sets.

11

=chardetect@v("你好",["Big5","CP949"])

Return the first eligible character set value in the cs list: Big5.

12

=chardetect@va("你好",["Big5","CP949"])

Return all eligible character sets in the cs list.