chardetect()

Description:

Auto-identify the characer set used for a string or a text file.

Syntax:

chardetect(param)

Note:

The function identifies the characer set used for a specified text file when no options are used. Character sets it supports include UTF-8, GBK, UTF-16LE and UTF-16BE.

Identify the character set as GB18030 for text files where the original character sets are GBK, GB2312 and GB18030. The function could return multiple possible character set values when trying to identify the character set used for a specified string or binarty code representing a Traditional Chinese character, a Japanese character or a Korean characters because there are overlaps between character sets for the three languages.

Options:

@v

Identify the character set used for a string or a binary value

Parameters:

param

The to-be-identified string or binary vlaue, name of the text file to be identified, or object/URL of the text file to be identified

Return value:

A charset value or a sequence of charset values

Example:

 

A

 

1

>www="http://www.baidu.com"

 

2

=chardetect(www)

UTF-8

3

=chardetect@v("abc一二三123")

GB-2312

4

>file1="d:/UTF8.xml"

 

5

>file2="d:/UTF16LE.xml"

 

6

=chardetect(file1)

UTF-8

7

=file(file2)

 

8

=chardetect(A7)

UTF-16LE

9

=chardetect@v("你好")