Description:
Auto-identify the character set used for a string or a text file.
Syntax:
chardetect(param)
Note:
The function identifies the character set used for a specified text file when no options are used. Character sets it supports include UTF-8, GBK, UTF-16LE and UTF-16BE.
Identify the character set as GB18030 for text files where the original character sets are GBK, GB2312 and GB18030. The function could return multiple possible character set values when trying to identify the character set used for a specified string or binarty code representing a Traditional Chinese character, a Japanese character or a Korean characters because there are overlaps between character sets for the three languages.
Option:
@v |
Identify the character set used for a string or a binary value |
Parameter:
param |
The to-be-identified string or binary vlaue, name of the text file to be identified, or object/URL of the text file to be identified |
Return value:
A charset value or a sequence of charset values
Example:
|
A |
|
1 |
>www="http://www.baidu.com" |
|
2 |
=chardetect(www) |
UTF-8 |
3 |
=chardetect@v("abc一二三123") |
GB-2312 |
4 |
>file1="d:/UTF8.xml" |
Use UTF-8 character set |
5 |
>file2="d:/UTF16LE.xml" |
Use UTF-16LE character set |
6 |
=chardetect(file1) |
UTF-8 |
7 |
=file(file2) |
|
8 |
=chardetect(A7) |
UTF-16LE |
9 |
=chardetect@v("你好") |
|