Here is the source code for a function I wrote many moons ago as an exercise for learning about Byte Order Markers (BOM's):
public function integer of_getfileencoding (string as_filename, ref encoding as_encoding);
// Determines how a file is encoded by examining the Byte Order Mark (BOM) at
// the beginning of a file. The start of the file has to be read in stream mode,
// otherwise the system skips over the BOM.
//
// There are five BOM's:
// 1. UTF 32 Big Endian (BE) x0000FEFF (byte values 0,0,254,255) Not supported by PB
// 2. UTF 32 Low Endian (LE) xFFFE0000 (byte values 255,254,0,0) Not supported by PB
// 3. UTF 16 Big Endian (BE) xFEFF (byte values 254,255) Recognized by PB
// 4. UTF 16 Low Endian (LE) xFFFE (byte values 255,254) Recognized by PB (default for PB10 & higher)
// 5. UTF 8 xEFBBBF (byte values 239,187,191) Recognized by PB
// 6. ANSI Any byte sequence not listed above Recognized by PB
//
// Arguments:
// String as_filename The path, name & extension of the file to be examined.
// Encoding as_encoding [passed by reference] The encoding in used by the file.
//
// Returns: Integer
// RC = 1 -> Successful determine of the file's encoding.
// RC = -1 & Encoding argument (passed by reference) null if error or not supported.
Integer li_filenum
Long ll_bytesread
Byte lbyte[]
Blob lblob
li_filenum = -1
SetNull(as_encoding)
if not FileExists(as_filename) then
Return -1
end if
// Open the file to be examined in Stream Mode.
li_filenum = FileOpen(as_filename, StreamMode!, Read!, Shared!)
if li_filenum = -1 then
Return -1
end if
// Read the first four bytes of the file (where the BOM resides) into a blob.
ll_bytesread = FileReadEx(li_filenum, lblob, 4)
FileClose(li_filenum)
if ll_bytesread < 4 then
Return -1
end if
// Copy the four bytes in the blob into a byte array for easy examination.
lbyte = GetByteArray(lblob)
// Does the file begin with a recognized BOM?
if lbyte[1] = 0 and lbyte[2] = 0 and lbyte[3] = 254 and lbyte[4] = 255 then
Return -1 // UTF 32 BE not supported by PB
elseif lbyte[1] = 255 and lbyte[2] = 254 and lbyte[3] = 0 and lbyte[4] = 0 then
Return -1 // UTF 32 LE not supported by PB
elseif lbyte[1] = 254 and lbyte[2] = 255 then
as_encoding = EncodingUTF16BE!
elseif lbyte[1] = 255 and lbyte[2] = 254 then
as_encoding = EncodingUTF16LE!
elseif lbyte[1] = 239 and lbyte[2] = 187 and lbyte[3] = 191 then
as_encoding = EncodingUTF8!
else
// No recognizable BOM, so this file is ANSI encoded.
as_encoding = EncodingANSI!
end if
Return 1
end function
HTH, John