1. Yuri Denshchik
  2. PowerBuilder
  3. Friday, 30 August 2019 16:26 PM UTC

Hello, 

We have to convert blob data into string. The blob content could be encoded in UTF8, ANSI or UTF16LE. 

How to identify which encoder to use for string conversion?

Thank you,

Yuri

Daniel Vivier Accepted Answer Pending Moderation
  1. Sunday, 15 May 2022 12:49 PM UTC
  2. PowerBuilder
  3. # 1

See also this newer thread for some comments and a complex work-around I came up with to handle unknown files that might or might not have the expected Byte Order Marks: https://community.appeon.com/index.php/qna/q-a/opening-unicode-files-without-boms

Comment
There are no comments made yet.
Roland Smith Accepted Answer Pending Moderation
  1. Monday, 4 May 2020 13:45 PM UTC
  2. PowerBuilder
  3. # 2

This might be helpful:

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode

Function boolean IsTextUnicode ( ref blob lpv, long iSize, ref long lpiResult ) Library "advapi32.dll"

 

Comment
There are no comments made yet.
Roland Smith Accepted Answer Pending Moderation
  1. Friday, 30 August 2019 18:50 PM UTC
  2. PowerBuilder
  3. # 3

You could add another column which contains a code to tell you the format. The process that stores the blob should know what the format is and could update the format column.

Comment
There are no comments made yet.
Yuri Denshchik Accepted Answer Pending Moderation
  1. Friday, 30 August 2019 18:01 PM UTC
  2. PowerBuilder
  3. # 4

blob is from database ...

Comment
There are no comments made yet.
John Fauss Accepted Answer Pending Moderation
  1. Friday, 30 August 2019 17:57 PM UTC
  2. PowerBuilder
  3. # 5

If the blob's contents are being read from a file, you can use the FileEncoding( filename ) PowerScript function.

Comment
  1. Mark Goldsmith
  2. Wednesday, 6 May 2020 16:13 PM UTC
Thank you for posting this reference to the bug submission Miguel. I spoke with Bruce Armstrong at Elevate 2019 about an issue I was having with the FileOpen function not being able to interpret accents properly unless I coded it a certain way, which I found out through trial and error. He suggested I submit a bug and I was going to after I couldn't find an existing bug submission (as I was searching on FileOpen) but I never got around to it.



Hopefully the fix for FileEncoding will by default fix FileOpen but if not hopefully FileOpen will also be addressed.



Here is the code I used that ultimately worked:



li_file_handle = FileOpen(ls_path + This.Text(li_counter), StreamMode!, Read!, LockReadWrite!, Append!, EncodingANSI!) //EncodingANSI! required for accents

li_chars_read = FileReadEx(li_file_handle,lb_blob)

ls_total_file = String(lb_blob, EncodingUTF8!) //EncodingUTF8! required for accents



At the time this really struck me as odd having to open the file with ANSI encoding in order to properly read a UTF-8 encoded file, but doing so along with some conversion ended up working. I thought it might be related to byte order marking and very strict adherence to the absence or presence of BOM, but thanks to your bug submission I now know it is a bug.



Regards,

  1. Helpful
  1. Thierry Del Fiore
  2. Wednesday, 6 May 2020 16:29 PM UTC
HI Mark,



I've the same problem and i've found the same workaround.



https://community.appeon.com/index.php/qna/q-a/fileencoding-utf8-file-returns-ansi



My problem is that the input files can be sometimes in ANSI, sometimes in UFT8.

The program doesn't know it.

So the conversion String(lb_blob, EncodingUTF8!) cannot always be done (if the file is ANSI, i get asian character).

So did you find a workaround to check the right file encoding ?





  1. Helpful
  1. Mark Goldsmith
  2. Wednesday, 6 May 2020 23:33 PM UTC
Hi Thierry...so for my current need at the moment I know in advance that the files I am receiving will be UTF-8 encoded (without BOM) and so I don't have the issue that you do. I am also automating this as the files come from an external source and so thankfully I don't have to see the files before processing them. If you/ your users aren't automating the process and are in a position to view the file before accepting it then you could write some code to preview it utilizing the different encoding (taking care when testing with UTF-8 due to the absence of BOM issue) and once it "looks" as it should, then finalize the process.



If not, then the only thing I can think of is to utilize Roland's suggestion re: IsTextUnicode but I thought in the other Q&A thread someone stated it didn't work or it didn't work enough maybe?!



Regards,

  1. Helpful
There are no comments made yet.
Yuri Denshchik Accepted Answer Pending Moderation
  1. Friday, 30 August 2019 17:22 PM UTC
  2. PowerBuilder
  3. # 6

I manually wrote the code below. I'd except that PB natively identify the encoder when converting blob to string. 

 

encoding enc
byte b1, b2


enc = EncodingUTF8!

if not isNull(ab_data) then
	if len(ab_data) >= 2 then
		b1 = byte(blobmid(ab_data,1,1))
		b2 = byte(blobmid(ab_data,2,1))
		
		if b1 = 255 and b2 = 254 then
			enc = EncodingUTF16LE!
		elseif b1 = 254 and b2 = 255 then
			enc = EncodingUTF16BE!
		end if
	end if
end if


return enc
Comment
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.