1. Subramanyam Kalivarapu
  2. PowerBuilder
  3. Monday, 5 August 2019 18:52 PM UTC

Hello All

I have used the below syntax which is not converting the encoding standard to UTF8! from ANSI!.

fileopen(as_filename,LineMode!, Write!,LockReadWrite!,Replace!,EncodingUTF8!)

The above statement is returning -1 if i'm passing EncodingUTF8! and returning 1 if the last argument in above statement is defined as EncodingANSI!.

My requirement is to convert file from ANSI! to UTF8!.

 

Please suggest.

 

Thanks

Subramanyam K.

Chris Pollach @Appeon Accepted Answer Pending Moderation
  1. Thursday, 15 August 2024 00:33 AM UTC
  2. PowerBuilder
  3. # 1

Hi Daniel,;

  Because HTML files have extended escape characters, you can not use Line Mode. Instead, open the file in Stream Mode and as John correctly states, do not specify the Encoding argument.  HTH

Regards ... Chris

Comment
  1. Benjamin Gaesslein
  2. Thursday, 15 August 2024 12:29 PM UTC
The problem is that Powerbuilder always! expects a byte-order-mark when opening a file with EncodingUTF8! and simply returns -1 when it is not found. The Unicode standard does not require or recommend UTF-8 files to have a BOM so many programs will not write it to the file. UTF-8 files with no BOM are perfectly valid but PB cannot open them without going through some hoops. To read a no-bom-UTF-8 file into a properly encoded UTF-8 string, you have to do a little dance:



- open the file as ANSI in Streammode: filenum = FileOpen(filename, StreamMode!, Read!, LockWrite!, Replace!, EncodingANSI!)

- Read the contents into a blob: FileReadEx(filenum, blobvariable)

- convert the blob variable into a string: utf8string = String( blobvariable, EncodingUTF8! )
  1. Helpful 4
  1. Daniel Seguin
  2. Thursday, 15 August 2024 19:52 PM UTC
Thanks a lot Benjamin! Works great.

With this solution my é stayed as a é in the string variable.

Then I just had to call my function to replace extended letters in html-iso-8859 format.

  1. Helpful
  1. Benjamin Gaesslein
  2. Monday, 19 August 2024 06:53 AM UTC
Glad I could help! This approach in reverse is also the only way to have a PB app create a UTF-8 text file without BOM. Which was my original goal that lead me to figure this out a while ago.
  1. Helpful
There are no comments made yet.
Daniel Seguin Accepted Answer Pending Moderation
  1. Thursday, 15 August 2024 00:07 AM UTC
  2. PowerBuilder
  3. # 2

Hello,

 

I am having the same issue as explained above

ls_tempfile = trim(ls_TempDir) + "\" + trim(as_rte_mode) + ".htm"
rte_autrs.savedocument( ls_tempfile, FileTypeHTML! )

// recupere le contenue du fichier html qu'on vient de creer en memoire
li_fnum = FileOpen(ls_tempfile, LineMode!, Write!, LockReadWrite!, Replace!, EncodingUTF8!)
li_linestatus = FileRead(li_fnum, ls_line)
do while  li_linestatus <> -100
	ls_html = ls_html + trim(ls_line) + ls_cr
	li_linestatus = FileRead(li_fnum, ls_line)
loop
FileClose(li_fnum)
FileDelete(ls_tempfile)

When I run this in the debugger, the li_fnum = -1 

And the html file generated is in utf8.  There is a é in Séguin which shows up correctly in the file.

When I use this line to open the file, li_fnum = FileOpen(ls_tempfile, LineMode!, Read!), I get li_fnum = 1 but I get the é is transformed into a different value in the string variable ls_line

<p lang="en-US" style="text-indent:0pt;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:10pt;">Bonjour salut </span><span style="font-size:10pt;font-style:italic;">HOLA</span><span style="font-size:10pt;"> Daniel Séguin</span></p> 

 

<?xml version="1.0" encoding="UTF-8" ?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta content="TX31_HTM 31.0.1103.500" name="GENERATOR" />
<title></title>
</head>
<body style="font-family:'Arial';font-size:12pt;text-align:left;">
<p lang="en-US" style="text-indent:0pt;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:10pt;">Bonjour salut </span><span style="font-size:10pt;font-style:italic;">HOLA</span><span style="font-size:10pt;"> Daniel Séguin</span></p>
</body>
</html>

 

Question:

Does this mean that if I am opening an html file, even though the file appears to be utf8

I cannot use encoding parameter

therefore I cannot read the proper utf8 character 

 

Comment
There are no comments made yet.
John Fauss Accepted Answer Pending Moderation
  1. Tuesday, 6 August 2019 00:48 AM UTC
  2. PowerBuilder
  3. # 3

I think the issue is that you are expecting the file encoding argument to direct how some kind of data conversion is to be performed. That is not the purpose of this argument. This argument tells PB that the file you want to open for READING uses the specified file encoding.

Here is a key sentence from the FileOpen help topic:

If you specify the optional encoding argument and the existing file does not have the same encoding, FileOpen returns -1.

Comment
  1. Arnd Schmidt
  2. Thursday, 15 August 2024 10:49 AM UTC
Yes, Daniel should check if a BOM exists or use the FileEncoding() Method to get the real encoding of the file before opening or do further processing.
  1. Helpful
  1. Roland Smith
  2. Thursday, 15 August 2024 14:31 PM UTC
It could be that the file doesn't have BOM characters in the first two bytes.
  1. Helpful
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.