1. Phil Jorgensen
  2. PowerBuilder
  3. Wednesday, 31 January 2024 22:33 PM UTC

Hello,

I have a question on string encoding behavior when sending an HTML SMTP message. Here is a description of the workflow:

First, we are reading in an HTML file as a message template. We are also reading a customizable message from a DataWindow from a value stored in the database. We do a Replace to insert the custom message into the template before passing the entire string to the SMTP service.

The issue I'm having is that the resulting email seems to have two different encodings. Both the template HTML file and the custom message inserted into it contain accented characters, but in the received message the custom characters are not displayed correctly, while the template characters are.

Using Notepad++ to switch between UTF-8 and ANSI encodings seems to switch which characters are displayed correctly, which is odd.

Since it is my understanding that PowerBuilder 2021 uses Unicode internally for all processing, how could this be possible? Is there a way to re-encode the string as Unicode to ensure consistency?

Thank you,

Phil

Accepted Answer
Phil Jorgensen Accepted Answer Pending Moderation
  1. Tuesday, 6 February 2024 18:51 PM UTC
  2. PowerBuilder
  3. # Permalink

Thanks for the response, Chris.

I was able to solve my problem. I eventually started looking at each string in the process to trace the resulting encoding using a hex editor to see what characters were actually there.

The root causes of my issues are as follows:

1) In the string storing the template email file being read, I was noticing some extra bytes being inserted in the UTF-8 characters, such as "83 c2". This led me to an article outlining how the string had likely been "double-UTF8 encoded," which led me to test and see that the UTF8 file being loaded was seen by PowerBuilder as ANSI encoded. My temporary solution for testing is to read the file into a Blob (which PB thinks is ANSI encoded) and then reading that Blob to string as if it were already UTF8 encoded:

        string ls_email_template_path
        integer li_FileNum
        long ll_file_bytes
        blob lb_ansi
        ls_email_template_path = "c:\path\to\template.html"
        li_FileNum = FileOpen(ls_email_template_path, TextMode!, Read!, Shared!)
        ll_file_bytes = FileReadEx(li_FileNum, lb_ansi)
        //FORCE TO UTF8 as the file is being read as ANSI when it is actually UTF8 encoded
        ls_email_body = string(lb_ansi, EncodingUTF8!)
        FileClose(li_FileNum)

The long-term solution is to instead ensure the file templates are UTF8-BOM encoded to remove ambiguity when reading them.

2) The next issue was that while all of the characters were encoded consistently, they were also displayed incorrectly in the resulting email. I believe this issue is related to our SMTP pipeline, and as a workaround I am actually "double-encoding" the HTML message text in UTF8.

        //Workaround - double-encode the text as UTF8
        blob lb_msg
        lb_msg = Blob(ls_email_body, EncodingUTF8!)
        ls_email_body = String(lb_msg, EncodingANSI!)

Since it works, it tells me that somewhere in the pipeline the message text is being re-encoded incorrectly, but that is an issue for another day.

Comment
  1. Chris Pollach @Appeon
  2. Tuesday, 6 February 2024 22:09 PM UTC
Hi Phil;

Thank you for sharing that workaround solution that you created!

Yes, I have had ANSI file challenges before too in my PB Past when handling external ANSI files from outside sources.

I had to get a bit creative too on some data massaging. ;-)

We look forward to your further insights in this area.

Regards .. Chris
  1. Helpful
  1. Benjamin Gaesslein
  2. Wednesday, 7 February 2024 13:30 PM UTC
FWIW, Powerbuilder's FileOpen and FileReadEx functions can only directly read files as UTF-8 when these files have the UTF-8 BOM set. (First three bytes have to be set as EF BB BF) However, most editors that save UTF-8 files do not write a BOM for UTF-8 because it is not strictly necessary. Some software (looking at you, SQLPlus) cannot handle the BOM at all.

Using FileOpen with EncodingANSI! and then reading the contents into a blob that you later interpret as UTF8 is also the only way I've found to circumvent the issue with opening non-BOM UTF8 files.
  1. Helpful 2
There are no comments made yet.
Chris Pollach @Appeon Accepted Answer Pending Moderation
  1. Thursday, 1 February 2024 01:51 AM UTC
  2. PowerBuilder
  3. # 1

Hi Phil;

  Even though Appeon PB is Unicode by design, your PB Apps can process data in either ANSI, UTF-8 or UTF-16LE encoding. It's all up to the way your Apps use related commands with the above encoding set.

   The type of data encoding is also a factor in dealing with your DBMS. So I would check your applications code handling of any File processing plus DML handling of any Blob column data in the DB where HTML is stored & retrieved.  HTH 

Regards ... Chris 

Comment
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.