Hi Ahmed,
Daryl's speculation on the issue being related to encodings is correct. As you can see in the picture below, when I opened the text file you provided, it shows as UTF-8 encoding, but PowerBuilder requires UTF-8 with BOM in order to successfully open it with FileOpen() and EncodingUTF8! Actually, if you manually save the file but change it to UTF-8 with BOM (with Notepad, Notepad++ etc.) and then try to process it I suspect it will work. You may wish to see if you can control the encoding that Apache Tika uses when creating the text file.
If you can't control the encoding there are a couple of options I can think of:
- Once Apache Tika creates the file, open it and manually add the BOM encoding bytes (EF BB BF) to the start of the file before you try and process it, or
- The following code will allow you to get around the absence of the BOM bytes:
li_file_handle = FileOpen(ls_path_filename, StreamMode!, Read!, LockReadWrite!, Append!, EncodingANSI!)
ll_chars_read = FileReadEx(li_file_handle,lblb_file_contents)
ls_contents_as_utf8 = String(lblb_file_contents, EncodingUTF8!)
The above 3 lines of code are basically opening the file in stream mode with ANSI encoding; then reading the contents of the file into a Blob variable; then converting the blob contents to a string using UTF-8 encoding...now you can process what's contained in ls_contents_as_utf8.
HTH...regards,
Mark