1. John Vanleeuwe
  2. PowerBuilder
  3. Friday, 16 October 2020 17:36 PM UTC

Hi all,

We receive a string ( copy and paste) into an MLE.

From this mle string we need to filter out all the ascii values higher than 126.

We have the solution , something like this

string ls_a

ls_a = mle_1.text

for i = 1 to Len(ls_a)

     character = Mid(ls_a,1)

     If Asc(character) > 0 or ASC(character) > 126 Then

    Else

         ls_a = Replace(ls_a, i, 1, " " )

     End If

 

   yield()

next

 

 

So far so good , but try this with a 5.000.000 characters :)

 

Its taking hours :)  How can i speed up this process ? Is there any way i can use a find function , anything ? Other possible solutions ? And no , i am not in charge of the data being copied in our app... I just need to filter it out.

 

 

 

TIA

John

 

 

Accepted Answer
John Fauss Accepted Answer Pending Moderation
  1. Tuesday, 20 October 2020 04:20 AM UTC
  2. PowerBuilder
  3. # Permalink

Greetings, John -

I'm also a little late to the dance, but I love a technical challenge, so I wanted to give this a shot.

You don't always have to resort to C++ or C# (.Net) DLL's to obtain good performance...

I've created a non-visual object named n_ascii containing a function that examines the contents of a blob that contains a Unicode string and replaces any character in the blob that is outside of the ASCII127 code set with a replacement character (such as a blank). It's all PB code with a few calls to the RtlMoveMemory WinAPI function, and it's pretty speedy. On my 8-years-old PC (no speed demon, to be sure), it examines a blob containing a string with roughly 4.5 million characters (approx. 9 million bytes), and replaces 40,000 characters interspersed throughout the string in a little over 2 seconds. Please be aware your mileage may vary.

The technique is pretty similar to what Benjamin describes in his post, but I copy the blob contents in 1,000,000-character segments (chunks) into an UnsignedInteger array so I don't have to examine one value instead of two separate bytes and then take action on any array element with a value > 127.

I've included the n_ascii NVO in a small app that is based on Roland Smith's stringclass free code sample attached to this post. The app uses Roland's n_stringclass NVO to construct the string, but the n_ascii object does NOT require any other objects/classes. The sample app was developed in PB 2017 R2.

Cheers, John

Attachments (1)
Comment
  1. Miguel Leeuwe
  2. Monday, 26 October 2020 17:19 PM UTC
Awesome John!
  1. Helpful
There are no comments made yet.
Benjamin Gaesslein Accepted Answer Pending Moderation
  1. Monday, 19 October 2020 07:35 AM UTC
  2. PowerBuilder
  3. # 1

Powerbuilder treats strings as UTF-16LE so you can convert it into a byte array and loop through that in steps of two, which is much quicker than converting each individual character to its ascii value. This works reasonably well for strings with a a few million characters:

 

ulong lul_len
ulong lul_index
string ls_output
blob lblb_string
byte lby_bytearray[]
lblb_string = blob(as_input)
lby_bytearray = GetByteArray(lblb_string)
lul_len = upperbound(lby_bytearray)
for lul_index = 1 to lul_len step 2
	if lby_bytearray[lul_index] > 126 or lby_bytearray[lul_index + 1] > 0 then
		lby_bytearray[lul_index] = 32
		lby_bytearray[lul_index + 1] = 0
	end if
next 
lblb_string = blob(lby_bytearray)
ls_output = String( lblb_string )

return ls_output

 

Crashed at 26 Million characters, though. For extremely large strings you could split the input into multiple strings and process the parts sequentially. Or look into multithreading.

Comment
  1. John Vanleeuwe
  2. Monday, 26 October 2020 11:20 AM UTC
Thank you.
  1. Helpful
There are no comments made yet.
Miguel Leeuwe Accepted Answer Pending Moderation
  1. Monday, 19 October 2020 04:35 AM UTC
  2. PowerBuilder
  3. # 2

Hi John,

As a follow up on my previous comment "Instead of checking on every character of the pasted text, loop through all characters above asc(126). I'll post a new answer ..." here you go (see attached):

Probably too late, since you needed it last Friday, but I've made a DLL that works (and it's pretty fast too).
I tried to build an MSI installer for it also, but I think it doesn't work (yet), so for now you'd have to register it using regasm.exe.

I've zipped everything up, including a small app (pb 12.6) with a main window and a button where I do 2 calls to the replace function).

To register the DLL you have to run this command from an ADMINISTRATOR commandshell:

"%WINDIR%\Microsoft.NET\Framework\v4.0.30319\regasm.exe" "C:\fullpathToTheDLL\CSReplace.dll" /codebase

Microsoft .NET Framework Assembly Registration Utility version 4.8.4084.0
for Microsoft .NET Framework version 4.8.4084.0
Copyright (C) Microsoft Corporation. All rights reserved.

Types registered successfully

(might be different paths, depending on where your regasm.exe is and where you copy the CSReplace.dll).

If you're interested, tomorrow I'll have a look if I can get the MSI installer working so you don't have to run regasm on your end users' computers.

In my test app I've used a file called "dirs.txt" which is not included.

regards

Attachments (1)
Comment
  1. Miguel Leeuwe
  2. Monday, 19 October 2020 04:38 AM UTC
BTW: in my example app, I'm also replacing ascii code 126, which is not correct, as you want to start replacing from 127.
  1. Helpful
  1. Miguel Leeuwe
  2. Monday, 19 October 2020 10:09 AM UTC
Another BTW: if you call my .net function and the string with which you replace any of the ascii characters defined by your range, is within the ascii range, then you'll end up in an endless loop, so beware of that.

Your example of ranges and replacement character should work though.
  1. Helpful
  1. John Vanleeuwe
  2. Monday, 26 October 2020 11:19 AM UTC
Thank you very much Miguel !
  1. Helpful
There are no comments made yet.
Olan Knight Accepted Answer Pending Moderation
  1. Sunday, 18 October 2020 15:08 PM UTC
  2. PowerBuilder
  3. # 3

John -

 DO NOT attempt this task in PowerBuilder. Use Java. Create a Java Class:
       https://stackoverflow.com/questions/39800324/java-replace-ascii-char

Use the RUN command to execute the Java class.

Done.


Olan

Comment
There are no comments made yet.
John Vanleeuwe Accepted Answer Pending Moderation
  1. Sunday, 18 October 2020 12:41 PM UTC
  2. PowerBuilder
  3. # 4

Hi guys,

 

i am at my wits end :(

anyone can help me with changing the string functions into blob functions please ?

i need to come up with a solution before tomorrow !!!!

 

 

https://onlineasciitools.com/validate-ascii   this site does almost what i want , they validate a copied string of 2MB in less than 2 seconds for non - ascii characters.

this would be step 1 , check if there are any

 

second step would be , replacing them with any ascii character eg. space

 

 

 

 

tia

John

 

 

 

Comment
  1. mike S
  2. Sunday, 18 October 2020 14:14 PM UTC
maybe start with roland's string class:

https://topwizprogramming.com/freecode_stringclass.html

and add your functionality to that
  1. Helpful
There are no comments made yet.
Andrew Barnes Accepted Answer Pending Moderation
  1. Friday, 16 October 2020 23:07 PM UTC
  2. PowerBuilder
  3. # 5

Do you have anybody on your team that is comfortable with C/C++?  That would be a pretty easy function to code in C and the string processing would verily blaze away compared to PowerScript. You then call the DLL function using an external function declaration.

Comment
  1. John Vanleeuwe
  2. Saturday, 17 October 2020 07:18 AM UTC
Hi Andrew,



not really , anyone on here can make me a pricing offer for this :)





Grts

John
  1. Helpful
There are no comments made yet.
Chris Pollach @Appeon Accepted Answer Pending Moderation
  1. Friday, 16 October 2020 17:54 PM UTC
  2. PowerBuilder
  3. # 6

Hi John;

  1. The command "Asc(character) > 0" does not make sense if your App is only looking for > 126 values.
  2. The POS () command would be faster than checking every character in a Loop.
  3. Try to convert the String datum into a Blob data type and then use the BlobXxxxx() functions to process the replacements.

HTH

Regards ... Chris

Comment
  1. John Vanleeuwe
  2. Friday, 16 October 2020 18:07 PM UTC
Thanks Chris as usual :)



1. True.

2. Pos can only work if i know what i am looking for , right ? I want to keep only asc values between 0 and 126. I think that looping through all possibles asc values ( higher than 127) will be slower , no ? What is the max asc value possible ?

3. will have a look in this



Thanks.

John
  1. Helpful
  1. Chris Pollach @Appeon
  2. Friday, 16 October 2020 18:26 PM UTC
Hi John ... then it should be a little faster if you use only one ASC() command ...

Int li_asc

li_asc = Asc (character)

If li_asc > 126 THEN

But still, looping through K's of characters like this would be slow.
  1. Helpful
  1. Miguel Leeuwe
  2. Sunday, 18 October 2020 23:13 PM UTC
Instead of checking on every character of the pasted text, loop through all characters above asc(126). I'll post a new answer ...
  1. Helpful
There are no comments made yet.
mike S Accepted Answer Pending Moderation
  1. Friday, 16 October 2020 17:52 PM UTC
  2. PowerBuilder
  3. # 7

use blobs instead of string if you want 100% PB.

 

if using 2019rx, you can use a .net assembly to do the work

 

or do it in a c dll.

Comment
  1. John Vanleeuwe
  2. Friday, 16 October 2020 18:07 PM UTC
Thanks Mike , still on 2017 R3 here. I'll keep it in mind when we did the migration.



Grts

John

  1. Helpful
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.