1. Andrew Davis
  2. PowerBuilder
  3. Tuesday, 15 September 2020 09:40 AM UTC

Morning

I hope you can help, I am trying to search / lookup records in a name field on a database. But i am having a problem with special letters 

for example below - i want to be able to search 'Caus' and have the below show up ? so the special character C - shows, i have the same issue with o, how do treat them the same

Çaushi 

Caushi

 

Mónica

Monica

regards

Andrew

 

John Fauss Accepted Answer Pending Moderation
  1. Friday, 16 October 2020 19:59 PM UTC
  2. PowerBuilder
  3. # 1

Hi, Andrew -

Well, I had some free time recently while getting acquainted with and enjoying my newborn grandson, so I created a non-visual object (NVO) that implements the ASCII Folding Filter functionality. It was developed using PB 2017. A Zip file containing the pbl, workspace and target is attached.

The app is a very simple, single-window application that illustrates how the of_FoldToASCII function in the NVO can be used.

Please Note: My conversion of the Lucene.Net class definition is not designed or intended for mass filtering of large amounts of Unicode text.

It performs the same translation coded in the Apache/Lucene.Net class. You supply an input string, a starting position and the length (number of characters) to be examined, similar arguments as the PowerScript Mid function. The translated input string is returned. There are two alternative versions of the of_FoldToASCII function (overrides) with fewer arguments.

I hope you find the NVO and the sample application helpful. Enjoy!

John

Attachments (1)
Comment
  1. Benjamin Gaesslein
  2. Tuesday, 20 October 2020 07:41 AM UTC
Nice!
  1. Helpful
  1. Andrew Davis
  2. Wednesday, 21 October 2020 07:40 AM UTC
John

Thank you for this, i look forward to trying the code out.

I hope you are continuing to enjoy your grandsons company, lets hope the world gets back to normal soon, we are unable to visit each others homes here at the moment.

regards and thanks again

Andrew
  1. Helpful
There are no comments made yet.
John Fauss Accepted Answer Pending Moderation
  1. Monday, 28 September 2020 18:44 PM UTC
  2. PowerBuilder
  3. # 2

Greetings, Andrew -

In order to accomplish this, I believe you need two things:

1. A means of replacing diacritics in Unicode strings with their plain ASCII-127 "equivalent" characters. Perhaps something akin to the java ASCIIFoldingFilter method in the org.apache.lucene.analysis package.

2. If the information in the DBMS holds string information that contains diacritics that you wish to search/find, then you would likely need or want a version of this information in the DBMS that already has the diacritics replaced, for performance reasons.

You probably would only want to perform this kind of search when the user selects a special "Advanced Search" option.

It would be terrific if someone had a PB-compatible version of the ASCIIFoldingFilter function and was able to share it, but I'm not aware of anyone publishing such a thing. If you can find/use source code that does this or similar, it might be possible to port the code to PowerScript or to C++ and make it available to PowerScript via PBNI.

Good luck!

Regards, John

Comment
  1. Benjamin Gaesslein
  2. Tuesday, 29 September 2020 09:10 AM UTC
There's a C# version here, maybe this can be used somehow.

https://lucenenet.apache.org/docs/3.0.3/dd/d7c/_token_filter_8cs_source.html
  1. Helpful
  1. John Fauss
  2. Monday, 5 October 2020 15:12 PM UTC
You need the ASCIIFolderFilter class, not the TokenFilter class.

https://lucenenet.apache.org/docs/3.0.3/d8/d86/_a_s_c_i_i_folding_filter_8cs_source.html

This is the code that "folds", or translates, the Unicode characters into their closest ASCII127 counterparts. A considerably larger challenge to translate into PowerScript for you. Good luck!
  1. Helpful
There are no comments made yet.
Benjamin Gaesslein Accepted Answer Pending Moderation
  1. Monday, 28 September 2020 13:43 PM UTC
  2. PowerBuilder
  3. # 3

I don't think there's any existing PB functionality that does this, you'd have to program it yourself. Step one would probably be the creation of a lookup table containing every possible variant of all letters and their corresponding "standard" ASCII character. Then you can use this table to convert both the search string and the database results to pure ASCII strings and compare the results.

Comment
There are no comments made yet.
Phil Wallingford Accepted Answer Pending Moderation
  1. Monday, 21 September 2020 16:28 PM UTC
  2. PowerBuilder
  3. # 4

What method/tool are you using to search?

Comment
There are no comments made yet.
René Ullrich Accepted Answer Pending Moderation
  1. Tuesday, 15 September 2020 14:40 PM UTC
  2. PowerBuilder
  3. # 5

Hi Andrew,

yes, I know about this and other issues like ß = ss.

I think it's a feature :-)

 

Comment
  1. Andrew Davis
  2. Tuesday, 15 September 2020 14:56 PM UTC
so do you know a way of searching using normal characters and showing both normal C and the C with the squiggle ?
  1. Helpful
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.