1. John Vanleeuwe
  2. PowerBuilder
  3. Thursday, 14 November 2019 10:49 AM UTC

Hi guys,

 

i have a file of millions of records. I want to use the importfile function for this but the columns have fixed lengths.

 

how can i process this file without having any column delimiter ?

 

TIA

John

 

 

Roland Smith Accepted Answer Pending Moderation
  1. Thursday, 14 November 2019 11:29 AM UTC
  2. PowerBuilder
  3. # 1

1. Read the file one line at a time.

2. Use the Mid and Trim functions to parse out each value

3. Combine the values into another string variable like this:

ls_import = ls_value1 + "~t" + ls_value2 + ls_value3

4. Import the line:

dw_import.ImportString(ls_import)

 

Comment
  1. Miguel Leeuwe
  2. Friday, 15 November 2019 04:44 AM UTC
I like this idea, but it will probably be slow when manipulating the strings with pb.
  1. Helpful
There are no comments made yet.
Michael Kramer Accepted Answer Pending Moderation
  1. Thursday, 14 November 2019 14:12 PM UTC
  2. PowerBuilder
  3. # 2

For inspiration

A few years back we migrated from Oracle 8 on a Unix box to "modern" MSSQL. Large tables were 8M - 20M rows. Everything was fixed format. Date formats were complete nonsense to MSSQL. And we knew some code tables could have "unexpected" codes. We had no influence on file formats.

We ended up doing the import for each of the large files via staging tables because the shear data volume meant we had to restrict data roundtrips between client app and database.

  1. Staging table #1: INITIAL LOAD
    1. ImportRowID IDENTITY column (PKEY)
    2. LineData VARCHAR(xxx) wide enough to hold each imported line.
  2. Staging table #2: IMPORT DATA
    1. ImportRowID
    2. One varchar column for each imported data field
    3. A status column to help later filtering/statistics
  3. Table #3: IMPORT MESSAGES
    1. NOTE: Potentially multiple message regarding same row
    2. Each row holds {msgID, ImportRowID, Severity, Message}
  4. Table #4: FINAL DATA
    1. ImportRowID
    2. Each data column in correct data type
    3. Rows in error have no final row, rows in warning have final row but user may want to ignore them

Import process

  1. Import full data file into table #1
    Import to this table is trivial but takes long time due to data volume.
  2. Split data into separate columns
    Basically one big INSERT … SELECT … - and therefore FAST!
  3. Validate data
    This is the complex part.
    1. Code values need to checked
    2. Some codes need translation
    3. Dates etc. need format/validity check
    4. etc.
  4. Transform to correct data types
    1. Another big INSERT … SELECT … - and therefore FAST!

User starts the import - then waits because importing 8M - 15M rows takes time even if it is "fast". We let user filter/sort messages, seep through imported data - deciding whether some rows should be accepted despite warnings. User has to confirm import before app finally copied from "FINAL DATA" into the destination tables.

We did many validation "per column" for performance reasons like below:

INSERT INTO ImportMessage(ImportRowID, Severity, Message) 
SELECT ImportRowID, 2, ('Unknown code ' + CodeXX)
FROM ImportData 
WHERE CodeXX NOT IN (SELECT CodeValue FROM CodeTable WHERE CodeType = 'XX')

This validates millions of rows in fractions of time it takes for a client app to process data one row at a time.

HTH /Michael

 

Comment
  1. Miguel Leeuwe
  2. Friday, 15 November 2019 04:46 AM UTC
So basically you have a table with just one column, wide enough to hold a full row? That's great! Once imported, you can use SQL to insert the data in the end-table using substrings, etc. Nice one as long as your data can fit in a single column!
  1. Helpful 1
  1. Michael Kramer
  2. Friday, 15 November 2019 11:31 AM UTC
True. We had fix length well above 500 chars. Anyway if fixed width is very large, you could do varchar(max).

The load-to-table function validated length of each line. Any unexpected length we flagged and aborted import. It could be an attempt to attack our database using some buffer-overrun technique.

The most difficult part was getting info on Code Values we found in data that we were told hadn't been in use for 15+ years. Nobody knew the meaning.
  1. Helpful
There are no comments made yet.
David Peace (Powersoft) Accepted Answer Pending Moderation
  1. Thursday, 14 November 2019 16:35 PM UTC
  2. PowerBuilder
  3. # 3

Hi

You have not said what DB you are targeting, but I would use the database's tools for this job. They process much quicker.

MS SQL has a bulk import tool.

Oracle has a data load tool.

SQLAnywhere has a import wizard tool.

 

Why try and reinvent the wheel.

 

Cheers

David

Comment
There are no comments made yet.
Chris Pollach @Appeon Accepted Answer Pending Moderation
  1. Thursday, 14 November 2019 17:54 PM UTC
  2. PowerBuilder
  3. # 4

Hi John;

   I would check with your DBA team. For example in Oracle, I bypassed this by using Oracle's "External Table Feature". This allows you to define a Table object that maps to an external file. Your Apps still send Oracle regular DML statements but Oracle does the file assess for you and then returns you a regular ANSI result set. Even for updating as well.

FYI: https://oracle-base.com/articles/9i/external-tables-9i

Food for thought.
Regards ... Chris

Comment
There are no comments made yet.
  • Page :
  • 1


There are no replies made for this question yet.
However, you are not allowed to reply to this question.