Use importfile function with fixed length data

Resolved Use importfile function with fixed length data

How-to

Votes

Undo

John Vanleeuwe
PowerBuilder
Thursday, 14 November 2019 10:49 AM UTC

Hi guys,

i have a file of millions of records. I want to use the importfile function for this but the columns have fixed lengths.

how can i process this file without having any column delimiter ?

TIA

John

Responses (4)

Chris Pollach @Appeon Accepted Answer Pending Moderation

Thursday, 14 November 2019 17:54 PM UTC
PowerBuilder
# 1

Hi John;

I would check with your DBA team. For example in Oracle, I bypassed this by using Oracle's "External Table Feature". This allows you to define a Table object that maps to an external file. Your Apps still send Oracle regular DML statements but Oracle does the file assess for you and then returns you a regular ANSI result set. Even for updating as well.

FYI: https://oracle-base.com/articles/9i/external-tables-9i

Food for thought.
Regards ... Chris

Comment

There are no comments made yet.

David Peace (Powersoft) Accepted Answer Pending Moderation

Thursday, 14 November 2019 16:35 PM UTC
PowerBuilder
# 2

You have not said what DB you are targeting, but I would use the database's tools for this job. They process much quicker.

MS SQL has a bulk import tool.

Oracle has a data load tool.

SQLAnywhere has a import wizard tool.

Why try and reinvent the wheel.

Cheers

David

Comment

There are no comments made yet.

Michael Kramer Accepted Answer Pending Moderation

Thursday, 14 November 2019 14:12 PM UTC
PowerBuilder
# 3

For inspiration

A few years back we migrated from Oracle 8 on a Unix box to "modern" MSSQL. Large tables were 8M - 20M rows. Everything was fixed format. Date formats were complete nonsense to MSSQL. And we knew some code tables could have "unexpected" codes. We had no influence on file formats.

We ended up doing the import for each of the large files via staging tables because the shear data volume meant we had to restrict data roundtrips between client app and database.

Staging table #1: INITIAL LOAD
1. ImportRowID IDENTITY column (PKEY)
2. LineData VARCHAR(xxx) wide enough to hold each imported line.
Staging table #2: IMPORT DATA
1. ImportRowID
2. One varchar column for each imported data field
3. A status column to help later filtering/statistics
Table #3: IMPORT MESSAGES
1. NOTE: Potentially multiple message regarding same row
2. Each row holds {msgID, ImportRowID, Severity, Message}
Table #4: FINAL DATA
1. ImportRowID
2. Each data column in correct data type
3. Rows in error have no final row, rows in warning have final row but user may want to ignore them

Import process

Import full data file into table #1
Import to this table is trivial but takes long time due to data volume.
Split data into separate columns
Basically one big INSERT … SELECT … - and therefore FAST!
Validate data
This is the complex part.
1. Code values need to checked
2. Some codes need translation
3. Dates etc. need format/validity check
4. etc.
Transform to correct data types
1. Another big INSERT … SELECT … - and therefore FAST!

User starts the import - then waits because importing 8M - 15M rows takes time even if it is "fast". We let user filter/sort messages, seep through imported data - deciding whether some rows should be accepted despite warnings. User has to confirm import before app finally copied from "FINAL DATA" into the destination tables.

We did many validation "per column" for performance reasons like below:

INSERT INTO ImportMessage(ImportRowID, Severity, Message) 
SELECT ImportRowID, 2, ('Unknown code ' + CodeXX)
FROM ImportData 
WHERE CodeXX NOT IN (SELECT CodeValue FROM CodeTable WHERE CodeType = 'XX')

This validates millions of rows in fractions of time it takes for a client app to process data one row at a time.

HTH /Michael

Comment

Miguel Leeuwe
Friday, 15 November 2019 04:46 AM UTC

So basically you have a table with just one column, wide enough to hold a full row? That's great! Once imported, you can use SQL to insert the data in the end-table using substrings, etc. Nice one as long as your data can fit in a single column!

Helpful 1

Michael Kramer
Friday, 15 November 2019 11:31 AM UTC

True. We had fix length well above 500 chars. Anyway if fixed width is very large, you could do varchar(max).

The load-to-table function validated length of each line. Any unexpected length we flagged and aborted import. It could be an attempt to attack our database using some buffer-overrun technique.

The most difficult part was getting info on Code Values we found in data that we were told hadn't been in use for 15+ years. Nobody knew the meaning.

Helpful 0

There are no comments made yet.

Roland Smith Accepted Answer Pending Moderation

Thursday, 14 November 2019 11:29 AM UTC
PowerBuilder
# 4

1. Read the file one line at a time.

2. Use the Mid and Trim functions to parse out each value

3. Combine the values into another string variable like this:

ls_import = ls_value1 + "~t" + ls_value2 + ls_value3

4. Import the line:

dw_import.ImportString(ls_import)

Comment

Miguel Leeuwe
Friday, 15 November 2019 04:44 AM UTC

I like this idea, but it will probably be slow when manipulating the strings with pb.

Helpful 0

There are no comments made yet.

Page :
1

There are no replies made for this question yet.
However, you are not allowed to reply to this question.

Please login to post a reply

You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here. Register Here »

Forgot Password?

Resolved Use importfile function with fixed length data

Find Questions by Tag