Looking to speed up slow data File Import

Resolved Looking to speed up slow data File Import

How-to

Votes

Undo

John Bailey
PowerBuilder
Friday, 10 April 2020 14:04 PM UTC

A daily CSV file is import into the Oracle database via data-window File Import
and then saved to the database via DW Update. (PB2017R2) Simple, but not fast.
The file has about 105K rows and in the office takes around 15 minutes, which
the users have accepted and plan their day for.

Working from home, which we are all doing, the import takes 90 minutes.
Obviously due to the now extended network and the users are not happy.

So I am looking for a solution without having to completely rewrite the process
(not that there is much to the process).

I do not have permissions to write stored procs in the data base (oh my, too dangerous),
and I do not have an Oracle BCP tool nor access to Oracle APEX, so I am looking for what
I can do in PB using DWs.

The table being loaded has 4 primary keys (don't get me started), and the table
is truncated by in-line delete before saving the DW; but the commit is not made
until the DW update.

I am pretty sure PB DW update performs overhead for data verification before
sending to the database, looking for such issues as "row already changed", etc.
All rows in the DW are new, but I am not sure if Insert triggers these checks too.

If so, will committing the table truncation before the update speed up the process?
Then of course, I would lose the ability to fall back upon failure. Worth the risk if it works.

Also, I have never had a "duplicate key" error from the incoming data, so for this DW
if I add a computed dummy key and tell the DW that it is the primary key and is not
updateable (so it is not saved to the database) in the update properties, would that
eliminate extra overhead by PB?

Thoughts, Ideas, Suggestions?

Responses (5)

Miguel Leeuwe Accepted Answer Pending Moderation

Monday, 13 April 2020 12:11 PM UTC
PowerBuilder
# 1

Just wondering: could an async db connection be helpful in this case?

In addition to all great advice I'm seeing here already: the way transaction logs and space for tempdb are configured might also be improvable to speed up things. (I'm not a DBA, but I know it matters).

Comment

There are no comments made yet.

Chris Pollach @Appeon Accepted Answer Pending Moderation

Monday, 13 April 2020 00:30 AM UTC
PowerBuilder
# 2

Hi John;

I did a big project about a decade ago that was importing many CSV files from various Telco companies that supply telecommunications services to the Canadian government. Many of these CSV files were the same or larger size to yours. Importing the large datum like your encouraging was a long tedious process and a degrading process as more Telco companies were supplying services.

The target of this system was to import all the Telco data ASAP and then reconcile this information about every phone call the Telco's billing information to what the Canadian records were tracking.

To solve this intensive problem, I designed the systems around the Oracle's External Table Feature. The ETF feature allows you to directly attach a CSV file to an Oracle DBMS as a table definition. The PB applications were changed to:

FTP the CSV files to the Oracle Server.
Execute DDL commands to create & modify the Oracle ESF table(s) to match the FTP'ed CSV files.
Execute DML commands to reconcile the CSV data via standard SQL on the ETF tables.
Produce reconciliation reports & work queues of phone records to be Revit by the Telco companies.
Recieve updated billing adjustments CSV files to process.

The above design reduced the processing of these CSV files from about a week to hours.

FYI: External Tables

HTH ... Food for thought. ;-)

Regards... Chris

Comment

Roland Smith
Monday, 13 April 2020 11:17 AM UTC

And if the data needed to go into a real table, you could do a insert select.

Helpful 0

There are no comments made yet.

Yiannis Papadomichelakis Accepted Answer Pending Moderation

Sunday, 12 April 2020 12:45 PM UTC
PowerBuilder
# 3

Hi John

I don't think the overhead is related to DW verification. You can check this by enabling database tracing. I think your problem is related to the network response. If thousands of inserts has to be executed and, as we know, DW sends them one by one, then the delay will be really huge because for each insert, PB wait's for it's response (Success or error code) in order to send the next one.

Since you are not allowed to use Oracle stored procedures / packages, my advice would be to build anonymous PL/SQL blocks that will insert some hundreds of rows, using just one call.

An anonymous PL/SQL block can be like this:

Begin
  INSERT INTO TBL_OWNER.MY_TABLE(V1, V2, V2)
VALUES('1', '2', '3');

  INSERT INTO TBL_OWNER.MY_TABLE(V1, V2, V2)
VALUES('1', '2', '3');

  INSERT INTO TBL_OWNER.MY_TABLE(V1, V2, V2)
VALUES('1', '2', '3');
End;

The obvious disadvantage is that you have to build this string manually. Also keep in mind that anonymous PL/SQL blocks have a size limit (about 64kb I think), so you have to do something like the following script:

string sPLSQL

for r = 1 to dw_1.rowcount()
if mod(r, 500) = 0 then
if len(sPLSQL) > 0 then
sPLSQL += "End";

execute immediate :sPLSQL;
end if;

sPLSQL = "Begin~r~n";
end if;

sPLSQL += "INSERT INTO TBL_OWNER.MY_TABLE(V1) VALUES('" + dw_1.getItemString(r, "COL1") + "')~r~n";
next

if mod(r -1, 500) > 0 then
sPLSQL += "End";

execute immediate :sPLSQL;
end if;

You will be surprised but speed improvement of this approach....!

Comment

There are no comments made yet.

John Bailey Accepted Answer Pending Moderation

Friday, 10 April 2020 16:49 PM UTC
PowerBuilder
# 4

Thanks for your response. Much appreciated.

To answer your question, we are using a VPN connection to the network. The exe file resides on a server, but of course it is run on the user's client machine. The data file resides on a shared drive. I will need to run a test to see how long it takes just to load the datawindow from the file. I hadn't considered that copying the file to be local might speed things up.

I did consider trying to get an app running on a server to do the file load, so the path would be server-to-server of both the file and DW connection. However, I didn't want to attempt that now, working remotely. There will be hurdles with security groups, server DBAs, share drive owners, and other mean people. After all, the current solution works. I can try when we are back together. However, I do think users would delighted to have one less thing to do and have the file loaded automatically at early am.

I hadn't considered the DB overhead of the indexes, etc. So I could create a new duplicate table with no keys. DW Import into that table (after truncation, no need for rollback) and then once loaded, just do a table to table copy.

Thanks for the help.

Comment

mike S
Friday, 10 April 2020 17:24 PM UTC

--> The exe file resides on a server, but of course it is run on the user's client machine

you should not run it this way even on a LAN. way too slow. always install and run on the client machine

You should also run a RDP session at the office and run it on the office server or workstation

Helpful 0

Yiannis Papadomichelakis
Monday, 13 April 2020 18:51 PM UTC

The index overhead is the same when run on local network or through the internet (VPN), so I wouldn't consider changing that.

The fact that the file to be imported is not on the client's machine, will increase the time to populate the datawindow. You need to figure out (it's very easy) if the delays are when loading the file or when updating the datawindow.

My bet is on the update...! ;)

Helpful 0

There are no comments made yet.

John Fauss Accepted Answer Pending Moderation

Friday, 10 April 2020 14:58 PM UTC
PowerBuilder
# 5

Greetings, John -

is the app running on the user's home PC/laptop? Citrix? Remote Desktop? Where does the file being imported reside during the import? Are users accessing the company network via a VPN? VPN's are typically slow, so if the data being imported is having to make its way to the DBMS over the VPN, or down to the laptop and then back to the DBMS, that's a potential bottleneck.

The data import is part of an application, I assume. Does it have to be? Since stored procedures are evil/taboo, could the import process be encapsulated in a small, stand-alone PB application that just does the import and gets scheduled and run on an in-house server on an automatic schedule?

I'd also look at using Truncate Table to clear the table instead of inline deletes, if that is workable for your scenario. MUCH faster. If you are absolutely sure there will be no primary key violations in the imported data, I'd maybe also look into temporarily disabling PK & FK indexes, constraints, triggers, etc., import/insert the data, then restore/enable everything. Rebuild the table indexes once at the end instead of 1000's of times.

Reduce unnecessary overhead in the DBMS, eliminate the VPN as much as possible.

Food for thought.

Regards, John

Comment

There are no comments made yet.

Page :
1

There are no replies made for this question yet.
However, you are not allowed to reply to this question.

Please login to post a reply

You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here. Register Here »

Forgot Password?

Resolved Looking to speed up slow data File Import

Find Questions by Tag