POWERBUILDER AND GRAPH DATABASES
Because I could find nothing, anywhere, regarding PowerBuilder (PB) and graph databases (GDB), I thought I’d provide some information following some dabbling with this emerging (well, emerged now) technology.
I developed a keen interest in GDBs after a contracting friend who does some work for me had attended a NEO4J course – he contacted me and said the application I develop would be ideally suited for a GDB.
I looked at NEO4J and quickly realised the code would not fit with the PB my application. Then another friend advised that MS SQLServer had introduced basic graph DB features in its 2017 version, with enhancements expected to follow. I gave it a try.
(I should point out that I’m not an experienced programmer - self-taught, I work alone so I can’t learn from peers, too busy on my work to go off and learn new skills, long in the tooth, and need to get a twelve-year-old to change my digital watch at daylight savings. So the caveat is that experienced programmers may know ways far better than I’ll lay out here.)
First, after attaching PB to SQL Server 2017, I was easily able to create AS NODE and EDGE tables using the PB database painter. The only issue I could find was in the DB Painter: if the table was populated, and I elected to edit the data, I got an SQLSTATE=4200 error message – ‘Invalid pseudocolumn on the graph DB $node column’. I had to go to ISQL to look at the table’s data.
So far, so good. However, I did not want to re-write my whole application because I had to introduce new table names and, moreover, if I was to go down this GDB path, I would want a smooth transition for my clients who would want to retain their data. There was no feature in SQL Server to convert an existing, populated table to ‘AS NODE’ table ( none that I could find - see list of shortfalls in SQL SS 2017 in the introduction of https://www.youtube.com/watch?v=_KEmK_4ZBYQ ) Surely this feature will be in a following version. Thus my immediate challenge was to convert existing, populated tables to AS NODE.
But first, by way of background, a bit about the application I develop to explain the environment. It processes, for law enforcement agencies, telephone call data, financial statements, social media data, and the plethora of data these agencies extract from seized mobile phones. Hence there are many entity (node) tables in the application – people, organisations, addresses, telephone numbers, events (including call records), multimedia (video, audio, photographs extracted from mobile phones), documents, financial transactions, cell towers, etc etc.
They all involve many-to-many relationships and are linked via the one table that also has a column that describes the relationship – in essence, ‘edges’.
The relationships in this table can be exponential. For example, a single telephone number could be related to: its subscriber, its service provider, an address, call records, cell tower locations, Wi-Fi ping locations, other contacted telephone numbers, call records and time-date events, the document it came in, the particular project or investigation, its IMIE number …. (You can understand why my friend could see the application was suited to a GDB).
Because this application was constantly evolving with ongoing technology changes in mobile phones, each new version I put out invariable incorporated databases changes coded into a function in the start-up of the application – see PBDJ article at https://pbdj.sys-con.com/node/2446520
So, I used this convenience to place functions to automate the conversion of the node-potential entity tables to AS NODE, with the view to keeping the same table names so as to sync with the many datawindow and datastore objects, and embedded SQL, within the application, as well as retaining the data in the tables.
I also had to be mindful that any of my clients could have added an index to a table, adjusted the width of a column, or even added a column(s) to a table. Stated differently, I simply could not use my development tables as a model for any SQL ‘CREATE TABLE AS NODE’ code.
In essence, the process I utilised involved nine steps:
- Creating a coded list of the tables I wished to convert to AS NODE – and put the list into a table to loop through, as well as record the success or otherwise of the conversion.
- Looped through the table list, using SYSTEM data that described each table, to put together CREATE sql for the respective tables, for the primary keys, and indices, as well INSERT SQL to copy the data between tables.
- Used this SQL CREATE code to create a temporary table in the image of the existing entity table
- Copied the original table data to the temporary table using the generated INSERT SQL
- Dropped the original table
- Modified the generated CREATE and INSERT code by swapping around the original and temporary table names to re-create the original tables ‘AS NODE’
- Copied the data back from the temporary table to the new AS NODE table
- Added the primary key and indices back to the re-created table (I found it easier for debugging to separate the primary key and index creation from a singular CREATE TABLE statement).
- Dropped the temporary table.
Once the transition was accomplished, I ran the application and it worked fine relying on data retrieved from the new same-name AS NODE tables.
One of the tables I converted was for documents. The content of each document was stored in a blob column. I did not know what to expect here because, in my research, I recalled warnings about dealing with blobs in GDBs. But not a problem - all the blob data was retained in the new AS NODE document table.
I have many ‘look-up’ tables in the application. I saw no need to convert these to AS NODE. Although I did so for city, state, and country tables: a person (node), say, could be related via an edge in several different ways to a city, state, or country.
Creating the edges from the relational tables was, and remains, a bigger challenge as I have several hundred different relationships. In very basic terms, simply to get some testing data, I used the relationship code-name from the relationship table as the EDGE name, and looped though the table populating the edge ‘$From_Id’ and ‘$To_ID’ with the corresponding ‘$Node_ID’ of the entity/node tables.
(I gave the EDGE table-names an added ‘z’ prefix so they were grouped at the bottom of the table list in the DB painter, and not scattered amongst the NODE and other table names. SQL Server separates the graph tables from standard tables in the database tree-view – not a current option in the PB DB Painter)
For further testing, I created a datawindow object selecting data from the new tables and related the data by incorporating in the SQL, in lieu of a 'WHERE' clause, the GDB 'MATCH' clause based on the EDGE tables. The datawindow worked fine.
This has given me heart to work in my spare time to incrementally convert the application to GBD. During this time, SQL Server, I trust, will add further functionality to its GDB features.
Using a GDB will also enable me to scale the application to a fully function case management and analytical research tool embracing many entity (node) types. A typical question within law enforcement could be ad hoc queries like: "Give me any person who was in the vicinity of The Dog and Bone Hotel at Hicksville on 23 March 2018 and is associated with a red Ford.” A real challenge for SQL, not so hard I’m led to believe in GDB MATCH SQL.
These are two references in particular I found useful:
https://www.youtube.com/watch?v=_KEmK_4ZBYQ
If it helps, listed below is the SQL code ( purloined from internet searches) behind the three datastores I used to extract the system data to build the CREATE, INSERT, Primary key, and indices.
Some of the code relating to building the CREATE and INSERT SQL is also listed.
Bill Beale
select column_name,
is_nullable,
data_type,
character_maximum_length,
character_octet_length,
numeric_precision,
numeric_precision_radix,
numeric_scale
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = :as_table
order by ordinal_position;
_________________________________________________
SELECT KU.table_name as TABLENAME,column_name as PRIMARYKEYCOLUMN, TC.CONSTRAINT_NAME, tc.constraint_type
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS AS TC
INNER JOIN
INFORMATION_SCHEMA.KEY_COLUMN_USAGE AS KU
ON TC.CONSTRAINT_TYPE = 'PRIMARY KEY' AND
TC.CONSTRAINT_NAME = KU.CONSTRAINT_NAME AND
KU.table_name=:as_table_name
ORDER BY KU.TABLE_NAME, KU.ORDINAL_POSITION;
________________________________________________________
SELECT
IndexName = i.Name,
IndexType = i.type_desc,
ColumnOrdinal = Ic.key_ordinal,
ColumnName = c.name,
ColumnType = ty.name
FROM
sys.indexes i
INNER JOIN
sys.tables t ON t.object_id = i.object_id
INNER JOIN
sys.index_columns ic ON ic.object_id = i.object_id AND ic.index_id = i.index_id
INNER JOIN
sys.columns c ON c.object_id = ic.object_id AND c.column_id = ic.column_id
INNER JOIN
sys.types ty ON c.system_type_id = ty.system_type_id
WHERE
t.name = :as_table_name
ORDER BY
t.Name, i.name, ic.key_ordinal;
Using this data, I looped through the table list and generated the CREATE SQL for the temporary table. I gave the temporary table name a ‘g_’ prefix, ie ‘g_telephone_line’.
This is a sample, code-generated CREATE for the temporary table:
IF NOT EXISTS (SELECT * FROM sysobjects WHERE name= 'g_telephone_line' AND xtype='U') CREATE TABLE g_telephone_line( ip_no numeric(16,0) NOT NULL, complete_number varchar(60) NOT NULL, country_code varchar(5) NULL, area_code varchar(5) NULL, local_number varchar(30) NOT NULL, telephone_type int NULL, connected_date datetime NULL, connected_end_date datetime NULL, exchange numeric(10,0) NULL, use_type varchar(30) NULL, service_status varchar(15) NULL, list_status varchar(15) NULL, sp_ip_no int NULL, carrier_ip_no int NULL, numeric_number bigint NULL, composite char(1) NULL, purpose varchar(100) NULL, processing_value varchar(40) NULL, engineering_type int NULL, type_of_service varchar(10) NULL, service_status_code varchar(10) NULL, service_status_date datetime NULL, sim_password varchar(25) NULL ) AS NODE;
This the code-generated INSERT used to copy the data to and fro from the tables (code is used to swap the two table names for the transfer of the data from the temporary table back to the final table:
INSERT INTO g_telephone_line( ip_no, complete_number, country_code, area_code, local_number, telephone_type, connected_date, connected_end_date, exchange, use_type, service_status, list_status, sp_ip_no, carrier_ip_no, numeric_number, composite, purpose, processing_value, engineering_type, type_of_service, service_status_code, service_status_date, sim_password ) SELECT ip_no, complete_number, country_code, area_code, local_number, telephone_type, connected_date, connected_end_date, exchange, use_type, service_status, list_status, sp_ip_no, carrier_ip_no, numeric_number, composite, purpose, processing_value, engineering_type, type_of_service, service_status_code, service_status_date, sim_password FROM telephone_line WHERE ip_no IS NOT NULL;
The code-generated primary key and index(s) code- effected only on the final table:
ALTER TABLE telephone_line ADD CONSTRAINT PK__telephone_line__286302EC PRIMARY KEY CLUSTERED (ip_no)
CREATE NONCLUSTERED INDEX ix_completenumber ON telephone_line( complete_number );
This perhaps the nub ( not the complete code) of the sql generation as it loops through the datastore rows for the columns of a table:
ls_column_name = TRIM( lds_system_table_column_description.GetItemString( ll_c, "column_name"))
ls_is_nullable = Upper(TRIM( lds_system_table_column_description.GetItemString( ll_c, "is_nullable")))
ls_data_type = TRIM( lds_system_table_column_description.GetItemString( ll_c, "data_type"))
li_max_column_length = lds_system_table_column_description.GetItemNumber( ll_c, "character_maximum_length")
li_character_octet_length = lds_system_table_column_description.GetItemNumber( ll_c, "character_octet_length")
li_numeric_precision = lds_system_table_column_description.GetItemNumber( ll_c, "numeric_precision")
li_numeric_precison_radix = lds_system_table_column_description.GetItemNumber( ll_c, "numeric_precision_radix")
li_numeric_scale= lds_system_table_column_description.GetItemNumber( ll_c, "numeric_scale")
ls_transfer_string = ls_transfer_string + " " + ls_column_name + ", "
If ll_c = 1 Then
ls_first_column = ls_column_name
End If
ls_create_string = ls_create_string + ls_column_name + " " + ls_data_type
If ls_data_type = 'int' OR ls_data_type = "text" OR ls_data_type = "real" OR ls_data_type = "float" OR ls_data_type = 'uniqueidentifier' OR ls_data_type = "ntext" &
OR ls_data_type = "bit" OR ls_data_type = 'tinyint' OR ls_data_type = "bigint" OR ls_data_type = "image" &
OR ls_data_type = 'timestamp' OR ls_data_type = "money" OR ls_data_type = "smallmoney"
OR ls_data_type = 'xml' OR ls_data_type = "date" &
OR ls_data_type = "hierarchyid" OR ls_data_type = "geometry" OR ls_data_type = "geography" THEN
// do nothing at this point - variables that don't have data lengths
Else
If NOT(IsNull(li_max_column_length)) Then
If li_max_column_length = -1 Then
ls_create_string = ls_create_string + "(max)"
Else
ls_create_string = ls_create_string + "(" + STRING(li_max_column_length) + ")"
End If
ElseIf NOT(IsNull(li_numeric_precision)) Then
ls_create_string = ls_create_string + "(" +STRING(li_numeric_precision)
If NOT(IsNull(li_numeric_scale)) Then
ls_create_string = ls_create_string + "," + STRING(li_numeric_scale) + ")"
Else
ls_create_string = ls_create_string + ")"
End If
End If
End If
If ls_is_nullable = 'NO' Then
ls_create_string = ls_create_string + ' NOT NULL, '
ElseIf ls_is_nullable = 'YES' Then
ls_create_string = ls_create_string + ' NULL, '
End If
Comments (1)