Using FOP to Handle Formatting of Large Text Blocks in DataWindow Output

It's been the bane of PowerBuilder development since day one. You have a DataWindow that contains one or more text blocks that, when previewed for printing or printed, span a page boundary. The result: the DataWindow incorrectly handles portions of the text. You may find some text on the first page that is repeated on the next page, or some text may be missing entirely, or the text may end up overwriting subsequent report objects. The bottom line is that the results are unusable, and it often takes a great deal of tweaking to get adequate results. The good news is that I found at least one method of addressing the issue. The bad news is that the path to get there was rather convoluted. This article describes both.

Beginning with PowerBuilder 9, the DataWindow has had the capability of exporting into XSLFO (XSL formatting objects) format. XSLFO is a variation of XML that contains formatting information in addition to the data. It is intended for use by a FOP (formatting objects processor) to render into any number of final output formats (e.g., PDF, RTF, HTML, etc.). In some sense it's similar to HTML, in that it contains formatting information and also because the syntax is somewhat similar. If you want to learn more about the actual structure and syntax of XSLFO documents, I recommend the tutorial on the w3schools site: http://w3schools.sinsixx.com/xslfo/default.asp.htm. Referring to that may be helpful when I start discussing some of the current limitations of PowerBuilder's XSLFO implementation. There is another good tutorial at the RenderX site: www.renderx.com/tutorial.html. If you want to work directly with XSLFO documents and render them to PDF manually to see the result, I'd recommend the editX editor: www.editix.com/

The primary reason that PowerBuilder supports XSLFO is so that rendering to a PDF can be done through the XSLFOP! method rather than the Distiller! method. Sybase supplies the 0.20.4 version of Apache FOP with the product; you'll find it in the %SYBASEHOME%\Shared\PowerBuilder\fop-0.20.4 directory. Since PowerBuilder is already capable of generating a PDF through XSLFOP!, and XSLFO is capable of handling large text blocks easily, the original idea was to take the XSLFO output from the DataWindow and tweak it to ensure that the large text blocks were handled correctly before handing it over to the FOP for conversion to a PDF.

That original approach was based on the faulty assumption that, similar to HTML, the bounding area of the text block might have inappropriately calculated dimensions, but that the various areas within the document were independent of one another except they had a particular sequence. What actually turns out to be the case is that when PowerBuilder generates XSLFO, it generates absolute coordinates for everything in the document. That's actually good for making sure that the generated PDF is an exact representation of the original DataWindow. Unfortunately, that's exactly what we don't want.

As it turns out, there are a number of other areas in which PowerBuilder's XSLFO implementation is a bit anemic. They include:
• If the DataWindow contains a Page n of nnn computed column, what gets exported is:

Page of X

where X is a fixed value. We don't want that in our documents, because once we get the large text flowing correctly, the total number of pages is likely to change from what PowerBuilder originally calculated.

It is possible to create a similar compute in XSLFO. We just need to add a block element to the end of the document with an ID, and then in the page number reference just add a fo:page-number-citation that refers to that flow ID.

• If there is a header and a footer on a DataWindow, the gets defined with a margin-bottom the same size as the . That's the way it should be. However, there's no matching margin-top to match the height of the . There's also no extent defined for the

• If there is a header and footer on the DataWindow, the simple-page-master gets created with the following sections in the following order:

The is supposed to be declared before the in a simple-page-master.

• The DataWindow I was working with for testing included radio buttons for one of the columns. When those were rendered in the XSLFO, it was done with a followed by a . An instream-foreign-object isn't a valid child element of a block-container. There is supposed to be an fo:block between the block-container and the instream-foreign-object.

Perhaps one of the reasons that the PowerBuilder implementation looks the way it does is because they are targeting an older version of Apache FOP that only implements a rather limited subset of the 1.0 WC3 Recommendation for the format. The more recent version of Apache FOP (0.93) supports more of the 1.0 WC3 Recommendation as well as portions of the 1.1 Working Draft.

One critical limitation I found in the 0.20.4 version of Apache FOP is that it doesn't support non-absolute positioning, which is essential for our custom implementation to work. PowerBuilder doesn't calculate the boundaries for the large text blocks correctly, but that doesn't mean we want to be responsible for calculating it on our own. The reason we want to use XSLFO is to remove the need to do the positioning calculation altogether and just let the FOP handle it on its own. We're going to use the 0.93 version of Apache FOP for this implementation. More on that later.

Since the XSLFO that the DataWindow natively generates is lacking for our purposes, my next approach was to create an XML export template that simply generated XSLFO rather than just XML. If you've checked out the w3schools site or are already familiar with XSLFO, you'll know that the tags all begin with "fo:". One thing I discovered is that if you include such a tag prefix in a PowerBuilder XML export template, PowerBuilder won't export anything. It won't throw an error or return an error message, but it also won't export the data. You'll need to leave the "fo:" prefix off of the tags and then add them on once the export is complete.

However, for a number of reasons, primarily because I wanted something I could use more generically rather than having to custom design an XML export template for every DataWindow I needed to handle, I ended up creating a custom class to do the XSLFO export. If you had to generate XML or HTML from PowerBuilder before it was a native feature, you're probably familiar with the process. The main difference is the tags you use to surround the data. Some comparison to HTML might be in order (see Table 1). Obviously some of the formatting options will change; these are just what I used.

IBM has a much more detailed document describing the correlation between XSLFO and HML at www.ibm.com/developerworks/library/x-xslfo2app/.

At this point, we have a method of generating XSLFO in the format we want and we can use Apache FOP 0.93 to render it to PDF. You can obtain more information on Apache FOP at their Website: http://xmlgraphics.apache.org/fop/. In particular, the download mirrors are listed at www.apache.org/dyn/closer.cgi/xmlgraphics/fop from which you'll want to grab fop-0.93-bin-jdk1.4.zip.

The issue I didn't want to have to deal with though is making sure an appropriate JVM was configured on all of the workstations I wanted to deploy this to. Some folks were able to take the 0.20.4 release of Apache FOP and convert it to a .NET assembly using J#. They've made it available at http://sourceforge.net/projects/nfop/. My first approach was to try to do the same thing with the 0.93 version of Apache FOP. After struggling with that for a while, I finally gave up and used IKVM instead.

IKVM is a Java Virtual Machine for .NET available at www.ikvm.net/. When using it, compiling Apache FOP 0.93 to a .NET assembly was as easy as running the following at a command line on the Apache FOP JAR file and it's supporting JAR files:

ikvmc -nowarn:0109 -out:fop-0.93.dll -reference:IKVM.GNU.Classpath.dll lib\*.jar build\fop.jar

You can download IKVM from the following location: http://sourceforge.net/project/showfiles.php?group_id=69637. I now have a .NET version of Apache FOP I can use, but what I also want is a wrapper that makes it simple to use. I also really want a class with a single CreatePDF method that takes two arguments: the filename of my XSLFO file and the name of the PDF file I want to generate. I wrote a wrapper assembly using C#, the source for which is provided as Listings 1 and 2.

There are a couple of things worth noting in Listing 1. There are several calls to java.lang.System.setProperty that are used to tell the JVM to use different libraries to work with the XML that it defaults to. That's followed by an explicit call to create a particular SAX Parser Factory, get a handle to its ClassLoader, and then set the ContextClassLoader of the current thread to that ClassLoader. The reason I'm doing that is because each .NET assembly has its own class loader. My instructions to use particular libraries are only valid for the assembly I'm working in, whereas I need the assembly I'm wrapping to follow those instructions. Setting the current thread's class loader to the same one as the assembly I'm wrapping resolves the class loader issue without requiring editing of the wrapped assembly's code. For more information on the issue, you might see this thread in the IKVM developer's mailing list: http://ikvm-developers.narkive.com/OCwc9cf0/problem-with-calling-fop-c-from-another-assembly. Now we have a COM visible assembly that can be called from PowerBuilder as through its COM Callable Wrapper (CCW). You would need to run REGASM on the assembly to generate the registry entries that PowerBuilder will need to treat it as an OLE Automation Object. For your convenience, I've included those registry entries as Listing 3. Alternatively, you could ensure that they are part of the manifest for the assembly and then use a manifest file for your application to reference them so that registry entries on the client machine are not necessary. I've explained how to do that for CCWs in an earlier PBDJ article. All we need to do is call it now. Listing 4 shows how to call the assembly through OLE Automation. It then calls an OpenPDF function to display the resulting PDF to the user. That function is just an alias for the ShellExecute function in the Windows API, the source for which is provided in Listing 5.

We're in the home stretch now. The only remaining problem is this: What happens if an error occurs within one of the .NET assemblies? Perhaps our XSLFO isn't valid or we supplied the wrong filename. What happens within the assembly itself is that it throws an exception. What then happens is that the COM routines within Windows captures that and returns an error to PowerBuilder. When COM captures that exception, it only relays the specific information that was provided as part of the exception to an application if it is early binding. Then it clears out the information, which means that subsequent calls to the GetLastError function in the Windows API can't see it. The problem is that PowerBuilder implements OLE Automation through late binding. We know an error occurred, but by the time we find out it's too late to determine what the error was.

This is an issue for any CCW that is used from PowerBuilder, and fortunately there is a way to address it. PowerBuilder isn't the only development tool that has this problem. In fact, Microsoft's own ASP (not the more recent ASP.NET) implements CCWs through late-binding OLE Automation. The issue, and its solution, was covered in an MSDN article in January of 2005: "Get Seamless .NET Exception Logging from COM Clients Without Modifying Your Code": http://msdn.microsoft.com/msdnmag/issues/05/01/ExceptionLogging/. You can get pretty bogged down if you try to follow all the details as they describe their solution, perhaps even more bogged down than you're probably already feeling at this point in the article.

The good news is that it's trivial to take advantage of their solution. Simply supply the five files that are part of their solution with your application. One of them (NetInteropServicesEngine.DLL) is a COM object that you need to register by running regsvr32 on it. Just before you make a call to a CCW, make a simple OLE automation call to start up the exception handling routines and when you are done you shut them back down again through an OLE automation call. In my sample I'm doing it through the open (see Listing 6) and close (see Listing 7) events of the parent window. When the routines are running, CCW exceptions are captured before COM can handle them and logged to the application even before they are released for COM to process.

That's it. The code is actually fairly simple, although the road that got us there was rather long. I'll also be making the sample code available on CodeXchange on the Sybase Website if you simply want do download and run it.

--This article was originally published on PBDJ.

Tech Articles

Using FOP to Handle Formatting of Large Text Blocks in DataWindow Output

Comments (0)

Find Articles by Tag