PollyReports vs. Geraldo Reports — A Correction

Some time back I made a post about the development of PollyReports, and I gave code line counts based on Robin Parmar’s lines-of-code counter which ascribed a truly huge number of lines to Geraldo.  While I knew it was more complex than PollyReports, I began to feel that there had to be some mistake… it just couldn’t be THAT big.

So I took Robin’s program apart and rewrote it, keeping his (or is it her?) line counting mechanism intact but altering the traversal scheme so that only *.py files would be counted, and so that they would be listed in a fashion similar to the Unix/Linux du command.  Using the current 1.5.1 version of PollyReports, the module itself weighs in at 262 actual code lines, 388 total lines (including comments and doc strings).  Using the version of Geraldo that I have downloaded, the total count for source files (excepting the effectively empty tests folder) is 1,785 actual lines of code, 4885 total lines including comments and doc strings.  I’m pretty sure that the code I abstracted from Robin’s script is not good in all cases; the docstring detector will not detect all docstrings, and may be confused by some literal string assignments (basically if you put three double quotes on a line by themselves, you’ll confuse it).  However, these counts do seem more reasonable.

Geraldo is almost 7 times the size of PollyReports, still pretty big, but not over 340 times as I originally reported it.  I think Robin’s code may have been tallying the documentation files as well as the actual Python code.

Feeling much better now… throwing out Geraldo in favor of Polly

Gee, it sounds like I’ve changed my sexual orientation or something.  But it’s not like that at all.  As I noted in my previous post, I’ve found significant flow issues with Geraldo Reports which I have found rather more intractable than I then thought.  So I got to thinking, in my best Jeremy Clarkson mode, “how hard can it be?”

This morning I hacked out the first version of PollyReports.py.  You can see it here:

https://github.com/Solomoriah/PollyReports

The current version handles detail bands and page headers and footers.  I intend to add grandtotal and subtotal bands shortly.

With this module I’m taking a different approach than that applied by the developer(s) of Geraldo Reports.  First of all, PollyReports will never be as ambitious.  If I can manage to do so at all, PollyReports will always be contained within a single source file.  I am a bear of little brain and prefer my code small and simple.  In fact, I’m trying to follow the adage to create the “simplest thing that can possibly work.”

By contrast, Geraldo Reports has numerous source files, with the generator modules separated from the formatting modules.  I’ve felt from the start that this was not necessary.  PollyReports is designed around Reportlab, but does not import it at all; rather, it assumes that the Canvas object you pass into it will follow the Reportlab Canvas interface.  Pure duck typing.  Creating a wrapper that implements that interface adequately for PollyReports’ purposes shouldn’t be all that difficult; though I have no current plans to do so, I can easily imagine wrapping my MSWinPrint.py module in that way.

Right now, in fact, PollyReports.py imports nothing (except in the test rig, where Reportlab’s Canvas is imported).  Though it’s not really a good idea, doing:

from PollyReports import *

would likely work just fine for most people… there’s just not that much in PollyReports’ namespace, and I don’t plan to put much there.  I’m trying to implement all my utility functions as methods to avoid any excess names being imported.

So anyway, sayonara, Geraldo Reports.  It was fun while it lasted.  Well, not so much there at the end… like many relationships, this one is ending on a sour note.

Geraldo gives me a headache

My last two commits had to do with generator flow issues.  It all comes down to generators/base.py, and it’s giving me a headache… it still doesn’t work right.

The reports I’m generating have multiple levels of group header bands.  Each active group header band should reprint on each new page; this didn’t work correctly sometimes, mainly if a group header pushed a detail band off the page.  So I fixed that, no problem.

But then, I found that child bands would flow off the page if the parent band landed at the bottom (as utils.calculate_size() did not take child bands into account).  I added that calculation to the height check I had already committed, and it worked.  No problem.

Except… there’s still a problem.  If you use auto_expand_height, the generator obligingly adjusts the space consumed by the band.  But this addition space is consumed in the render_band function, after my extra size check.  I can’t just move the size check down, as the band will have then been rendered; I need to advance the page (and reprint those darn group headers) before that happens.

Gah.  So I’m stuck, for now.  One of the reports I’m generating has a subreport that expands the detail band, and this sometimes screws up the page position (and those headers).  I can’t see any way to avoid having the subreport, and the whole reason I used a child band was to ensure that a specific block of information floats at the bottom below that subreport.

I’m coming to the conclusion that Geraldo is structurally flawed.  Here’s how I see the flow working correctly:

  • Each row of the data source generates an in-memory band structure of some sort.  A simple list would actually work, where every item in the list would have a relative vertical position and height associated with it.  The whole band, including subreports and child bands, would be generated into this structure, and the height could be calculated by a simple iterative bounding box algorithm.  If you were really clever, the band structure could adjust the height automatically each time an element was generated into it.
  • But, the band structure I’ve just described hasn’t been rendered yet, just queued.  The generator loop would check to see if there is enough room left on the page for this band; if there is not, it would trigger a new page, re-render the current group header bands, and only then render the detail band.

The problem I see with Geraldo is, obviously, this isn’t being done.

But Geraldo represents a lot of otherwise-good work.  I don’t want to just throw it away.  Can it be fixed?  Possibly…

What needs doing is that whole render-the-detail-first thing.  If, instead of rendering directly into the Reportlab canvas, the detail band could be rendered into a sort of side-canvas or pseudo-canvas first, the correct size could then be figured accurately.  One of the main issues with this is the fact that the generator is looping; my fixes described above involve breaking out of the loop before the rendering takes place.  I suspect I’ll have to render into my side-canvas, breaking if necessary, and thus causing the band to render twice (as the next pass through the loop would find the same record ready to process, just as it does now).

I don’t like that solution on the surface; it will be important to ensure that functions called from detail band rendering do not produce side effects, since they would effectively double up.  (This “shouldn’t” happen, of course.)  I’ll also have to verify that this procedure won’t throw off any running totals being maintained by Geraldo… and this is something that certainly will happen, so I can’t handwave it.

Egad.  Time to call it a night, and hope I wake up smart enough to do this.

More Changes and Changes of Plan

I’ve spent some time getting to know the innards of Geraldo, though I’ll admit I don’t understand the flow completely yet.  The Rule class I recently added (to allow me to draw a horizontal line with the right end at BAND_WIDTH) works well enough; I’d still like to make Line work with the “magic” BAND_WIDTH parameter, though.  I just can’t think of a clean way to do it.

I also looked over the forks.  Other than those changes I’ve applied manually (the ones previously mentioned, posted in the original repository’s issues), only joaoalf’s tree had any changes I thought I wanted.  So I merged them, and I’ll be testing the results for the next couple of weeks.

I haven’t done any cutting on the generators, etc. as I said I might.  I’m on the fence about cutting ties to the original work.

Geraldo Reports – My Plans

I’ve begun using Geraldo Reports in several current commercial jobs, and I’m finding that it’s not, well, exactly right for me.  But it’s so close.  I’ve forked it on Github:

https://github.com/Solomoriah/geraldo

You can, of course, track back to the original author’s project from there.  I’m just getting started on Github, but I’m liking it a lot.

What’s wrong with Geraldo?

First of all, it’s not being maintained.  It’s been 8 months since anything’s been done to it.  There are outstanding pull requests and issues, but there’s no sign that any of the needed work will ever be done.

Second, it’s a victim of premature abstraction.  It’s an excellent report generator, tied fairly tightly to Reportlab, but it also has all these added things… a plain text report generator and an XML exporter, for instance.  I can’t see any reason I’d ever want to use it to do an export… it’s like killing flies with a sledgehammer.

So I’ve started my own fork, as I’ve said.  I’ve applied the changes mentioned in the last three issues posted to the original project, and added a feature I needed for my production project.  It’s now possible to mark a group footer or the report summary to print on a new page by defining force_new_page_before (in the same way as the usual force_new_page variable).

As time goes on, I plan to strip out the features I don’t think “belong” in the package.  I don’t know when or if I’ll get to that part, though… despite feeling that they don’t belong, their presence doesn’t disturb me all that much.