PollyReports Tutorial

I’ve noticed that acceptance of a new software module or package for developers in the Open Source/Free Software world is greatly affected by the availability of a good tutorial. I mean, it seems obvious, doesn’t it? But I’ve also noticed that the original author of a project rarely writes a good tutorial.

EDIT 6/20/2012: I’ve moved the tutorial to PyPI; find it here:

http://packages.python.org/PollyReports/tutorial.html

What do they say about battle plans?

So, after posting that PollyReports was ready for use, I actually used it last night with a small report for one of my clients.  Turns out, it still needed work.

But now, it works.  There were a couple of things I had just plain forgotten, like… what if there are newlines in an Element’s text?  Answer: break up the text into lines and print them one after the other in vertical alignment, using the given font size and leading to space them out.  What about page numbers?  Well, oops.  I’ve added a sysvar parameter to Element initialization that can be used to access any of the parent Report’s variables.  All I really want is “Report.pagenumber” but I can see that there may well be other uses for this.

Though this particular report didn’t use it, I have other client’s reports that used Geraldo’s event system (mainly so the user wouldn’t decide a slow-generating report was borked).  Rather than add all those event hooks to PollyReports, I added just one: an onrender parameter added to Element, which is automatically passed to the Renderer when it’s instantiated.  When Renderer.render is called (i.e. when the data is actually output), onrender is called with a single parameter, a reference to the Renderer.  Assuming you called that parameter “obj”, the Element which spawned the Renderer is accessible as obj.parent, and the Report as obj.parent.report.

Making progress…

Wow, PollyReports.py is already usable!

I started on PollyReports yesterday morning, and as of right now, it’s usable.  It’s true, PollyReports lacks some functionality from Geraldo Reports, but as I said in my post yesterday, that was the plan.  Keep it simple, Stanley, or something to that effect.

Using Robin Parmar’s lines-of-code counter found here, I’ve counted the code lines in both projects.  Geraldo Reports consists of 90,138 lines of code (in my current fork, which is pretty close to the standard in terms of length), while PollyReports has just 1,345 lines.  These are the “minimal” numbers, with comments and blank lines ignored, and they include all the Python files in each of the respective directories.  This includes the sample data file for PollyReports… which is 1002 code-lines long.

The actual PollyReports.py is 382 lines long, including comments and blank lines!

I’m pretty proud of Polly.  She’s managed to mature nicely while keeping her girlish figure.  I’m sure, as time goes by, she’ll gain a little more weight, but hopefully she’ll never get close to the mass of Geraldo.

Okay, enough silliness.  What is still missing?  Two things come to mind:

1.  A means of adding fonts other than the standard PDF fonts.  Geraldo Reports handled this internally… but Polly doesn’t “know” you are using Reportlab, nor import any parts of it directly.  Therefore, if you want nonstandard fonts, register them with Reportlab before you pass your canvas to PollyReports.  You’ll be able to call on those fonts using whatever names you have registered, just as normal when using Reportlab directly.

What does this buy me?  The ability to use a wrapper and run PollyReports with something other than Reportlab.  The less of Reportlab’s API the wrapper has to replicate, the easier it will be.  Here’s the whole list of Canvas methods and attributes PollyReports uses:

canvas.drawRightString()
canvas.drawString()
canvas.line()
canvas._pagesize
canvas.restoreState()
canvas.saveState()
canvas.setFont()
canvas.setLineWidth()
canvas.setStrokeGray()
canvas.showPage()
canvas.translate()

There’s just no need to add anything to that list, other than perhaps the rect() method at some point (for a Box class, no doubt).

2.  Subreports.  I can’t think of a clean way to handle subreports, since there must be some way to retrieve the external recordset.  Perhaps an Element subclass where you register a callback to get the data?  Hmm.  Might do it just that way.

Anyway, I’m very pleased with this project.  I expect to be using PollyReports for several of my custom software clients very soon.

Are you interested in PollyReports?  Let me know!

Feeling much better now… throwing out Geraldo in favor of Polly

Gee, it sounds like I’ve changed my sexual orientation or something.  But it’s not like that at all.  As I noted in my previous post, I’ve found significant flow issues with Geraldo Reports which I have found rather more intractable than I then thought.  So I got to thinking, in my best Jeremy Clarkson mode, “how hard can it be?”

This morning I hacked out the first version of PollyReports.py.  You can see it here:

https://github.com/Solomoriah/PollyReports

The current version handles detail bands and page headers and footers.  I intend to add grandtotal and subtotal bands shortly.

With this module I’m taking a different approach than that applied by the developer(s) of Geraldo Reports.  First of all, PollyReports will never be as ambitious.  If I can manage to do so at all, PollyReports will always be contained within a single source file.  I am a bear of little brain and prefer my code small and simple.  In fact, I’m trying to follow the adage to create the “simplest thing that can possibly work.”

By contrast, Geraldo Reports has numerous source files, with the generator modules separated from the formatting modules.  I’ve felt from the start that this was not necessary.  PollyReports is designed around Reportlab, but does not import it at all; rather, it assumes that the Canvas object you pass into it will follow the Reportlab Canvas interface.  Pure duck typing.  Creating a wrapper that implements that interface adequately for PollyReports’ purposes shouldn’t be all that difficult; though I have no current plans to do so, I can easily imagine wrapping my MSWinPrint.py module in that way.

Right now, in fact, PollyReports.py imports nothing (except in the test rig, where Reportlab’s Canvas is imported).  Though it’s not really a good idea, doing:

from PollyReports import *

would likely work just fine for most people… there’s just not that much in PollyReports’ namespace, and I don’t plan to put much there.  I’m trying to implement all my utility functions as methods to avoid any excess names being imported.

So anyway, sayonara, Geraldo Reports.  It was fun while it lasted.  Well, not so much there at the end… like many relationships, this one is ending on a sour note.

Geraldo gives me a headache

My last two commits had to do with generator flow issues.  It all comes down to generators/base.py, and it’s giving me a headache… it still doesn’t work right.

The reports I’m generating have multiple levels of group header bands.  Each active group header band should reprint on each new page; this didn’t work correctly sometimes, mainly if a group header pushed a detail band off the page.  So I fixed that, no problem.

But then, I found that child bands would flow off the page if the parent band landed at the bottom (as utils.calculate_size() did not take child bands into account).  I added that calculation to the height check I had already committed, and it worked.  No problem.

Except… there’s still a problem.  If you use auto_expand_height, the generator obligingly adjusts the space consumed by the band.  But this addition space is consumed in the render_band function, after my extra size check.  I can’t just move the size check down, as the band will have then been rendered; I need to advance the page (and reprint those darn group headers) before that happens.

Gah.  So I’m stuck, for now.  One of the reports I’m generating has a subreport that expands the detail band, and this sometimes screws up the page position (and those headers).  I can’t see any way to avoid having the subreport, and the whole reason I used a child band was to ensure that a specific block of information floats at the bottom below that subreport.

I’m coming to the conclusion that Geraldo is structurally flawed.  Here’s how I see the flow working correctly:

  • Each row of the data source generates an in-memory band structure of some sort.  A simple list would actually work, where every item in the list would have a relative vertical position and height associated with it.  The whole band, including subreports and child bands, would be generated into this structure, and the height could be calculated by a simple iterative bounding box algorithm.  If you were really clever, the band structure could adjust the height automatically each time an element was generated into it.
  • But, the band structure I’ve just described hasn’t been rendered yet, just queued.  The generator loop would check to see if there is enough room left on the page for this band; if there is not, it would trigger a new page, re-render the current group header bands, and only then render the detail band.

The problem I see with Geraldo is, obviously, this isn’t being done.

But Geraldo represents a lot of otherwise-good work.  I don’t want to just throw it away.  Can it be fixed?  Possibly…

What needs doing is that whole render-the-detail-first thing.  If, instead of rendering directly into the Reportlab canvas, the detail band could be rendered into a sort of side-canvas or pseudo-canvas first, the correct size could then be figured accurately.  One of the main issues with this is the fact that the generator is looping; my fixes described above involve breaking out of the loop before the rendering takes place.  I suspect I’ll have to render into my side-canvas, breaking if necessary, and thus causing the band to render twice (as the next pass through the loop would find the same record ready to process, just as it does now).

I don’t like that solution on the surface; it will be important to ensure that functions called from detail band rendering do not produce side effects, since they would effectively double up.  (This “shouldn’t” happen, of course.)  I’ll also have to verify that this procedure won’t throw off any running totals being maintained by Geraldo… and this is something that certainly will happen, so I can’t handwave it.

Egad.  Time to call it a night, and hope I wake up smart enough to do this.

More Changes and Changes of Plan

I’ve spent some time getting to know the innards of Geraldo, though I’ll admit I don’t understand the flow completely yet.  The Rule class I recently added (to allow me to draw a horizontal line with the right end at BAND_WIDTH) works well enough; I’d still like to make Line work with the “magic” BAND_WIDTH parameter, though.  I just can’t think of a clean way to do it.

I also looked over the forks.  Other than those changes I’ve applied manually (the ones previously mentioned, posted in the original repository’s issues), only joaoalf’s tree had any changes I thought I wanted.  So I merged them, and I’ll be testing the results for the next couple of weeks.

I haven’t done any cutting on the generators, etc. as I said I might.  I’m on the fence about cutting ties to the original work.

Geraldo Reports – My Plans

I’ve begun using Geraldo Reports in several current commercial jobs, and I’m finding that it’s not, well, exactly right for me.  But it’s so close.  I’ve forked it on Github:

https://github.com/Solomoriah/geraldo

You can, of course, track back to the original author’s project from there.  I’m just getting started on Github, but I’m liking it a lot.

What’s wrong with Geraldo?

First of all, it’s not being maintained.  It’s been 8 months since anything’s been done to it.  There are outstanding pull requests and issues, but there’s no sign that any of the needed work will ever be done.

Second, it’s a victim of premature abstraction.  It’s an excellent report generator, tied fairly tightly to Reportlab, but it also has all these added things… a plain text report generator and an XML exporter, for instance.  I can’t see any reason I’d ever want to use it to do an export… it’s like killing flies with a sledgehammer.

So I’ve started my own fork, as I’ve said.  I’ve applied the changes mentioned in the last three issues posted to the original project, and added a feature I needed for my production project.  It’s now possible to mark a group footer or the report summary to print on a new page by defining force_new_page_before (in the same way as the usual force_new_page variable).

As time goes on, I plan to strip out the features I don’t think “belong” in the package.  I don’t know when or if I’ll get to that part, though… despite feeling that they don’t belong, their presence doesn’t disturb me all that much.

 

Thoughts on the Evolution of Python (the language)

I’m a fan of Python. I’ve been using Python since the 1.5.2 version. The first time I looked at it, though, I hated it. I thought, what a stupid way to design a language… white space at the beginnings of lines actually has meaning to the program. I didn’t look any further than that, with memories of FORTRAN coding forms dancing in my head.

But then Linux Journal devoted an issue to Python, and Eric Raymond, a fellow I have tremendous respect for, wrote an article on the language. He told me, clearly and elegantly, why I really wanted to be using this language.

So I downloaded and printed out the manuals, and I gave it another try… and I loved it. I found the whitespace thing was a non-issue. Python was at one time elegant and pragmatic, easy to write and easy to read. It’s been called “executable pseudocode,” and for good reason.

I’ve written thousands of lines of Python (which, if you’re familiar with the language, you’ll realize is roughly equal to hundreds of thousands of lines of C), and I can read all those programs easily, even years later.

When 2.0 came out, and 2.1 after it, I liked them. Some of the warts I had found and disliked in the language just evaporated, and the backwards compatibility level was very high. I’ve upgraded to each new version fairly soon after release, up to the 2.6 version. But I noticed, as time went on, that Python was absorbing syntax and semantics from other languages. This is nothing really new… remember I said that Python is pragmatic. If the other guys have a good way to do something, it’s entirely reasonable for Python to borrow it.

But some of those new goodies are really hard to read, in my opinion. List comprehensions, for instance, are just ugly. Saving a few lines of code is not much of a win to me, especially in a language that is already as concise as it really needs to be, and I always have to look up in the docs when I see one to figure out what’s happening. Don’t post a comment and tell me how stupid I am… I’ll admit it. I’m a bear of little brain, but writing in a subset of Python which might be considered “1.5 with the warts off,” I’ve written a bunch of very useful code. I think I have good enough reason for my opinion on the matter.

For the most part, this didn’t matter to me. I can still write that subset of Python in the 2.6 version and it works just fine. Backwards compatibility remains really good.

Then came Python 3.0, sometimes called Python 3000 (which was sort of the code name for the project to create it). Python 3.0 breaks a lot of things; it’s not backward compatible, and this was by design. I understand the desire by Guido and company to do this. But I don’t like the results.

Sure, they say, the 2.x line will be supported for some time to come, in parallel with 3.x. I’ve seen this before, though, and it doesn’t work that well. Since modules written for one version need to be modified to work with the other, module authors must maintain two versions (if they care at all). In practice, this means that as soon as a module author has upgraded his or her development system to 3.x, the 2.x module versions will fall to the bottom of the priority queue (or fall off entirely). 2.x will become a wasteland, module-wise.

I’m not sure how this will affect me, but I do know that all the modules and all the applications I’ve written are very un-compatible with 3.x. Worse, as I no longer have a Windows development environment at all (just Linux), I can’t build my own 3.x module binaries for the Windows modules I’ve published on my website. As a non-upgrader, I’m beginning to feel like an outcast in the Python world.

It sucks, honestly.