When a merged cell with borders is created in Apache POI it tends to look like this:

There is an FAQ on this here, but it’s not clear what’s actually required. I thought it meant create cells around the merged cell, but that is wrong (and would prohibit adjacent merged cells from having borders).

Looking at the file produced when a merged cell is given a border in Excel it is clear that what is actually required is the creation of more cells within the merged cell.

So, to achieve this:

I have to:

  • Create the cell at 1,1 and set the desired value for the merged cell.
  • Add a merged region from 1,1 to 3,3.
  • Create these cells without setting any value: [1,2]; [1,3]; [2,1]; [2,3]; [3,1]; [3,2]; [3,3].
  • Set top borders on the top cells, left borders on the left cells, right borders on the right cells and bottom borders on the bottom cells.
    Note that for a 3×3 merged cell this implies that each of the constituent cells has a unique cell style – in practice this may not be necessary as full borders on all cells should have the same effect.
 

This is pretty obvious when you think about it, but of course, when it’s being annoying one doesn’t think about it enough.

In order to instruct eclipse to refresh its cache of plugins it needs to be run with the “-clean” argument. So upgrading a plugin should be a case of:

  1. Delete the old plugin.
  2. Copy the new plugin to the same location.
  3. Run eclipse with the -clean argument

I did that, and found that my plugin had been completely ignored.

The problem was that eclipse (in a fit of bad planning) maintains the plugin cache below its own directory, and if that directory is in a read-only location the -clean argument cannot work.

The fix is simply to ensure that eclipse can write to its own directory, either by changing the permissions or by running in an elevated manner (both are bad, but the former is much less bad).

 

There is a post on the BIRT DevShare that puts forward one approach for dynamic number formats in BIRT. In this post I put forward another approach that I claim is the “right way” :) , and then explain the merits of the other approach.

Virgil’s approach allows you to write script that performs the formatting, my approach is simply to change the number format each time it is rendered.

In my report I have a scripted dataset that outputs the following:

Currency Amount Rate
£ 3141.59265358979 1
$ 3141.59265358979 0.620578379
¥ 3141.59265358979 0.008192255
3141.59265358979 0.878428696

(the Rate column is irrelevant for this solution, it’s just there so I can easily demonstrate that the Amount ends up as a number in Excel, by multiplying by the rates (which google told me were the correct exchange rates this morning :) ))

The output I want from my report is:

Currency Amount Rate
£ £3141.59 1
$ $3141.59 0.620578379
¥ ¥3141.59 0.008192255
€3141.59 0.878428696

Where the number format of the Amount column depends on the string listed in the Currency column. To do this the OnRender event for the Data element on the report becomes (with plenty of debug spew):

java.lang.System.err.println( 'onRender' );
java.lang.System.err.println( this.getRowData().getColumnValue('Currency') );
java.lang.System.err.println( this.getRowData().getColumnValue('Currency').charCodeAt(0) );
var currency = this.getRowData().getColumnValue('Currency');
this.getStyle().numberFormat = currency + "###0.00{RoundingMode=HALF_UP}";

Why?

The benefit of this approach (and the root of my claim that it’s the “right way”) is that it changes the number format, rather than replacing a number with a string. For most emitters this is irrelevant (i.e. it makes absolutely no difference in HTML or PDF) and changing the data format like this might even confuse an emitter; but if you are outputting to a spreadsheet there is a world of difference between a number and a string – and the latter is not much use if you want to multiply by the rate.

I would recommend using this approach if the field you are reformatting is numeric and you may want to have your report output to a spreadsheet and you know that your chose spreadsheet emitters won’t fall over, otherwise I’d go for Virgil’s approach. Personally all my reports may be output as spreadsheets and my emitter will support this approach as soon as I upload the fix (as, I think, after limited testing, does the default Excel emitter), so I’ll go for this approach whenever the field is numeric and Virgil’s approach if it is not.
Virgil’s approach is slightly simpler as it uses a complex expression rather than a script.

 

There are three ways in which a BIRT emitter can handle pagination:

  • no-pagination
    The emitter will be instructed to create a new page at the beginning of the report, and to end that page at the end of the report.
    This could be used for an HTML page that contains the entire report (it makes more sense to let the browser handle pagination).
  • paper-size-pagination
    The emitter will be told to start a new page based on the size of the paper and the size of elements added to that page.
    It is the responsibility of the emitter to ensure that the things it renders are actually at the size that the layout engine expects, otherwise pagination will be a mess.
    This is used for the PDF emitter, where the output is destined to end up on actual paper.
  • page-break-pagination
    The emitter will be told to start a new page only in response to an explicit page-break (page-break-before or page-break-after) appearing in the report.
    This is needed for my Excel emitters, where I want to enable the report designer to lay elements out on separate Excel worksheets.

There are two ways in which a BIRT report can be produced:

  • A single RunAndRenderTask
  • A RunTask followed by a RenderTask

The BIRT servlet defaults to using a RunAndRenderTask, whilst my own report server always uses separate Run and Render tasks.

Whilst working on v0.3 of my Excel emitters, where I changed the pagination from no-pagination (everything on a single worksheet) to page-break-pagination (separate sheets used in the control of the report design) I found that everything worked correctly when reports were run from the designer, but that my pagination was being ignored when run from my own API calls.
I walked through the RenderTask source code and could not see how pagination would ever work correctly unless I set HTML_PAGINATION, but that would cause large tables to also generate page breaks, which the designer was not doing.

Eventually, after much scratching of head and debugging through BIRT code, I realised that the difference was that the designer was using a RunAndRenderTask whilst I was using a RenderTask.

The code for setting pagination in RunAndRender is completely different from that used in Render, and the result is that a RenderTask cannot support page-break-pagination.

I filed a bug, but that still left me with broken pagination.

The solution was quite simple: use my own private RenderTask, that is identical to the BIRT RenderTask except that it has:

else if ( ExtensionManager.PAGE_BREAK_PAGINATION
        .equals( pagination ) )
{
    if ( !paged )
    {
        long pageNumber = iter.next( );
        if ( pageNumber != 1 )
        {
            layoutEngine.setLayoutPageHint( getPageHint(
                    pagesExecutor, pageNumber ) );
        }
        setFilteredPageNumber( filteredTotalPage,
                totalPage,
                pageNumber );
        layoutEngine.layout( executor, report, emitter, true );
    }

(the only change there is the final parameter of layout is changed from false to true).

I have to construct my RenderTask myself, rather than calling createRenderTask, but that’s simple enough:

public IRenderTask createRenderTask( IReportDocument reportDocument ) {
    return new RenderTask( (ReportEngine)reportEngine, reportDocument );
}
 

Unfortunately there is no maven repository with jodconverter v3 snapshots in it.

In order work around this I chose to install it into our local repository, which required a couple of changes to the pom.

Enabling deployment

The first step is to tell the pom the location of the repository:

<project>
  ...
  <distributionManagement>
    <repository>
      <id>our-repo</id>
      <name>Our Internal Repository</name>
      <url>http://our-nexus:8081/nexus/content/repositories/releases/</url>
    </repository>
    <snapshotRepository>
      <id>our-repo</id>
      <name>Our Internal Snapshot Repository</name>
      <url>http://our-nexus:8081/nexus/content/repositories/snapshots/</url>
    </snapshotRepository>
  </distributionManagement>
  ...
</project>

With that a simple “mvn deploy” will deploy the pom and jar to the repository, but it will ignore source and docs.

Deploying Source and Javadocs

Given that jodconverter is a nicely built maven project it’s a shame to exclude its source and docs from the repository, so to add them:

<?xml version="1.0" encoding="UTF-8"?>
<project>
  ...
  <build>
    <plugins>
      ...
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-source-plugin</artifactId>
        <executions>
          <execution>
            <id>attach-sources</id>
            <goals>
              <goal>jar</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-javadoc-plugin</artifactId>
        <executions>
          <execution>
            <id>attach-javadocs</id>
            <goals>
              <goal>jar</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    ...
    </plugins>
  </build>
  ...
</project>

After doing this you are still subject to the warnings about using 3rd party snapshots, but now you are in control of everything so you can ensure that you don’t get bitten.

 

The reasons for this are well documented (try here or here) but, when you need it, it’s irritating to discover that you cannot upload 3rd party snapshots to Nexus.

I got as far as creating a new 3rd party snapshot repository and faffing around with it for far too long before discovering this.

My solution was (as suggested, though i didn’t find that out until later) to check out the source, build it myself and install it to my local repository.

 

In BIRT if you define a Flat File Data Source you have to provide it with an absolute path, and if you try to cheat by going in to the XML and changing the path to a relative one the data simply won’t be found.
This is very frustrating if you want to ship a set of data with a report, in my case as one of the tests for my Excel Emitters.

The solution is a little bit of scripting in the beforeFactory method:

var path;
if( reportContext.getHttpServletRequest() != null ) {
	var request = reportContext.getHttpServletRequest().getParameterMap();
	path = request.get( "__report" )[0].toString();
}
if( path!= null){
	if( path.contains( "/" ) ) {
		path = path.substring( 0, path.lastIndexOf( "/" ) );
	} else if( path.contains( "\\" ) ) {
		path = path.substring( 0, path.lastIndexOf( "\\" ) );
	}
}
// java.lang.System.err.println( "resourcePath = " + path );
this.getDataSource( "Mission Data Source" ).setPrivateDriverProperty( "HOME", path );

That’s great if you are running your report from a normal environment (where reportContext.getHttpServletRequest() will not be null), but what if you are running it from a custom ReportEngine host?
Well, in that case you need to add the path to the report to the appContext:

if( filepath != null ) {
    @SuppressWarnings("unchecked")
    Map<String,Object> appContext = (Map<String,Object>)reportRunTask.getAppContext();
    if( appContext == null ) {
        appContext = new HashMap<String,Object>();
        reportRunTask.setAppContext(appContext);
    }
    appContext.put("__report", filepath);
}

Which only leaves the small problem of how to find the correct path to the report design file.
In a normal environment finding the path to the report design file shouldn’t be too difficult, but if the report design is a resource in an OSGi bundle (i.e. in an eclipse plugin unit test) you have a few more hoops to jump through:

String filepath = null;
if( Activator.getContext() != null ) {
    URL bundleLocation = new URL(Activator.getContext().getBundle().getLocation());
    // System.err.println( "Activator.getContext().getBundle().getLocation() = " + bundleLocation );
    String bundleLocationFile = bundleLocation.getFile();
    if(bundleLocationFile.startsWith("file:/")) {
        bundleLocationFile = bundleLocationFile.substring(6);
    }
    // System.err.println( "bundleLocationFile = " + bundleLocationFile );

    URL resourceLocation = this.getClass().getResource( filename );
    String resourceLocationFile = resourceLocation.getFile();
    // System.err.println( "resourceLocationFile = " + resourceLocationFile );

    filepath = bundleLocationFile + "bin" + resourceLocationFile;
    // System.err.println( "filepath = " + filepath );
}

It’s worth having those println method calls in there so you see what you are getting for the bundle and resource location – the values will almost certiainly change with different OSGi hosts, but that works for my plugins unit tests.

And of course, now the beforeFactory script needs modifying to pick up from either the Servlet request or the appContext:

var path;
if( reportContext.getHttpServletRequest() != null ) {
	var request = reportContext.getHttpServletRequest().getParameterMap();
	path = request.get( "__report" )[0].toString();
} else if( reportContext.getAppContext() != null ) {
	path = reportContext.getAppContext().get( "__report" );
}
if( path!= null){
	if( path.contains( "/" ) ) {
		path = path.substring( 0, path.lastIndexOf( "/" ) );
	} else if( path.contains( "\\" ) ) {
		path = path.substring( 0, path.lastIndexOf( "\\" ) );
	}
}
// java.lang.System.err.println( "resourcePath = " + path );
this.getDataSource( "Mission Data Source" ).setPrivateDriverProperty( "HOME", path );
 

There are, to my knowledge, five existing emitters for BIRT that output reports in a format understood by Excel. Unfortunately all four of them have significant issues for my purposes:

  • The built-in emitter.
    The built-in emitter does not output Excel format files, it outputs files in an XML format that Excel can understand, as a result newer versions of Excel complain whenever the files are opened.
    The files also have problems with page layout that I could not work around (specifically wide reports would be cut off) and do not display charts or logos.
  • The Tribix emitter.
    The Tribix emitter solves many of the problems with the built-in emitter, but tries too hard to produce an output document that looks identical to the original report.
    As a result the output spreadsheets contain a lot of very small rows/columns that are introduced purely to make the content line up visibly as it would on a PDF. Unfortunately this results in headers appearing in one column with data appearing in another – sparse crosstabs are particularly bad in this regard with data failing to line up – which all means that the output from the Tribix emitters is not actually very useful as a spreadsheet.
  • The Arctorus emitters.
    When I last tried them, the Arctorus emitters behaved suspiciously similarly to the Tribix emitter, but also support XLSX and cost money.
    It is my understanding that their v3 emitters no longer attempt pixel-perfect layout, and that may make them the best around, but they still cost money and I believe they still create/merge rows/columns for layout.
  • The Actuate 11 emitter.
    Unfortunately I haven’t tried the new Actuate 11 emitter, because it comes as part of the Actuate 11 system that is priced out of our range.
  • RameshS’ emitter
    A fellow entrant in the Getting Started with BIRT contest.
    My understanding is that RameshS’ emitter is based on the built in source code, but writing to native XLS files (I don’t think it handles XLSX, but it’s not clear from my reading).
    I have not tried this emitter, but my expectation is it still creates/merges rows/columns for layout, and its use of Excel formatting seems to be more primitive than mine (images are restricted to a single cell, auto row heights rely on Excel behaviour) – but it claims to have smaller file sizes than Tribix, which may mean it beats me too.
    If your only complaint about the built in emitter is the file format then give this a go.

So, after discovering Apache POI and realising that BIRT emitters are not so complicated after all, I set about writing an Excel emitter that met our needs.
The design aims for the SpudSoft BIRT emitter are:

  • It must be useful as a spreadsheet, cells should all line up, there should be no blank rows/columns for formatting, there should be no page breaks.
  • It must open in Excel without complaint, preferably support both XLS and XLSX files.
  • It must display images (logos and charts are required).
  • It must include any formatting that does not clash with the other requirements.
  • It must work in the BIRT runtime environment, as well as in eclipse.

Known Limitations

Obviously, given that I’m explicitly favouring spreadsheetiness over formatting, there are going to be some limitations:

  • BIRT allows you to put multiple things into a cell (i.e. two different labels), my emitter will try to output the content of both, but the formatting will only be an amalgamation.
    In particular this means that nested tables (or tables within grids) will not display correctly.
    I’m open to suggestions as to how to resolve this, but I haven’t actually got any reports that use nested tables yet, so I haven’t worried about it.
  • BIRT allows you to have two tables next to each other, horizontally or vertically, with different row/column height/widths, on a spreadsheet I should just take the maximum.
  • Formatting will be handled on an as-needs basis – when I have a report that needs support for something I’ll make sure I can have it :)
  • Column widths will often be left at their default, rather than trying to persuade things to fit (because headings and images often overlap).

If you find more limitations or other issues please let me know (either via this blog or by filing an issue on bitbucket), and if can’t live with these limitations let me know how you think I should resolve them and I’ll see what I can do.

In short, my emitter will produce good results with all the reports I’ve seen (better, I hope, than BIRT or Tribix), but I could easily produce a report that it could not cope with at all (that BIRT and Tribix probably could cope with).

Download

The emitter is all GPLv3 and the source is available on BitBucket.
The binary is available from https://bitbucket.org/yaytay/spudsoft-birt-excel-emitters/downloads.
And instructions for installation are available here.

Before throwing that binary directly into your production environment you are warned to take good notice of the version number!
You could well be the first person other than myself to use the emitter, it is bound to have faults ranging from slight limitations to major crashers.
If you do find issues with it please report them at https://bitbucket.org/yaytay/spudsoft-birt-excel-emitters/issues.

Update v0.3

Version 0.3 generates new Excel sheets in response to explicit page-breaks.
Sheets will take their names from the last named table seen on that page (if there are no named tables and there is only a single sheet that sheet will be named with the title of the report).

There is a bug in BIRTs RenderTask that prevents page-break-pagination working. There are two ways around this:

  • The simple option is to use the RunAndRenderTask, which does paginate correctly.
  • If you need to use a separate RenderTask the only option I could find is to write a custom RenderTask.

Update v0.4

Version 0.4 has been through a substantial refactor in order to make it possible to implement nested tables without having spaghetti code.
The result is more explicitly an FSM than it was previously, with state being represented by an instance of a “handler” class – typically a new handler is created in a startThing method call and removes itself in the corresponding endThing method call.
The main changes on top of the refactor are:

  • Nested Tables
    Currently these only work if the child table is the only thing on given row (more specifically, there must be no other cells on the same row, so the cell must be merged across columns). This is a significant limitation, but does allow a great many designs to work that couldn’t work before.
    Note that my emitter still does not (and never will) merge or insert columns to try to make things line up.
  • Rotated Text
    Large crosstabs are often unwieldly in spreadsheets because the column headings tend to be quite long, meaning that columns are wide and thus very little of the sheet is visible at any time.
    The fix for this in the spreadsheet world is to rotate the column headings to an angle so multiple headings overlap and the columns can be much narrower.
    This option is now available from my emitter.
    BIRT does not have any built in option for rotating text, and any generic solution would have to generate the text as an image, which is not much use for a spreadsheet.
    To inform the emitter that text should be rotated you have to set a User Property called “ExcelEmitter.Rotation” on the text/label control that is to be rotated, for example:

    <label id="57">
        <list-property name="userProperties">
            <structure>
                <property name="name">ExcelEmitter.Rotation</property>
                <property name="type">string</property>
            </structure>
        </list-property>
        <property name="ExcelEmitter.Rotation">60</property>
        <property name="textAlign">right</property>
        <text-property name="text">Flights</text-property>
    </label>
    
  • Auto Column Widths
    An oft-requested feature from my No1 client, and one that I’ve been putting off because reports as spreadsheets often don’t work with auto-column widths as they often have titles in the early rows that are expected to flow over multiple cells (without using merged cells).
    My solution is to only base auto column widths on cells in the details band of tables, and for performance reasons I only consider the first 4 rows there.
    Auto column widths will not make columns narrower (if you want them narrower specify the width in the report design) and will not be used at all if the width is specified in the report design.
  • Row Grouping
    A simple change – table groups in the report are now configured as row groups in Excel.

Update v0.5

A bit anti-climactic after version 0.4, version 0.5 has a bug fix to set the top and bottom margins correctly and one new feature:

  • Force Auto Column Widths
    After adding support for auto column widths and turning it off by default I go and find a situation where it needs to be be turned on – specifically if you have columns that you want hidden unless they contain data. By forcing auto column width calculations the column widths can be set to zero and they will only become wider if there is data to fill them.
    As with text rotation, this is controlled by a User Property, this time a boolean property called “ExcelEmitter.ForceAutoColWidths”.

Footnote

Finally I’d like to thank my employer, GTI Recruiting Solutions, for permitting me to release this.

 

It’s taken me quite a while to get this right, and until yesterday I was using something that was close, but wrong, so here’s the details.

When images are placed in a spreadsheet using Apache POI they are located using an instance of the ClientAnchor class.
The ClientAnchor consists of eight values, accessed by getters/setters:

    short Col1;
    short Col2;
    int Row1;
    int Row2;
    int Dx1;
    int Dx2;
    int Dy1;
    int Dy2;

Between them these eight values represent the top left and bottom right corner of the image. The Row/Col values are the (zero-based) indices of the cells in which the image starts/ends, and the Dx/Dy values are the offsets within those cells. Hopefully this diagram makes this clearer:
ClientAnchor diagram

Here the Row/Col fields have the following values:

Col1 1
Row1 2
Col2 5
Row2 5

Unfortunately the DX/DY values have different units for XLS (HSSF) and XLSX (XSSF) spreadsheets, so I’ll handle them separately.

HSSFClientAnchor

For the HSSFClientAnchor the DX/DY values represent fractions of the total size of the cell, and for added fun they are different fractions.
The value of DX is in [0, 1023] and the value of DY is in [0, 255].
So if the offset of the image is known in millimetres, and the size of the column is known in millimetres, calculation of DX is as simple as:

return (int)( 1023.0 * widthMM / colWidthMM );

XSSFClientAnchor

For the XSSFClientAnchor the DX/DY values are in EMUs (that’s EMUs not emus). There are, by definition, 36000 EMUs per millimetre, so the calculation of DX becomes:

return (int)( 36000 * widthMM );
 

Not a problem, just something that you might need to be aware of, in Apache POI XSSFWorkbook.write(OutputStream) closes the stream, whilst HSSFWorkbook.write(OutputStream) doesn’t.

I found this because I have a TemporaryFile class that cleans itself up after being closed (more specifically, after the OutputStream has been closed you can get an InputStream, and after that has been closed it deletes the file) – and it only permits one OutputStream to be got.
It was trivial to work around the issue, I just had to keep hold of the OutputStream rather than ask for it again, that way I close the same OutputStream (which does nothing) and the TemporaryFile is unaffected.

© 2012 Stuff and Things Suffusion theme by Sayontan Sinha