Fork me on GitHub

29 Nov 2011

This blog has moved

Ad-Hockery is no longer updated at this location. Please redirect bookmarks to blog.freeside.co for more awesome content.

24 Nov 2011

Fear & loathing in functional testing land

As projects grow the two things I’ve repeatedly found to be particularly painful have been functional testing and data fixtures. I might write up some thoughts on data fixtures another time but what follows is a brain-dump of my troubled relationship with functional testing.

Disclaimers: I have more questions than answers and I’m completely open to the idea that I’m doing it all wrong. I’m not trying to diss any tool or technique here. I have spent a lot of time over the last few years writing functional test coverage so I think I at least have some perspective on the issues if no clue how to solve them.

When I say functional testing I mean in the Grails sense of an end-to-end test via the browser. Some people are quite resistant to writing such tests and whilst I agree that as much testing as possible should be done at as low a level as possible there are certain things that really only can be tested in that way. I’m also a big fan of the GOOS approach of working outside in; starting with a (failing) functional test that defines the desired behaviour in a user-centric way and building in to the low-level implementation with its unit tests then back out to watch the original end-to-end test (hopefully) pass.

Why is functional testing difficult?

Test development cadence

The main issue I find when constructing functional tests is what I’ll call the test development cadence; that is the time it takes to go round the loop (and sub-loops) of

  1. write a bit of test
  2. watch it fail
  3. write some code to make it work
  4. watch it still fail
  5. figure out if your expectation is wrong or your code doesn’t work
  6. fix it
  7. repeat last 3 steps n times
  8. watch it pass
  9. celebrate

With a unit test that time is typically fast, a keystroke in an IDE and the results are available in at most a couple of seconds. Functional tests are considerably slower. Even assuming you can optimise so that the application is running and you can run the test from an IDE then Selenium has to start up a browser instance, queries execute, views need to render, etc. In the worst case you’re switching to the terminal and using grails test-app functional: Blarg or equivalent then waiting for the webapp to start up before the test can even start to run and shut down again before the report is generated.

A slow test development cadence leads to distraction (checking Twitter, making coffee, getting drawn into a discussion of the finer points of mixing an old fashioned, etc.) and distraction leads to context-switching which slows things still further.

Test diagnostics

GOOS makes a great point about the importance of test diagnostics suggesting that the TDD mantra of “red, green, refactor” should be replaced with “red, decent diagnostics, green, refactor”. When a test breaks, especially one someone else wrote (or that I wrote more than a week ago and have consequently lost all recollection of), I want to be able to see what’s broken without re-running the test with added logging or resorting to a debugger. Testing further from the browser hurts diagnostics as you can’t see what’s not working and so have to rely solely on the quality of your assertion output. That’s not an easy thing to get right. Geb’s output when a waitFor condition fails is just Condition did not pass in x seconds. Even with direct assertions and nice power assert output its not always clear whether the functionality didn’t work or the expectation is incorrect. Selenese is by no means great in this regard (a humble Condition timed out isn’t much help) but at least you can step back with the Selenium IDE and watch the damn thing not working much more easily.

Bad test diagnostics coupled with a slow test development cadence make for a horrible experience.

The quest for the functional testing holy grail

The most productive I’ve ever been when writing functional tests has been when using Selenium IDE. That’s quite an admission for someone who’s spent a considerable amount of time & energy over the last few years trying to find or build something better!

The test development cadence is fast. Really fast. When you’re writing tests with Selenium IDE (and I do mean write them, I’ve almost never used the recording functionality) the app is running, the browser is running and you can execute the test, a portion of the test or an individual command very quickly. You can step through the test, set breakpoints, etc. When using a framework like Grails that lets you make changes to the app without restarting you can rock along pretty rapidly.

That said, the downsides are not inconsiderable:

  • Abstraction is typically poor; you’re dealing with fine details of page structure (DOM element ids, CSS selectors) and copy ‘n pasting sequences of commands that would in a sane world be defined as functions or macros. You can write custom JavaScript commands but with considerable limitations such as the fact that any wait for x step must be the last thing the command does. Lack of abstraction means lack of maintainability. As the project goes on any change in page rendering probably means picking apart a bunch of tests that fail not because the functionality they are testing has stopped working but because the implementation of that functionality has changed.
  • Atomicity is difficult. Because each test probably requires a few lines of setup it’s tempting for developers to add new assertions to an existing test. This violates the principle of having a single (logical) assertion per test. Part of the problem I think is that with Java, Groovy, Ruby, etc. each file can contain multiple tests whereas with Selenese each file is a single test script. The right thing to do is to have lots of small Selenese test files but it’s tempting to fall into the trap of munging tests together into the testing equivalent of a run-on sentence. One of the worst side-effects of this is that as a test suite grows it becomes really hard to identify where certain features are tested and to find redundancy or obsolete tests.

Despite these significant failings writing tests in Selenium IDE is very effective. Maintaining a suite of such tests is another matter. Working on a long-running project the failings of Selenese tests start to increase logarithmically. The reason I created the Grails Selenium RC plugin was to try to build something I could use in future projects that would combat the failings of Selenese. I wanted to use a ‘real’ language with selection and iteration and to be able to build a robust abstraction so that tests are not dealing with the fine details of page markup. Geb is another step along this road. It provides a nice way of defining page objects and modules and handles things like tracking which page class is the ‘current’ one and how and when that should change.

What do I want from a functional testing tool/language?

I’m convinced that the goal of writing tests in the same language as the application is a pretty vapid one. Working inside one’s comfort zone is all very well but too many times I’ve seen things tested using Selenium or Geb that would be better tested with a unit-level JavaScript test. I’m guilty of this myself. I’m a better Groovy coder than I am a JavaScript coder so it’s easy initially to break out a new Geb spec than a new Jasmine spec. Functional testing tools are really bad at testing fine-grained JavaScript behaviour, though. These sort of tests are really flaky, false negatives are a fact of life. They’re wastefully slow as well. JavaScript unit tests are fast, faster than Groovy unit tests. As a Grails developer I’ve looked enviously at how fast tests run in Rails apps but that’s nothing compared to watching a couple of hundred Jasmine tests run in under a second. To get back to the point, I have no problem with writing my functional tests in something other than Groovy if I can hit my goals of productivity and maintainability.

I was at one time convinced that the ability to use loops and conditional statements in Groovy made it a more suitable testing language than Selenese but honestly, how often are such constructs really required for tests? The The single most essential thing for a maintainable suite of functional tests is the ability to create a decent abstraction. Without that you’ll be building brittle tests that fail when the implementation changes 100 times more often than they fail because the functionality they’re testing is actually broken.

Abstraction is key

The abstraction layer needs to be powerful but simple. I’ve seen test suites crippled by badly written page object models and I’m starting to feel that the whole idea is too formalized. Building Geb content definitions with deeply nested layers of Module types is time consuming & difficult. With Selenium RC there’s not even the page transition logic Geb provides so you end up having to write that as well (probably getting it wrong or implementing it in several different ways in different places).

I can’t help thinking the page object model approach is coming at the problem from the wrong angle. Instead of abstracting the UI shouldn’t we be abstracting the behaviour? After all the goal is to have tests that describe how users interact with the application rather than how the various components that make up the application relate to one another. I’d rather have a rusty toolbox of lightweight macros and UI module definitions than a glittering palace of a page component model that I find awkward to use, extend or change. The abstraction has to be there - when I change the implementation I don’t want to spend half a day finding and fixing 100 subtly different CSS selectors scattered throughout the tests - but I don’t think it has to be particularly deep.

Where do I go from here?

A better Selenese?

An interesting possibility for creating better Selenese tests is the UI-Element extension library that allows a UI abstraction layer to be built on top of Selenese. It also introduces the concept of rollup rules (paramaterized macros) that are a more powerful way of abstracting command sequences than custom Selenese commands. From what I’ve seen the tool support in Selenium IDE looks impressive too. I need an opportunity to use UI-Element seriously but it certainly appears promising.

The most impressive Selenium extension I’ve seen is Steve Cresswell’s Natural Language Extensions that layers something like JBehave’s feature definition language on top of Selenese. Energized Work used this on a couple of projects (unfortunately not ones I was involved with) and I’ve heard great stories of how it enabled really rich cooperation between developers, QA and project stakeholders. I was pleasantly surprised with how simple the underlying code appeared to be given the radical difference in the test language.

Other options?

The tools I really need to look into are:

  • Cucumber which syntactically looks like the answer to my prayers. I want to see how fast the test development cadence is. Since there’s now a pure JVM implementation I really have no excuse for not getting up to speed with it pronto.
  • FuncUnit is much lower level and I’m not sure how easy it would be to build an effective abstraction layer that kept the tests readable and maintainable but it’s fast and runs right in the browser which are potentially compelling advantages.

26 Aug 2011

Grails Gotcha: Beware HEAD requests when rendering binary output in controllers

Although most Grails controllers render HTML, JSON or XML output it is possible to use them to render binary data as well. We use a controller to render images uploaded by editors into our content management interface. The theory is simple enough, instead of using the render dynamic method or returning a model the controller action just writes bytes directly to the HTTP response stream. Our action looked something like this:

def show = {
    def image = Image.read(params.id)
    if (image) {
        response.contentType = image.contentType
        response.outputStream.withStream { stream ->
            stream << image.bytes
        }
    } else {
        response.sendError SC_NOT_FOUND
    }
}

This seemed to work well enough. However when writing a test I noticed an odd thing. I was using RESTClient to scrape resource URLs out of and make a HEAD request against them to ensure the URLs were valid. Javascript and CSS files were working fine but all the non-static images in the page were getting 404s. Initially I suspected a data setup problem and spent some time ensuring my test was setting data up properly. It was only once I put some debug logging in the controller action that I saw that the controller was actually loading images. The 404 was not coming from the else block in the action as I initially assumed. I tried changing the RESTClient call from head to get and suddenly the image URLs started working!

Once I did that I realised what the problem was. An HTTP HEAD request does not expect a response, in fact a server receiving a HEAD request must not return a response. The response stream that our controller action is writing to is, when the request method is HEAD, actually a no-op stream. When the action completes Grails checks to see if anything has been committed to the response stream and since it has not assumes that we want to render a view by convention. You can probably see where this is going now. The convention is that the request gets forwarded to grails-app/views/<controller>/<action>.gsp which of course does not exist. The forwarded request sets a response code of 404 because there is no GSP!

We caught this bug in our app completely by accident but it could actually have been quite serious. Caching proxies and CDNs may well use a HEAD request to revalidate content and on getting a 404 assume that the URL is no longer valid. If the 404 response itself then gets cached we could get broken images on our site because the CDN tells the client browser there's nothing there.

The solution is simple enough. I changed the controller action to simply set a 200 response code when it gets a HEAD request for a valid image:

def show = {
    def image = Image.read(params.id)
    if (image) {
        if (request.method == "HEAD") {
            render SC_OK
        } else {
            response.contentType = image.contentType
            response.outputStream.withStream { stream ->
                stream << image.bytes
            }
        }
    } else {
        response.sendError SC_NOT_FOUND
    }
}

A neater solution might be to use Grails’ support for mapping actions to request methods so that GET and HEAD requests dispatch to different actions.

18 Aug 2011

Data-driven variation with Spock

Spock’s where: block is commonly used with a data table but can also be driven by any Iterable data. It’s worth bearing in mind that the data driving the where: block doesn’t have to be hardcoded, it can be dynamic. For example, today we implemented a spec to ensure that every table in our database schema has a primary key (because it’s required by HA-JDBC and not automatically added by GORM on join tables). In this spec the where: block is driven by the list of table names read from the database metadata.

Something like this could be done with JUnit, of course. A test could iterate over the table names and assert that each has a primary key. However, such a test would fail fast whereas with the power of Spock’s @Unroll annotation the spec creates a separate test result for each database table and will run each individually regardless of whether any others pass or fail. The command line output from this spec will be enough to tell you which tables do not have primary keys as @Unroll puts the table name right in the test name.

The other great thing about this spec is that it doesn’t require maintenance; as we add more domain classes to our app the spec will automatically check the associated tables.

2 Aug 2011

Avoiding accidental i18n in Grails

We’re developing an app that’s exclusively for a UK audience so i18n really isn’t an issue for us. However recently we got bitten by some i18n creeping in where we didn’t want it. Specifically, when using Grails’ g:dateFormat tag the default behaviour is to format the date according to the Locale specified in the user’s Accept-Language header. Even though we are explicitly specifying a format pattern for the date Java is aware of localized day names for some languages so the output can vary. The result is that on a page full of English text there suddenly appears a Spanish or Swedish day name. What makes things worse is that as we use server-side content caching and a CDN if a user with a non-English Accept-Language header is the first to see a particular page or bit of dynamically retrieved content then the cache is primed and until it expires everyone will see the non-English day name text.
The solution in a Grails app is as simple as replacing Spring’s standard localeResolver bean with an instance of FixedLocaleResolver. Just add the following to grails-app/conf/spring/resources.groovy:
localResolver(org.springframework.web.servlet.i18n.FixedLocaleResolver, Locale.UK)
This changes the way Spring works out the request locale and any locale-aware tags should just fall into place.

21 May 2011

Resources plugin and modular web components for Grails apps

I recently gave a talk on ‘Building progressive UIs with Grails’ at gr8conf in Copenhagen and was really pleased with the feedback & comments I received afterwards. There was a question asked at the end that I felt in retrospect I could have answered better, though. I had mentioned that the Grails AJAX tags were something that should be avoided as they write script blocks directly into the page and event handlers directly onto HTML elements. I was pitching an approach based on a clean separation of clean semantic markup and script enhancements and inline script violates that separation.

I was asked if there might be any kind of tags that could be developed that provided a more appropriate replacement and answered that since modern JavaScript frameworks such as jQuery make decorating elements with AJAX functionality so easy that I didn’t think there was much point.

I wouldn’t want anyone to understand me to mean that creating taglibs and GSP templates for modular components of your pages is a bad thing. I’d just advocate keeping the script out of them. Custom tags that write out markup or delegate to templates can be really useful for building complex reusable modules. Pairing those with external JavaScript files that enhance the generated markup would be very effective.

If you’re using the resources plugin (and really… you should be) then there’s a really neat way to tie the taglib or template to the JavaScript file as well. I did briefly mention this in my talk but it’s worth expanding on here as it’s a technique with a lot of potential.

The resources plugin’s r:use tag doesn’t write out anything directly to the page but rather adds a resource module to a list that will get written out at the appropriate place in the page when the r:layoutResources tag is used. This means modular components throughout the page can declare a dependency on JavaScript and CSS resources by simply calling r:use. In a complex app this can be a real boon as it might not be obvious from the top level GSP exactly which modules are going to get rendered. Also if a module is later added to or removed from a page you don’t need to worry about fiddling around with resource declarations in the top level GSP. Taglibs or templates become real drop in components whilst maintaining a nice clean separation of markup and script. Even better, if you use the same resource dependency multiple times the plugin ensures the resources are actually only linked once.

Since r:layoutResources is typically used in a SiteMesh layout the resources your module depends on can even be ones that need to appear in the head of the document. The SiteMesh is rendered after the GSP it decorates so any r:use calls will already have been made.

26 Mar 2011

CSS box model hacking

Want to make an HTML element fill 100% of its parent’s width but also give it a border and/or some padding? Since the width of an element is exclusive of its border and padding this can be a pain. However there’s a fairly simple CSS solution that works cross-browser.

Here’s an example. Both input boxes are set to width: 100% and have padding: 5px. The first input shows the problem. Because the padding and border are added to the width of the element it overflows its container. The box model of the second input has been modified so that the padding and border are inside the declared width.

The trick to modifying the box model is to set box-sizing: border-box. Unfortunately that’s not a cross-browser property, only Opera supports it at the moment. To get the same effect in other browsers you will also need to set browser-specific versions as well:

   -moz-box-sizing: border-box;
-webkit-box-sizing: border-box;
    -ms-box-sizing: border-box;
        box-sizing: border-box;
Note that Internet Explorer only supports the -ms-box-sizing property from version 8 upwards so you should probably be judicious with this technique or use an alternative method to get a similar effect in IE7 and below.