Friday, February 26, 2010

Inevitable and avoidable rework

Without really thinking about it until now, I've been seeing two types of technical debt. The first is the quick solution implemented with dirty code. I consider this to be irresponsible. That's not to say I won't do it, just that if I decide I should do it I make sure the necessary people understand the consequences and that it's an irresponsible action to take.

The second is a natural byproduct of emergent design and YAGNI(yet) decisions. It's the debt that surfaces when a system outgrows implementations resulting from previous decisions, which were the right ones to make at the time based on the information available (because they did not compromise quality or the health of the code in any way). Irresponsible debt creates avoidable rework; it's failure demand. It's bad, it smells and it needs to be cleaned up because, if left to fester, it's going to slow us down and divert capacity away from meeting the value demand.

The debt that surfaces because the system is maturing creates inevitable rework. It's necessary to do this rework on a regular basis to keep the emergent design relevant, the code habitable, to prevent obsolescence, perhaps increase reuse, and reduce risks and medium to long-term costs. I think most people try to roll this debt into feature cards and that's the right policy. We prefer to do that if we can. However, we've become too good at writing cards to be less than 2 days (which helps smooth the flow) and sometimes it's not possible to absorb inevitable rework into a feature card and keep it under 2 days (the way we like it). And of course, sometimes, the rework just doesn't relate to any features, e.g. upgrading to the latest Grails framework. So this gets written on a blue card. By definition this is failure demand too. But that's harsh, don't you think? I have a weird take on this because I insist that the system is recognized and treated as a stakeholder, and as such it values certain things and makes its own demands. One of the things it values is to be kept healthy. But I'm not hung up on this rework being classified as failure demand providing it's being managed effectively.

As I mentioned in my previous post, completing some inevitable rework on a regular basis (and assuming you're not being irresponsible ;) helps reduce the remaining rework. We can see this in action in the chart below.


Rework
Originally uploaded by energizr


The blue and pink lines show the remaining technical debt and defects that are either work-in-process or queued inventory (i.e. completed but not released). The blue and pink bars show the technical debt that has been repaid and the defects that have been fixed. Think of these bars pulling the remaining rework down keeping it small and preferably fairly steady. And, of course, assuming there's throughput satisfying the value demand then the team is effective.

It's useful to track the remaining technical debt and defects in statistical process control charts. The natural process limits help to distinguish signs of system instability from normal variation. When the limits are breached investigate what's happened to understand how the system may have changed. Watch for trending beyond the breach as it's likely to reveal more information to help you. Use these events to identify improvements.


Technical Debt SPC
Originally uploaded by energizr

Defects SPC
Originally uploaded by energizr

(Incidentally, the process limits were calculated between weeks 7 and 14 because week 6 saw the system change. Up to the end of week 6 all the software completed became queued inventory. This was then flushed to throughput and released, enabling a weekly release from then on.)

Labels: , ,

Links to this post 

3 Comments

Wednesday, February 24, 2010

A simple measure of effectiveness

In the Lean manufacturing world there's a measurement called First-Time-Through (FTT), which monitors whether a cell is making products right the first time. It's a measurement of the effectiveness of the cell's standardized work and shows the percentage of product made without any need for rework or scrap.
FTT = ( Total units processed - Rejects or Reworks ) / Total units processed
If the standardized work is adhered to, the product will be made right first time and FTT will be 100%. However, flawed materials, faulty components and operator error all contribute to rework and scrap.

Who cares about parallels between manufacturing and software development? I was just interested to read about FTT because I've been thinking for a while now about the effectiveness of software teams ... at an operational level, let's say. I've long considered an effective team as one that is able to sustain throughput (i.e. the number of cards released to production that deliver value) while fixing defects immediately and repaying technical debt to keep the amount of rework small.

I consider technical debt and defects to be rework, and technical debt to be a natural byproduct of software development. It stems from earlier decisions, based on what we knew at the time, and requires attention later when the system has outgrown the outcomes of those decisions. It is necessary rework that keeps the emerging design relevant and the software healthy and habitable, reducing risks and medium to long-term costs. Defects are basically mistakes. They happen. How we create software determines whether we have a small and manageable amount of rework or a crippling amount of rework. If we're responsible, skilled and bake quality into code we can minimize rework to technical debt and occasional defects. If we're irresponsible and cut corners, or we're rubbish and write crap code, then rework can become so large that the only viable option is to cancel or start again.

Technical debt requires careful management and continuous investment while defects should be fixed as soon as they are found. A proportion of a team's capacity is therefore always expended doing an amount of rework. That's a good thing providing:

  • the completed rework is small compared to the throughput so that capacity mostly focuses on value demand, and
  • the completed rework is enough to keep the remaining rework small compared to the throughput, thus minimizing further failure demand.

(Throughput excludes repaid technical debt and fixed defects that went live).

On a weekly basis then, the throughput in relation to the remaining technical debt and defects might be a useful measure of a team's effectiveness.
Effectiveness = ( Throughput - Rework ) / Throughput

where

Throughput = Number of cards released to production that deliver value
Rework = Number of technical debt and defect cards in inventory and work-in-process
I’ve pushed various teams’ data through and the charts seem to correlate with the events described in my historical notes. Here's a chart based on a small, experienced team working on a small project for 3 months.


Effectiveness
Originally uploaded by energizr


You can see there wasn't any throughput in the first 4 weeks as completed cards queued up in inventory. In week 5 that inventory was flushed to became throughput as the first cut was released. Effectiveness then varied with the weekly releases until week 10, which saw the team 100% effective with no rework cards in inventory or work-in-process. In week 12, however, effectiveness dropped to -33% because 1 technical debt card was work-in-process and 3 fixed defects were queued in inventory while only 3 cards were released.

Although it's perhaps a simplistic indicator do you think it's useful as a measure for effectiveness (i.e. a team's ability to deliver value and stay healthy)? Or is it utter tosh? Can it be refined (without complicating it)?

Links to this post 

10 Comments

Tuesday, February 23, 2010

There's more to done than the green dot

So you're working on a user story with your pair, developing vertical slices and getting feedback from the tester and customer as you progress. You're ticking off the acceptance criteria as they're satisfied by the emerging functionality. Awesome! Everything is tickety-boo. Then the customer realizes that something is missing and asks for that something to be incorporated into the card. What do you do?

You could just refuse and ask him to write a new story and prioritize it accordingly. You could say yes, have a discussion, write the new acceptance criteria on the back of the card and carry on. Some people say no because the additional work will exceed the original estimate for the story. Some people say no because they won't be able to finish the story with the new criteria by the showcase and the card will slop.

I used to say slop was bad. I stopped saying that some time ago when I started to focus on limiting work-in-process.

I can't say what the right thing to do is because situations are different. I do say that the discussion with the customer must happen so that everyone involved understands, quite simply, whether the story will make more sense for users and provide them with greater value if the additional something is included. Perhaps the customer is pushing his luck. Or maybe he's got a point.

I always say stories are an invitation to a conversation. In the last year we've started to frame these conversations in the context of users because we've been using iteration to explore interaction designs and improve user experiences. Given the users' perspective, I came to realize that stories are also a journey taken with the customer to explore options and learn more about users. It's easy to write acceptance criteria but it's difficult to express what user experience will really work until we see a few different ones (and ideally validate them with real users). As a result, I am seeing more conversations where the customer or designers want to add something to a story when it's in play. I believe this to be a good thing providing the discussion happens and everyone agrees that the resultant delivery will be better for users.

The goal is about users and satisfying their needs, delighting them if possible with every story delivered. There's context, a bigger picture, a system and that involves how the users interact with it. It's not about velocity, estimates or slop. And it's not about ticking off the acceptance criteria and getting the green dot. That's all just process.

Links to this post 

0 Comments

Wednesday, February 10, 2010

Without accountability there can be no solidarity

Over the past two years I've been seeing teams fail because people are not holding one another accountable. People tell me they are scared of being perceived to blame and so instead they say nothing. I asked some people why they don't hold people accountable. They responded with things like: "I'm really uncomfortable doing that." Or "I'm not good at saying that kind of stuff. I'm just a developer." And I empathize. I really do. I'm uncomfortable holding people accountable too. I'm guessing everyone probably is to some degree. And by the way, I possess those developer genes. That said, I still think these responses are phooey! Being able to communicate is a basic human skill. We all do it, admittedly some better than others, but just because something is difficult doesn't mean we should stop doing it. How will we learn if we don't practice?

Saying nothing rather than speaking up is the worst thing we could do. I see two reasons why. First, everyone has missed a great opportunity to learn something. If something goes wrong, and it does - often - then those accountable are expected to discuss their part in the events, because their knowledge is needed to improve the way we work. And second, restraint leads to pent-up frustration, even anger. Over time, perhaps bickering starts and fissures appear in the team. People start talking about others behind their backs, which really is blaming, and eventually what we've held back for so long probably comes blurting out in a damaging way.

So what's really stopping people holding others accountable? Is it just a misunderstanding of the difference between blame and accountability? This is something I'm struggling with.

To be accountable means to accept responsibility and be answerable for any actions or decisions. However, to be blamed is to be assigned responsibility for a fault in a way that deserves censure. But this still doesn't make it clear for me. I like to think the difference between accountability and blame is in the intent. Think of accountability as a handshake between people whereas blame goes in one direction.

The intention of holding people accountable is to understand, with them, the nature of the failure, its context and how it came to be. Those people questioning the actions value the participation of those responsible for the actions because they have useful information. And together they achieve clarity to explore solutions so that everyone may work to prevent similar failures in the future. The sole intent of blaming people is to identify the culprits and impose punishment. There is unwillingness to engage in a collaborative and objective analysis of the events. Instead judgment has already been passed based on a personal interpretation of events.

I think fearing accountability and staying silent perpetuates the very thing people are seeking to avoid - blame. Holding people accountable is not optional. We need to take it easy and be gentle but we must start holding people accountable. We shouldn't overreact to peoples' reactions. They may feel like they are being blamed based on their past experiences, so we must work extra hard to communicate our intentions as positive and constructive framed within the context of learning. And we must keep at it. Eventually the blame-free culture of accountability we thought we had will emerge for real in a healthier team with a new found honesty and integrity.

PS. I'd be really interested to hear your thoughts on this subject and any stories you have to tell. I still seek to understand this notion of accountability better.

Links to this post 

8 Comments

Thursday, February 04, 2010

Sitemap on the wall

Full size sitemap on the wall

Labels: ,

Links to this post 

0 Comments

Tuesday, February 02, 2010

Petition against recurring Government IT incompetence

Isn't it about time we started calling the civil service and the Government to account for the repeated failures and wasted money in Public IT projects?

Don't delay! Sign the petition to the PM.

Links to this post 

2 Comments

Monday, February 01, 2010

Integration Testing: The Story Continues

Over the past few months I've been reading the 'Integration Tests Are A Scam' serious of articles by J.B. Rainsberger and following some of the responses to it such as this one by Steve Freeman. I put in my 2 cents a few days ago which I've reproduced here:
Interesting series of articles & comments. I also read Steve Freeman’s article in response to the same topic. It’s got me thinking about how we work and I thought I’d take the time to describe it here.

You define an integration test as “… any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.” We have many such components that exhibit such non-trivial behaviour in the products we create, many of which are not developed by us. And we have integration tests to verify they work. I’m not just talking about 3rd party libraries and frameworks here, I’m referring to the whole system: caching layers. load balancers, DNS servers, CDNs, virtualization etc. When we build software it only becomes a product or service for our users when it has been deployed into a suitable environment; an environment that typically contains more than just the software we have written and packaged. Since our users’ experience and perception of quality result from their interaction with a deployed instance of the whole system, not just their interaction with the software at a unit level, we have come to value end-to-end integration testing. I believe there’s merit in testing these components in symphony and will attempt to clarify what kind of integration testing I’m talking about.

For a given piece of functionality we write an executable acceptance test in human readable form (for web projects we typically use some domain-specific extensions to selenium, for services we have used FIT and it’s ilk, sometimes we roll our own if there’s nothing expressive enough available). We run it against a deployed version of the application (usually local though not always) which typically has a running web/application server and database. The test fails. We determine what endpoint needs to be created/enhanced and then we switch context down into unit-test land. A typical scenario would involve enhancing a unit test for the url mappings, adding one for the controller, then one for any additional service, domain object etc. When we’re happy and have tested and designed each of the required units we jump back up a level and get our acceptance test to progress further. The customer steers the development effort as he sees vertical ‘slices’ of functionality emerge. The acceptance test is added to a suite for that functional area. The continuous build system will then execute that test against a fully deployed (but scaled down) replica of the production environment, with hardware load balancer, vlans, multiple nodes (session affinity) and so forth. Any additional environmental monitoring (e.g. nagios alerting) is also done as part of this development effort and is deployed into the test environment along with the updated code.

Setting up the infrastructure to do this kind of testing takes investment, both initial and ongoing. The continuous build needs to be highly ‘parallelized’ so you get feedback from a checkin in 10 mins or less (we’re heavy users of virtualization, usually VMWare or OpenVZ). The individual acceptance test suites need to be kept small enough to run quickly before check-in.

Benefits of this approach

  • The continuous context-switch between acceptance test and unit test is key to our staying focused on delivering what the customer actually wants.

  • The customer has multiple feedback points that he can learn from and use to steer the development effort.

  • It confirms that the whole system works together – networking, DNS, load balancing, automated deployment, session handling, database replication etc.

  • We create additional ‘non-functional’ acceptance tests that automatically exercise other aspects of the system such as fail-over and recovery.

  • Upgrades to parts of the system (switches, load balancers, web caches, library versions, database server versions etc.) can be tested in a known and controlled way.

We’ve caught a number of integration-related issues using this approach (a few examples: broken database failover due to missing primary keys, captcha validation not working due to a web cache not behaving correctly, data not persisting because one database server had the wrong locale) and stopped them before they have reached our users. We have used the feedback as a basis for improving our products and their delivery at a system level.

OK this reply has now become far too long :-/ It would of course be good to discuss this in person sometime :)

J.B.'s taken the time out to respond and it seems that there's a lot of common ground. Maybe there's a language problem here in developer land? Do we need some clear common definitions in this area?

Labels: , ,

Links to this post 

0 Comments