Testing Too Late: A Case Study
Today's article dives into a real-world example of testing too late, the challenges, and solutions.
Last week, I discussed the problems of testing too late and other software testing anti-patterns. This week, I’m discussing some of these challenges by sharing a real-world example from my experience. At one point in my career, a QA team I managed fell into a cycle of testing roughly eight weeks behind the development team. What began as a minor two-week lag soon escalated, turning us into an unintended bottleneck.
If you missed last week’s article, check it out for more on testing too late and other software testing anti-patterns.
What Happened?
When we rolled out Jira, the VP of Engineering wanted to use sprints to track only development work. He wanted pretty burn-down charts showing that his part of the organization was delivering on time.
To accommodate this, we created our own QA sprints in Jira. Once the dev team finished a two-week sprint, they handed over a release package to us. We’d run our own two-week sprint, and then the code would be released to production. Since we had a more or less monthly deployment schedule before this, it all seemed feasible.
Readers, I’m sure some of you already see the problems with this.
The Problem(s)
Initially, it worked okay. But as time went on, we fell further and further behind. This was due to several factors, but the main contributors were releases that needed more time due to the nature of the changes and waiting on bug fixes.
While development and testing efforts are often correlated, they aren’t always. What can be coded in two weeks may not be testable within two weeks.
Once the code was handed off, QA often had to wait for bug fixes, which weren’t planned for in their current development sprint. When these delays happened, the QA team experienced unexpected downtime.
Stopping their current work to fix bugs from several releases back resulted in a lot of context-switching for developers.
As the gap widened, more and more time passed since a developer had initially worked on the code in question. Each bug required more time to fix, as the code was no longer fresh in the developer’s mind. Remember, the earlier you can identify and fix a bug, the less costly it is.1
And because we (QA) had to understand the future features coming to us, we were often interrupted to participate in design reviews for what the dev team was currently working on, even if it would be over a month before we’d test it. This meant a lot of context-switching for QA, too.
Their definition of done lacked testing! What good is a burn-down chart showing Jira stories as done if no one has verified the code actually works? Or if that code isn’t in production for another eight weeks?
For each two-week sprint, the dev team created a release branch and a deployable release package. So, when we were eight weeks behind, this was equal to approximately four releases. If developers were working on version N, QA was testing version N-4. When a bug needed to be fixed, the developers had to merge the bug fix to each release branch from N-4 to N. And for a production hotfix, yet another merge was required.
When product managers worked with engineering to plan what would be worked on next, those features didn’t make it to production for 8-10 weeks, even if the development work could be done within a two-week sprint.
The Solution(s)
The dev team manager and I wanted to reduce redundancy and content-switching inefficiencies and get code to production faster. We also wanted to align these teams with the mobile release schedule, shipping every two weeks with dev and QA work happening within the same sprint.
Here are some of the process changes we made:
We created a parent sprint in Jira for each release, with dev and QA sprints as child sprints. This allowed the VP of engineering to have his dev-specific charts. (Note: It was not worth the overhead, and we ultimately eliminated it in favor of a single sprint with dev and QA subtasks.)
We started adding QA story points to each story and doing the same capacity planning as the dev side for each sprint.
Acceptance criteria were added to each story.
The definition of done was changed to include testing. If it wasn’t tested, it wasn’t done. This meant that a story might not make it into the release package for the sprint if testing wasn’t completed. When that happened, it would be re-estimated and included in a future sprint.
All scrum rituals included developers and QA, not just developers. We eliminated the QA-only stand-ups and retros and merged them with the dev team.
We felt these changes were necessary to get the entire team (developers and QA engineers) working together in the same sprint, shipping a release every two weeks. We had buy-in from the CTO to make the changes. However, catching up was hard when we were four releases behind. For this to work, we needed to ship the latest release the developers had worked on and then start fresh in a new sprint with the new process in place.
To catch up, an entire sprint was allocated primarily to design and research on the dev side. Developers were also expected to deliver prompt bug fixes; we even asked them to pitch in with testing.
It wasn’t easy, but in the end, we were successful!
Summary
As you can see, many process issues contributed to testing too late.
Ultimately, the most prominent issues were poor collaboration and communication between developers and QA due to separate sprints, charts, and scrum rituals, as well as many inefficiencies due to context-switching and the time between committing code and it being in production.
Here are some indicators we later used to measure and track improvements as we iterated on our processes:
Change lead time2: Before, our change lead time was very high (eight weeks). After, we brought this down to < two weeks as most code was deployed at the end of the sprint.
Deployment frequency3: This was lower than desired (often monthly or less) before. Afterward, we consistently hit our goal of deploying every two weeks, with a few planned exceptions.
Failed deployment recovery time4: While we didn’t track this before we made these changes, anecdotally, we know it was better after. It was much easier to identify what caused a failed deployment when deploying the most recent release package rather than being behind by many versions.
Sprint Capacity: After making these changes and giving it time to become routine, we saw that we had a higher capacity to take on work in each sprint, both for developers and QA. This is likely due to less context-switching and lower effort to fix bugs when found closer to commit time.
Change fail percentage5: This is another that we didn’t initially track. However, we later started tracking this and using this as one measurement of quality. We had a large dashboard in the office with a “Days Since” counter, which showed the number of days since a major production issue/bug occurred. Our goal was to keep this number high.
Conversation Starters:
Have you ever been part of a team where testing was significantly delayed? How did it impact your project, and what solutions did you implement to address the issue?
How does your team handle bug fixes from previous releases while managing current sprints? What techniques have you used to minimize context-switching?
How does your team define "done" for a story or task? Do you include testing in your definition, and if so, how has it impacted your workflow?
Like this article? Click the ♥️ button or leave a comment 💬.
xo,
Brie
PS. If you’d like to support my writing and my work on QUALITY BOSS, you can show appreciation by leaving me a tip through Ko-fi.
Coaching/Mentoring: Interested in working with me? Book a call here to get started.
My areas of expertise and interest are leadership development, conquering impostor syndrome, values exploration, goal setting, and creating habits & systems. And, of course, Quality Engineering. 🐞
One of four key DORA Metrics.
Another DORA Metric.
And another DORA Metric.
The final DORA Metric.