Bug Triage

Sun, December 6, 2009, 07:56 PM under SoftwareProcess
In this blog post brain dump, I'll attempt to describe the process my team tries to follow when dealing with new bug reports (specifically, code defect reports). This is not official Microsoft policy, just the way we do things… if you do things differently and want to share, you can do so at the bottom in the comments (or on your blog).

Feature Triage Team
A subset of the feature crew, the triage team (which has representations from the PM, Dev and QA disciplines), looks at all unassigned bugs at regular intervals. This can be weekly or daily (or other frequency) dependent on which part of the product cycle we are in and what the untriaged bug load looks like. They discuss each bug considering the evidence and make a decision of whether the bug goes from Not Yet Assigned to Assigned (plus the name of the DEV to fix this) or whether it goes from Active to Resolved (which means it gets assigned back to the requestor for closure or further debate if they were not present at the triage meeting). Close to critical milestones, the feature triage team needs to further justify bugs they take to additional higher-level triage teams.

Bug Opened = Not Yet Assigned
Someone (typically an SDET from the QA team) creates the bug item (e.g. in TFS), ensuring they populate all the relevant fields including: Title, Description, Repro Steps (including the Actual Result at the end of the steps), attachments of code and/or screenshots, Build number that they observed the issue in, regression details if applicable, how it was found, if a test case exists or needs to be created etc. They also indicate their opinion on the Priority and Severity. The bug status is left as Not Yet Assigned.

"Issue" versus "Fix for issue"
The solution to some bugs is easy to determine, e.g. "bug: the column name is misspelled". Obviously the fix is to correct the spelling – still, the triage team should be explicit and enter the correct spelling in the bug's Description. Note that a bad bug name here would be "bug: fix the spelling of the column" (it describes the solution, rather than the problem).

Other solutions are trickier to establish, e.g. "bug: the column header is not accessible (can only be clicked on with the mouse, not reached via keyboard)". What is the correct solution here? The last thing to do is leave this undetermined and just assign it to a developer. The solution has to be entered in the description. Behind this type of a bug usually hides a spec defect or a new feature request.

The person opening the bug should focus on describing the issue, rather than the solution. The person indicates what the fix is in their opinion by stating the Expected Result (immediately after stating the Actual Result). If they have a complex suggested solution, that should be split out in a separate part, but the triage team has the final say before assigning it. If the solution is lengthy/complicated to describe, the bug can be assigned to the PM. Note: the strict interpretation suggests that any bug with no clear, obvious solution is always a hole in the spec and should always go to the PM. This also ensures the spec gets updated.

Not Yet Assigned -> Not Yet Assigned (on someone else's plate)
If the bug is observed in our feature, but the cause is actually another team, we change the Area Path (which is the way we identify teams in TFS) and leave it as Not Yet Assigned. The triage team may add more comments as appropriate including potentially changing the repro steps. In some cases, we may even resolve the bug in our area path and open a new bug in the area path of the other team.

Even though there is no action on a dev on the team, the bug still needs to be tracked. One way of doing this is to implement some notification system that informs the team when the tracked bug changed status; another way is to occasionally run a global query (against all area paths) for bugs that have been opened by a member of the team and follow up with the current owners for stale bugs.

Not Yet Assigned -> Resolved
This state transition can only be made by the Feature Triage Team.

0. Sometimes the bug description is not clear and in that case it gets Resolved as More Information Needed, so the original requestor can provide it.
After understanding what the bug item is about, the first decision is to determine whether it needs to go to a dev.

1. If it is a known bug, it gets resolved as "Duplicate" and linked to the existing bug.

2. If it is "By Design" it gets resolved as such, indicating that the triage team does not think this is a bug.

3. If the bug does not repro on latest bits, it is resolved as "No Repro"

4. The most painful: If it is decided that we cannot fix it for this release it gets resolved as "Postponed" or "Won't Fix". The former is typically due to resources and time constraints, while the latter is due to deciding that it is not important enough to consume our resources in any release (yes, not all bugs must be fixed!). For both cases, there are other factors that contribute to the decision such as: existence of a reasonable workaround, frequency we expect users to encounter the issue, dependencies on other team to offer a solution, whether it breaks a core scenario, whether it prohibits customer feedback on a major feature, is it a regression from a previous release, impact of the fix on other partner teams (e.g. User Education, User Experience, Localization/Globalization), whether this is the right fix, does the fix impact performance goals, and last but not least, severity of bug (e.g. loss of customer data, security threat, crash, hang). The bar for fixing a bug goes up as the release date approaches. The triage team becomes hardnosed about which bugs to take, while the developers are busy resolving assigned bugs thus everyone drives for Zero Bug Bounce (ZBB). ZBB is when you have 0 active bugs older than 48 hours.

Not Yet Assigned -> Assigned
If the bug is something we decide to fix in this release and the solution is known, then it is assigned to a DEV. This is either the developer that will do the work, or a Lead that can further assign it to one of his developer team based on a load balancing algorithm of their choosing.

Sometimes, the triage team needs the dev to do some investigation work before deciding whether to take the fix; similarly, the checkin for the fix may be gated on code review by the triage team. In these cases, these instructions are provided in the comments section of the bug and when the developer is done they notify the triage team for final decision.

Additionally, a Priority and Severity (from 0 to 4) has to be entered, e.g. a P0 means "drop anything you are doing and fix this now" whereas a P4 is something you get to after all P0,1,2,3 bugs are fixed.

From a testing perspective, if the bug was found through ad-hoc testing or an external team, the decision is made whether test cases should be added to avoid future regressions. This is communicated to the QA team.

Assigned -> Resolved
When the developer receives the bug (they should be checking daily for new bugs on their plate looking at bugs in order of priority and from older to newer) they can send it back to triage if the information is not clear. Otherwise, they investigate the bug, setting the Sub Status to "Investigating"; if they cannot make progress, they set the Sub Status to "Blocked" and discuss this with triage or whoever else can help them get unblocked. Once they are unblocked, they set the Sub Status to "Working on Solution"; once they are code complete they send a code review request, setting the Sub Status to "Fix Available". After the iterative code review process is over and everyone is happy with the fix, the developer checks it in and changes the state of the bug from Active (and Assigned to them) to Resolved (and Assigned to someone else).

The developer needs to ensure that when the status is changed to Resolved that it is assigned to a QA person. For example, maybe the PM opened the bug, but it should be a QA person that will verify the fix - the developer needs to manually change the assignee in that case. Typically the QA person will send an email to the original requestor notifying them that the fix is verified.

Resolved -> ??
In all cases above, note that the final state was Resolved. What happens after that? The final step should be Closed. The bug is closed once the QA person verifying the fix is happy with it. If the person is not happy, then they change the state from Resolved to Active, thus sending it back to the developer. If the developer and QA person cannot reach agreement, then triage can be brought into it. An easy way to do that is change the status back to Not Yet Assigned with appropriate comments so the triage team can re-review.

It is important to note that only QA can close a bug. That means that if the opener of the bug was a PM, when the bug gets resolved by the dev it may land on the PM's plate and after a quick review, the PM would re-assign to an SDET, which is the only role that can close bugs. One exception to this is if the person that filed the bug is external: in that case, we leave it Resolved and assigned to them and also send them a notification that they need to verify the fix. Another exception is if specialized developer knowledge is needed for verifying the bug fix (e.g. it was a refactoring suggestion bug typically not observable by the user) in which case it is fine to have a developer verify the fix, and ideally a different developer to the one that opened the bug.

Other links on bug triage
A quick search reveals that others have talked about this subject, e.g. here, here, here, here and here.

Your take?
If you have other best practices your team uses to deal with incoming bug reports, feel free to share in the comments below or on your blog.