|
Server : Apache/2.4.62 System : FreeBSD fbsdweb2.web.rcn.net 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64 User : www ( 80) PHP Version : 8.3.8 Disable Function : NONE Directory : /domains/srakitin/OLD/newsletter/vol2/no5/ |
Upload File : |
Food for Thought: Getting to the Root of Customer Reported Problems
An e-newsletter published by Software Quality Consulting, Inc.
May 2005, Vol. 2 No. 5
To view a web version of this newsletter, click on the following link:
http://www.swqual.com/newsletter/vol2/no5/vol2no5.html.
--------------------------------------------------------------------------------
Welcome to Food for Thought(TM), an e-newsletter from Software Quality
Consulting (http://www.swqual.com/?Intro). I've created free subscriptions for
my valued business contacts. If you find this newsletter informative, I
encourage you to continue reading. Feel free to pass this newsletter along to
colleagues by clicking this Forward Email link
(http://ui.constantcontact.com/roving/sa/fp.jsp?plat=i&p=f&m=sctz69n6). If
you�ve received this newsletter from a colleague and would like to subscribe,
please click this Enter New Subscription link
(http://www.swqual.com/newsletter/Subscribe.htm). If you don't wish to receive
this newsletter, click the SafeUnSubscribe(TM) link at the bottom of this
newsletter, and you won�t be bothered again.
Your continued feedback on this newsletter is most welcome. Please send
your comments and suggestions to [email protected].
--------------------------------------------------------------------------------
In This Months� Topic, I discuss using Root Cause Analysis to find the
real cause of Customer Reported Problems...
Regular features to look for each month are:
- Monthly Morsels
Hints, tips, techniques and reference info related to this month�s topic
- Calendar
Conferences, workshops, and meetings of interest to software engineers,
QA engineers and anyone interested in software development
--------------------------------------------------------------------------------
***This Month�s Topic***
GETTING TO THE ROOT OF CUSTOMER REPORTED PROBLEMS
Of all the kinds of problems that software development organizations face,
Customer Reported Problems (CRPs) are clearly the most important. This is
because CRPs represent potential gaps in your knowledge of how your
customers use your software. CRPs may be the result of deficiencies in
your product marketing, software development, test, or fulfillment
processes. CRPs can often result in unplanned releases that are both
disruptive and expensive.
When the underlying cause of CRPs are not fully understood, they can
result in poor solutions that often create more problems than they solve.
Nothing frustrates customers more than a supplier who is unable to resolve
problems quickly and with correctly.
MOTIVATION
By now, we should all know that the sooner a problem is found the easier
and less costly it is to fix. Barry Boehm [1] demonstrated this almost 25
years ago. Current data [2] suggest that even the most experienced
developers inject one defect for every 10 lines of code they write. While
effective testing can find up to 95% of these defects prior to release,
that still leaves quite a few defects for customers to find.
Finding critical defects in your software is very disruptive not only for
your customers but for your software development organization as well.
Unplanned releases to fix CRPs divert expensive development resources from
tasks that generate revenue (new features, new products, etc.) to tasks
that don�t generate revenue (bug fixes). Unplanned releases are clearly
not good for your bottom line.
CRPs represent more than just defects. CRPs should be broadly defined to
include any failure of software and services (including code,
documentation, installation, customization, fulfillment, training, etc.)
that negatively impacts customers.
ROOT CAUSE ANALYSIS
Working in safety-critical industries has allowed me to become familiar
with several tools not routinely used in the commercial software
development industry. One such tool is called Root Cause Analysis (RCA).
This tool is commonly used within a Six-Sigma framework.
I�ve adapted the traditional RCA Process to make it work effectively
within typical software development organizations. RCA helps people
understand WHAT, WHY, and HOW an event (a CRP) occurred.
Overview
RCA is routinely used to investigate the cause of major disasters
including:
- Airline crashes
- Space Shuttle accidents
- Chemical and nuclear plant disasters
RCA helps us:
- understand causes of customer dissatisfaction
- understand the what, the why, and the how...
- reduce rework by preventing recurrence
- identify process weaknesses
- improve customer satisfaction
In applying RCA to a typical software development organization, we need to
keep in mind the fact that finding the root cause of a CRP may be
difficult because:
- We often have an incomplete problem definition
- Causal relationships are unknown
- We tend to focus on finding quick solutions and assigning blame
Let�s now look at terms specific to the RCA process.
TERMINOLOGY
The RCA Process uses the following terms:
EventAny - failure of software and services (including code,
documentation, installation, customization, training, fulfillment,
etc.) that impacts customers. A CRP is an example of an event.
Causal Factors - Factors that contribute to occurrence of an event.
Causal Relationships - Cause and effect sequence in which a specific
action creates a condition that contributes to or results in an
event.
Corrective Action - Specific actions taken to eliminate root cause of
a CRP. There are two kinds of Corrective Actions (CAs):
- Immediate CA is taken soon after CRP is reported to help customer
recover (examples: workaround, hot fix, etc.)
- Long Term CA taken to prevent recurrence. Long Term CA results in
changes to process and procedures
Root Cause - Cause that, if corrected, prevents recurrence of this and
similar CRPs.
Attributes of root causes:
- Represent specific underlying causes of events...
- Can be reasonably identified...
- Can be fixed by Management...
- Lead to effective corrective actions...
Let�s look a bit closer at the attributes of root causes:
- Root Causes represent specific underlying causes ofCRPs
- The goal of RCA is to find specific underlying causes
- The more specific the investigation is about why a CRP occurred, the
easier it will be to arrive at CAs that will prevent recurrence
- Root Causes can be reasonably identified...
- The RCA investigation must be cost-effective
- A good RCA Process helps keep ROI high
- Root Causes can be fixed by Management...
- Management needs to know exactly why a CRP occurred before effective
CA can be taken to prevent recurrence
- Vague root causes such as �user error�, �software failure�, or
�external factors� are not helpful because Management can�t do much
about them
- Root Causes lead to effective Corrective Actions...
- Corrective actions are directly related to the identified root causes
- Vague corrective actions mean specific root cause was not found
Now that we have some terms defined, let�s look at the Root Cause Analysis
Process.
RCA PROCESS OVERVIEW
The RCA Process consists of investigating, understanding, and categorizing
underlying root causes of observed CRPs. It can be best performed by a
small cross-functional team and can be easily incorporated into your
Defect Triage Process.
The RCA Process includes a detailed a nalysis based on gathering factual
information obtained from:
- Available documents and records
- Interviews with staff and customers
- Brainstorming sessions with staff
And the RCA Process uses simple tools including:
- Why Trees
- Pareto Analysis
An effective RCA Process helps determine appropriate and effective
corrective actions by identifying both an Immediate Corrective Action
(what should be done today to resolve the CRP) and Long Term Corrective
Action (what should be done to prevent recurrence).
In applying the RCA Process, the Triage Team starts with a specific CRP
and asks:
- What is it about way we operate that allowed this CRP to occur?
Most root causes are found in way we operate. That includes:
- Who does what?
- How things get done?
- Why we behave way we do?
The Triage Team asks questions about �Who does what�, �How things get
done�, and �Why we behave the way we do�, in order to identify factual
information that can be helpful in identifying real root causes.
In asking these questions, the Triage Team uses a tool called the Why
Tree. Why Trees are similar to Fault Trees in that the CRP is placed at
the top. We then ask �Why did this happen?� and start drilling down into
�Who does what�, �How things get done�, and �Why we behave the way we do�.
At each level, the team continues to ask �Why� � usually at least five
times (though for simpler problems, less than five Whys may suffice).
The following illustrates a partially completed Why Tree for a simple
problem:
(see the image in the HTML version �
http://www.swqual.com/newsletter/vol2/no5/vol2no5.html.)
Answers to Why questions may need to be determined from documents (like
Functional Specifications, Test Plans, User Manuals, etc.), from records
(like test results, shipping invoices, etc.), from interviews with staff
and customers, and from brainstorming sessions.
The information shown in green circles on the Why Tree example represents
probable root causes. The Triage Team reaches consensus on the most
probable root cause(s). Often, there will be more than one root cause.
Using the Why Tree, the Triage Team develops an Immediate CA (which could
be a workaround, hot fix, patch, new CDs, new doc, etc.). The team also
identifies effectiveness checks that can determine if the Immediate CA,
once implemented, has effectively resolved the CRP.
Once the Immediate CA is implemented and the effectiveness checks are
satisfactory, the Triage Team decides if a Long Term CA is needed. A Long
Term CA would be appropriate if the root cause points to systemic
problems. If so, they begin to develop a Long Term CA. The team does this
by:
- Reviewing existing processes and procedures
- Identifying process weaknesses directly related to root cause
- Identifying potential process and procedure changes
- Identifying long term effectiveness checks
Once the team has competed work on the Long Term CA, it can be presented
to Management and implemented. The team then collects data to determine if
long term effectiveness checks are satisfactory.
Now let�s identify the specific steps needed to perform an effective RCA.
RCA PROCESS STEPS
Step 1 - Data Collection
The majority of time spent analyzing events will be spent gathering data.
Complete information and a thorough understanding of events required to
identify causal factors and real root causes.
- Data collection begins with an accurate statement of what occurred in
the Customer�s words:
- Descriptions of CRPs are sometimes �filtered� by Technical Support. It
is critical that the original problem stated in the Customer�s words
are recorded and reviewed by the team to prevent wasted effort...
- Data collection will initially be sketchy � use the Why Tree to
identify additional data to collect...
- Collect general information about Customer. Some examples:
- Is this Customer a power user or novice?
- Has this Customer received training?
- Is this Customer�s use and/or data unique?
- Collect information about Customer�s environment. Some examples:
- Does this Customer have a standard release or customer-specific
release?
- What are their platform/database/operating system releases?
Have they received hot fixes recently? Are they installed?
Your Technical Support staff should gather this information by using a
checklist of questions to ask when Customers report problems.
Step 2 � Determine What Happened
The Triage Team starts with the CRP in the Customer�s words and asks �Why
did this happen?� As they start to drill down, they create the Why Tree
and continue asking �Why?� until there are no more answers. Usually, you
need to ask �Why?� a minimum of 5 times.
This process will identify additional information to collect. For example:
- Was the requirement defined in the Software Requirements Spec?
- Was the requirement ambiguous?
- Was the requirement tested? If so, how?
- Was the testing effective?
- Was user training provided? Was it effective?
- Are there platform, environment, or configuration issues?
When the team is satisfied that they have answered all the relevant
questions and gathered all relevant information, the team is then ready to
identify potential root causes.
Step 3 - Root Cause Identification
Based on the Why Tree, the Triage Team reviews results and identifies most
probable root causes. The team ensures that most probable root causes meet
the following criteria:
- They represent specific underlying causes of events...
- They can be reasonably identified...
- They can be fixed by Management...
- They can lead to effective corrective actions...
Once the team is satisfied that they have identified the most probable
root cause(s), they document their results.
With this information, the team can then identify an Immediate CA. These
actions can be taken immediately to help resolve the original CRP.
Effectiveness checks are included as part of the Immediate CA.
Step 4 � Long Term Corrective Action
Once an Immediate CA is implemented and determined to be effective, the
Triage Team decides if a Long Term CA is warranted. Usually, root causes
that identify underlying systemic problems are good candidates. Also, once
root causes are identified, they should be added to a list, as illustrated
below:
Example Root Cause List
1 Requirement was defined in SRS but not tested
2 Requirement was tested but test was inadequate
3 Requirement was not defined in SRS
4 Requirement was in SRS but was ambiguous
5 Code was incorrect � Code review not held
6 Code was incorrect � Code review held but didn�t catch it
7 Installation or configuration issues...
8 Version compatibility issues...
9 User training issues...
10 etc...
The Pareto Principle tells us that, in many cases, 80% of all problems
result from only 20% of root causes. Performing a Pareto Analysis based on
the Root Cause List can help determine what areas should be the focus of
Long Term CA in order to keep the ROI high.
The following example illustrates a simple Pareto Analysis of observed
root causes and their associated CRPs.
Example of Simple Pareto Analysis of Observed Root Causes
CRP RC #1 RC #2 RC #3 RC #4
10002 x x
10014 x x x
10045 x
10345 x x x
16778 x
17889
18779 x
19921 x
19992
20001 x
total 2 7 2 2
From this analysis it is clear that addressing Root Cause #2 with a Long
Term CA would have the highest ROI. The Triage Team would identify and
propose a Long Term CA and present recommendations to Management. Included
with this are effectiveness checks. Once implemented, data is collected
and reviewed by the Triage Team to ensure that the systemic issues have
been effectively eliminated.
IN SUMMARY...
Incorporating RCA into your Triage Process can lead to several benefits:
- Increases your ability to discover real root causes
- Helps identify WHAT, WHY, and HOW
- Leads to effective immediate and long term corrective actions
- Improves Customer Satisfaction
- Reduces rework and eliminates unplanned releases
By incorporating Root Cause Analysis into your Triage Process, the
resolution of your CRPs will be more effective and your customers will
certainly be happier.
Till next time...
--------------------------------------------------------------------------------
***Monthly Morsel***
Every month in this space you�ll find additional information related to
this month�s topic.
References:
[1] Boehm, B., Software Engineering Economics, Prentice-Hall, 1981
[2] Humphrey, W., A Discipline for Software Engineering, Addison-Wesley,
1995.
[3] Gano, D., et. al., Apollo Root Cause Analysis � A New Way Of
Thinking, Apollonian Publications, 1999
[4] US Dept of Energy, Root Cause Analysis Guidance Document,
DOE-NE-STD-1004-92, February 1992
A technical guidance document on how to perform traditional root cause
analysis, primarily used for investigating nuclear power plant
accidents. (http://www.eh.doe.gov/techstds/standard/nst1004/nst1004.pdf)
[5] Rooney, J. and Vanden Heuvel, L., "Root Cause Analysis for
Beginners�, ASQ Quality Progress, July 2004, p. 45-53
A good paper to read for an overview of the traditional Root Cause
Analysis process...
(http://www.asq.org/pub/qualityprogress/past/0704/qp0704rooney.pdf)
[6] Fagerhaug, T., and Andersen, B. (ed.), Root Cause Analysis:
Simplified Tools and Techniques, ASQ Quality Press, 1999.
On-line Resources:
Software Forensics Centre at the School of Computing Science,
Middlesex University ( UK)
An interesting site with lots of resources related to software
Failures (http://www.cs.mdx.ac.uk/research/SFC/index.html)
ASQ Quality Press
Search ASQ�s on-line bookstore for books and resources on Root Cause
Analysis (http://qualitypress.asq.org/index.html)
--------------------------------------------------------------------------------
***Calendar***
Every month, you�ll find news here about local and national events that
are of interest to the software community...
Software Quality Calendar
There are many organizations that sponsor monthly meetings, workshops,
and conferences of interest to software professionals. Find out what�s
happening... (http://www.swqual.com/links/upcoming.html)
Workshops Offered by Software Quality Consulting
Software Quality Consulting offers workshops in many topics related to
software process improvement. Get more info...
(http://www.swqual.com/seminars/courses.html)
--------------------------------------------------------------------------------
***About SQC***
Software Quality Consulting provides consulting, training, and auditing
services tailored to meet the specific needs of clients. We help clients
fine-tune their software development processes and improve the quality of
their software products. The overall goal is to help clients achieve
Predictable Software Development(TM) � so that organizations can consistently
deliver quality software with promised features in the promised timeframe.
To learn more about how we can help your organization, visit our web site
(http://www.swqual.com/) or send us an email ([email protected]).
--------------------------------------------------------------------------------
I hope this newsletter has been informative and helpful. Your comments and
feedback are most welcome. Send me your feedback... ([email protected])
Thanks,
Steve Rakitin
[email protected]
Food for Thought and Predictable Software Development are trademarks of Software
Quality Consulting, Inc.
Copyright � 2005. Software Quality Consulting, Inc. All rights reserved. Graphic
design by Sage Studio