INTRODUCTION
Quite often, test managers are expected to answer such questions as:
-
Why does testing take so long?
-
Why has the test process not been completed yet?
-
How many defects can I still expect during production?
-
How many re-tests are still required?
-
When can testing be stopped?
-
When will the test team start the execution of the test?
-
Tell me what exactly you are up to!
-
What is the quality of the system that you have tested?
-
When can I start production?
-
How can it be that the previous test project was much faster?
-
What did you actually test?
-
How many defects have been found and what is their status?
Answering these types of questions with well-founded, factually based answers is not easy. Most questions can be
answered with reference to the periodic reports as described in Report (AST). Such reports can
only be created on the basis of correctly recorded relevant data, which is converted into information and then used to
answer the above-mentioned questions.
Metrics on the quality of the test object and the progress of the test process are of great importance to the test
process. They are used to manage the test process, to substantiate test advice and also to compare systems or test
processes with each other. Metrics are important for improving the test process, in assessing the consequences of
particular improvement measures by comparing data before and after the measures were adopted.
To summarise, a test manager should record a number of items in order to be able to pass well-founded judgement on the
quality of the object under test as well as on the quality of the test process itself. The following sections describe
a structured approach for arriving at a set of test metrics.
GQM METHOD IN SIX STEPS
There are various ways of arriving at a particular set of metrics. The most common is the Goal-Question-Metric ( GQM)
method [Basili, 1994]). This is a top-down method in which one or more goals are formulated. For example: what
information should I collect in order to answer those questions posed in the introduction? These goals include
questions that constitute the basis for the metrics. The collected metrics should provide the answers to those
questions, and the answers will indicate among other things whether the goal has been achieved or not. The summary of
the GQM method described below focuses in particular on the test aspect. The GQM process is described in six steps.
This is a concise description that includes only those items that are relevant to the test manager. For a more detailed
description, please refer to the aforementioned GQM literature.
Step 1: Defining the goals
Measuring purely for the sake of measuring is pointless. Clear and realistic goals should be set beforehand. We
distinguish two types of goals:
-
Knowledge goals (knowing where we are now). These goals are expressed in words such as evaluate, predict, or
monitor. For example, “Evaluate how many hours are actually spent on re-testing” or “Monitor the test coverage”.
The goal here is to gain insight.
-
Improvement goals (where do we want to go). These goals are expressed in words such as increase, decrease, improve,
or achieve. Setting such goals suggests that we know there are shortcomings in the present test process or the
present environment and that we want to improve these.
An example of such an improvement goal is obtaining a 20% saving on the number of testing hours at a constant test
coverage within a period of 18 months. In order to ascertain this, the following two knowledge goals should be aimed
at:
-
Insight into the total number of testing hours per project.
-
Insight into the achieved test coverage per project.
It is important to investigate whether the goals and the (test) maturity of the organisation match. It is pointless to
aim at achieving a certain test coverage if the necessary resources (knowledge, time and tools) are not
available.
Example - Knowing where we are now - Goal: Provide insight into the quality of the test
object.
Step 2: Asking questions per goal
For each goal, several questions have to be asked. The questions are formulated in such a way that they act as a
specification of a metric. It can also be asked, for each question, who is responsible for the test metrics supplied.
From the above goal, various questions can be derived. We will limit the number of questions in this example to
three.
Example:

Step 3: From questions to metrics
The relevant metrics are derived from the questions, and form the full set of metrics, gathered during the test
process.
Example:
By asking the right questions, we arrive at the correct set of metrics for a certain goal. It is important to define
and specify each metric correctly. For example, what exactly is a defect?
Step 4: Data collection and analysis
During the test process a variety of data is collected. One way of keeping things simple is to use forms/templates (if
possible in electronic form). The data should be complete and easy to interpret. In the design of these forms,
attention should be paid to the following points:
-
Which metrics are collected on the same form.
-
Validation: how easy is it to check whether the data is complete and correct.
-
Traceability: forms supplied with the date, project ID, configuration management data, data collector, etc. Take
into consideration that it is sometimes necessary to preserve this data for a long time.
-
Possibility of electronic processing.
As soon as the data is collected, it should be analysed. At this point it is still possible to make corrections.
Waiting too long decreases the chance of restoring the data. Bear in mind possibilities, for example, of booking time
with the incorrect activity code.
Step 5: Presentation and distribution of the measurement data
The collected measurements are used both in the test reports on the quality of the product under test and in those on
the test process. Proper feedback is also of importance to the motivation of those involved and the validation of the
measured data.
Step 6: Relating the measurement data to the questions and goals
This last step is used to investigate to what extent the indicators (answers to the questions) offer sufficient insight
into the matter of whether the goals have been achieved. This situation may be the starting point for a new GQM cycle.
In this way, we are continually improving the test process.
HINTS AND TIPS
When metrics are being collected, the test manager should take the following issues into account:
-
Start with a limited set of metrics and build it up slowly.
-
Keep the metrics simple. The definition should appeal to the intuition of those involved. For example, try to avoid
the use of a variety of formulas. The more complicated the formulas, the more difficult they are to interpret.
-
Choose metrics that are relatively simple to collect and easily accepted. The more difficult it is to collect data,
the greater the chance that it will not be accepted.
-
Collect data electronically as much as possible. This is the quickest way of data collection and avoids the
introduction of manual errors into the data set.
-
Keep an eye on the motivation of the testers to record accurately. In the case of time registration, for example,
it sometimes happens that incorrect (read: not yet fully booked) codes are used.
-
Avoid complicated statistical techniques and models during presentations. Allow the type of presentation to depend
on the data presented (tables, diagrams, pie charts, etc.).
-
Provide feedback to the testers as quickly as possible. Show them what you do with the information.
PRACTICAL STARTING SET OF TEST METRICS
Below is an indication of what test managers embarking on a “metrics programme” should start with. The metrics set
described is a starting set that can be used in practice with little cost and effort.
-
Registration of hours, using activity codes. Register the following for each tester: date, project, TMap phase,
activity and number of hours. A “Comments” field is recommended, making it possible to check whether the data has
been entered correctly. Registering the hours in this way enables you to obtain insight into the time spent on each
TMap phase (see figure 1). It also enables the client to check the progress of the test process. It is advisable to
compile this type of timesheet on a weekly basis for projects that last up to three or four months. For projects
that last longer than half a year, this can be done on a fortnightly basis. For projects that last longer than a
year, it is best to report on a monthly basis.
-
Collect data about the test deliverables (test plans, test scripts, etc.), the test basis and test object. Record
the following: document name, delivery date, TMap phase upon delivery, version and a characteristic that says
something about the quantity. This may be the number of test cases for the test scripts, or the number of pages for
the other documents. For the test basis, the number of user requirements can be taken as a quantity characteristic.
-
Report on the progress of the defects. An example of this type of reporting is shown in figure 2:

Figure 1: Example of hours spent on test process, per TMap phase

Figure 2: Example of a progress overview of defects
These elementary metrics (hours, documents and defects) can be used to assess the productivity of the test process.
Note that this productivity should be seen in relation to the required effort and size of the test project. Example: in
the first ten hours of testing we may find more defects per hour than in 400 hours of further testing, simply because
the first defects are found more quickly than the last ones.
The following metrics regarding productivity can be derived from this elementary set:
-
number of defects per hour (and per hour of test execution)
-
number of test cases carried out per hour
-
number of specified test scripts per hour (and per hour of test specification)
-
number of defects per test script
-
ratio of hours spent over the TMap phases.
If the number of function points or the number of ‘kilo lines of code’ (KLOC) of the object under test is known, the
following numbers can be calculated:
-
number of test hours per function point (or KLOC)
-
number of defects per function point (or KLOC)
-
number of test cases per function point (or KLOC).
For the test basis we can establish the following metrics:
-
number of test hours per page of the test basis
-
number of defects per page of the test basis
-
number of test cases per page of the test basis
-
number of pages of the test basis per function point.
When it is known how many defects occur in production during the first three months, the following metric can be
determined:
-
Defect-detection effectiveness of a test level: number of found defects in a test level divided by the total number
of defects present. This metric is also called the Defect Detection Percentage ( DDP). In calculating the DDP, the
following assumptions are applied:
-
all the defects are included in the calculation
-
the weighed severity of the defects is not included in the calculation
-
after the first three months of the system being in production, barely any defects are present in the
system.
The DDP can be calculated both per test level and overall. The DDP per test level is calculated by dividing the number
of found defects from the relevant test level by the sum of this number of found defects and the number of found
defects from the subsequent test level(s) and/or the first three months of production. The overall DDP is
calculated by dividing the total number of found defects (from all the test levels together) by the sum of this number
of found defects and the found defects from the first three months of production.
Example - DDP calculations
Test level
|
Found defects
|
System test (ST)
|
100
|
Acceptance test (AT)
|
60
|
3 Months of production
|
40
|
DDP ST (after the AT is carried out) : (100 / 100+60 ) = 63%
DDP ST (after 3 months of production) : (100 / 100+60+40 ) = 50%
DDP AT (after 3 months of production) : (60 / 60+40 ) = 60%
DDP overall : (100+60 / 100+60+40 ) = 80%
Some causes of a high or low DDP may be:
-
High DDP
-
the tests have been carried out very accurately
-
the system has not yet been used much
-
the subsequent test level was not carried out accurately.
-
Low DDP
-
the tests have not been carried out accurately
-
the test basis was not right, consequently nor were the tests derived from it
-
the quality of the test object was wrong (containing too many defects to be found during the time
available)
-
the testing time has been shortened.
By recording the above-mentioned metrics, supplemented here and there with particular items, we arrive at the following
list of metrics.
METRICS LIST
In the following (non-exhaustive) list of metrics, a number of commonly used metrics are mentioned, which can be used
as indicators for pronouncing on the quality of the object under test or for measuring the quality of the test process
and comparing against the organisation’s standard. All the indicators can of course also be used in the report to the
client:
-
Number of defects found - The ratio between the number of defects found and the size of the system per unit of
testing time.
-
Executed instructions - Ratio between the number of tested program instructions and the total number of program
instructions. Tools that can produce such metrics are available.
-
Number of tests - Ratio between the number of tests and the size of the system (for example expressed in function
points). This indicates how many tests are necessary in order to test a part.
-
Number of tested paths - Ratio between the tested and the total number of logical paths present.
-
Number of defects during production - This gives an indication of the number of defects not found during the test
process.
-
Defect detection effectiveness - The total number of defects found during testing, divided by the total number of
defects – estimated partly on the basis of production data.
-
Test costs - Ratio between the test costs and the total development costs. A prior definition of the various costs
is essential.
-
Cost per detected defect - Total test cost divided by the number of defects found.
-
Budget utilisation - Ratio between the budget and the actual cost of testing.
-
Test efficiency - The number of required tests versus the number of defects found.
-
Degree of automation of testing - Ratio between the number of tests carried out manually and the number of
tests carried out automatically.
-
Number of defects found (relative) - The ratio between the number of defects found and the size of the system (in
function points or KLOC) per unit of testing time.
-
Defects as a result of modifications that are not tested - Defects because of modifications that are not tested, as
a part of the total number of defects arising as a result of changes.
-
Defects after tested modifications - Defects because of modifications that are tested, as a part of the total
number of defects arising as a result of changes.
-
Savings of the test - Indicates how much has been saved by carrying out the test. In other words, what would the
losses have amounted to if the test had not been carried out?
|