My testing blog

On software ... On testing ...On technology ... On software testing technology.

Monday, April 2, 2012

April Fools’ Day

Best April Fools’ Day joke is to make no joke: I started a brand new technical blog, focused on software development. This bold course of action will give a new impetus to this blog as well.

Enjoy!

Saturday, June 12, 2010

Mouse Testing

MouseRevenge[1] I wrote in a previous article how important is for exploratory testing to be able to try ideas quickly and to use the simplest environments, tools and languages when doing testing – especially exploratory testing. I also recommended PowerShell as the testing language for .NET.

In this article I try to introduce an alternative to testing with PowerShell. The alternative consists in running C# tests with only two right clicks of the mouse. In other words, mouse testing!

Ingredient No. 1: CS-Script

scriptInNP[1] CS-Script is a simple but effective scripting engine that transforms C# into a scripting language, with all the associated advantages. Although here is not the place to describe CS-Script, a few details are worth mentioning:

  • Embedded dependencies: CS-Script allows C# files to specify their own dependencies. Good bye project files, welcome incremental development.
  • Windows Explorer integration: CS-Script inserts in the contextual menu of Windows Explorer a special entry with the most common functionality. This makes compiling a script as easy as issuing a right click of the mouse. Moreover, the CS-Script sub-menu is customizable.
  • Conversion to Visual Studio projects: as the project represented by a script and all its dependencies grows, the user can convert it into a Visual Studio project. The original C# code remains unchanged.

The simplest way to see the power of CS-Script is to try it. Install CS-Script and save the following content into triangle.cs:

class Triangle
{
    public static bool Is(int a, int b, int c)
    {
        return a > 0 && b > 0 && c > 0 &&
               a < b + c &&
               b < a + c &&
               c < a + b;
    }
}

Go to Windows Explorer, right click triangle.cs and then select CS-Script | Compile to | DLL (Debug). CS-Script will produce triangle.dll and triangle.pdb.

Ingredient No. 2: NUnit

logo_nunit NUnit is the de-facto standard for unit testing and, besides many qualities, it permits the execution of tests with the right click of the mouse. We shall rely on this feature to do “mouse testing”.

Before proceeding any further, CS-Script must have a grip on the DLLs that the tests employ from NUnit. For our example simply copy from the installation directory of NUnit the .\bin\net-2.0\framework\nunit.framework.dll library and place it into the same directory as triangle.cs.

Writing and executing the tests

writing[1] Create a file named triangle_t.cs into the same directory as triangle.cs with the following content:

//css_import triangle;

using NUnit.Framework;

[TestFixture]
class TriangleTest
{
    [Test]
    public void Fail000()
    {
        Assert.IsFalse(Triangle.Is(0, 0, 0));
    }
}

Right click triangle_t.cs in Windows Explorer then select CS-Script | Compile to | DLL (Debug)) and then right click triangle_t.dll in Windows Explorer and select Run Tests. Windows Explorer will launch NUnit and from there you can execute the Fail000 test.

When two is too much

One-EyedLisa[1] If issuing two right clicks to do testing feels too much of a burden, then the whole stuff (compilation and NUnit launch) can be carried out in just one right click of mouse. This version requires a little bit more work but it’s a one time effort.

Do the following steps:

  1. Within the directory of CS-Script, create a directory named 28.Test with under .\bin\Lib\ShellExtensions\CS-Script (use a higher number if 28 is already used).
  2. Within the 28.Test with directory create a file named 00.NUnit.c.cmd with the following content:

    @echo off

    rem -------------------------------------------------
    rem WARNING! Change the path of NUnit accordingly
    set NUNIT_PATH=%ProgramFiles%\nunit 2.5.5\bin\net-2.0
    rem -------------------------------------------------

    set CSPATH=
    set CSNAME=
    for %%I in (%1) do (
        set CSPATH=%%~pI
        set CSNAME=%%~nI
        )
    set DLLNAME=%CSPATH%%CSNAME%.dll

    @echo Copying NUnit libraries into "%CSPATH%" ...
    copy /y "%NUNIT_PATH%\framework\*.dll" "%CSPATH%"
    copy /y "%NUNIT_PATH%\lib\*.dll" "%CSPATH%"
    if ERRORLEVEL 1 goto Error

    @echo Compiling %1 ...
    cscs /cd %1
    if ERRORLEVEL 1 goto Error

    @echo Launching NUnit with "%DLLNAME%" ...
    "%NUNIT_PATH%\nunit.exe" "%DLLNAME%"

    :Error
    pause

  3. Replace the path to NUnit with the directory where NUnit resides on your computer (see the red text from above).
  4. In Windows Explorer right click triangle_t.cs and then select CS-Script | Test with | NUnit. The underlying script will compile and run the tests under NUnit. OBS: this command copies locally all the DLLs from the NUnit distribution.

Conclusions

Software testing requires the ability to try quickly new ideas as the program under test is being explored by the test engineer. Simple, straightforward tools are essential for such endeavor.

With the aid of CS-Script and NUnit the software test engineer can compile and run .NET tests quickly and efficiently with just a few mouse clicks, without the need of a fully fledged development environment.

Tuesday, April 13, 2010

Model-Based Long Haul Testing

stairs3[1] “Long-haul” is an important testing technique aiming to simulate product usage over an extended period of time. Such kind of testing is necessary because some difficult bugs in software may get caught via long-haul usage only.

Model-based testing is an advanced testing methodology that promises to increase the productivity of the professional tester. I discussed some aspects of modeling and model-based testing in previous articles: here, here, here and here.

Model-based testing implies that the test developer has less control over test scenarios in exchange for increased productivity. This may seem at odds with long-haul testing which requires more control over the test runs.

Can we reconcile the two approaches? This article tries to give an answer.

What exactly is long haul testing?

corbis_rf_photo_of_dog_chasing_tail[1]  Informally speaking, long haul testing is a repetition of select functional scenarios. The goal is to emulate long-term usage – days, weeks or months – without having to wait days, weeks or months. Another goal is to detect special software bugs such as memory leaks that accumulate slowly.

Let us consider Microsoft Word users. People using documents heavily may open dozens of files a day within a single Word session. We may assume that some of these people keep Word open for days or even weeks in a row. This means that Word must be able to open and then close hundreds or even thousands of documents in a single session.

We can test this via automation: we write a test script that opens and then closes a document. Then we run the script 10000 times. If each round takes less than a second then we are done in less than three hours.

Something more complex – like open, edit a little and close? No problem. We write a little test script doing just that (open, edit a little, close), we run it 10000 times and voilĂ , we have long haul testing for open-edit a little-close.

It seems long haul is no big deal: we take a scenario, we run it many times, we check that the system under test is nice and happy. If it is, we’re done. If not, we file a bug.

However, this is not what happens in real life. It is true that people do open and close documents and it is true that they usually edit a little in between. However, it’s not the same kind of editing that takes place. Yet, the dumb repetition of the same scenario over and over again does just that: it repeats. Quite boring.

The question is: can we diversify the enacted scenarios to resemble as much as possible what happens in the actual usage extended over a long period of time?

Yes, we can and this is where modeling comes to play.

Models as simplified behavior

dog_close[1] A model of a software system may be seen as a simplification of the system’s behavior. Therefore, analyzing the behavior of a model means, in fact, analyzing the behavior of the actual system. By consequence, long haul testing may be seen as enforcing those model behaviors that last as long as possible, up to a limit imposed by the tester.

How this “long lasting” behavior gets represented depends heavily upon the model. For state-based models, this means finding long paths within the state graph – cycles are great for such purpose. For models represented the functional way, this means detecting recursions. For models represented as iterative executions, this means detecting loops.

This article deals with state-based models with NModel because this is the framework I’ve been talking about in my previous posts.

Long paths! How long?

When we are talking about long paths within the state graph of a model we’re actually talking about chains of states that do not end or that end as late as possible. It is impossible to process the state graph of a real life model since the number of states is virtually infinite. All we can do it is to try to avoid the end states as much as possible.

Avoiding the end

Mobius-Strip[1] Fortunately enough, NModel lets us know whether a state is final or not. Unfortunately enough, we do not have any means to “think reversely”, i.e. to go backwards from a final state towards the states that are leading to it. In the absence of such “reverse thinking” we have to resort to probabilities.

Theoretically we can assign to any state S a probability of the event consisting in “the automaton reaches a final state when commencing from S”. However, computing this probability is virtually impossible. We have to approximate it - we’ll see below how.

Given that it’s hard to asses the states themselves since they come in fabulous numbers, we have to replace them with state transitions – whose number is finite. We used this technique with frequency-based testing.

Putting in buckets

manbucketPA_450x726[1]Assuming we are talking about actions from this point on, we can approximate the above mentioned probabilities by using a recursively defined set of distinct collections (“buckets” of actions):

  1. each action that leads to a final state at least once goes to Bucket1.
  2. if an action A gets followed by an action B from BucketN, then:
     2.1 if A doesn’t belong to any “bucket”, then it goes to BucketN+1.
     2.2 if A belongs to BucketM, then it’s moved (if necessary) to Bucketmin(M, N+1).

Any action that remains outside the “buckets” may be considered as belonging to a “bucket” with a very high index, like BucketCount+1 where  Count is the number of buckets. Obviously, the set of “buckets” must persist between test runs for the whole system to have any meaning.

When executed repeatedly, the transitions from one state to another will make the actions to “bubble up” from “bucket” to “bucket” towards Bucket1 until the whole system eventually stabilizes.

At that moment the “bucket” system tells us how probable is for a certain action to lead to a final state: the lower the index of the “bucket”, the higher the probability of the action to lead to a final state. Naturally, Bucket1 corresponds to the probability 1 of the event “may lead to a final state”.

In (Markov) chains

chained_to_desk[1] Not all the actions are equal within the same “bucket”. Some actions lead to “buckets” placed further away from Bucket1, other actions get closer to Bucket1 (but no closer than the “bucket” immediately next to them, otherwise they move to another “bucket” further up).

It is important to choose those actions that lead to “buckets” as farthest away as possible from Bucket1. How can we know upfront to what “bucket” a certain action will lead?

We cannot know that precisely since the actual states get constructed on the fly. Yet, we can keep an average target index obtained from averaging the indexes of all the “buckets” that the action has lead to in the past. The greater this average target index is, the lower the chances that the action will lead to a final state so the more eligible that action should become. This approach resembles the state machines with probabilistic transitions known as Markov chains (hence the title of the section).

Because the “bucket” of a certain action changes over time, it is not recommended to keep an average of all the target indexes from the very beginning but it’s better to use a formula that gives more weight to the newest occurrences and gradually “forgets” the oldest ones.

An appropriate formula is:

AvgN = AvgN-1 + (TargetN-AvgN-1)*K

where TargetN is the current index of the target “bucket”, AvgN-1 is the previous average index and AvgN is the new average index. K is a number greater than 0 and smaller than 1.

By decomposing the recursive formula from above one can see that  it’s actually a weighted sum of all the previous target indexes, the weights being powers of (1-K). This summed geometric progression with a factor smaller than 1 leads to the older indexes getting “forgotten”. A value of K closer to 0 produces more stability but also more latency (more past values are relevant).

Choosing wisely

choose1[1] The “bucket” system adorned with average target indexes makes choosing the next action an easy task: we parse the “buckets” from the highest index towards 0 until we reach a “bucket” that has at least one action that can follow the current action. From that “bucket” we choose the eligible action with the highest average target index.

Assuming the bucket (“bucket index”, “average target index”) couple is a fairly good approximation of the chance to reach a final state from a given state, it results that the final states get avoided without the cost of exploring the state space in its entirety.

The “bucket” system is not perfect, of course. The probability approximations given by the (“bucket index”, “average target index”) pairs are pretty coarse in the beginning so the first runs may not be particularly long.

Yet, as the runs repeat, the “bucket” system stabilizes and it yields longer and longer sequences – up to detecting and following infinite cycles within the state graph.

Avoiding boredom

sluggo on repetition[1] The bucket system is efficient at avoiding final states – hence producing longer paths – yet it has a major drawback: it is completely ignorant of how often a certain action has been selected. Recall that the average target index of an action evaluates only how far away from the final states the action will lead and not how many times the action has been executed.

The result of this ignorance is that the system may get stuck within an infinite cycle without ever trying to escape because choosing the same action or group of actions over and over again doesn’t correlate with the probability of leading towards a final state. Moreover, the chance of such dull repetitions grows tremendously if the cycle at fault is in the proximity of the start state.

So, we must provide a mechanism to “spice up” the selection of actions so that the system under test doesn’t get “bored” from being exercised the same way for a too long time.

The next section suggests some ways to do it.

“Spycing up” test scenarios

hot-flashes-spicy-foods[1] There is more than one way to increase the variety of path selection and to produce more lively scenarios. We discuss several of them from the most simple to more complex.

Randomizing

The first method is to randomize. If two actions belong to the same “bucket” and have about the same average target index, then choose one randomly.

Pros: the method is simple.
Cons: it doesn’t really avoid infinite loops if all the eligible actions lead to such loops.

Adding frequencies to the “buckets”

The second method is to combine the average target index with the frequency computed according to the method shown in a previous post. We can do that in at least two ways:

  1. we choose first by average target index (rounded to integers) and then by frequency.
  2. we compute a number based on average target index and frequency and we choose based on that number. A simple way to compute that number is to divide the average target index by the frequency. It is not advisable to do another operation because both count and reverse of frequency are akin to probability measures whereas their combination is akin to intersecting probabilistic events.

The first way preserves the probability of avoiding a final state better. The second way preserves the chance to avoid boring infinite loops better.

Pros: the method preserves the general framework based on “buckets”.
Cons: if all the states within a certain “bucket” lead to infinite cycles there’s still chance to get stuck in unproductive repetitions.

Combining “buckets”, target indexes and frequencies together

The third method consists in maintaining a value f(bucket_index, average_target_index, frequency) for each action and to choose the eligible action with the highest value for f.

That function f must have the following properties:

  • it must increase as the bucket index increases.
  • it must increase as the average target index increases.
  • it must decrease as the frequency increases.

Here are some possible forms for the f number:

  1. linear: f(Bucket, Avg, Freq) = A*Bucket + B*Avg - C*Freq
  2. geometric: f(Bucket, Avg, Freq) = A*Bucket*Avg / B*Freq
  3. exponential: f(Bucket, Avg, Freq) = (Bucket*Avg)A/Freq
  4. invers-exponential: f(Bucket, Avg, Freq) = (Bucket*Avg)1-Freq/M
  5. rational: f(Bucket, Avg, Freq) = A*Bucket*Avg*(1-1/(M-Freq))

M is a positive number larger than max(Freq)+1. We can maintain a value large enough  by choosing initially an arbitrary value and then by increasing it whenever a frequency surpasses it.

It should be noted that the f number should decrease smoothly with the frequency, otherwise the chains of states get curtailed too early. Unfortunately, only the last two formulas from above satisfy this condition.

Pros: the method ensures that testing is not stuck in infinite cycles since each cycle “erodes” over time.
Cons: choosing an appropriate f function for a given state machine may not be easy or even possible.

The dangers of getting too high

istockphoto_453710-sky-is-the-limit[1] The previous section shows how we can use action frequencies to diversify the long-haul scenarios by “eroding” cycles that get exercised too much. Also, the “erosion” may go very smoothly initially, thus protecting the cycles from getting curtailed too early.

Using unlimited action frequencies has a drawback, though: the impact of a single change decreases over time as the value of the frequency gets higher and higher. For this reason it is better to limit the frequencies. The simplest way is to limit the M coefficient and whenever some frequency equals M-1 all the frequencies have to be divided by a value greater than 1.

Conclusions

Long haul testing is an important part of quality assurance because it simulates usage over a long period of time and it uncovers software errors hard to detect by other means.

Long haul testing and model-based testing seem to be at odds because long haul testing requires more control over the test runs from the part of the tester whereas model-based testing implies less control over scenario generation.

This article proposes a method to reconcile long haul testing with model-based testing by using a system based on probabilistic classes named “buckets” combined with frequency considerations to preserve the variety of test scenarios during the long runs.

Sunday, March 28, 2010

Testing Modal Forms with NModel

book[1]The testing of graphical user interfaces is one of the most sensible and – I dare to say – unpleasant aspects of software testing.

While testing GUI manually provides instant gratification to the professional tester – who can explore and try new ideas right on the spot – it certainly produces headaches to project managers: GUI testing is one of the slowest, sometimes boring, almost always time consuming and difficult to automate areas of quality assurance.

In other words, not something to brag about.

The NModel tool provided by Microsoft Research which I wrote about in my previous post, while not being created with GUI testing in mind, may be used in testing graphical user interfaces provided that the test developer is endowed with a good GUI automation library and a good understanding of modeling basics.

In this article I address the problem of testing modal forms with NModel. The testing of modal dialog boxes is the most simple kind of GUI testing. Future articles will address other areas of GUI testing with NModel.

The truth about dialogs

form2[1]

A modal window (or modal form or modal dialog window or modal dialog box) is a graphical element in a window-based computer interface that serves as data gateway from the user to the computer.

Dialog boxes have the following characteristics:

  • they have clear creation and destruction times.
  • the hosting application freezes during the lifetime of a modal dialog box.
  • dialog boxes have one purpose only: to make the user give information.
  • a dialog box has two possible outcomes: either the user provides the information or the user declines to give the information. The OK and Cancel buttons usually fill these roles.

Besides these main behavioral characteristics, modal dialog boxes may also:

  • provide data validation for some or all the controls within.
  • provide final data validation upon data approval (when the user presses OK).
  • enable and/or disable some controls.
  • provide correlation between controls (for example, a list box may be dynamically populated based on some other value).
  • accept or refuse resizing. Resizing brings the issue of control migration/anchoring/resize as well as the issue of text representation.
  • contain sub-variants, i.e. sub-dialog boxes hosted by the same dialog box. It is the case of tabbed dialog boxes.
  • exhibit asynchronous elements like timers and progress bars.

For all their characteristics, modal dialog boxes represent an ideal candidate for state-based modeling and model-based automated testing:

  • the start and end moments of a dialog box can be easily modeled with state variables.
  • there is usually only one event happening at a time (the exception are the asynchronous elements - quite rare). This makes a state-based representation easy.
  • the controls are usually (but not always) independent from each other, making the state space of a dialog box quite large. The size of the state space makes thorough testing a daunting task especially when some controls are not fully independent from each other.
  • once thoroughly tested, a modal dialog box can be modeled from that point on simply as a function returning a tuple of values.
  • automated model-based testing may produce race conditions that are nearly impossible to reproduce manually - yet revealing profound design defects in event handlers.
  • because NModel takes care of computing the actual state machine of a model, modeling a dialog box is as easy as matching all the controls in the dialog box with a corresponding variable-action pair.

Alpha and Omega

trinity-rublev[1]

The first thing to do when modeling a modal dialog box it to represent its beginning and its end. It’s easy to do this in NModel:

  • provide two state variables in the model, the first one telling whether the dialog box has been created and the second one telling whether the dialog box has been closed:

    static bool open = false;
    static bool closed = false;
  • create an action named Open that “opens” the dialog box:

    static bool OpenEnabled()
    {
        return !open && !closed; // the dialog has never existed
    }

    [Action]
    static void Open()
    {
        … pretend here to “open” the dialog; it is enough to reset the state variables although NModel does it …
        open = true; // do not forget to set “open” to ‘true’!
    }
  • create an action named Close (it may be replaced by an OK - Cancel couple) that “closes” the dialog box:

    static bool CloseEnabled()
    {
        return open;
    }

    [Action]
    static void Close()
    {
        … pretend here to “close” the dialog; it is enough to validate data if Close() represents ‘exit with valid data’ …
        open = false; // do not forget to set “open” to ‘false’ !
        closed = true; // do not forget to set “closed” to ‘true’ !
    }

prefix the enabling condition of any action with an AND-ed open so that no action takes place before the dialog box gets created. Assuming our action is named Act() and its enabling condition is provided by function MyCond, replace:

static bool ActEnabled()
{
    return MyCond();
}


with:

static bool ActEnabled()
{
    return open && MyCond();
}

  • make sure that only Open and Close change open and closed.

Meat on the bones

BABY%20BACK%20RIBS[1]

Once the start and end events have been modeled, we may proceed with modeling the rest of the dialog’s functionality.

There’s no single way to do it, yet some guidelines are good to follow:

  • allow one state variable per independent or partially independent control.

    Independent controls may be acted upon by the user independently from other controls. Partially independent controls differ from totally independent controls by the fact that their value may depend on other controls.

    You may ignore the controls which are totally dependent on other controls unless you desire to test their values.
  • provide one state action for each independent or partially independent control. Have the action to accept an argument of the same type as the data in the control. Within the action, set the value of the control and change the value of any other control that depends upon the current one.
  • make sure that no action can be called before the dialog box got “created” (see section Alpha and Omega from above).
  • it is good practice to provide a validation method AssertValid and to call it at the end of each action. This caution makes sure there are no nonsense transitions between states. The AssertValid method should assert on open and close:

    [Conditional (“DEBUG”)]
    static void AssertValid()
    {
        Debug.Assert(!open || !closed);
        … other assertions …
    }

When complexity unfolds

istockphoto_10272933-box-unfolding[1]A dialog box may contain tabs which increase the complexity of the form: the user may navigate freely between tabs. This ability to go back and forth as desired adds a potentially infinite chain of state clusters (each cluster corresponding to a tab being selected) which in turn makes the state machine of a tabbed dialog box very complex.

So, it seems that dialog boxes with tabs cannot be modeled as shown above.

Fortunately, there is a way out. The secret resides in the fact that there is nothing miraculous about tabs, they usually exist for graphical convenience when there are too many controls to host within a form. Otherwise they are like little dialog boxes on their own.

So, the simplest method to model tabbed dialog boxes is to create a model for each tab in isolation and then to create an all encompassing model for the entire form at the end.

The final model doesn’t have to contain all the internals of the tab models. On the contrary, each tab may be abstracted away as a function returning a tuple of values or – in NModel parlance – as an action receiving as arguments the values corresponding to the controls within the tab. In fact, the final model emulates the navigation from one tab to another and not much more.

The good test driver

bad-drivers-handbook1[1]  As shown in my previous post, NModel needs an interface between the model and the system under test in order to do testing. This interface is named test driver and it must implement the IStepper interface:

// code extracted from the “NModel book”
namespace NModel.Conformance
{
    public interface IStrategy
    {
        void Reset();
        CompundTerm DoAction(CompoundTerm term);
    }
}

The DoAction method does all the work upon the system under test whereas term contains all the information necessary for the action. Reset brings the system under test to the initial state.

Acting upon the system under test means to send mouse click and keystrokes to the modal dialog box. It is very important to do it this way instead of calling the form’s methods directly since it’s necessary to emulate as close as possible the actual user action.

So far, so good. Provided the test developer has a good UI automation library at hand, the task seems over.

However, because emulating UI actions may be tricky, it is highly recommended to create yet another interface: one between the test driver and the system under test. This new interface sits in front of the system under test while exposing part of its functionality. I call it the front.

The front

TheFront[1]

The role of the front is to provide support for acting upon the system under test completely outside the NModel framework.

Such a separation is necessary in order to be able to test the GUI automation workings of each action in isolation, under the control of the test developer (by using NUnit, for instance). As a bonus, these tests make an excellent battery of smoke tests (or BVTs or base verification tests) for the dialog box.

Let us assume that the Act action from above accepts a string argument. In such case, the front class and the test driver class should contain something like that:

public static class SUTFront
{
    public static void Open() { /* open the dialog */ }
    public static void Close() { /* close the dialog */ }
    public static void Act(string s) { /* do something with ‘s’ */ }
}

class SUTDriver
{
    public void Reset() { /* usually call SUTFront.Close() here */ }
    public CompundTerm DoAction(CompundTerm term)
    {
        switch (Term.Name)
        {
            case “Open”: SUTFront.Open(); break;
            case “Close”: SUTFront.Close(); break;
            case “Act”:
                string s = term[0] as string;
                Debug.Assert(s != null); /* only if ‘s’ may not be ‘null’ *.
                SUTFront.Act(s);
                break;
            …
        }
        …
    }
    …
}

So, with this approach, the actual test driver is SUTFront, SUTDriver acting only as a dispatcher. The benefit is that SUTFront can be used for other kinds of testing, apart from NModel.

It is strongly recommended to test SUTFront thoroughly before integrating it into SUTDriver. Programmatic GUI automation is not as trivial as it seems. The next section shows some of the glitches.

Traps and pitfalls in programmatic GUI automation

trap2[1]

Programmatic GUI automation is difficult because no matter how good a GUI automation library is, ultimately an automation library is not the same as a human user – neither in speed nor in intelligence.

These differences raise some specific challenges.

Trap no. 1: neither human nor non-human

In one hand, we want the emulated behavior to resemble human actions as much as possible. On the other hand, we want it to differ so that we can take advantage of the automatic nature of our tools in order to reveal more defects.

Let us consider mouse clicks, for example. Once we issue a mouse click programmatically, how long should we wait before issuing another event?

If we wait longer, then we get closer to the slow motions of a human user – but we might not be able to reproduce important race conditions. If we wait less, then we might be able to catch legitimate bugs from race conditions but we may also raise a lot of false positives - because it takes a non-zero time until the dialog box transitions from one valid state to another valid state.

Where’s the line between the two?

The truth is, there is not general answer, it depends on each case. I personally favor closeness to human behavior even though some race conditions might escape uncaught.

Trap no. 2: robotic take-over

As efficient as it is model-based testing, it is dangerous to rely solely on it for GUI testing. In fact, it is dangerous to rely exclusively on any kind of automated GUI testing.

Human check-up is necessary, either because automating is sometimes too costly or because automation is plainly not possible. For example, no GUI automation library can do verification of text meaning or usability verification. Such things require human intervention or Artificial Intelligence techniques that are beyond today’s state of the art.

Trap no. 3: model luring

Models are addictive. They are pretty good at luring the IT professional into believing that gain with no sweat stands right behind the corner. Hence, the danger to do a model for any kind of problem, no matter how trivial, is non-negligible.

The test developer should strive to keep the things simple. If a dialog box has only a few controls, if the controls are independent from each other and there’s not much dynamics involved, then using modeling is most likely inappropriate. Writing a plain vanilla test suite that simply exercises the form is a better choice in such cases.

We do have hammers, too. It doesn’t mean that everything should look like a nail.

Trap no. 4: Chinese speaking

In my native tongue, the saying “you speak Chinese to me” means “I don’t understand anything from what you are saying”. Without care, adopting model-based techniques may lead to a very unproductive “Chinese” way of “speaking”.

The test developer using models should not forget that the outside world cares nothing about them. Our colleagues and managers want results with little concern for the method used. They want the results in their terms, not ours – and it is our duty to provide the translation - otherwise we become “Chinese speakers”.

This means that, in the end, we must provide tests – and other artifacts – that can be executed and/or used by anyone, outside the realm of modeling.

Conclusions

The GUI testing of modal dialog boxes can be accomplished with model-based techniques provided that the test developer has good quality GUI automation libraries and he understands the basics of modeling.

This article shows how NModel can be used for GUI testing of modal forms along with traps and pitfalls than one may face when tackling GUI testing with the aid of models.

Monday, March 8, 2010

Frequency-Based Test Strategies with NModel

book[1]

As I have shown in my previous post, NModel is a viable framework for model-based testing and analysis of software programs. In this article I explore various possibilities to develop test strategies that can be implemented for the ct.exe tool of NModel.

How testing works with NModel

The idea behind testing with NModel is very simple: the test developer constructs a model of the system under test and then constructs a test driver which translates the state transitions into actual commands upon the system under test. NModel is capable of traversing the state space of the model while issuing the calls upon the system under test via the test driver.

If the state space is small and the total number of paths throughout the state space is manageable, then it matters less how the paths get generated since NModel will eventually generate all the possible paths (i.e. test scenarios).

Usually the state space is extremely large so, in real life, it matters a lot how the paths get generated. The NModel element controlling the manner in which the paths get generated is called test strategy. NModel comes with one strategy out of the box but the test developer can build other strategies on his own.

The IStrategy interface

strategy This interface forms the basis for any NModel test strategy. It has the following declaration in C#:

// code extracted from the “NModel book”
namespace NModel.Conformance
{
    public interface IStrategy
    {
        Set<Symbol> ActionSymbols { get; }
        IState      CurrentState { get; }
        bool        IsInAcceptingState { get; }
        void        Reset();
        bool        IsActionEnabled(Action action, out string reason);
        Action      SelectAction(Set<Symbol> actionSymbols);
        void        DoAction(Action action);
    }
}

The methods of IStrategy are self-explanatory (for information on these methods see the “NModel book” at page 201 and following). The most important method of the interface is SelectAction(Set<Symbol>) since this method dictates how the ct.exe tool chooses actions during continuous testing.

The Strategy class

permtree-anim[1]This class is the “standard” strategy offered by NModel. When selecting an action it simply makes a random choice. While this strategy is simple and it ensures a pretty good state coverage, it may be inefficient: there is nothing to prevent that a state gets selected over and over again. Circumventing this disadvantage means to record the visited states one way or another.

The following sections explore possible ways to do that recording.

A dead path: storing the state paths

The simplest way to avoid repeating the same states over and over again is to check the current state path (which is the chain of states starting with the initial state and ending with the current state) against previously generated state paths. Tempting as it is, such strategy fails for any non-trivial program: while the number of states may be large, the number of state paths is usually huge.

Numeric escape: using frequencies

2024_FreqGraph[1] Storing entire state paths means, in fact, storing too much information. If we settle for less when it comes to the assurance of not duplicating the paths, then it’s possible to store less information while still keeping a high chance of path non-duplication.

Any state path is nothing else than an ordered collection of states. Obviously we want to generate different, new collections with each step without keeping the whole collection. How can we do that?

The idea is to replace the collection of states with another piece of information which maintains - at least partially – the identity of the collection without the aid of elements. There is more than one solution to this problem, yet a very simple one is based on frequencies: if we choose the state that hasn’t occurred much lately then it’s a high chance we do not generate a previously generated state path.

Frequencies have two big advantages: are fixed in size and are very easy to update. State frequencies can be considered in various ways. The next sections reveal those ways.

Brute state frequencies

abp_hyperstates[1]  Since the state path is made of states, it is plain natural to consider brute state frequencies first. This means that the method SelectAction(Set<Symbol>) selects the action leading to the least frequent state. Upon encountering more than one state of the lowest frequency the strategy may choose randomly - hence increasing the chance of non-duplication.

This solution looks better than keeping whole paths yet the problem is not solved: the number of states may still be very large, virtually infinite. Maintaining a dictionary of state-frequency pairs simply does not scale for real-life cases.

This problem has a solution if we think that a state doesn’t come from nowhere. A state occurs from one of the following sources:

  1. it is specified as an initial state of the abstract state machine.
  2. it is obtained from another state by applying an action upon the abstract state machine.

So, for any given state S, there is a correspondence between the next state and the action applied upon the state machine while being in state S. This correspondence leads to another use of frequencies which has the merit of scaling to any number of states - hence for any abstract state machine – despite being less precise than state frequencies.

Action frequencies

The idea is simple: method SelectAction(Set<Symbol>) selects the least frequent action from the list of eligible actions. Upon encountering more than one action of the lowest frequency the strategy may choose randomly - hence increasing the chance of non-duplication.

Unlike state frequencies, this method does scale since the number of actions is fixed for any given model. So, space consumption is O(1).

Balancing actions against states

justice scale Choosing the next action based on action frequencies is simple and practical but it has a major drawback: not all actions are created equal. The number of states they generate vary greatly – especially that, in the case of NModel, action methods accept arguments, hence there is a very wide range of changes that each action is able to perform upon the state machine.

To make the things more clear, let us consider two actions A and B, both applicable upon the same state S. Let us assume that A generates 100 states and B generates one state only. Let us assume that any time we choose an action we increment the appropriate frequency number by 1.

It is quite obvious that this policy favors B against A with a ratio of 100 to 1 since the state generated by B has 100 times more chances of being elected than any state generated by A. So, differentiating between A and B in terms of chances of being elected is a must. The simplest way to do it is to take into account the number of states that each action generates.

Unfortunately, it is nearly impossible to accurately predict the number of states generated by an action because:

  1. the number of states generated by an action may depend upon the previous states.
  2. action methods may have parameters and the number of generated states may depend on those parameters in ways that make any prediction very hard to make.
  3. the states are not equally important. For instance, out of the 100 states generated by action A, most likely the number of truly interesting states is smaller (say, 20), all the other states being variants of the interesting cases.

So, instead of using the exact number of states generated by each action, we may use an action weight – representing an estimate of the number of relevant states that each action produces.

With this weight taken into account, using frequencies is easy: each time an action gets selected, its frequency number gets incremented by 1/weight in the stead of 1. Obviously, the frequency number cannot be an integer anymore.

Some recollection: using pairs

1236210989_00-va-schaffhaeuser_and_friends_unequal_equality[1]The method of using action frequencies is simple and practical yet it ignores the previous history of actions completely. We don’t want to record the whole previous history (otherwise we end up with the full path problem from above), but we might need to recall the last state and/or action  and include it into the computation of frequencies.

In other words we might want to use frequencies of pairs. Pairs of what? The next sections will tell.

Frequencies of state-action pairs

We can keep frequencies of state-action pairs in the stead of action frequencies. However, because the number of states that a certain action can act upon may be large, we can do the following:

  1. compute H(S) where S is the state and H(S) is a numeric hash value for state S.
  2. replace state S by the value H(S) % K (remainder of the integer division H(S)/K) where K is the maximum number of distinct states that we want to keep in consideration.

With this method, space consumption is something like O(K) which is still O(1) for a constant K.

Frequencies of action-action pairs

If using the threshold K seems arbitrary, then keeping frequencies of action-action pairs is another option. This means that the strategy keeps the frequencies of (A1, A2) pairs where A1 and A2 are actions that may occur in natural succession during continuous testing.

With this method, space consumption is something like O(N2) which is still O(1) since N (the number of actions) is constant.

Added benefit: knowing how you’re doing and when to stop

istockphoto_11275783-exclamation-sign-man-the-silent-screamer[1]  Using frequency-based test strategies with NModel has the added benefit that we know all the time how many times each action has been called. One can use this information to develop various test metrics and policies to stop testing or evaluate the quality of testing:

  • the average frequency of action calls gives a clue on how well the implementation has been tested. For example, if we have a test run with an average of 10 calls per action and another test run with an average of 10000 calls per action, we can assume that the latter run has been 1000 times more thorough than the former.
  • individual call frequencies may tell when to stop. For instance, one may choose to stop when each action has been called at least 10000 times, or when the average frequency is 5000 with a lower limit of 1000 calls per action.
  • one can get a clue in real time about the dynamics of continuous testing by following the evolution of action frequencies over time. For example, if a certain action lags behind its peers then it is obvious that a certain part of the state space gets under-tested. In such case, refining or replacing the test strategy is quite necessary.

Conclusions

Continuous testing with the ct.exe tool of NModel is a viable method to achieve high quality testing with low costs. NModel offers to the test developer the possibility to develop new test strategies within this framework.

Selecting the next state based on frequencies is a simple yet effective method to ensure a wide variety of action calls with the added benefit that the accumulated numbers may form the basis for various quantitative and qualitative metrics.

Monday, February 15, 2010

Model-Based Software Testing and Analysis with C#

MBSTA with C#

If software testing is checking conformance to specifications or looking for discrepancies between expected behavior and actual behavior, then model-based testing is a promising way to do it. In my opinion, models may be the way to transform software testing from an art into a science, a strong reason to like them.

In the case of .NET, the NModel set of tools produced by Microsoft Research and freely distributed through CodePlex can be of great help when it comes to modeling. Based on the theory of abstract state machines, NModel has the advantage of using a programming language to express the behavior of the system under test. That is, no UML diagrams or fancy graphical editors. On the other hand, the programmatic approach makes NModel less attractive for non-programmers despite the fact that it does include a visualization tool as part of its suite.

The standard book on NModel is Model-based software testing and analysis with C# by Jacky et al. (the “NModel book”). In this article I try to make a brief overview of the book while emphasizing perceived deficiencies of NModel as a tool for modeling real-life systems.

A little bit of history

AsmL Yuri Gurevich is a theoretical computer scientist who’s developed the theory of evolving algebras. At Microsoft Research he’s lead a team that developed a specification language named AsmL, based on abstract state machines (AsmL comes from Abstract State Machine Language). Abstract state machines are state machines working with arbitrary data. As a result of this, ASMs are capable of expressing a much broader range of computations than FSMs (finite state machines) because the latter ones suffer from “state explosion” (unmanageable growth in number of states).

AsmL provides executable specifications and is embeddable into Word documents via an extension developed by the same team at Microsoft Research. I used AsmL and it is a great language: simple and elegant, it offers great constructs to express abstract concepts while maintaining a programmatic look and feel. I might say it’s the specification heaven for the computer programmer.

Despite its elegance, AsmL has a major drawback: it is neither a programming nor a modeling language. It’s never meant to be, it was supposed just to accompany specifications written in natural language and not to provide ways to construct real systems or the models thereof.

NModel fills the gap. Consisting in a set of tools and nothing else, leveraging the .NET to the fullest extent, anyone who can program .NET can program viable models in C# or any other .NET language (thus, the title of the “NModel book” is somewhat misleading; one can construct models in Visual Basic.NET or C++/CLI as easy as in C#). But NModel is not essentially different from AsmL; it’s founded upon the same sound theory of abstract state machines.

NModel: a brief description

NModel NModel consists in a library and a set of tools. The library contains elements to specify the model. Such an element is the ActionAttribute attribute telling that a certain method represents a state transition. The tools have various purposes: graphic state explorer, test generator, test runner.

With NModel, the designer does not represent states explicitly. He represents data and changes upon the data and NModel computes the states automatically. This represents a great advantage over graphical modeling tools that start with states and end with data. With NModel, the designer models functionality directly, he doesn’t start from functionality to build diagrams only to arrive back at functionality. In order to facilitate state analysis, NModel ofers a graphical exploration tool, though.

The model, in NModel acceptance, is nothing else than a .NET custom library. The modeled elements are classes. The state transitions are methods and the state data consists in class fields. This permits to explore the model apart from NModel and, in fact, NModel itself encourages that. For example, NModel exhibits only one algorithm for state traversal but anyone can implement other algorithms with ease since NModel offers all that’s necessary to explore the state space in any desired manner.

Beside automatic state construction, NModel offers tools for model-based testing. Firstly, the designer writes stubs that act upon the system under test as dictated by the current state. Secondly, he uses the test generator that traverses the state space and, while traversing, acts upon the system under test via the above-mentioned stubs.

This is much more productive than writing test scenarios by hand because the tester can leverage the exploratory algorithms of NModel to generate a larger amount of scenarios in a shorter period of time while obtaining a better coverage of possible states and transitions.

The “NModel book”: a short overview

MBSTA with C# NModel comes accompanied by thorough documentation of the library but it lacks a tutorial explaining how to use everything. The “NModel book” fills this role.

The book is divided into five main parts:

  1. Overview
  2. Systems with Finite Models
  3. Systems with Complex State
  4. Advanced Topics
  5. Appendices
Overview

This section of the NModel book introduces the reader into the topic of modeling. It explains the role of modeling in analysis and testing, it gives examples on how modeling does a better job at detecting some errors than traditional methods and it shows how modeling is useful in some cases harder to tackle by other means (such as design defects).

Systems with Finite Models

This section covers the issue of state machines with a small number of states, the ones that can be explored exhaustively. Topics like analysis, modeling, exploration, selection and testing are included.

Systems with Complex States

This section begins to reveal the power of abstract state machines and their advantage over finite state machines: the states of ASMs may contain complex data. The section covers modeling, analysis and testing systems with complex states.

Advanced Topics

As the name suggests, this section covers advanced topics: model composition, modeling objects (useful in conjunction with the OOP paradigm) as well as handling non-determinism.

Appendices

There are two kinds of appendices:

  • a library reference –AND -
  • a description of the tools included with NModel:
    • mpv, the Model Program Viewer
    • otg, the Offline Test Generator
    • ct, the Conformance Tester

NModel: friendly criticism

criticism1[1]

The first criticism of NModel is acknowledged by the autors themselves: the tool does not have other traversal algorithms other than the postman tour algorithm. However, a programmer can create new traversal algorithms with what the NModel library offers.

The second criticism of NModel consists in the lack of expressing parallelism. NModel offers the composition mechanism (which is, in fact, a Cartesian product over the state spaces adorned with some rudimentary “synchronism”) but the mechanism is so awkward that the authors themselves use it solely for state selection and not much else.

The third criticism of NModel consists in the lack of expressing structural composition: to be able to construct the whole from its parts. For example, provided we have a sub-model A and a sub-model B, there’s no way to produce a model C composed of A and B working together. Needless to say, C should be at a higher level of abstraction than the sheer composition A x B.

OBS: NModel does have a mechanism for state abstraction, i.e. for grouping a (potentially infinite) number of states into a single one. But for an abstraction based of structural composition as explained above, there is no support. That is, if one wants the model C from above, he must write it from scratch by hand.

The fourth criticism of NModel consists in the lack of expressing structural decomposition: to be able to break the whole into its parts. For example, provided we have a model A, there’s no way to break it into sub-models B and C that, when “put together”, do the same thing as A does. Needless to say, B and C should be at a lower level of abstraction than a sheer division of A’s states into two separate groups.

OBS: NModel does have a mechanism for state transition refinement via features and composition. Yet, such refinement applies to isolated states and transitions. It doesn’t work per entire groups of states, let alone per entire models. Most importantly, such refinement is not automatic with the exception of exploiting action matching via identical action names (the basic “synchronism” mechanism mentioned above). While yielding some relief in terms of decomposition, state transition refinement does not preclude the need for more elaborate ways to express structural decomposition as it happens in traditional top-down design methodologies.

The fifth criticism of NModel consists in the poor support it offers for the well-known categories of testing. The software testing industry has long established several distinct kinds of testing: parameter, limits, boundary, functional, stress, robustness, etc. While the test designer can implement them via test strategies, there is not much more support beyond that.

The sixth criticism of NModel consists in the poor support for repro scenarios. Testing with NModel follows two paradigms: predefined testing, quite reduced in scope and applicable to models with small number of states, and on-the-fly testing, much more powerful and customizable via test strategies. By far, the best option is on-the-fly testing in terms of both coverage and chances to find defects. Yet, when it comes to reproducing faulty scenarios, the support offered by NModel is sparse and cumbersome to use.

Conclusions

Model-based testing is a promising method to bring rigor and precision into the realm of test design. Despite requiring a higher level of expertise and more effort upfront, using models pays off due to a higher productivity in terms of test case generation and feature coverage.

NModel is a free suite of tools provided by Microsoft Research that can be used for model-based analysis and testing in the .NET ecosystem. While the NModel library is well documented by a help system provided with the tools, it is the “NModel book” the piece of work which contains a gentle yet thorough introduction into the world of modeling in general and NModel usage in particular.

Monday, February 8, 2010

How Much Intelligence in Software Testing?

monkey_thinker[1]The issue of required level of technical expertise in software testing is a sensible one, especially when we compare software testers with their friends (and foes) from software development.

I dare to say that the universally accepted “wisdom” tells that testers are less qualified professionals than developers, that one becomes a tester if he’s not good enough to be a developer, that it’s expected that good testers eventually move to development and that professional testing is a disposable asset because, after all, one has the option to hire school students to do the job.

In this article I explore briefly the issue of tester intelligence then I move on to possible ways to optimize the usage of human intelligence in software testing.

Cogito, ergo sum

“I think, therefore I am” said Descartes many years ago and this statement, so much true for any human endeavor, is even more valid for that complex and  brain intensive activity named software construction. Indeed, building software is hard, complex and – unfortunately - very much error prone.

What about software testing?

When it comes to sheer verification of conformance to specs, software testing clearly requires a lower level of expertise than development: we don’t care how much intelligence has been put into a piece of software as long as it conforms to the specs.

However, when it comes to finding on purpose malfunctions in software, things do change tremendously. The tester is not anymore the patient, hard working bookkeeper of features from above. He’s more like a hacker, trying to exploit any little clue extracted from the system under test.

A hacker is anything but dumb and so is the creative professional tester. Creative software testing is a highly intelligent activity requiring cognitive processes as complex as the ones used for development (but different, of course). Needless to say, such good testers are hard to find.

Alas, like any intelligent human activity, creative software is expensive. Are there ways to optimize it?

Yes, there are.

When the artificial flavor is good

yogs_03[1] When the software tester is looking for bugs, he’s faced with a non-decidable search problem whose space is infinite:

  • it is a search problem because the tester has to search the space of all possible test scenarios and select only the ones leading to bugs.
  • it is a non-decidable problem because the decision whether a certain behavior is indeed defective resides outside the testing process since it usually requires input from external sources (the developer, the rest of the team).
  • it is an infinite-space search problem because the total number of possible scenarios is extremely large, practically infinite.

Does that sound familiar? If it doesn’t, it should: chess players, stock traders, investment businessmen, military or economic strategists, business executives, medical diagnosticians and many others have been playing with such things for years – and they are all endowed with pretty darn good levels of intelligence.

Not only that: such problems have been tackled by computer science, too. It’s named Artificial Intelligence (A.I), it deals with problems non-tractable by ordinary methods and it’s yielded notable successes in domains like medical diagnosis, oil drilling, automated vision, game playing and others.

Does Artificial Intelligence have any place in software testing? Some attempts have been devised so far yet there is definitely room for better.

Let’s take a closer look.

A proven path: theorem proving

Pythagorean_Theorem[1] Theorem proving is a field of Artificial Intelligence that deals with the automated means to prove that a certain statement is true or false. Rooted in mathematical logic, theorem proving employs sophisticated methods of symbolic processing.

The principle of using theorem proving in software testing is very simple:

  • testing conformance to specifications is equivalent with: given a software program P and a set of specifications S, is the statement ”P and S“ true? In other words, does S hold for any run of P?
  • bug-oriented testing is equivalent with: given a software program P and a set of specifications S, is the statement ”P and not S“ true? In other words, are there runs of P that break the specs S?

A theorem prover cannot work with a computer program in its direct, binary form, neither can it understand specs in natural language. So, a test system based on theorem proving must perform some transformations upon both the system under test as well as upon the specs to bring them in line with what the theorem prover can work upon.

These transformations are problematic and they wonderfully reveal the limits of the method:

  • not every system’s internals are expressible as a formula that can be processed automatically. Hence, some approximations must take place, approximations that introduce a difference between the system under test and its formal expression.
  • not every specification in natural language can be translated into a format that can be processed automatically. Again, some approximations must take place.

To summarize the method in terms of pros and cons, one can say:

  • Pros: it is a precise and rigorous method, based upon on the time-tested, solid foundation of mathematical logic.
  • Cons: it requires approximations and it relies on access to the system’s internals. This means it cannot do black-box testing.

A career in modeling

Camilla_Barungi[1]Capturing the essentials of a system’s behavior, apart from specifications, can be achieved via models. Curiously, despite modeling being essential to other branches of engineering, the software industry has recognized its importance only recently.

A software model is an artifact that represents, in simplified form, the behavior of the software system being modeled. Think of a model as the maquette of the software system being built.

Software models can be executable or non-executable. Executable models may be state-based modeling, i.e. they try to mimic the most important states of the system as well as the transitions from one state to another. When each state specifies the invariants that the system must satisfy, then the model becomes a verification tool, too.

Models are viable for software testing by the means of Artificial Intelligence because:

  • the elements of a model (states, transitions) can be processed by automatic means out of the box – AND -
  • the behavior of a model is much simpler than the actual behavior, hence it is easier to process automatically.

As simple as they are, models are still too complex to be explored exhaustively. Their state space is big enough to require Artificial Intelligence techniques, especially from the realm of search algorithms.

Because they resemble more what the user sees (in contrast with the theorem proving approach, where the system under test becomes a mathematical formula) they are closer to the real-life work of the tester. However, because they rely on search algorithms empirical in nature (i.e. not as theoretically sound as mathematical logic) model-based testing does not enjoy the same theoretical precision that theorem proving has.

To summarize the method in terms of pros and cons, one can say that:

  • Pros: it represents the behavior of a system and not the internals, therefore it supports black-box testing better.
  • Cons: it adds extra work to build the model, it is not theoretically sound as theorem proving, modeling requires ignoring details of the system that may prove important later on.

Darwin was right

THE SIMPSONS: Flanders calls Homer an ape and makes a case for evolution revolution in THE SIMPSONS episode "The Monkey Suit" airing Sunday, May 14 (8:00-8:30 PM ET/PT) on FOX.  THE SIMPSONS™ & ©2006TCFFC ALL RIGHTS RESERVED.  ©2006FOX BROADCASTING  CR:FOX The previous methods of testing rely on a view upon the system under test that originates from:

  • the translation of the system under test into a formalism that can be maneuvered by the theorem prover – OR -
  • the manually built representation of the system’s behavior as a model.

Either way, the view of the system under test is static, it does not evolve over time. Once the view gets constructed, the test engine uses it unchanged as long as testing goes. If one wants a better, more detailed view of the system under test, he must reconstruct it from scratch.

Such approach, obviously, involves a lot of work just to recode the information that already exists in the system. Wouldn’t be nice to have a testing system that doesn’t require that much information upfront but it learns the system while testing it?

Such a system would depend pretty much on how the learned information gets represented. One way to represent that information consists in the test scenarios themselves, since any test scenario does represent some information about the system under test. This means that having more scenarios is having more information and having a lot of scenarios is having a lot of information.

Such system would not store all the scenarios but only the ones containing maximum of test-related information, i.e. the ones revealing most defects or the ones exercising most system states. To avoid writing all the scenarios by hand the system has to start with a finite set of hand-crafted scenarios and it has to generate new scenarios automatically, based on the existing ones.

One method of generating new scenarios is by “mixing” existing ones to produce “offspring” while retaining the best “children” and discarding the sub-optimal ones. Since the “mixing” is akin to how the genes of a child’s parents mix to produce the genome of the child, these methods got called genetic algorithms.

These algorithms make up an evolutionary model of software testing because they work upon a population of test scenarios that, with each new generation, becomes better fit to the “environment” represented by the system under test.

The success or failure of genetic algorithms rely on two elements:

  • the individual actions which, when applied on succession, form the scenario. These actions are atomic, mostly stateless and mostly independent computational units that try to mimic an atomic piece of functionality of the system under test. Unfortunately, not all the systems support such atomic, sequential and independent decomposition.
  • the mixing procedure must be fast in order to ensure a high rate of generational renewal of the population. If the above-mentioned actions are truly stateless and independent, the mixing procedure is simple. Yet, a higher rate of inter-dependent actions requires more intelligent mixing procedures which may prove too slow.

To summarize the method in terms of pros and cons, one can say that:

  • Pros: it requires little effort upfront and the more it runs the more intelligent, better informed results it produces.
  • Cons: performance degrades for systems whose functionality does not support decomposition into atomic, stateless and independent actions.

Unexplored paths

watermarksm[1] The previous sections show three fields of Artificial Intelligence that have been used in software testing. Some other fields of AI are good candidates, though. According to my knowledge to date, they haven’t been considered as such.

This section tries to present them along with reasons on why considering them good candidates for software testing makes sense.

Rule-based expert systems

Rule-based expert systems is one field of Artificial Intelligence that has enjoyed considerable commercial success. Expert systems have been used in areas like medicine and mining and, albeit very expensive, they’ve saved large amounts of money to the ones who used them.

A rule-based expert system has two parts:

  • a set of “rules” processed by an inference engine that represents the “thinking” of the system.
  • a set of “facts” that represents the “memory” of the system.

When presented with a problem, the expert system applies the rules upon the facts in order to draw a conclusion related to the problem. A solution may consist in a yes/no answer, a sequence of steps leading to a result or an explanation for a certain conclusion.

We may consider that a hypothetical expert systems for software testing should have the following elements:

  • a set of “rules” indicating standard procedures to test a testable element.
  • a set of “facts” containing knowledge about testable elements: UI controls, APIs, protocols, data structures, hardware ports, etc.

When presented with a problem – i.e. a description of the system in terms of both structure and functionality – such an expert system would yield test procedures while explaining the reasons for choosing them. Expected outcomes would be: producing a test plan, yielding several standard test scenarios or proposing quality metrics.

Neural networks

Neural networks are a field of Artificial Intelligence that has been used with success in form recognition. They proved capable of recognizing patterns like forms, images, handwriting or even voices. Their principle consists in a decision scheme based on the parallel work of tiny decisional elements named neurons which are inter-linked in various ways. More complex patterns require more complex neural networks.

Neural networks have been designed with static information in mind. This means that they can recognize patterns like images or handwriting but they are not fit to classify motion (unless, of course, motion is decomposed in individual frames, although I am not sure whether such an approach has been tried as of today). When thinking of testing, this means that reasoning upon the dynamics of a software system is unlikely to succeed with neural networks.

Yet, neural networks could be used to reason upon static aspects of software, such as GUI layout. For example, writing a program that tells whether the GUI controls of a form are harmoniously arranged is nearly impossible to do in conventional ways, yet it becomes tangible with neural networks.

Case-based reasoning

Case-based reasoning is a new field in Artificial Intelligence that deals with problems that resist an analytical description hence they aren’t tractable by analytic processes, no matter how advanced. Case-based reasoning is the automatic counterpart of the “that’s the way we do it” from real life that we hear so often from people with great empirical experience who know they are right but cannot explain why.

A case-based reasoning system consists in a large database of problems, their characteristics and their resolutions. These are the cases. When a new problem arrives, the system tries to match the new situation to one of the existing cases and to propose a solution based on existing precedents. The solution bears no logical explanation since there is no apparent, logical correlation between a problem, its characteristics and its resolution. Yet, it works because a liaison does exist but it is intractable by computational means.

Case-based reasoning may work for software testing considering that software developers, being all humans, most likely make similar mistakes when faced with a similar design. Hence, a case-based testing system doesn’t need to deeply analyze the system’s structure or behavior: provided with a description of the design, the case-based testing system might look up into the database of preceding cases to pinpoint the most probable vulnerabilities.

Conclusions

Software testing, like any other creative endeavor, requires a significant amount of intelligence. Various fields of Artificial Intelligence have been used to replace or complement the intelligence of software testers which, like any human intelligence, is slow, expensive and error prone.

This article presented an overview of several usages of Artificial Intelligence in software testing while proposing other fields of AI as good candidates for the same purpose, along with reasons for such proposals.