Testing and the Catch2 framework

Why testing?

Anybody that writes code for some purpose (whether as a researcher, a software engineer, or in any other profession) will get to the point where others are relying on their code. Bugs in software can be dangerous or even deadly. Additionally, users do not enjoy using software that is buggy and crashes and fixing bugs once the software is in production is very costly. Most importantly, good engineers take pride in building things that work well and are robust.

The key to writing working software is developing good tests. In this course we follow an approach called test-driven development. As you write code, you will also write companion tests. These tests are used to verify that the code you just finished writing works as intended.

This strategy is sometimes called “test-as-you-go”. You work in small steps, being sure to test thoroughly, and only move on after you having confirmed the correctness and fixed all issues. The beauty of this approach is that each step is relatively straightforward and easy to debug. Imagine the opposite approach: you write hundreds of lines of code, the code does not work, and now you need to figure out which one of those hundreds of lines of code isn’t working as expected! That is the sort of frustration that we want to help you all avoid as you continue to develop your skills as programmers.

Catch2

For CS3380, you will use a widely used testing framework named Catch2 to test your code. Catch2 provides a simple, clean approach to writing and running test cases.

Here is an example of how you might see the Catch2 framework used in the starter code of an assignment.

#include <catch2/catch_test_macros.hpp>

// reversed(str) returns copy of str with characters in reverse order.
std::string reversed(std::string s)
{
    std::string result;
    for (int i = s.length() - 1; i >= 0; --i) {
        result += s[i];
    }
    return result;
}

// Test Cases
PROVIDED_TEST("Demonstrate different Catch2 use cases") 
{
    CHECK(reversed("but") == "tub");
    CHECK(reversed("lsu") == "usl");
}

When we provide tests for you in the starter code, each test case is marked as PROVIDED_TEST. The string argument in parentheses describes the purpose of the test and the code block that follows (enclosed in curly braces) defines the actual test behavior.

When you add your own test cases, you will mark your test code blocks with STUDENT_TEST instead. The STUDENT_TEST functionality and structure is exactly the same as the tests marked with PROVIDED_TEST , it simply distinguishes the tests you’ve written yourself from those we provide for the benefit of your grader. You will see many examples of this in the following sections.

CHECK

Within the code block, the test macro most commonly used is CHECK which confirms the condition wrapped. A typical use case for CHECK is to take a value produced by your code, e.g. the return value from a call to one of your functions, and confirm that value matches to the expected outcome. For example, in the above code, CHECK is used to compare the result of the call reversed("but") to the string "tub". If the two are indeed equal, the test passes. If they do not match, the test is reported as a failure.

For example, after adding your own tests (using the STUDENT_TEST identifier as previously mentioned) to the above file, it could look something like this:

#include <catch2/catch_test_macros.hpp>

// reversed(str) returns copy of str with characters in reverse order.
std::string reversed(std::string s)
{
    std::string result;
    for (int i = s.length() - 1; i >= 0; --i) {
        result += s[i];
    }
    return result;
}

// Test Cases
PROVIDED_TEST("Demonstrate different Catch2 use cases") 
{
    CHECK(reversed("but") == "tub");
    CHECK(reversed("lsu") == "usl");
}

STUDENT_TEST("my added cases not covered by the provided tests") 
{
    CHECK(reversed("racecar") == "racecar");
    CHECK(reversed("") == "");
    CHECK(reversed("123456789") == "987654321");
}

Important note: You should never modify the PROVIDED_TEST tests – these are the same tests that will be used for grading, so it is not in your best interest to modify them. If you want to test different scenarios, always add new tests mark those with the STUDENT_TEST tag.

CHECK_THROWS

You pass an expression to CHECK_THROWS and it evaluates the expression and observes whether it throws an exception to report an error. If an exception is thrown, this causes the test to pass. If not, it causes the test to fail and reports that the expression failed to trigger an exception. CHECK_THROWS is used in the specific situation of confirming expected handling of errors within your code.

CHECK_NOTHROW

This macro is exactly the opposite of CHECK_NOTHROW. With this macro, you pass it an expression, and if the expression successfully runs to completion without running throwing an exception, then the test passes. However, if evaluating the expression throws an exception somewhere along the way, the error causes the test case to report failure. CHECK_NOTHROW is used in specific situations where you want to confirm that functions behave properly on correct input.

BENCHMARK

Catch2 also has support for simple execution timing.

To time an operation, evaluate the expression within the macro BENCHMARK, as shown below:

STUDENT_TEST("Time operation vector sort on tiny input")
{
    std::vector<int> v = {3, 7, 2, 45, 2, 6, 3, 56, 12};
    BENCHMARK("sorting a vector") 
    {
        return std::ranges::sort(v);
    };
}

The argument to BENCHMARK is used to label this timing result to distinguish from other results. The code block after the BENCHMARK() macro is the expression to evaluate. BENCHMARK will start a new timer, evaluate the expression, stop the timer, and report the elapsed time.

Please note that the block associated with the BENCHMARK expression must end with a semicolon (;). You might see unrelated compilation errors otherwise.

You can have more than one use of BENCHMARK within a test case. Each operation is individually evaluated and timed. Below demonstrates use of BENCHMARK in a loop to time sorting successively larger vectors.

STUDENT_TEST(
    "Time operation vector sort over a range of input sizes")
{
    for (int size = 50000; size < 1000000; size *= 2) 
    {
        // fill vector with random values
        std::vector<int> v;
        for (int i = 0; i < size; i++) 
        {
            v.push_back(random_integer(1, 1000)); 
        }

        // measure how long sorting takes
        BENCHMARK("sorting the vector") 
        {
            return std::ranges::sort(v);
        };
    }
}

By default, a test case that uses BENCHMARK will be reported as Correct as long as the expression being evaluated does not result in an error or crash. If you want to verify the actual correctness of the result as well as time it, you can mix in regular use of CHECK into the test case as shown below:

STUDENT_TEST(
    "Time operation vector sort on tiny input and verify is sorted")
{
    std::vector<int> v = {3, 7, 2, 45, 2, 6, 3, 56, 12};
    BENCHMARK("sorting the vector") 
    {
        return std::ranges::sort(v);
    };
    CHECK(std::ranges::is_sorted(v));
}

Debugging a failing test

The goal you are shooting for is for all of your tests to pass. However, if you get a failed test result, don’t look at this as sad times, this test result is news you can use. The failing test case indicates that you have identified a specific operation that behaves counter to your expectations. This means you know where to focus your attention.

Dig into that test case under the debugger to analyze how it has gone astray. Set a breakpoint inside the test code block, and choose to stop at the line that is at or before the failing CHECK statement.

Below you can see a screenshot of setting a breakpoint in the debugger on a statement. Simply click on the left border next to the line number in VSCode (a red dot will appear) on the line you would like for the execution of your program to stop. Once you run your code in the debugger, the execution will halt allowing you to introspect all local and global variables or to perform a step by step (line by line) execution to diagnose the issue at hand.

breakpoint

Now run the tests under the debugger. When the program stops at the breakpoint, single step through the code while watching in the variables pane to observe the changing state of your variables, using a technique just like you did in the debugging tutorial in Assignment 0.

After you understand the failure and apply a fix, run that test again. When you see the test now pass, you can celebrate having squashed that bug!

Test-driven development

We highly recommend employing test-driven development when working on your assignments. To do so, follow these steps:

identify a small, concrete task (bug to fix, feature to add, desired change in behavior)
construct tests for the desired outcome, add them to the file in which you’re currently working, and verify the current code fails these tests implement the changes in your code to complete the task
re-run your newly added tests and verify they now succeed
test the rest of the system (by running all tests) to verify you didn’t inadvertently break something else

You change only a small amount of code at once and validate your results with carefully constructed tests before and after. This keeps your development process moving forward while ensuring you have a functional program at each step!

Test cases and grading

The Catch2 framework will be supplied with each assignment, and there will be some initial test cases provided in the starter project, but you will also be expected to add your own tests.

You will submit your tests along with the code, and the grader’s review will consider the quality of your tests. We will also provide comments on your tests to help you improve your testing approach. Please incorporate our feedback into future assignments; it will improve your grade and, more importantly, your effectiveness as a programmer. We guarantee future employers will appreciate your ability to write good tests and well-tested code!

Here are some things we look for in good tests:

Are the tests comprehensive? Is all the functionality tested?
Where possible, are the tests self-contained and independent?
Did you anticipate potential problems, tricky cases, on boundary conditions?
Did you develop the tests in a good order? Did you test basic functionality before more advanced functionality? Did you take small, carefully chosen steps?

You may want to the following general guidelines:

Use the CHECK methods instead of plain assert for nicer error messages. If you don’t do this, you’ll see that the test failed but not what values caused it to fail.
Don’t cram too much into one test, test one thing at a time.
Don’t mix tests for different types and or different functions in unit tests.
Avoid making tests depend on one another. Don’t call tests from tests. Factor out common code into methods and call those.
Test “your” code, not libraries.

Common questions

Should each `CHECK` be in a `STUDENT_TEST(...)` code block of its own or can I list several within one code block?

For tests that are closely related, it may be convenient to group them together in the same code block under one test name. The tests will operate as one combined group and show up in the report as one aggregate success (if all pass) or one failure (if at least one fails).

However, there are advantages to separating each individual test case into its own code block. You will be able to choose a clear, specific name for this block. The separation isolates each test so you can easily identify exactly which cases are passing and which are failing. For example if you have

When the instructions say to “add 2 tests”, do we count each `STUDENT_TEST` or each `CHECK`?

Each use of CHECK is counted as a test case. Read the answer to the previous question for some things to consider when deciding whether to group multiple test cases under a single STUDENT_TEST group or keep separated.

What happens if my test case is bogus or malformed?

When testing your code, you should construct each test case so that a correct implementation will pass. A test case that been written to “fail” given a correct implementation is considered “bogus”. If a test case is bogus, it is usually asking the wrong question.

Suppose you have written an is_even function to determine if a number is even and you wish to test its correctness. You have written the bogus test case below that is designed to fail if the is_even function returns false on a odd input.

STUDENT_TEST("Test is_even() on odd numbers should fail") 
{
    CHECK(is_even(3));
}

If you run the above test on a correct implementation of is_even, the test will fail and the result is reported as FAILED. The only way to “pass” this test would be with a broken implementation of is_even. Confirming whether your function is actually correctness becomes very confusing if your test case is bogus.

Instead, you want to ask the question, “Does is_even return true for an even number?” and “Does is_even return false for an odd number?” The tests below are correct ways to test for both true and false results.

STUDENT_TEST("Test is_even() on even number should return true") 
{
    CHECK(is_even(8));
}

STUDENT_TEST("Test is_even() on odd number should return false") 
{
    CHECK(!is_even(13));
}

Both of these tests will pass for a correct implementation of is_even. In short, make sure to design your tests to pass not fail to demonstrate that your code is working.