In our previous post, Decoding Formal Description of a Simple Train Control System, we examined how to interpret a formal model defined through a Kripke structure for a simple train control system—the Ceiling Speed Monitoring Controller (CSMC). Now, we’re taking the next step by using this model to build a powerful test suite to verify an AI-generated Python implementation of the controller.

With AI tools like ChatGPT increasingly involved in code generation, rigorously testing this code is essential. So, we’ll develop the test suite and see how good the AI-generated code really is—buckle up, and let’s start testing!

Reference implementation

Before we can start testing, we need a concrete implementation of the system. Taking a modern approach, we’ll use an AI tool, ChatGPT, to provide us with a reference implementation in Python.

We’ll give the ChatGPT 4o model access to the original article, Decoding Formal Description of a Simple Train Control System, to provide context. In addition, we will restate the transition relation, $R$ , which defines the system’s behavior as follows:

$R((V_{\text{est}}, V_{\text{MRSP}}, \ell, W, EB), (V'_{\text{est}}, V'_{\text{MRSP}}, \ell', W', EB')) \equiv \\ \quad \varphi_0: (\ell = \text{NS} \land V_{\text{est}} \leq V_{\text{MRSP}} \land \ell' = \text{NS} \land W' = W \land EB' = EB) \lor \\ \quad \varphi_1: (\ell = \text{NS} \land V_{\text{est}} > V_{\text{MRSP}} \land \ell' = \text{WS} \land W' = 1 \land EB' = EB) \lor \\ \quad \varphi_2: (\ell = \text{NS} \land (g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{IS} \land W' = 1 \land EB' = 1) \lor \\ \quad \varphi_3: (\ell = \text{WS} \land V_{\text{est}} > V_{\text{MRSP}} \land \neg(g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{WS} \land W' = 1 \land EB' = EB) \lor \\ \quad \varphi_4: (\ell = \text{WS} \land V_{\text{est}} \leq V_{\text{MRSP}} \land \ell' = \text{NS} \land W' = 0 \land EB' = 0) \lor \\ \quad \varphi_5: (\ell = \text{WS} \land (g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{IS} \land W' = W \land EB' = 1) \lor \\ \quad \varphi_6: (\ell = \text{IS} \land V_{\text{est}} > 0 \land \ell' = \text{IS} \land W' = W \land EB' = 1) \lor \\ \quad \varphi_7: (\ell = \text{IS} \land V_{\text{est}} = 0 \land \ell' = \text{NS} \land W' = 0 \land EB' = 0)$

Refer to the Transition relation $R$ section in the previous blog if the transition relation is unclear.

Next, we provided ChatGPT with the following prompt to generate a Python implementation:

"Can you write a Python program that will simulate the behavior of the system described by this transition relation? Make it a CLI utility that allows the user to set initial V_MRSP and guards, then prompts the user for V_est, shows the state of the system, and exits cleanly when CTRL-C is pressed."

Below is the Python program generated by ChatGPT, which we will now test.

train_control_system.py

import sys
import signal

class TrainControlSystem:
    def __init__(self, V_MRSP, gebi1_threshold=15, gebi2_threshold=7.5):
        self.V_MRSP = V_MRSP
        self.g_ebi1_threshold = gebi1_threshold  # Threshold for gebi1 condition
        self.g_ebi2_threshold = gebi2_threshold  # Threshold for gebi2 condition
        self.V_est = 0
        self.state = "NS"  # Initial state: NS (Normal State)
        self.W = 0  # Warning
        self.EB = 0  # Emergency Brake

    def transition(self):
        # Determine emergency brake conditions
        gebi1 = self.V_MRSP > 110 and self.V_est > self.V_MRSP + self.g_ebi1_threshold
        gebi2 = self.V_MRSP <= 110 and self.V_est > self.V_MRSP + self.g_ebi2_threshold

        # Apply transition rules based on the current state
        if self.state == "NS":
            if self.V_est <= self.V_MRSP:
                self.state = "NS"  # phi_0
            elif self.V_est > self.V_MRSP:
                self.state, self.W = "WS", 1  # phi_1
            elif gebi1 or gebi2:
                self.state, self.EB = "IS", 1  # phi_2

        elif self.state == "WS":
            if self.V_est > self.V_MRSP and not (gebi1 or gebi2):
                self.state, self.W = "WS", 1  # phi_3
            elif self.V_est <= self.V_MRSP:
                self.state, self.W, self.EB = "NS", 0, 0  # phi_4
            elif gebi1 or gebi2:
                self.state, self.EB = "IS", 1  # phi_5

        elif self.state == "IS":
            if self.V_est > 0:
                self.state, self.W, self.EB = "IS", 1, 1  # phi_6
            elif self.V_est == 0:
                self.state, self.W, self.EB = "NS", 0, 0  # phi_7

        # Print the current state after the transition
        print(f"State: {self.state}, V_est: {self.V_est}, V_MRSP: {self.V_MRSP}, W: {self.W}, EB: {self.EB}")

    def set_V_est(self, V_est):
        self.V_est = V_est
        self.transition()

def signal_handler(sig, frame):
    print("\nExiting program.")
    sys.exit(0)

def main():
    # Register the signal handler for clean exit on CTRL-C
    signal.signal(signal.SIGINT, signal_handler)

    # Prompt the user for initial setup values
    V_MRSP = float(input("Enter the initial maximum restricted speed (V_MRSP): "))
    gebi1_threshold = float(input("Enter the emergency brake guard threshold for V_MRSP > 110 (default is 15): ") or 15)
    gebi2_threshold = float(input("Enter the emergency brake guard threshold for V_MRSP <= 110 (default is 7.5): ") or 7.5)

    # Initialize the system with these parameters
    system = TrainControlSystem(V_MRSP, gebi1_threshold, gebi2_threshold)

    print("\nTrain Control System initialized.")
    print("Press CTRL-C to exit.\n")

    # Continuously prompt for V_est and update the system state
    while True:
        try:
            V_est = float(input("Enter the estimated speed (V_est): "))
            system.set_V_est(V_est)
        except ValueError:
            print("Invalid input. Please enter a numerical value for V_est.")

if __name__ == "__main__":
    main()

When you run the program, it will prompt you to enter V_MRSP and two thresholds used to determine when emergency braking should be applied. Afterward, it will ask for the estimated current speed of the train, V_est, display the system’s current state, and then wait for the next V_est reading. You can stop the program at any time by pressing CTRL-C.

1	$ python train_control_system.py

Enter the initial maximum restricted speed (V_MRSP): 100
Enter the emergency brake guard threshold for V_MRSP > 110 (default is 15): 15
Enter the emergency brake guard threshold for V_MRSP <= 110 (default is 7.5): 7.5

Train Control System initialized.
Press CTRL-C to exit.

Enter the estimated speed (V_est):  80
State: NS, V_est: 80.0, V_MRSP: 100.0, W: 0, EB: 0
Enter the estimated speed (V_est):

Building test input sequences

With the reference implementation in hand, we’re ready to verify that it meets the system’s behavioral requirements. Fortunately, since we have a formal definition of the system, these requirements are as precise as they can be.

Refer to "Decoding Formal Description of a Simple Train Control System" for the full formal model.

What should we use from the formal definition to build our test sequences? The answer is straightforward: we need the set of atomic propositions $AP$ to create input equivalence class partitions (IECP).

For a guide on creating input equivalence class partitions, see Using Atomic Propositions and Equivalence Classes (Part 1) and Using Atomic Propositions and Equivalence Classes (Part 2).

Below is our set of atomic propositions:

$AP = \{ V_{\text{est}} = 0, V_{\text{est}} > V_{\text{MRSP}}, V_{\text{MRSP}} > 110, V_{\text{est}} > V_{\text{MRSP}} + 7.5, V_{\text{est}} > V_{\text{MRSP}} + 15, \\ \quad \ell = \text{NS}, \ell = \text{WS}, \ell = \text{IS}, W, EB \}$

To construct input equivalence class partitions, we only need to focus on the propositions directly related to the inputs $V_{\text{est}}$ and $V_{\text{MRSP}}$ . Therefore, our input set becomes:

$AP_I = \{ V_{\text{est}} = 0, V_{\text{est}} > V_{\text{MRSP}}, V_{\text{MRSP}} > 110, V_{\text{est}} > V_{\text{MRSP}} + 7.5, V_{\text{est}} > V_{\text{MRSP}} + 15 \}$

Let’s construct the initial table of input equivalence class partitions (IECP). This table includes every combination of each atomic proposition related to the inputs being either true or false, representing the Cartesian product of all possible input states.

For example,

$IECP = \{ V_{\text{est}} = 0, \neg(V_{\text{est}} = 0) \} \times \\ \quad \{ V_{\text{est}} > V_{\text{MRSP}}, \neg(V_{\text{est}} > V_{\text{MRSP}}) \} \times \\ \quad \{ V_{\text{MRSP}} > 110, \neg(V_{\text{MRSP}} > 110) \} \times \\ \quad \{ V_{\text{est}} > V_{\text{MRSP}} + 7.5, \neg(V_{\text{est}} > V_{\text{MRSP}} + 7.5) \} \times \\ \quad \{ V_{\text{est}} > V_{\text{MRSP}} + 15, \neg(V_{\text{est}} > V_{\text{MRSP}} + 15) \}$

Now, we can construct a table explicitly listing all $2^5 = 32$ possible classes. From this set, we can then filter out combinations that have logical conflicts and select specific values for $V_{\text{est}}$ and $V_{\text{MRSP}}$ that satisfy each class. Alternatively, we could visualize these combinations graphically, as shown below.

For $V_{\text{MRSP}} > 110$ , we can use the following values:

$V_{\text{MRSP}} = 120, V_{\text{est}} = \{0, 50, 123, 132, 140\}$

For $V_{\text{MRSP}} \leq 110$ , we can use the following values:

$V_{\text{MRSP}} = 100, V_{\text{est}} = \{0, 50, 105, 113, 120\}$

These values are chosen to cover each region of interest. Since the estimated speed cannot simultaneously fall into multiple regions, most of the $2^5 = 32$ combinations are invalid. Only $10$ classes remain as valid.

Another way to work this out

If we assign each atomic proposition a short $p_i$ name, we get the following definitions:

$p_0$ : $V_{\text{MRSP}} > 110$ — The MRSP is greater than 110.
$p_1$ : $V_{\text{est}} = 0$ — The estimated speed of the train is zero.
$p_2$ : $V_{\text{est}} > V_{\text{MRSP}}$ — The estimated speed exceeds the maximum restricted speed (MRSP).
$p_3$ : $V_{\text{est}} > V_{\text{MRSP}} + 7.5$ — The estimated speed exceeds the MRSP by more than 7.5.
$p_4$ : $V_{\text{est}} > V_{\text{MRSP}} + 15$ — The estimated speed exceeds the MRSP by more than 15.

From the table that includes all $32$ classes, only the following valid combinations remain:

Class	$p_0$	$p_1$	$p_2$	$p_3$	$p_4$	Explanation
1	$p_0$	$p_1$	$\neg p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} > 110$ , $V_{\text{est}} = 0$
2	$p_0$	$\neg p_1$	$\neg p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} > 110$ , no thresholds exceeded
3	$p_0$	$\neg p_1$	$p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} > 110$ , $V_{\text{est}} > V_{\text{MRSP}}$ without exceeding other thresholds
4	$p_0$	$\neg p_1$	$p_2$	$p_3$	$\neg p_4$	$V_{\text{MRSP}} > 110$ , $V_{\text{est}} > V_{\text{MRSP}}$ exceeding 7.5 but not 15
5	$p_0$	$\neg p_1$	$p_2$	$p_3$	$p_4$	$V_{\text{MRSP}} > 110$ , exceeding all thresholds
6	$\neg p_0$	$p_1$	$\neg p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} <= 110$ , $V_{\text{est}} = 0$
7	$\neg p_0$	$\neg p_1$	$\neg p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} <= 110$ , no thresholds exceeded
8	$\neg p_0$	$\neg p_1$	$p_2$	$\neg p_3$	$\neg p_4$	$V_{\text{MRSP}} <= 110$ , $V_{\text{est}} > V_{\text{MRSP}}$ without exceeding other thresholds
9	$\neg p_0$	$\neg p_1$	$p_2$	$p_3$	$\neg p_4$	$V_{\text{MRSP}} <= 110$ , $V_{\text{est}} > V_{\text{MRSP}}$ exceeding 7.5 but not 15
10	$\neg p_0$	$\neg p_1$	$p_2$	$p_3$	$p_4$	$V_{\text{MRSP}} <= 110$ , exceeding all thresholds

Then, you simply select values for each input that satisfy the conditions of each input equivalence class partition.

With our test input values defined, we can start writing the test code to automate all possible user actions, including:

Starting and stopping the program
Setting V_MRSP and threshold values
Entering estimated speed values
Verifying the normal status state
Verifying the warning status state
Verifying the emergency braking (intervention) status state

Since we’re working with a command-line interface (CLI) program, we’ll use the testflows.connect module to control it with the send and expect methods. Additionally, we’ll define a State class to encapsulate all elements of the controller’s state into a single object. This class will be crucial when we implement the controller’s behavior model, which will serve as a test oracle to provide the expected state for any combination of controller behavior.

Here is the implementation of all actions, including the State class, in Python:

import re
from testflows.core import *
from testflows.combinatorics import *
from testflows.connect import Shell

class State:
    def __init__(self, State, V_est, V_MRSP, W, EB):
        self.State = State
        self.V_est = float(V_est)
        self.V_MRSP = float(V_MRSP)
        self.W = int(W)
        self.EB = int(EB)

    def __str__(self):
        return f"State={self.State},V_est={self.V_est},V_MRSP={self.V_MRSP},W={self.W},EB={self.EB}"


@TestStep(Given)
def start_program(self):
    """Start the train controm system program."""
    with Shell() as bash:
        with bash("python train_control_system.py", asynchronous=True) as controller:
            self.context.app = controller.app
            try:
                yield
            finally:
                with By("stopping the program with CTRL-C"):
                    self.context.app.child.send("\03")
                    self.context.app.child.expect("Exiting program.")
                    self.context.app.child.expect("\n", escape=False)
                    bash.expect(bash.prompt)


@TestStep(When)
def enter_speed(self, v_est):
    """Enter the estimated speed and then
    read and return current system state."""
    self.context.app.child.send(str(v_est))
    self.context.app.child.expect(str(v_est))
    self.context.app.child.expect("Enter the estimated speed \(V_est\)\:")
    # State: NS, V_est: 60.0, V_MRSP: 100.0, W: 0, EB: 0
    pattern = re.compile(
        r"State: (?P<State>\w+), "
        r"V_est: (?P<V_est>[\d.]+), "
        r"V_MRSP: (?P<V_MRSP>[\d.]+), "
        r"W: (?P<W>\d+), "
        r"EB: (?P<EB>\d+)"
    )
    return State(**pattern.match(self.context.app.child.before.strip()).groupdict())


@TestStep(Given)
def enter_vmrsp_and_thresholds(self, v_mrsp, g_ebi1_threshold=15, g_ebi2_threshold=7.5):
    """Enter V_MRSP and guard 1 and 2 thresholds."""
    app = self.context.app

    app.child.expect("Enter the initial maximum restricted speed \(V_MRSP\)\:")
    app.child.send(str(v_mrsp))
    app.child.expect("\n", escape=False)

    app.child.expect(
        "Enter the emergency brake guard threshold for V_MRSP > 110 \(default is 15\)"
    )
    app.child.send(str(g_ebi1_threshold))
    app.child.expect("\n", escape=False)

    app.child.expect(
        "Enter the emergency brake guard threshold for V_MRSP <= 110 \(default is 7.5\)"
    )
    app.child.send(str(g_ebi2_threshold))
    app.child.expect("\n", escape=False)

    app.child.expect("Press CTRL-C to exit.")
    app.child.expect("\n", escape=False)
    app.child.expect("Enter the estimated speed \(V_est\)\:")


@TestStep(Then)
def expect_normal_status(self, state, W, EB):
    """Expect normal status."""
    assert state == "NS", f"expected state NS but got {state}"
    assert W == 0, f"expected W=0 but got {W}"
    assert EB == 0, f"expected EB=0 but got {EB}"


@TestStep(Then)
def expect_warning_status(self, state, W, EB):
    """Expect warning status."""
    assert state == "WS", f"expected state WS but got {state}"
    assert W == 1, f"expected W=1 but got {W}"
    assert EB == 0, f"expected EB=0 but got {EB}"


@TestStep(Then)
def expect_emergency_brake_status(self, state, W, EB):
    """Expect emergency brake status."""
    assert state == "IS", f"expected state IS but got {state}"
    assert W == 1, f"expected W=1 but got {W}"
    assert EB == 1, f"expected EB=1 but got {EB}"

The behavior model

To check the numerous combinations of inputs—potentially hundreds—we need a behavior model that can accurately compute the expected state of the system for each case.

Refer to Combinatorial Testing: Writing Behavior Model for an introduction to writing a behavior model.

This behavior model is derived from the system’s transition relation $R$ . Here it is again for easy reference, allowing you to compare it directly with the code implementation below:

$R((V_{\text{est}}, V_{\text{MRSP}}, \ell, W, EB), (V'_{\text{est}}, V'_{\text{MRSP}}, \ell', W', EB')) \equiv \\ \quad \varphi_0: (\ell = \text{NS} \land V_{\text{est}} \leq V_{\text{MRSP}} \land \ell' = \text{NS} \land W' = W \land EB' = EB) \lor \\ \quad \varphi_1: (\ell = \text{NS} \land V_{\text{est}} > V_{\text{MRSP}} \land \ell' = \text{WS} \land W' = 1 \land EB' = EB) \lor \\ \quad \varphi_2: (\ell = \text{NS} \land (g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{IS} \land W' = 1 \land EB' = 1) \lor \\ \quad \varphi_3: (\ell = \text{WS} \land V_{\text{est}} > V_{\text{MRSP}} \land \neg(g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{WS} \land W' = 1 \land EB' = EB) \lor \\ \quad \varphi_4: (\ell = \text{WS} \land V_{\text{est}} \leq V_{\text{MRSP}} \land \ell' = \text{NS} \land W' = 0 \land EB' = 0) \lor \\ \quad \varphi_5: (\ell = \text{WS} \land (g_{\text{ebi1}} \lor g_{\text{ebi2}}) \land \ell' = \text{IS} \land W' = W \land EB' = 1) \lor \\ \quad \varphi_6: (\ell = \text{IS} \land V_{\text{est}} > 0 \land \ell' = \text{IS} \land W' = W \land EB' = 1) \lor \\ \quad \varphi_7: (\ell = \text{IS} \land V_{\text{est}} = 0 \land \ell' = \text{NS} \land W' = 0 \land EB' = 0)$

class Model:
    """Ceiling Speed Monitoring Contoller (CSMC) behavior model."""

    def __init__(self, V_MRSP_threshold=110, g_ebi1_threshold=15, g_ebi2_threshold=7.5):
        self.V_MRSP_threshold = V_MRSP_threshold
        self.g_ebi1_threshold = g_ebi1_threshold
        self.g_ebi2_threshold = g_ebi2_threshold

    def g_ebi1(self, current_state):
        """Guard 1 condition."""
        return (
            current_state.V_MRSP > self.V_MRSP_threshold
            and current_state.V_est > current_state.V_MRSP + self.g_ebi1_threshold
        )

    def g_ebi2(self, current_state):
        """Guard 2 condition."""
        return (
            current_state.V_MRSP <= self.V_MRSP_threshold
            and current_state.V_est > current_state.V_MRSP + self.g_ebi2_threshold
        )

    def expect_normal_status(self, behavior):
        """Expect normal status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State == "NS":
            if current_state.V_est <= current_state.V_MRSP:
                return expect_normal_status
        elif prev_state.State == "WS":
            if current_state.V_est <= current_state.V_MRSP:
                return expect_normal_status
        elif prev_state.State == "IS":
            if current_state.V_est == 0:
                return expect_normal_status

    def expect_warning_status(self, behavior):
        """Expect warning status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State in ("NS", "WS"):
            if current_state.V_est > current_state.V_MRSP:
                if not (self.g_ebi1(current_state) or self.g_ebi2(current_state)):
                    return expect_warning_status

    def expect_emergency_brake_status(self, behavior):
        """Expect emergency brake status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State in "NS":
            if self.g_ebi1(current_state) or self.g_ebi2(current_state):
                return expect_emergency_brake_status
        elif prev_state.State == "WS":
            if self.g_ebi1(current_state) or self.g_ebi2(current_state):
                return expect_emergency_brake_status
        elif prev_state.State == "IS":
            if current_state.V_est > 0:
                return expect_emergency_brake_status

    def expect(self, behavior):
        """Return expected behavior."""
        return (
            self.expect_normal_status(behavior)
            or self.expect_warning_status(behavior)
            or self.expect_emergency_brake_status(behavior)
        )

To use this model, we’ll define a new check_result action that takes a specified behavior and uses the model to determine the expected state of the controller.

@TestStep(Then)
def check_result(self, behavior, model):
    """Check the result of the train control system."""
    current_state = behavior[-1]
    expect = model.expect(behavior)
    debug(f"{current_state},expect={expect.__name__ if expect else None}")
    expect(state=current_state.State, W=current_state.W, EB=current_state.EB)

Writing the test to check all equivalence classes

With the inputs selected to cover each equivalence class, we can now write the actual test. Since the system is stateful, we need to account for transitions between states as the system progresses. Given that the system has only three internal states, we can apply the Pigeonhole principle, just as we did in the Determining the minimum number of calls section when testing the memory function in the previous article.

The test itself will consist of two parts: the TestOutline, which defines the overall structure of the test, and individual checks for specific combinations of $V_{\text{MRSP}}$ , threshold values, and sequences of estimated speed readings.

Here is the outline:

@TestOutline(Scenario)
def check_combination(self, v_mrsp, g_ebi1_threshold, g_ebi2_threshold, speed_sequence):
    """Check specific combination of the controller behavior."""

    model = Model(g_ebi1_threshold=g_ebi1_threshold, g_ebi2_threshold=g_ebi2_threshold)
    behavior = []

    with Given("I start the train control system program"):
        start_program()

    with And("I set V_MRSP and guard thresholds"):
        enter_vmrsp_and_thresholds(
            v_mrsp=v_mrsp,
            g_ebi1_threshold=g_ebi1_threshold,
            g_ebi2_threshold=g_ebi2_threshold,
        )
        # add the initial state
        behavior.append(State(State="NS", V_est=0, V_MRSP=v_mrsp, W=0, EB=0))

    for i, v_est in enumerate(speed_sequence):
        with When(f"step {i}: speed is {v_est}"):
            state = enter_speed(v_est=v_est)
            behavior.append(state)
            check_result(behavior=behavior, model=model)

The outline above will be used in the check_all_combinations test, which will generate all the necessary combinations to test each value of $V_{\text{MRSP}}$ , threshold settings, and estimated speed values corresponding to each selected $V_{\text{MRSP}}$ .

@TestFeature
def check_all_combinations(self):
    """Test train control system."""
    v_mrsps = {
        100: (0,50,123,132,140),
        120: (0,50,105,113,120)
    }
    thresholds = [(15, 7.5)]
    i = 0

    for v_mrsp, (g_ebi1_threshold, g_ebi2_threshold) in product(list(v_mrsps.keys()), thresholds):
        speeds = v_mrsps[v_mrsp]
        for speed_sequence in product(speeds, repeat=3):
            Combination(
                f"#{i}: V_MRSP={v_mrsp},g_ebi1={g_ebi1_threshold},g_ebi2={g_ebi2_threshold},speeds={speed_sequence}",
                test=check_combination,
            )(
                v_mrsp=v_mrsp,
                g_ebi1_threshold=g_ebi1_threshold,
                g_ebi2_threshold=g_ebi2_threshold,
                speed_sequence=speed_sequence,
            )
            i += 1

With all the pieces in place, we’re ready to assemble the full test program.

The full test program

Here is the complete test program, which we can use to verify the accuracy of the reference implementation produced by ChatGPT.

test_train_control_system.py

import re
from testflows.core import *
from testflows.combinatorics import *
from testflows.connect import Shell


class State:
    def __init__(self, State, V_est, V_MRSP, W, EB):
        self.State = State
        self.V_est = float(V_est)
        self.V_MRSP = float(V_MRSP)
        self.W = int(W)
        self.EB = int(EB)

    def __str__(self):
        return f"State={self.State},V_est={self.V_est},V_MRSP={self.V_MRSP},W={self.W},EB={self.EB}"


@TestStep(Given)
def start_program(self):
    """Start the train controm system program."""
    with Shell() as bash:
        with bash("python train_control_system.py", asynchronous=True) as controller:
            self.context.app = controller.app
            try:
                yield
            finally:
                with By("stopping the program with CTRL-C"):
                    self.context.app.child.send("\03")
                    self.context.app.child.expect("Exiting program.")
                    self.context.app.child.expect("\n", escape=False)
                    bash.expect(bash.prompt)


@TestStep(When)
def enter_speed(self, v_est):
    """Enter the estimated speed and then
    read and return current system state."""
    self.context.app.child.send(str(v_est))
    self.context.app.child.expect(str(v_est))
    self.context.app.child.expect("Enter the estimated speed \(V_est\)\:")
    # State: NS, V_est: 60.0, V_MRSP: 100.0, W: 0, EB: 0
    pattern = re.compile(
        r"State: (?P<State>\w+), "
        r"V_est: (?P<V_est>[\d.]+), "
        r"V_MRSP: (?P<V_MRSP>[\d.]+), "
        r"W: (?P<W>\d+), "
        r"EB: (?P<EB>\d+)"
    )
    return State(**pattern.match(self.context.app.child.before.strip()).groupdict())


@TestStep(Given)
def enter_vmrsp_and_thresholds(self, v_mrsp, g_ebi1_threshold=15, g_ebi2_threshold=7.5):
    """Enter V_MRSP and guard 1 and 2 thresholds."""
    app = self.context.app

    app.child.expect("Enter the initial maximum restricted speed \(V_MRSP\)\:")
    app.child.send(str(v_mrsp))
    app.child.expect("\n", escape=False)

    app.child.expect(
        "Enter the emergency brake guard threshold for V_MRSP > 110 \(default is 15\)"
    )
    app.child.send(str(g_ebi1_threshold))
    app.child.expect("\n", escape=False)

    app.child.expect(
        "Enter the emergency brake guard threshold for V_MRSP <= 110 \(default is 7.5\)"
    )
    app.child.send(str(g_ebi2_threshold))
    app.child.expect("\n", escape=False)

    app.child.expect("Press CTRL-C to exit.")
    app.child.expect("\n", escape=False)
    app.child.expect("Enter the estimated speed \(V_est\)\:")


@TestStep(Then)
def expect_normal_status(self, state, W, EB):
    """Expect normal status."""
    assert state == "NS", f"expected state NS but got {state}"
    assert W == 0, f"expected W=0 but got {W}"
    assert EB == 0, f"expected EB=0 but got {EB}"


@TestStep(Then)
def expect_warning_status(self, state, W, EB):
    """Expect warning status."""
    assert state == "WS", f"expected state WS but got {state}"
    assert W == 1, f"expected W=1 but got {W}"
    assert EB == 0, f"expected EB=0 but got {EB}"


@TestStep(Then)
def expect_emergency_brake_status(self, state, W, EB):
    """Expect emergency brake status."""
    assert state == "IS", f"expected state IS but got {state}"
    assert W == 1, f"expected W=1 but got {W}"
    assert EB == 1, f"expected EB=1 but got {EB}"


@TestStep(Then)
def check_result(self, behavior, model):
    """Check the result of the train control system."""
    current_state = behavior[-1]
    expect = model.expect(behavior)
    debug(f"{current_state},expect={expect.__name__ if expect else None}")
    expect(state=current_state.State, W=current_state.W, EB=current_state.EB)


class Model:
    """Ceiling Speed Monitoring Contoller (CSMC) behavior model."""

    def __init__(self, V_MRSP_threshold=110, g_ebi1_threshold=15, g_ebi2_threshold=7.5):
        self.V_MRSP_threshold = V_MRSP_threshold
        self.g_ebi1_threshold = g_ebi1_threshold
        self.g_ebi2_threshold = g_ebi2_threshold

    def g_ebi1(self, current_state):
        """Guard 1 condition."""
        return (
            current_state.V_MRSP > self.V_MRSP_threshold
            and current_state.V_est > current_state.V_MRSP + self.g_ebi1_threshold
        )

    def g_ebi2(self, current_state):
        """Guard 2 condition."""
        return (
            current_state.V_MRSP <= self.V_MRSP_threshold
            and current_state.V_est > current_state.V_MRSP + self.g_ebi2_threshold
        )

    def expect_normal_status(self, behavior):
        """Expect normal status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State == "NS":
            if current_state.V_est <= current_state.V_MRSP:
                return expect_normal_status
        elif prev_state.State == "WS":
            if current_state.V_est <= current_state.V_MRSP:
                return expect_normal_status
        elif prev_state.State == "IS":
            if current_state.V_est == 0:
                return expect_normal_status

    def expect_warning_status(self, behavior):
        """Expect warning status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State in ("NS", "WS"):
            if current_state.V_est > current_state.V_MRSP:
                if not (self.g_ebi1(current_state) or self.g_ebi2(current_state)):
                    return expect_warning_status

    def expect_emergency_brake_status(self, behavior):
        """Expect emergency brake status."""
        current_state = behavior[-1]
        prev_state = behavior[-2]
        if prev_state.State in "NS":
            if self.g_ebi1(current_state) or self.g_ebi2(current_state):
                return expect_emergency_brake_status
        elif prev_state.State == "WS":
            if self.g_ebi1(current_state) or self.g_ebi2(current_state):
                return expect_emergency_brake_status
        elif prev_state.State == "IS":
            if current_state.V_est > 0:
                return expect_emergency_brake_status

    def expect(self, behavior):
        """Return expected behavior."""
        return (
            self.expect_normal_status(behavior)
            or self.expect_warning_status(behavior)
            or self.expect_emergency_brake_status(behavior)
        )


@TestOutline(Scenario)
def check_combination(self, v_mrsp, g_ebi1_threshold, g_ebi2_threshold, speed_sequence):
    """Check specific combination of the controller behavior."""

    model = Model(g_ebi1_threshold=g_ebi1_threshold, g_ebi2_threshold=g_ebi2_threshold)
    behavior = []

    with Given("I start the train control system program"):
        start_program()

    with And("I set V_MRSP and guard thresholds"):
        enter_vmrsp_and_thresholds(
            v_mrsp=v_mrsp,
            g_ebi1_threshold=g_ebi1_threshold,
            g_ebi2_threshold=g_ebi2_threshold,
        )
        # add the initial state
        behavior.append(State(State="NS", V_est=0, V_MRSP=v_mrsp, W=0, EB=0))

    for i, v_est in enumerate(speed_sequence):
        with When(f"step {i}: speed is {v_est}"):
            state = enter_speed(v_est=v_est)
            behavior.append(state)
            check_result(behavior=behavior, model=model)


@TestFeature
def check_all_combinations(self):
    """Test train control system."""
    v_mrsps = {
        100: (0,50,123,132,140),
        120: (0,50,105,113,120)
    }
    thresholds = [(15, 7.5)]
    i = 0

    for v_mrsp, (g_ebi1_threshold, g_ebi2_threshold) in product(list(v_mrsps.keys()), thresholds):
        speeds = v_mrsps[v_mrsp]
        for speed_sequence in product(speeds, repeat=3):
            Combination(
                f"#{i}: V_MRSP={v_mrsp},g_ebi1={g_ebi1_threshold},g_ebi2={g_ebi2_threshold},speeds={speed_sequence}",
                test=check_combination,
            )(
                v_mrsp=v_mrsp,
                g_ebi1_threshold=g_ebi1_threshold,
                g_ebi2_threshold=g_ebi2_threshold,
                speed_sequence=speed_sequence,
            )
            i += 1


@TestModule
def regression(self):
    "Exhaustively test train control system." ""
    Feature(test=check_all_combinations)()


if main():
    regression()

Running the test program

The moment of truth has arrived! Is ChatGPT’s implementation of the controller correct? Let’s run our comprehensive test program:

1	python3 test_train_control_system.py -o slick

First bug

And, either unfortunately or fortunately for our test program, we catch the first bug!

➤ Module regression
  ➤ Feature check all combinations
    ✔ Combination #0﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 0）
    ✔ Combination #1﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 50）
    ✘ Combination #2﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 123）, AssertionError
  ✘ Feature check all combinations, AssertionError
✘ Module regression, AssertionError
  AssertionError: expected state IS but got WS

The output indicates that we’ve found a case where we expected the controller to enter the emergency braking (intervention) status (IS) state, but instead, it was in the warning status (WS) state. Not good!

After reviewing the code, we identify the issue:

# Apply transition rules based on the current state
if self.state == "NS":
    if self.V_est <= self.V_MRSP:
        self.state = "NS"  # phi_0
    elif self.V_est > self.V_MRSP:
        self.state, self.W = "WS", 1  # phi_1
    # BUG: this needs to be checked before self.V_est > self.V_MRSP!
    elif gebi1 or gebi2: 
        self.state, self.EB = "IS", 1  # phi_2

The code checks the gebi1 or gebi2 condition after self.V_est > self.V_MRSP, which is incorrect! It should first check for gebi1 or gebi2.

Second bug

Let’s apply the fix to train_control_system.py and try again.

But we encounter another issue!

➤ Module regression
  ➤ Feature check all combinations
    ✔ Combination #0﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 0）
    ✔ Combination #1﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 50）
    ✘ Combination #2﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 123）, AssertionError
  ✘ Feature check all combinations, AssertionError
✘ Module regression, AssertionError
  AssertionError: expected W=1 but got 0

This time, the test program reveals that the warning signal (W) was not set when expected. Upon further investigation, we find that the controller entered the emergency braking (intervention) status state (IS) without setting the warning signal (W) to alert the driver.

It turns out the same block of code is to blame again—it had two bugs!

# Apply transition rules based on the current state
if self.state == "NS":
    if self.V_est <= self.V_MRSP:
        self.state = "NS"  # phi_0
    elif self.V_est > self.V_MRSP:
        self.state, self.W = "WS", 1  # phi_1
    # BUG: this needs to be checked before self.V_est > self.V_MRSP!
    elif gebi1 or gebi2: 
        # BUG: warning singal self.W is not set to 1 when moving to the IS state!
        self.state, self.EB = "IS", 1  # phi_2

Fixing both bugs

After applying both fixes to the code, we finally get the following:

# Apply transition rules based on the current state
if self.state == "NS":
    if self.V_est <= self.V_MRSP:
        self.state = "NS"  # phi_0
    elif gebi1 or gebi2: 
        self.state, self.W, self.EB = "IS", 1, 1  # phi_2
    elif self.V_est > self.V_MRSP:
        self.state, self.W = "WS", 1  # phi_1

Running the test program again, we see that it no longer fails immediately. This time, it actually completes successfully!

$ python3 test_train_control_system.py -o slick
➤ Module regression
  ➤ Feature check all combinations
    ✔ Combination #0﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 0）
    ✔ Combination #1﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 50）
    ✔ Combination #2﹕ V_MRSP=100,g_ebi1=15,g_ebi2=7․5,speeds=（0, 0, 123）
    ...
    ✔ Combination #247﹕ V_MRSP=120,g_ebi1=15,g_ebi2=7․5,speeds=（120, 120, 105）
    ✔ Combination #248﹕ V_MRSP=120,g_ebi1=15,g_ebi2=7․5,speeds=（120, 120, 113）
    ✔ Combination #249﹕ V_MRSP=120,g_ebi1=15,g_ebi2=7․5,speeds=（120, 120, 120）
  ✔ Feature check all combinations
✔ Module regression
...
1 module (1 ok)
1 feature (1 ok)
250 combinations (250 ok)

The test program successfully checked all $2 \times 5^3 = 250$ input combinations.

Conclusion

In this post, we created a comprehensive test based on the formal description of the Ceiling Speed Monitoring Controller (CSMC) discussed in the Decoding Formal Description of a Simple Train Control System article. By using the set of atomic propositions derived from the Kripke structure and creating input equivalence class partitions from it, our test successfully caught two critical bugs in the AI-generated reference implementation.

These bugs were subtle and would have been challenging to identify through visual code review alone. However, by employing rigorous testing techniques rooted in formal methods, we effectively identified both issues. That said, this test program does not guarantee the discovery of all possible bugs in any given implementation of the controller. The test program achieves thorough coverage only when the set of atomic propositions aligns perfectly with the model. Since we cannot always know the actual atomic propositions for a specific implementation, additional testing techniques are necessary to uncover any hidden atomic propositions that might exist in the code. This is what makes testing both challenging and rewarding!

Stop Bashing Bugs – It's All Your Fault!

You’re running your tests, and there it is again — a red light, a failing check, you’ve found anothe...

What Do Condition, Proposition, Predicate, Property, and Invariant Really Mean?

In software testing, you often hear terms like “condition,” “proposition,” “predicate,” “property,” ...

Transforming Testing Behavior Models into Formal Models

Most software systems do not come with formal models. However, formal models are invaluable for cons...

Decoding Formal Description of a Simple Train Control System

In scientific research on software testing, you often come across papers filled with complex mathema...

Using Atomic Propositions and Equivalence Classes (Part 1)

Atomic propositions and equivalence classes offer a practical solution for making testing manageable...

Using Atomic Propositions and Equivalence Classes (Part 2)

Building on the foundations from Part 1 🛸, where we introduced atomic propositions and equivalence ...

Combinatorial Testing: Writing Behavior Model

Combinatorial testing significantly increases the test coverage of software systems, achieving level...

Sketching Combinations For Combinatorial Tests

Writing combinatorial tests usually requires that the author of the test plans for combinatorial tes...

Combinatorial Testing: The Introduction

According to the US National Institute of Standards and Technology’s (NIST) page on Combinatorial Te...

BDD or not to BDD, or TDD, or even ATDD. How does it affect your QA team?

How do software development processes such as behavior-driven development (BDD), test-driven develop...

Get Your Software Covered Using Covering Arrays

For software systems with any non-trivial number of parameters, exhaustively testing all possible co...

Working With Requirements Just Like With Code

“Requirements are the starting point of any design”, this is a quote from the presentation given by ...

Making Your Tests Better or How to Break Your Tests Into Steps

Writing tests is not as easy task as it may seem at first glance. In many cases, writing a good test...

Testing Simple Train Control System Using Its Formal Description