SBFT/ICST Tool Competition: Self-Driving Car Testing

Aryan Prakash

University of Bern

Switzerland

Christian Birchler

University of Bern

Switzerland

Tommaso Fulcini

Politecnico di Torino

Italy

Luigi Sterace

Università degli Studi di Napoli Federico II

Italy

Sebastiano Panichella

University of Bern

Switzerland

AI4I - The Italian Institute of Artificial Intelligence for Industry

Italy

Self-Driving Car Competition

Context

Cost
Replicability

Realism
Reliability

Cost
Replicability

Realism
Reliability

Cost
Replicability

Realism
Reliability

BeamNG.tech Simulator

How is a test defined?

When is a test failing or passing?

Passed

Failed

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray;

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray; :nth-child(1) { background: orange; } @nth(9) { background: orange; } @nth(12) { background: orange; } @nth(20) { background: orange; } @nth(21) { background: orange; }

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray; :nth-child(1) { background: transparent; } @nth(9) { background: transparent; } @nth(12) { background: transparent; } @nth(20) { background: transparent; } @nth(21) { background: transparent; } @nth(2) { background: green; } @nth(5) { background: green; } @nth(8) { background: green; } @nth(11) { background: green; } @nth(14) { background: green; } @nth(18) { background: green; } @nth(24) { background: green; }

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: transparent; @nth(2) { background: green; } @nth(5) { background: green; } @nth(8) { background: green; } @nth(11) { background: green; } @nth(14) { background: green; } @nth(18) { background: green; } @nth(24) { background: green; }

Last year's competition:

What tests should be selected?

How should tests be selected?

This year:

How should tests be prioritized?

Competition code on GitHub
Troubleshooting with Issues and Discussion Forum
Docker Images of all Tools!

Infrastructure

				 			%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%%

				 			sequenceDiagram
				 			Evaluator ->>+ ToolX: initialize
				 			ToolX -->>- Evaluator: ok
				 			Evaluator ->>+ ToolX: prioritize
				 			ToolX -->>- Evaluator: return priority

				 			%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%%
				 			block-beta
				 			columns 3
				 			d("Evaluator (Docker Container)"):1
				 			blockArrowId5<["gRPC"]>(x)
				 				g("ToolX (Docker Container)"):1
				 				block:group3:3
				 				docker("Docker")
				 				end

Protocol Buffers

						
syntax = "proto3";

service CompetitionTool {
  rpc Name(Empty) returns (NameReply) {}

  rpc Initialize (stream Oracle) returns (InitializationReply) {}

// bidirectional streaming for high flexibility
  rpc Prioritize (stream SDCTestCase) returns (stream PrioritizationReply) {}
}

message Empty {}

message NameReply {
  string name = 1;
}

message Oracle {
  SDCTestCase testCase = 1;
  bool hasFailed = 2;
}

message SDCTestCase {
  string testId = 1;
  repeated RoadPoint roadPoints = 2;
}

message RoadPoint {
  int64 sequenceNumber = 1;
  float x = 2;
  float y = 3;
}

message InitializationReply {
  bool ok = 1;
}

message PrioritizationReply {
  string testId = 1;
}

						
syntax = "proto3";

service CompetitionTool {
  rpc Name(Empty) returns (NameReply) {}
  rpc Initialize (stream Oracle) returns (InitializationReply) {}
  rpc Prioritize (stream SDCTestCase) returns (stream PrioritizationReply) {}
}
...

What are the evaluation metrics?


@dataclass
class EvaluationReport:
    """ This class holds evaluation metrics of a tool."""

    test_suite_cnt: int
    benchmark: str
    time_to_initialize: float
    time_to_prioritize_tests: float
    tool_name: str
    time_to_first_fault: float | None
    time_to_last_fault: float | None
    apfd: float
    apfdc: float

Benchmark

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

32,580 Test Cases
Generated by three test generators

Experiments

The experiments are conducted on a virtual machine (VM) with 16GB of RAM, eight virtual CPUs.

Results of SBFT Competition

ITEP4SDC

Ali ihsan Güllü, Faiz Ali Shah, Dietmar Pfahl

University of Tartu, Estonia

Thank you all!

Any feedback and/or ideas are welcome for future editions.

github.com/christianbirchler-org/sdc-testing-competition