SBFT/ICST Tool Competition: Self-Driving Car Testing

Aryan Prakash
University of Bern
Switzerland
Christian Birchler
University of Bern
Switzerland
Tommaso Fulcini
Politecnico di Torino
Italy
Luigi Sterace
Università degli Studi di Napoli Federico II
Italy
Sebastiano Panichella
University of Bern
Switzerland
AI4I - The Italian Institute of Artificial Intelligence for Industry
Italy

Self-Driving Car Competition

Context

  • Cost
  • Replicability
  • Realism
  • Reliability
  • Cost
  • Replicability
  • Realism
  • Reliability
  • Cost
  • Replicability
  • Realism
  • Reliability
BeamNG.tech Simulator

How is a test defined?

How is a test defined?

When is a test failing or passing?

 

Passed

 

Failed

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray;

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray; :nth-child(1) { background: orange; } @nth(9) { background: orange; } @nth(12) { background: orange; } @nth(20) { background: orange; } @nth(21) { background: orange; }

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray; :nth-child(1) { background: transparent; } @nth(9) { background: transparent; } @nth(12) { background: transparent; } @nth(20) { background: transparent; } @nth(21) { background: transparent; }

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: gray; :nth-child(1) { background: transparent; } @nth(9) { background: transparent; } @nth(12) { background: transparent; } @nth(20) { background: transparent; } @nth(21) { background: transparent; } @nth(2) { background: green; } @nth(5) { background: green; } @nth(8) { background: green; } @nth(11) { background: green; } @nth(14) { background: green; } @nth(18) { background: green; } @nth(24) { background: green; }

Regression Testing

:doodle { width: 8em; height: 8em; gap: 10px; } background: transparent; @nth(2) { background: green; } @nth(5) { background: green; } @nth(8) { background: green; } @nth(11) { background: green; } @nth(14) { background: green; } @nth(18) { background: green; } @nth(24) { background: green; }
Last year's competition:
What tests should be selected?
How should tests be selected?
This year:
How should tests be prioritized?
  • Competition code on GitHub
  • Troubleshooting with Issues and Discussion Forum
  • Docker Images of all Tools!

Infrastructure

				 			%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%%

				 			sequenceDiagram
				 			Evaluator ->>+ ToolX: initialize
				 			ToolX -->>- Evaluator: ok
				 			Evaluator ->>+ ToolX: prioritize
				 			ToolX -->>- Evaluator: return priority

				 		
				 			%%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%%
				 			block-beta
				 			columns 3
				 			d("Evaluator (Docker Container)"):1
				 			blockArrowId5<["gRPC"]>(x)
				 				g("ToolX (Docker Container)"):1
				 				block:group3:3
				 				docker("Docker")
				 				end

				 		

Protocol Buffers

						
syntax = "proto3";

service CompetitionTool {
  rpc Name(Empty) returns (NameReply) {}

  rpc Initialize (stream Oracle) returns (InitializationReply) {}

// bidirectional streaming for high flexibility
  rpc Prioritize (stream SDCTestCase) returns (stream PrioritizationReply) {}
}

message Empty {}

message NameReply {
  string name = 1;
}

message Oracle {
  SDCTestCase testCase = 1;
  bool hasFailed = 2;
}

message SDCTestCase {
  string testId = 1;
  repeated RoadPoint roadPoints = 2;
}

message RoadPoint {
  int64 sequenceNumber = 1;
  float x = 2;
  float y = 3;
}

message InitializationReply {
  bool ok = 1;
}

message PrioritizationReply {
  string testId = 1;
}
						
					
						
syntax = "proto3";

service CompetitionTool {
  rpc Name(Empty) returns (NameReply) {}
  rpc Initialize (stream Oracle) returns (InitializationReply) {}
  rpc Prioritize (stream SDCTestCase) returns (stream PrioritizationReply) {}
}
...
						
					
What are the evaluation metrics?

@dataclass
class EvaluationReport:
    """ This class holds evaluation metrics of a tool."""

    test_suite_cnt: int
    benchmark: str
    time_to_initialize: float
    time_to_prioritize_tests: float
    tool_name: str
    time_to_first_fault: float | None
    time_to_last_fault: float | None
    apfd: float
    apfdc: float
					

Benchmark

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
  • 32,580 Test Cases
  • Generated by three test generators

Experiments

The experiments are conducted on a virtual machine (VM) with 16GB of RAM, eight virtual CPUs.

Results of SBFT Competition

ITEP4SDC

Ali ihsan Güllü, Faiz Ali Shah, Dietmar Pfahl
University of Tartu, Estonia

Thank you all!

Any feedback and/or ideas are welcome for future editions.

github.com/christianbirchler-org/sdc-testing-competition