Software-defined cars will need various mechanisms to keep the vehicle safe and operational under all circumstances. Proprietary solutions for these mechanisms require large verification efforts and are hard to integrate with diverse software architectures. Is there a standardized software framework for safety-critical distributed communications?
For many years, the traditional automotive systems have been adding weakly programmable electronic control units (ECUs) that perform isolated functions. At present, however, the advanced automotive design is starting to evolve towards flexible and interoperable software distributed across only a few (zonal) processors. The distributed software performs coordinated tasks of automated driving, infotainment, powertrain and body control, while sharing processors, networks and sensors to reduce the system cost. The transition to software-defined cars is one of the most significant trends in the automotive industry, making software features a key differentiator .
To compete in this market, car manufacturers need to quickly and easily build modular distributed applications, which require programmable, reliable and cost-effective semiconductor devices to run on. Therefore, standardized software platforms with easy-to-use application programming interfaces (APIs), such as POSIX and AUTOSAR , are becoming more popular. A key component in these software platforms is the middleware, the software layer between the various operating systems and high-level applications (see the figure below). Simply put, the middleware is a software library that enables distributed system components to communicate with each other. The safety of software-defined cars highly depends on the middleware and the underlying network processors for reliable real-time data communication among distributed processes.
State-of-the-art automated driving (AD) systems often adopt the dual-channel architecture for redundancy, i.e. a fallback channel is implemented next to the main channel that controls the AD system in normal situations. If the main channel fails, the vehicle control would switch back to the fallback channel. This way both safety and availability of the AD system are enhanced. Such an architecture requires a safety checker to verify the health status of the main channel and trigger a safety mechanism, such as a safe stop of the vehicle, when necessary. Obviously, the safety checker’s computation and communication are safety-critical, which sets high demands on its fault tolerance and reliability.
The NXP S32G vehicle network processors are an ideal fit for implementing highly reliable AD systems with various safety mechanisms. The Arm® Cortex®-A53 cores in the S32G offer high-performance computing capabilities and the ASIL D Cortex-M7 safety cores are suitable for running safety-critical functionality in the lockstep mode. Moreover, the SJA1110 Ethernet switch integrated on the S32G GoldBox reference design for service-oriented gateways offers time sensitive networking (TSN) features for real-time and reliable communication to the higher-level AD applications distributed on the network.
Besides high integrity hardware, the data distribution service (DDS) middleware software running across the Cortex-A53 and Cortex-M7 cores in the S32G manages the data and communication of the distributed system. The DDS middleware protocol is based on the publish-subscribe pattern that is standardized by the object management group® (OMG). DDS has been integrated into various key automotive platform ecosystems, such as AUTOSAR Adaptive and ROS2. DDS provides low-latency data connectivity, reliability and scalable data-centric communication. Moreover, DDS comes with a rich set of built-in quality of service (QoS) policies that control the DDS behavior, such as resource consumption and communication reliability. To learn the fundamentals of DDS and the QoS policies, you can try the interactive Shapes demo application or view the demo video .
Note that DDS for an extremely resource-constrained environments is implemented using the OMG DDS-XRCE protocol . This is a client-to-agent protocol, meaning the DDS-XRCE client node talks to the DDS network via an external agent node. DDS-XRCE is ideal for developing lightweight DDS applications for IoT devices, but the agent can become a single point of failure when used in safety-critical systems. RTI Connext® DDS Micro running on S32G Cortex-M7, however, talks directly to the full-fledged DDS network without any bridge or broker, thus eliminating a single point of failure. RTI Connext DDS Micro can also be built and integrated in ISO 26262 automotive safety contexts up to ASIL D.
Here are a few DDS QoS policies that are particularly interesting for implementing a redundant automated driving channel:
The DDS built-in QoS policies are ready for use once the DDS middleware layer is in place. This eases the development process and highly improves the interoperability and reusability of the software components. There are several variations of DDS distributions that suit different system requirements of the distributed AD components. Implementing DDS across the distributed AD system establishes both a common communication and data management framework and also provides increased system diversity with little effort. In addition, the system built on top of DDS can be easily modeled and configured using one single DDS XML file. The XML file format makes system development easier and helps the architects and the application developers design the software-defined car at the system level.
When combined properly, DDS QoS policies can be used to enable various fault handling mechanisms and safety measures against performance limitations. The DDS middleware layer establishes a common framework for all the AD components running on top of it. Various safety mechanisms at different scales can be implemented without much engineering effort, such as the fail-over to a complete redundant AD channel or the seamless takeover of components. Below we elaborate on safety mechanisms which are implemented in our proof-of-concept demo setup.
Fail-over is a widely used safety mechanism in safety-critical systems. It often relies on fail-silent components, which stop producing output when they fail. Typically, when the main AD channel silently fails, the system should fall back to the redundant safety channel, which maneuvers the vehicle to a safe state. This mechanism can be implemented using DDS Liveliness and Ownership QoS policies. If the vehicle control DataWriter in the main channel silently fails or loses communication with the rest of the system, the samples produced by the safety channel’s DataWriter with a lower ownership strength will automatically become visible to the vehicle actuators and will start controlling the vehicle seamlessly. Meanwhile, the change of the DDS network Liveliness due to the failed DataWriter is monitored by the safety checker. Recovery mechanisms, such as reboot, can be implemented based on such diagnostic information.
Even when the failing AD component is not fail-silent, a takeover safety mechanism can be implemented to actively overrule the malfunctioning or unreliable component without compromising the system availability. The takeover can be realized by using DDS Exclusive Ownership and Ownership Strength QoS policies. These QoS policies control which DataWriter is allowed to send data to the DataReader. When the safety checker detects that the primary DataWriter does not operate properly, such as missing the Deadline or sending out-of-boundary data, it can trigger a healthy DataWriter with higher ownership strength to send data to the DataReader.
DDS Deadline, Liveliness, Exclusive Ownership and Ownership Strength can be combined to implement a hybrid mechanism that takes advantages of both fail-over and takeover mechanisms. For example, by monitoring the DDS network Liveliness, the safety checker can flexibly trigger the fail-over mechanism when a node fails silently, or activate the takeover mechanism when a running node is not fail-silent and publishes faulty data or misses the Deadline. Transition faults in the system can also be easily dealt with by seamlessly switching between the main channel and safety channel, thanks to the different Ownership Strength QoS values.
To evaluate our DDS-based safety mechanisms on S32G in a realistic setup, NXP teamed with an automotive engineering team of experts at Real-Time Innovations (RTI) . RTI is a leading software framework provider for autonomous systems, marketing a family of DDS products and tools called Connext DDS . Together, we integrated the NXP safety checker into an Autonomous Valet Parking (AVP) demonstration based on Autoware.Auto , an open source project by the Autoware Foundation . The demo shows how the vehicle drives itself into a valet parking lot. Autoware.Auto is a full-fledged end-to-end automated driving framework based on ROS2 which uses DDS as its underlying middleware.
The architecture of our hardware-in-the-loop evaluation demo setup is shown in the figure below:
In our evaluation setup, we injected faults similar to real-life issues into the AD system and observed how our DDS-based safety mechanisms handle the situation. The demo video below shows how our safety checker monitors, detects and reacts to system faults such as software crashes, power loss and network connection loss.
To cope with the transition to software-defined cars, automotive system software needs to be modular, reliable and scalable. As shown in our Autoware.Auto AVP experiments, the NXP S32G ASIL D Cortex-M7 processor cores are well capable of functioning as a safety checker in automated driving systems. The RTI Connext DDS middleware contributes to this process by offering a communication framework for both powerful processors and resource-constrained microcontrollers across the automotive system. With its rich set of quality-of-service policies, DDS enables safety mechanisms in a software-defined car with low engineering effort and high interoperability.
Jochen Seemann is an embedded software architect at NXP Semiconductors, the Netherlands. He graduated at the Baden-Württemberg Cooperative State University in Applied Computer Science and has 5 years of experience as a full-stack software developer for Industrial PC interfaces. Jochen has further 5 years of experience as a Software engineer and architect for a Tier 1 in the automotive area, working on IVI and Automated Driving products. Additionally, he contributed to the Open Source Qt framework.
Yuting Fu is a system engineer at NXP Semiconductors, the Netherlands. She holds a Master's degree in Embedded Systems from Eindhoven University of Technology and TU Berlin. Yuting is an author of 3 scientific publications in the area of vehicle-level safety mechanisms for automated driving systems. Furthermore, she is a certified IEC 61508 Functional Safety Professional.
Andrei Terechko is a senior principal architect at NXP Semiconductors, the Netherlands. Andrei has 15 years of experience in multinational corporations and 10 years in startup companies. Currently, he focuses on safety mechanisms and architectures for automated driving. Andrei is a co-author of 15 patents, 20+ international publications and public presentations.
Emilio Guijarro is a senior automotive applications engineer at Real-Time Innovations (RTI), with over 15 years of experience in the defense and automotive industries, including automotive infotainment systems. In 2019 he joined RTI to work on the integration of DDS in automotive use cases and in specific development environments, including the AUTOSAR ecosystem.