一、自动驾驶安全性问题
在自动驾驶技术蓬勃发展的当下,其安全性成为了人们关注的焦点。随着辅助驾驶功能逐渐普及,相关事故数据也引发了广泛讨论。据统计,在2021年7月至2022年5月期间,辅助驾驶事故中特斯拉的事故数量遥遥领先。然而,这一数据的准确性存在一定争议,部分原因在于一些装配量较低的品牌可能存在事故不上报的情况,例如安波福。此外,国内相关统计数据的获取也较为困难,存在删帖现象,使得数据的完整性和真实性难以保证。
二、辅助驾驶功能与事故关系分析
即便如此,对事故数据进行分析仍具有重要意义。特斯拉事故频发,表明其在某些方面可能存在不足。从不同品牌的事故数量分布来看,各品牌在辅助驾驶安全方面的表现差异较大。进一步分析不同类型辅助驾驶功能对事故频率的影响可知,无论是在其他道路还是高速公路上,基本ADAS(预警类L0功能)和先进ADAS(LKA/AEB类L1功能)对事故的减少都有明显的提升作用。以手动挡汽车为例,在没有倒车雷达等辅助设备的情况下,驾驶员在驾驶过程中需要高度集中注意力,否则在拐弯等操作时容易因盲区问题与其他车辆发生碰撞。而具备报警类功能的辅助驾驶系统,能够在车辆处于盲区时及时发出警报,有效避免此类事故的发生。类似地,LKA(车道保持辅助系统)和AEB(自动紧急制动系统)等功能也在实际驾驶中发挥了重要作用,不少用户在论坛分享中提到这些功能帮助他们避免了事故,保障了行车安全。
然而,对于高级ADAS(L2功能),其安全性表现则存在一定的不确定性。虽然该功能在某些情况下能够发挥积极作用,如华为的辅助驾驶系统在部分场景下能够成功触发紧急避让功能,但也有出现事故的案例,比如车辆撞上路边施工区域或在特殊情况下出现误判等。总体而言,基本ADAS和L2以下的L1功能在一定程度上对行车安全具有正向促进作用,但高级ADAS的安全性仍需进一步研究和完善。
在保险方面,随着电动汽车的普及,保费呈现出逐渐上升的趋势。此外,当L3、L4级自动驾驶车辆发生重大事故时,责任界定问题尚未得到明确解决。目前,即使部分企业获得了L4级自动驾驶的运营牌照,也只能在特定路段进行运营,这在很大程度上限制了自动驾驶技术的推广和应用。
三、自动驾驶安全关键领域与标准
为了确保自动驾驶的安全性,汽车安全领域通常会关注功能安全、预期功能安全和网络安全这三个关键方面,分别对应ISO 26262、ISO 21448和ISO/SAE 21434标准。这三个标准在保障自动驾驶安全方面发挥着重要作用,它们从不同角度对自动驾驶系统进行规范和约束,以降低风险,提高系统的可靠性和安全性。
功能安全主要关注系统在正常运行过程中,由于硬件故障或软件错误等原因导致功能失效时,如何避免对人员和车辆造成伤害。它起源于美国航空航天局提出的系统安全概念,旨在从功能设计的源头抑制伤害源,重点关注人员伤害和功能失效问题。例如,在汽车电控系统中,如果某个部件出现故障,导致车辆非预期制动或转向,就可能引发严重的交通事故。功能安全通过一系列的设计和分析方法,如危害分析和风险评估(HARA),来识别潜在的危害,并采取相应的措施降低风险。
信息安全则侧重于保护车辆数据的安全,防止数据被冒用、篡改、泄露等。例如,个人ID被冒用导致银行卡资金被盗取,车辆操作系统镜像完整性被破坏导致机器无法启动,以及车辆在运行过程中请求转向、多媒体等功能时受到否认攻击,无法得到响应等情况。信息安全问题不仅会损害个人利益,如财产损失、名誉受损等,还可能对车辆的正常运行产生影响,进而危及行车安全。在某些情况下,信息安全问题可能会导致功能安全问题,如AEB代码被篡改,使该功能无法正常工作,最终造成人员伤亡。但需要明确的是,这本质上是信息安全问题引发的功能异常,与功能安全中功能自身失效的情况有所不同。
预期功能安全关注的是系统在设计时未预料到的场景下,由于功能不足、设计缺陷或逻辑错误等原因导致的危害。以特斯拉撞白车事件为例,若车辆感知系统在设计时未考虑到识别白色卡车的情况,当遇到白色卡车时,系统可能无法做出正确的反应,从而引发事故。在传统汽车中,功能相对单一,出现非预期情况的概率较低。但在自动驾驶系统中,链路复杂,涉及多个环节,每个环节都可能存在设计缺陷或功能不足,导致预期功能安全问题日益凸显。例如,自动驾驶系统中的决策逻辑错误、功能被用户滥用(如在不允许的情况下打开某些功能)等情况,都可能引发安全事故。随着自动驾驶技术的不断发展,预期功能安全的重要性也日益增加。
在实际工作中,准确判断各类安全问题的归属对于保障自动驾驶安全至关重要。例如,电子电器系统故障通常属于功能安全范畴,因为它可能是由于系统自身的硬件或软件问题导致功能失效;而成功利用车辆安全漏洞的攻击则属于信息安全问题,因为这涉及到数据的安全和系统的访问控制。对于性能限制认知不足、没有合理预见的误用行为,以及可预见的误用、不正确的人机界面(如用户混淆、用户过载)等问题,需要根据具体情况进行判断,可能涉及功能安全或预期功能安全。此外,由系统技术造成的危害,如激光对人眼的伤害,也与功能安全相关;而来自主动基础设施、V2V通信、外部设备和云服务的攻击,则属于信息安全问题。
四、功能安全具体实施
(一)定义功能安全目标与HARA分析
在功能安全方面,明确功能安全目标是保障自动驾驶安全的重要基础。功能安全目标的定义应基于整车层面,例如确保整车不出现非预期的拐弯、加速或下跌等情况。为了科学合理地定义功能安全目标,需要进行危害分析和风险评估(HARA)。
HARA通过对系统功能的全面分析,识别可能的功能故障或失效模式,并结合驾驶场景进行情景分析,从而确定危害事件。例如,对于自适应巡航控制(ACC)功能,当车辆在前方有车切入(cutin)时,ACC应能及时停止。若ACC未能在这种情况下停止,就构成了一个危害事件。在分析该危害事件时,需要评估其严重度、暴露频率和可控性。严重度指危险事件所导致伤害或损失的潜在严重性;暴露频率表示人员暴露在系统失效能够造成危害的场景中的概率,可理解为危险事件可能发生的驾驶工况的可能性;可控性则反映危险所涉及的驾驶员和其他交通人员通过及时反应避免特定伤害或损失的能力。根据这三个因素的评估结果,可以确定危害事件的风险级别,即汽车安全完整性等级(ASIL)。不同的ASIL等级对应着不同的安全要求和实施手段,风险越高,安全要求级别越高,对应的ASIL等级也越高。
(二)分解目标与制定安全需求
在确定了功能安全目标和ASIL等级后,需要将其分解到系统的各个层面,包括硬件和软件。这一过程是自上而下进行的,即从整车功能安全目标出发,通过对功能故障的分析和HARA评估,确定整车层面的ASIL等级,然后逐步将该等级分配到各个子系统、硬件模块和软件模块。以ACC功能为例,在完成HARA分析并确定了相应的ASIL等级后,需要进一步制定功能安全需求(FSR)。FSR是针对功能本身提出的安全要求,例如在ACC系统中,为了确保其在各种情况下能够正常工作,需要考虑雷达、摄像头等传感器的功能,以及控制器对数据的处理和决策过程。具体来说,雷达应能准确发出目标物信息,摄像头应能输出正确的感知视频流,ADCU(ACC控制器)应对来自雷达和摄像头的数据进行检验,以防止数据错误或干扰。这些都是针对ACC功能的功能安全需求。
基于功能安全需求,还需要进一步制定技术安全需求(TSR)。TSR是为了满足功能安全需求而对系统架构和技术方案提出的要求。例如,对于ADCU内部的SOC和MCU,为了保证ADCU能够做出正确的决策,需要确保SOC发出正确的目标物信息,MCU对SOC的目标物进行校验,并在内部设置监控机制,保证向外部发出的刹车指令完整且正确。在设计系统架构时,需要考虑如何满足这些技术安全需求。如果原有的系统架构不符合功能安全要求,就需要进行改进和优化,例如增加校验和、监控模块,以及采用安全的通讯机制等。通过这些措施,将原有的非功能安全系统架构转变为符合功能安全要求的架构。
(三)功能安全工程师的作用
在实际的功能安全设计和实施过程中,功能安全工程师起着关键作用。他们不仅要熟悉ISO 26262标准,还要能够根据具体的系统功能和需求,进行HARA分析、制定功能安全需求和技术安全需求,并参与系统架构的设计和优化。在与各个模块进行交互时,功能安全工程师需要明确各模块的安全需求和接口规范,确保整个系统的安全性。同时,他们还需要对功能安全需求进行详细的描述和记录,以便在系统开发、测试和验收过程中进行追溯和验证。在测试环节,功能安全工程师需要制定测试原则和测试指标,指导测试工程师进行测试工作。测试工程师按照功能安全工程师提出的要求,对系统进行全面测试,确保系统满足功能安全需求。在测试过程中,需要关注各种异常情况的处理,以及系统在不同工况下的安全性表现。例如,对于SOC应发出正确目标物的需求,若采用双CPU计算的方式进行验证,当两个CPU的计算结果不一致时,应如何判断系统是否满足安全要求,这都需要功能安全工程师在测试指标中明确规定。
五、硬件相关的功能安全
(一)芯片设计与选择
在硬件方面,为了满足功能安全的要求,芯片设计和选择至关重要。大算力SoC一般只能做到ASIL B,要想实现系统级的ASIL D,通常需要配置Safety MCU。常见的方案有外置MCU方案(如OrinX+TC397)和内置安全岛方案(如TITDA4(配置|询价)内置Cortex R5F)。Safety MCU的核一般成对出现,通过指令级别的锁步(lock-step)机制,实现高诊断覆盖度。在锁步核中,两个核执行相同的指令,并通过“Compare”进行比较。虽然使用了两个核,但实际算力相当于一个核,这种方法在微控制器和复杂度较低的微处理器领域经过多年验证,具有较高的可靠性。Safety MCU的软硬件安全性和实时性都较高,一般用于运行整车的数据交互、诊断、控制算法等软件。此外,功能安全岛上可以运行Autosar等安全操作系统,也可以将多核操作系统中的安全监控等任务放在安全岛上执行;信息安全岛上则可以实现芯片的安全启动、对称/非对称加解密、签名与验签、密钥存储与销毁等安全任务。为了保证Safety MCU的独立性和安全性,其内部总线、外设接口、电源等通常与主芯片隔离。
(二)芯片认证局限性
然而,芯片在进行功能安全认证时,存在一定的局限性。以英飞凌的MCU为例,在认证其达到ASILD等级时,往往是基于systemout of context(无上下文系统)的前提条件,即把芯片视为一个封闭的环境,仅考虑芯片自身的功能,如CPU运算、内存存储、通讯等,而不涉及芯片外部的应用场景,如ACC、RTE等功能。在这种情况下,英飞凌的MCU解决了CPU运算失效、内存读写错误、内部总线通讯故障、ADC不稳定以及时钟出错等芯片内部的问题,但这并不意味着它能够完全保证ACC等功能的正常运行。尽管CPU运算正确有助于ACC的执行,但芯片内部的功能安全设计与整车功能之间的关联较为复杂,不能简单地通过芯片的ASIL等级来推断整车功能的安全性。
(三)硬件故障模式分析
在硬件故障模式方面,常见的有单点故障、残余故障和多点故障等。单点故障是指某个硬件元件出现问题,需要判断该故障是否与安全相关。若与安全相关,还需进一步确定其失效模式以及是否存在控制该失效模式的安全机制。例如,当SOC中的状态机失效时,可能会导致命令无法发出,从而影响安全目标的实现。此时,若存在监控模块等安全机制,能够及时检测到故障并采取相应措施,如触发CPU的安全机制进行重启或重置等,则可以降低故障对系统安全的影响。但即使有安全机制,也可能存在残余故障,需要计算残余故障率,以评估系统是否符合相应的ASIL等级要求。在分析硬件故障模式时,需要综合考虑各种因素,确保系统的安全性满足设计要求。
(四)冗余设计
在自动驾驶硬件架构设计中,冗余设计是提高系统安全性的重要手段。通常意义上,符合功能安全的设计需要明确功能安全目标,并满足相应的功能安全需求和技术安全需求。例如,在L3功能架构设计中,常见的硬件架构包含主控制器、副控制器、激光雷达、制动系统、转向系统、多个雷达和摄像头等设备。为了提高系统的可靠性,这些设备往往采用冗余设计,如冗余通讯、双转向系统等。然而,在实际设计过程中,需要谨慎考虑冗余设计的必要性。对于一些功能相对简单的应用场景,如仅运行ACC功能的车辆,过度的冗余设计可能会造成资源浪费,且与功能安全目标的关联性不大。但在L4及以上级别的自动驾驶设计中,为了应对更高的安全要求,双冗余甚至多冗余设计则更为常见,如双芯片、双传感器等。这种设计虽然在一定程度上增加了成本和复杂性,但能够有效提高系统的容错能力,确保在部分设备出现故障时,系统仍能正常运行。不过,在进行冗余设计时,需要综合考虑系统的整体性能、成本和可靠性等因素,避免盲目追求冗余而忽视了其他重要方面。
六、芯片厂商的不同做法
在芯片设计和应用领域,不同厂商的做法存在差异。英飞凌、德州仪器等厂商在芯片设计过程中,会进行详细的分析和设计,并编写全面的安全手册,如英飞凌的TC397 safetymenu和德州仪器的相关文档。这些手册包含了大量关于芯片功能安全设计的信息,为后续的应用开发提供了重要参考。而高通等厂商在SOC设计中,采用了安全岛的方案,并在安全岛中运行实时安全操作系统,如safetyrtos。高通通过对linux进行裁剪,使其更适合于实时性和安全性要求较高的自动驾驶场景,实现了确定性调度,提高了系统的安全性和可靠性。然而,高通在安全手册编写方面相对薄弱,内部资料不够完善。此外,地平线等厂商在芯片设计时,会借鉴其他厂商的成熟经验,如采用lockstep等技术,以减少研发成本和时间。但在实际应用中,仍需要根据自身产品的特点和需求,进行适当的调整和优化。
七、L3 级自动驾驶面临的挑战
在自动驾驶技术的发展历程中,L3级自动驾驶功能的实现具有重要意义,但也面临诸多挑战。2017年,奥迪A8(配置|询价)成功量产全球首款L3自动驾驶功能(TJP,Traffic Jam Pilot,拥堵领航驾驶),标志着自动驾驶技术在量产车上的重大突破。然而,2020年奥迪宣布放弃在下一代A8上搭载L3功能,这一决策引发了行业的广泛关注。
与此同时,戴姆勒宣称实现了L3级的ALKS(Automated Lane-Keeping System,自动车道保持系统)功能。尽管如此,目前L3级自动驾驶功能仍存在一定的局限性。例如,L3级自动驾驶功能中的车道保持功能,虽然允许驾驶员在一定条件下脱手脱眼,但在车辆行驶过程中,驾驶员仍需保持对前方道路的关注,以应对可能出现的突发情况。即使配备了ACC功能,也不能完全消除驾驶员的注意力需求,这表明L3级自动驾驶技术在实际应用中仍需进一步完善和优化,以提高其安全性和可靠性。
1、 Safety issues of autonomous driving
In the current booming development of autonomous driving technology, its safety has become a focus of people's attention. With the gradual popularization of assisted driving functions, related accident data has also sparked extensive discussions. According to statistics, Tesla led the way in the number of assisted driving accidents between July 2021 and May 2022. However, there is some controversy over the accuracy of this data, partly due to the possibility of some brands with lower assembly volumes not reporting accidents, such as Anbofu. In addition, obtaining relevant statistical data in China is also relatively difficult, with the phenomenon of deleting posts, making it difficult to guarantee the integrity and authenticity of the data.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
2、 Analysis of the Relationship between Assisted Driving Functions and Accidents
Nevertheless, analyzing accident data still holds significant importance. Tesla's frequent accidents indicate that it may have shortcomings in certain aspects. From the distribution of accident numbers among different brands, there are significant differences in their performance in assisted driving safety. Further analysis of the impact of different types of assisted driving functions on accident frequency reveals that both basic ADAS (warning class L0 function) and advanced ADAS (LKA/AEB class L1 function) have a significant improvement in reducing accidents, whether on other roads or highways. Taking manual transmission cars as an example, without auxiliary devices such as reverse radar, drivers need to be highly focused during the driving process, otherwise they may collide with other vehicles due to blind spots during turning and other operations. The assisted driving system with alarm function can issue timely alarms when the vehicle is in blind spots, effectively avoiding the occurrence of such accidents. Similarly, functions such as LKA (Lane Keeping Assist) and AEB (Automatic Emergency Braking) have also played an important role in actual driving. Many users have mentioned in forum sharing that these functions have helped them avoid accidents and ensure driving safety.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
However, for advanced ADAS (L2 functionality), there is a certain degree of uncertainty in its safety performance. Although this feature can play a positive role in certain situations, such as Huawei's assisted driving system successfully triggering emergency avoidance in some scenarios, there have also been cases of accidents, such as vehicles colliding with roadside construction areas or misjudgments in special circumstances. Overall, basic ADAS and L1 functions below L2 have a positive promoting effect on driving safety to a certain extent, but the safety of advanced ADAS still needs further research and improvement.
In terms of insurance, with the popularity of electric vehicles, premiums are gradually increasing. In addition, when major accidents occur in L3 and L4 autonomous vehicles, the issue of defining responsibility has not been clearly resolved. At present, even if some enterprises have obtained L4 level autonomous driving operation licenses, they can only operate on specific road sections, which greatly limits the promotion and application of autonomous driving technology.
3、 Key areas and standards for autonomous driving safety
In order to ensure the safety of autonomous driving, the automotive safety field usually focuses on three key aspects: functional safety, expected functional safety, and network safety, corresponding to ISO 26262, ISO 21448, and ISO/SAE 21434 standards, respectively. These three standards play an important role in ensuring the safety of automatic driving. They regulate and restrict the auto drive system from different perspectives to reduce risks and improve the reliability and safety of the system.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
Functional safety mainly focuses on how to avoid harm to personnel and vehicles when the system fails due to hardware failures or software errors during normal operation. It originated from the system safety concept proposed by NASA, aimed at suppressing injury sources from the source of functional design, with a focus on personnel injury and functional failure issues. For example, in the electronic control system of a car, if a component malfunctions and causes the vehicle to brake or turn unexpectedly, it may lead to serious traffic accidents. Functional safety identifies potential hazards and takes corresponding measures to reduce risks through a series of design and analysis methods, such as Hazard Analysis and Risk Assessment (HARA).
Information security focuses on protecting the security of vehicle data, preventing data from being misused, tampered with, leaked, etc. For example, if a personal ID is fraudulently used, resulting in the theft of bank card funds; if the integrity of the vehicle operating system image is compromised, causing the machine to fail to start; or if the vehicle requests steering, multimedia, and other functions during operation, it is subjected to denial attacks and cannot receive a response. Information security issues not only harm personal interests, such as property damage, reputation damage, etc., but may also have an impact on the normal operation of vehicles, thereby endangering driving safety. In some cases, information security issues may lead to functional security issues, such as AEB code being tampered with, causing the function to malfunction and ultimately resulting in personal injury or death. But it should be clarified that this is essentially a functional anomaly caused by information security issues, which is different from the situation where the function itself fails in functional security.
Expected functional safety focuses on the hazards caused by insufficient functionality, design flaws, or logical errors in scenarios that were not anticipated during the system's design. Taking the Tesla white car collision incident as an example, if the vehicle perception system was not designed to recognize white trucks, the system may not be able to respond correctly when encountering white trucks, leading to accidents. In traditional cars, the functions are relatively single and the probability of unexpected situations is low. However, in the auto drive system, the link is complex and involves multiple links, and each link may have design defects or insufficient functions, leading to increasingly prominent expected functional safety problems. For example, the decision logic error in the auto drive system and the abuse of functions by users (such as opening some functions without permission) may lead to safety accidents. With the continuous development of autonomous driving technology, the importance of expected functional safety is also increasing.
In practical work, accurately determining the attribution of various safety issues is crucial for ensuring the safety of autonomous driving. For example, electronic and electrical system failures typically fall under the category of functional safety, as they may be caused by hardware or software issues within the system itself leading to functional failure; The successful exploitation of vehicle security vulnerabilities in attacks belongs to information security issues, as it involves data security and system access control. For issues such as insufficient understanding of performance limitations, misuse behavior without reasonable foresight, as well as foreseeable misuse, incorrect human-machine interfaces (such as user confusion, user overload), it is necessary to make judgments based on specific circumstances, which may involve functional safety or expected functional safety. In addition, the hazards caused by system technology, such as laser damage to the human eye, are also related to functional safety; Attacks from active infrastructure, V2V communication, external devices, and cloud services belong to information security issues.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
4、 Specific implementation of functional safety
(1) Define functional safety objectives and HARA analysis
In terms of functional safety, clarifying functional safety objectives is an important foundation for ensuring the safety of autonomous driving. The definition of functional safety objectives should be based on the vehicle level, such as ensuring that the vehicle does not experience unexpected turns, acceleration, or drops. In order to scientifically and reasonably define functional safety objectives, it is necessary to conduct Hazard Analysis and Risk Assessment (HARA).
HARA identifies potential functional failures or failure modes through a comprehensive analysis of system functionality, and conducts scenario analysis in conjunction with driving scenarios to determine hazardous events. For example, for the adaptive cruise control (ACC) function, when there is a car cutting in ahead, ACC should be able to stop in time. If ACC fails to stop in this situation, it constitutes a hazardous event. When analyzing the hazardous event, it is necessary to assess its severity, exposure frequency, and controllability. Severity refers to the potential severity of injury or loss caused by a hazardous event; Exposure frequency represents the probability of personnel being exposed to scenarios where system failure can cause harm, and can be understood as the likelihood of hazardous driving conditions where dangerous events may occur; Controllability reflects the ability of drivers and other traffic personnel involved in danger to avoid specific injuries or losses through timely response. Based on the evaluation results of these three factors, the risk level of the hazardous event can be determined, namely the Automotive Safety Integrity Level (ASIL). Different ASIL levels correspond to different security requirements and implementation methods. The higher the risk, the higher the security requirement level, and the corresponding ASIL level.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
(2) Decompose goals and develop safety requirements
After determining the functional safety objectives and ASIL levels, it is necessary to decompose them into various levels of the system, including hardware and software. This process is carried out from top to bottom, starting from the goal of vehicle functional safety, determining the ASIL level at the vehicle level through analysis of functional faults and HARA evaluation, and gradually assigning this level to various subsystems, hardware modules, and software modules. Taking ACC function as an example, after completing HARA analysis and determining the corresponding ASIL level, it is necessary to further develop Functional Safety Requirements (FSR). FSR is a safety requirement proposed for the function itself. For example, in ACC systems, in order to ensure its normal operation in various situations, it is necessary to consider the functions of sensors such as radar and cameras, as well as the data processing and decision-making process of the controller. Specifically, the radar should be able to accurately emit target information, the camera should be able to output the correct perceptual video stream, and the ADCU (ACC controller) should verify the data from the radar and camera to prevent data errors or interference. These are all functional safety requirements for the ACC function.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
Based on functional safety requirements, further development of Technical Safety Requirements (TSR) is needed. TSR is a requirement for system architecture and technical solutions to meet functional safety requirements. For example, for the SOC and MCU inside the ADCU, in order to ensure that the ADCU can make correct decisions, it is necessary to ensure that the SOC sends the correct target object information. The MCU verifies the target object of the SOC and sets up a monitoring mechanism internally to ensure that the brake commands sent to the outside are complete and correct. When designing system architecture, it is necessary to consider how to meet these technical security requirements. If the original system architecture does not meet functional safety requirements, it needs to be improved and optimized, such as adding checksums, monitoring modules, and adopting secure communication mechanisms. Through these measures, the original non functional safety system architecture will be transformed into an architecture that meets functional safety requirements.
(3) The role of functional safety engineers
In the actual process of functional safety design and implementation, functional safety engineers play a crucial role. They not only need to be familiar with the ISO 26262 standard, but also able to conduct HARA analysis, develop functional safety requirements and technical safety requirements based on specific system functions and requirements, and participate in the design and optimization of system architecture. When interacting with various modules, functional safety engineers need to clarify the security requirements and interface specifications of each module to ensure the security of the entire system. At the same time, they also need to provide detailed descriptions and records of functional safety requirements for traceability and verification during system development, testing, and acceptance processes. In the testing phase, functional safety engineers need to develop testing principles and indicators to guide testing engineers in their testing work. The testing engineer conducts comprehensive testing of the system according to the requirements proposed by the functional safety engineer to ensure that the system meets functional safety requirements. During the testing process, it is necessary to pay attention to the handling of various abnormal situations and the safety performance of the system under different operating conditions. For example, for the requirement that SOC should issue the correct target object, if dual CPU calculation is used for verification, how to determine whether the system meets safety requirements when the calculation results of the two CPUs are inconsistent needs to be clearly specified by functional safety engineers in the testing indicators.
5、 Hardware related functional safety
(1) Chip Design and Selection
In terms of hardware, chip design and selection are crucial to meet functional safety requirements. High computing power SoCs can generally only achieve ASIL B, and to achieve system level ASIL D, it is usually necessary to configure a Safety MCU. Common solutions include external MCU solutions (such as OrinX+TC397) and built-in safety island solutions (such as TITDA4 (configuration | inquiry) with built-in Cortex R5F). The cores of Safety MCU generally appear in pairs and achieve high diagnostic coverage through instruction level lock step mechanism. In a lockstep core, two cores execute the same instruction and compare it using 'Compare'. Although two cores are used, the actual computing power is equivalent to one core. This method has been validated for many years in the field of microcontrollers and low complexity microprocessors, and has high reliability. The software and hardware safety and real-time performance of Safety MCU are relatively high, and it is generally used for software such as data exchange, diagnosis, and control algorithms to run the entire vehicle. In addition, security operating systems such as Autosar can be run on the functional safety island, and security monitoring tasks from multi-core operating systems can also be executed on the safety island; On the information security island, security tasks such as secure boot of chips, symmetric/asymmetric encryption and decryption, signature and verification, key storage and destruction can be achieved. In order to ensure the independence and safety of the Safety MCU, its internal bus, peripheral interfaces, power supply, etc. are usually isolated from the main chip.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
(2) Limitations of chip authentication
However, there are certain limitations to the functional safety certification of chips. Taking Infineon's MCU as an example, when certifying it to ASILD level, it is often based on the premise of system out of context, which considers the chip as a closed environment, only considering its own functions such as CPU computing, memory storage, communication, etc., without involving external application scenarios such as ACC, RTE, etc. In this case, Infineon's MCU solves internal problems such as CPU computation failure, memory read/write errors, internal bus communication failures, ADC instability, and clock errors, but this does not mean that it can fully guarantee the normal operation of ACC and other functions. Although correct CPU calculations contribute to the execution of ACC, the correlation between the functional safety design inside the chip and the overall vehicle functions is complex, and the safety of the vehicle functions cannot be simply inferred based on the ASIL level of the chip.
(3) Hardware Failure Mode Analysis
In terms of hardware failure modes, common ones include single point failure, residual failure, and multi-point failure. A single point of failure refers to a problem with a hardware component, and it is necessary to determine whether the fault is related to safety. If it is related to safety, it is necessary to further determine its failure mode and whether there is a safety mechanism to control this failure mode. For example, when the state machine in SOC fails, it may prevent commands from being issued, thereby affecting the achievement of safety goals. At this point, if there are security mechanisms such as monitoring modules that can detect faults in a timely manner and take corresponding measures, such as triggering CPU security mechanisms to restart or reset, the impact of faults on system security can be reduced. However, even with security mechanisms in place, there may still be residual failures, and it is necessary to calculate the residual failure rate to evaluate whether the system meets the corresponding ASIL level requirements. When analyzing hardware failure modes, it is necessary to comprehensively consider various factors to ensure that the system's safety meets design requirements.
(4) Redundant design
In the hardware architecture design of autonomous driving, redundancy design is an important means to improve system safety. In general, a design that complies with functional safety requires clear functional safety objectives and meets corresponding functional safety and technical safety requirements. For example, in the design of L3 functional architecture, common hardware architectures include main controller, sub controller, LiDAR, braking system, steering system, multiple radars and cameras, and other devices. In order to improve the reliability of the system, these devices often adopt redundant designs, such as redundant communication, dual steering systems, etc. However, in the actual design process, the necessity of redundant design needs to be carefully considered. For some application scenarios with relatively simple functions, such as vehicles that only run ACC function, excessive redundancy design may result in resource waste and have little relevance to functional safety goals. However, in the design of L4 and above level autonomous driving, in order to meet higher safety requirements, dual redundancy or even multi redundancy designs are more common, such as dual chips, dual sensors, etc. Although this design increases cost and complexity to a certain extent, it can effectively improve the system's fault tolerance and ensure that the system can still operate normally when some devices fail. However, when conducting redundancy design, it is necessary to comprehensively consider factors such as overall system performance, cost, and reliability to avoid blindly pursuing redundancy and neglecting other important aspects.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
6、 Different approaches of chip manufacturers
In the field of chip design and application, there are differences in the practices of different manufacturers. During the chip design process, manufacturers such as Infineon and Texas Instruments conduct detailed analysis and design, and write comprehensive safety manuals, such as Infineon's TC397 safety menu and Texas Instruments' related documents. These manuals contain a wealth of information on chip functional safety design, providing important references for subsequent application development. Qualcomm and other manufacturers have adopted a safety island solution in SOC design and run real-time safety operating systems such as Safetyrtos within the safety island. Qualcomm has tailored Linux to better suit the real-time and safety requirements of autonomous driving scenarios, achieving deterministic scheduling and improving system safety and reliability. However, Qualcomm is relatively weak in the development of security manuals and lacks comprehensive internal information. In addition, Horizon and other manufacturers will draw on the mature experience of other manufacturers in chip design, such as using technologies like lockstep, to reduce research and development costs and time. However, in practical applications, it is still necessary to make appropriate adjustments and optimizations based on the characteristics and needs of the product itself.
7、 Challenges faced by L3 level autonomous driving
In the development process of autonomous driving technology, the implementation of L3 level autonomous driving functions is of great significance, but it also faces many challenges. In 2017, Audi A8 (configuration | inquiry) successfully mass-produced the world's first L3 autonomous driving function (TJP, Traffic Jam Pilot, congestion navigation driving), marking a major breakthrough in autonomous driving technology in mass-produced cars. However, in 2020, Audi announced that it would abandon the L3 function on the next generation A8, which sparked widespread attention in the industry.
How to ensure the safety of autonomous driving - knowledge explanation of the integrated driving and parking series
At the same time, Daimler claims to have implemented the ALKS (Automated Lane Keeping System) function at the L3 level. However, there are still certain limitations to the L3 level autonomous driving function at present. For example, the lane keeping function in L3 level autonomous driving allows the driver to take off their hands and eyes under certain conditions, but during the vehicle's movement, the driver still needs to maintain attention to the road ahead to cope with possible emergencies. Even with ACC function, it cannot completely eliminate the driver's attention needs, indicating that L3 level autonomous driving technology still needs further improvement and optimization in practical applications to enhance its safety and reliability.