Reliability analysis on the train control system in the CTCS-3 operating mode

Lijuan Shi (Tongji University, Shanghai, China)

Jian Wang (Tongji University, Shanghai, China)

Smart and Resilient Transportation

ISSN: 2632-0487

Article publication date: 4 March 2021

Issue publication date: 4 May 2021

Downloads

1806

pdf (512 KB)

Abstract

Purpose

This paper aims to study the reliability of the high-speed train operation control system in the Chinese Train Control System Level 3 (CTCS-3) operating mode.

Design/methodology/approach

Dynamic fault tree and Bayesian network method are adopted to analyze the reliability and weakness of the CTCS-3 system.

Findings

First, a physical architecture and data flow diagram of the CTCS-3 system are established according to the typical structure and functions of the CTCS-3 system. Second, the dynamic fault tree of the CTCS-3 system is constructed. Considering the prior probability of the bottom event and the existence of dynamic redundancy, the dynamic fault tree is transformed into a Bayesian net. The reliability of the CTCS-3 system is carried out based on the prior probability and the weakness that affects the reliability of the system based on the posterior probability is also analyzed by the Bayesian network. Finally, it is disclosed that the impact of the on-board subsystem on the reliability of the CTCS-3 system is generally greater than that of the ground subsystem. The two weakest modules in the onboard subsystem are the driver-machine interface (DMI) and balise transmission module (BTM) and the weakest one in the ground subsystem is Balise. The analysis results are generally consistent with the malfunctions in the field operation of China’s high-speed railway.

Originality/value

(1) By reasoning, the reliability of the train operation control system in the CTCS-3 operating mode meets the standard requirements.

(2) Through backward reasoning, it is found that the failure of the onboard subsystem leads to a greater probability of failure of the train control system.

(3) The DMI, BTM and automatic train protection computer unit modules are weak components in the onboard subsystem. Vital digit input&output, train interface unit and train security gateway are rarely involved in previous research, the result in this paper shows that these three modules are also weak components in the subsystem, which requires attention.

Keywords

Citation

Shi, L. and Wang, J. (2021), "Reliability analysis on the train control system in the CTCS-3 operating mode", Smart and Resilient Transportation, Vol. 3 No. 1, pp. 25-36. https://doi.org/10.1108/SRT-10-2020-0019

Publisher

:

Emerald Publishing Limited

License

Published in Smart and Resilient Transportation. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence maybe seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

CTCS-3 is the Chinese Train Control System Level 3 with operating speeds over 300 km/h, including the onboard subsystem and the ground subsystem. The onboard subsystem includes a CTCS-3 level control unit and a CTCS-2 level control unit. When the train is operating in CTCS-3 level, the vehicle equipment mainly uses the global system for mobile Communication-R (GSM-R) network to transmit monitoring information from the RBC (radio block center). Only when the RBC or GSM-R networks fail, the onboard equipment is downgraded to use the information provided by the ground equipment of CTCS-2 level to monitor train operation. In this paper, the study is on the premise that high-speed trains are all operated in CTCS-3 mode. Therefore, it is necessary to analyze the reliability of the train control system of high-speed trains in this mode.

To analyze the reliability of the train control system, we first need to understand the working principle of the train control system. Therefore, we used the fault tree, which is a good way to express the fault mode clearly. Fault tree analysis (FTA) is a logical and graphical method for assessing the likelihood of a combination of fault events causing an accident, which has been widely used to analyze the reliability of complex systems. However, typical FTA usually assumes that faults are independent of each other without considering the dynamic logic of the system. Therefore, for a high-speed train control system with redundant components and complicated construction, the traditional FTA is unable to meet the requirements for reliability analysis. In recent years, dynamic fault tree (DFT) has been widely used in industries, in which new dynamic logic gates are added, such as priority-AND gate, sequence enforcing (SEQ) gate, SPARE gate. DFT overcomes the disadvantage of fault tree (FT), such as its hypothetical event must be independent. There are many research studies about DFT analysis and various technologies have been developed, which can be summarized as three types. The first is Markov chain-based method, which has been proved to be the valid tool for analyzing exponential time-to-failure and unrepaired systems. However, the total of states in Markov chain will increase dramatically and state space explosion will appear with the increasing number of system components (Portinale and Bobbio, 2013). Second, with the development of computer technology, Monte Carlo simulation has been used widely to analyze DFT, which adapts to any contribution time-to-failure. However, the evaluation accuracy is determined by the number of simulations When the system is complex and the fault tree is complex, the simulation will consume a lot of time and computing resources (Yevkin, 2016). The third method is converting DFT into an equivalent Bayesian net (BN), it expresses the dependency between nodes forward and backward reasoning mechanism. Qin et al. (2016) proposed several methods to analyze the reliability of the train system, and the reliability network model is one of them. Based on the functional relationship network model of the high-speed train system, the importance of the components in the reliability network and the connectivity of the components in the network are analyzed. For BN, it not only avoids space combination explosion, but the importance of top node and media nodes also can be obtained through its bidirectional reasoning. Przytula and Thompson (2000) comprehensively introduced the process of Bayesian network construction, and successfully applied the model to the system diagnosis of diesel locomotives, satellite communication systems and satellite testing equipment, using the bidirectional reasoning function of BN. Khakzad et al. (2011) compared the similarities and differences between fault trees and BN. Because BN has a mechanism of both forward reasoning and reverse reasoning, its application is in a wider range. Especially when considering multi-modal faults and common cause failures, BN is more flexible (Khakzad et al., 2011). Su and Che (2013) used FT and BN to analyze the reliability of the CTCS-3 train control system, but ignored several important components and paid no attention to the redundant structure of the components. Flammini et al. (2006) first analyzed the reliability of Lindside, onboard and trackside subsystem by using FT and then used BN to analyze the reliability of the entire European train control system train control system. The conclusion was that although each part met the reliability requirements, some components might not need such high reliability to save expenses. Pai and Joanne (2001) used BN to analyze the reliability of the network, and through the reverse reasoning, the impact of software’s framework and reliability on the network could be obtained. Kabir et al. (2014) converted the fault tree of the ship’s fuel distribution system into BN, and analyzed the reliability of the system in three states, assuming that the components can exist in three states.

In this paper, BN is combined by DFT. Logical relationship between system and components are put into DFT, and BN is used to assess the reliability and find out composite modes affecting train control system failure, which can help to improve the reliability of train control system under certain circumstances.

The remainder of the paper is organized as follows: after a brief overview of the CTCS-3 train control system, two models are developed in Section 2. Section 3 describes how the reliability assessment is developed, including building DFT and BN. Section 4 presents the conclusion and hints for future work.

2. Models of Chinese train control system level-3

2.1 Physical architecture of Chinese train control system level 3

When the train operates in CTCS-3 mode, GSM-R mainly realize information transmission between onboard and ground system, real-time monitoring train running speed, running interval and overspeed protection and monitoring the safe operation of the train with the target distance continuous speed control mode and brake override system of equipment. As shown in Figure 1, the content in the red box is the scope of this paper. The modules of the vehicle system include vital computer (VC), train interface unit (TIU), driver-machine interface (DMI), juridical recorder unit, speed and distance unit (SDU), safe transmission unit (STU-V), balise transmission module (BTM), compact antenna unit (CAU), GSM-R, radar, speed sensor (Ss). What is more, according to the results of fault data statistics, there are three important nodes involving vital digit input&output (VDX), train security gateway (TSG), TIU. Ground system includes Balise, Lineside Electronic Unit (LEU), train control center (TCC), RBC, temporary speed restriction system (TSRS). The hardware of CTCS-3 300H train control equipment adopts a distributed structure design, and the function of each module is relatively independent. To improve the reliability and security of the train control system, the system adopts a redundant configuration. Speaking of the vehicle subsystem, speed distance process (SDP), SDU, VDX, GSM-R, STU-V, Radar and Ss are in hot standby redundancy. The system can still operate normally if any component fails. VC, BTM, CAU, DMI and TIU are in cold standby redundancy (Di et al., 2010). If any component fails, it will take some time to restart the standby module. TSG is a single system. For the ground subsystem, including Balise, LEU, TCC, RBC and TSRS. TCC is in cold standby redundancy, TSRS and LEU is in hot standby redundancy; Balise and RBC are treated as a single system when building the model.

2.1.1 Vehicle equipment

VC: Is the kernel of the CTCS-3 onboard system. When the train operates at the CTCS-3 level, it accepts the route description and movement authority (MA) transmitted by the RBC, and calculates the mode control curve in combination with the train position determined by the ground Balise. The actual speed and position of the train are monitored according to the mode curve. In this way, interventions are carried out when the train is over speeding.
SDP: Processing speed and distance data.
SDU: Including SDU1 and SDU2, each of which is connected to an axle Ss and a Doppler radar as power. When the train is running, the SDU receives the pulse signal collected by the Ss and the radar and converts the pulse signal into digital data and sends it to the SDP for processing through the multifunction vehicle bus.
VDX: Is a fail-safe unit, including VDX1 and VDX2, used for outputting emergency braking and collecting brake feedback. VDX1 and VDX2 work in the form of guard collection. Only when the output and recovery are correct, can the VDX work normally. Otherwise, the system will output the emergency brake unconditionally.
GSM-R: Provides transmission channel between RBC and onboard system.
CAU: Is the antenna of BTM, which receives telegrams from Balise.
STU-V: Is a secure wireless transmission system unit that is responsible for encrypting and securely transmitting wireless data transmitted between onboard and ground equipment.
BTM: Receives the information of the balise by the CAU, and the received message is verified and decoded and sent to the VC.
TSG: Is mainly used to process the transmission of data of important core modules of in-vehicle equipment, and realize data exchange between modules.
R&Ss: Is connected to the SDU, the collected speed pulse signal is transmitted to the SDU.
DMI: Displays information for the driver, allows the driver to input relevant data and alarms under specific situations.
TIU: Connects VC and train.

2.1.2 Ground equipment

TCC: Realizes the track circuit coding function, and the train occupancy information is transmitted to the RBC.
RBC: Generates the information such as MA and line descriptions based on the information provided by other ground equipment and that interacting with onboard equipment, and transmitted to the onboard equipment of the train within its control range via the GMS-R.
Balise: Transfers information such as positioning, level conversion and over-phase area to the in-vehicle device. The transponder transmits the same information as the GSM-R transmission.
TSRS: Manages TSR and deliver temporary speed limit information to RBC and TCC, respectively.
LEU: Is a data acquisition and processing unit that forms a message according to the changed data when there is a data change and sends it to the responder for transmission.

2.2 Data flow diagram of Chinese train control system level 3

The real-time data flow of the CTCS-3 mode between the modules is shown in Figure 2. The on-board system accepts track occupancy, train positioning, line information, speed limit information, etc. from the ground subsystem in real time. After VC processing, the train speed monitoring and operation mode is generated to realize the safety protection of the train. Meanwhile, the onboard subsystem sends data such as location and train operation to the RBC through GSM-R, and then sends it to TCC and CTC. The specific information transmitted by each module is shown in Figure 2.

3. Methodologies

3.1 Dynamic fault tree

DFT adds the priority AND, the SEQ, the standby or spare (SPARE) and the functional dependency to the traditional FT. Based on the actual situation of the system, this paper mainly introduces two kinds of dynamic gates, namely, HotSpare and ColdSpare. SPARE gates model one or more principle components that can be substituted by more spares with the same functionality. For HotSpare, multiple same components run at the same time. One component fails and the system still runs normally. Only when all components fail, the system fails. For ColdSpare, only the master component is running, and the standby component is not running temporarily. When the master component fails, the standby component needs to be started to recover the system. According to the physical architecture and data flow diagram in Section 2, the DFT of Figure 3 is constructed, with A for the intermediate node, B for the bottom event and C for the fault phenomenon. Table 1 explains the meaning of each node.

3.2 Bayesian network

BN is an acyclic directed graph composed of nodes and arcs. Nodes represent variables, arcs represent causal relationships between nodes and a conditional probability table, which represents quantitative relationships between nodes. In this paper, the node is the failure of each module, and the conditional probability indicates the condition of the system failure. The input value of BN is the failure rate λ of each component.

BN is a mathematical model based on probabilistic reasoning with a robust foundation of probability theory[0]. The joint probability distribution describes the probability of all possible combinations of states for multiple random variables X1…Xn, the formula is Px1,x2,……,xn = Pxkx1,……,xk − 1…P(x2|x1)P(x1). Conditional probability indicates the probability of occurrence of B in the event of A, the formula is PBA = P (AB) P (A) = PBP (A|B) PA (PA > 0), PA and PB are prior probability, PBA is posterior probability. Therefore, BN has the function of both forward reasoning and backward reasoning. Forward reasoning is calculating the possibility of result given the prior probability of reasons. The backward reasoning is calculating the possibility of causes assumed top event happened. In the reliability evaluation of the CTCS-3 train control system, the forward reasoning is used to calculate the reliability of the train control system in a specific scenario; the backward reasoning is used to calculate the possibility of causes leading to system failure.

3.3 Mapping DFT to Bayesian net

Mapping DFT to BN includes graphical mapping and numerical mapping. In the graphical mapping, the bottom event, the intermediate event and the top event correspond to the root node, the intermediate node and the leaf node of the BN, respectively. In the numerical mapping, the conditional probability table is used to represent the logical relationship between the child node and the parent node. In the reliability analysis of this paper, after the corresponding BN is constructed, the input of BN is the failure rate of each component. The conditional probability of each gate of DFT mapping to BN is different, the mapping rules are shown in Figure 4. As in the conditional probability of subgraph (a), 1 indicates a fault and 0 indicates normal. C is a hot standby structure, assuming B1 is the main component and B2 is the standby system. Only when both B1 and B2 fail, the state of C will become 1. For a cold standby structure, such as subgraph (b), assume that B1 is the main component, so B2 is not started. When B1 fails, the time of starting B2 is very short, so the time taken for the conversion is ignored and the cold standby structure is regarded as a single system and the failure rate is half of that of a single component. For and gate, such as subgraph (c), C just fails when B1 and B2 are both failed.

3.4 Quantitative analysis

Convert the DFT of CTCS-3 to BN according to the steps in Section 3.3, as shown in Figure 5.

The train control system is a repairable system, and the failure characteristics of each component satisfies the exponential distribution. The failure rates of each bottom event and intermediate event are taken as the input of BN, as shown in Table 2, data are from Su and Che (2013), Di et al. (2010). Among them, C1, C2, C4 and C7 are the parent nodes of the cold standby structure, and the input is half of the failure rate of the child nodes (Wang and Ding, 2017).

3.4.1 Forward reasoning.

According to the principle of forward reasoning of BN, the failure rate of CTCS-3 is calculated by using BN software HUGIN.

Finally, when the train is running in CTCS-3 mode, its failure rate is λ = 0.987 × 10⁻⁶/h, MTBF = 0∞t f(t)dt = 1λΓ(2) = 1.013 × 10⁵h. According to the standard, the average failure time interval for high-speed trains is MTBF ≥ 10⁵h. Therefore, when the high-speed train is operated for 10⁵h in the CTCS-3 mode, the reliability R = 10⁵∞ftdt = e−λt = 0.906.

3.4.2 Backward reasoning.

According to the principle of backward reasoning of BN, it is assumed that when the CTCS-3 system fails, the probability of the fault caused by the ground subsystem and vehicle subsystem failure is 0.4012 and 0.5988, respectively. That is to say, the onboard subsystem is more likely to cause the train control system to fail.

For the ground subsystem, assuming the ground subsystem fails, the backward reasoning is also used to calculate the probability of failure of the components of the ground subsystem, as shown in Table 3. Balise provides a large amount of fixed and variable information to the vehicle equipment, whose failure causes the ground subsystem to have the greatest probability of failure. So, it is the weakest component of the ground subsystem.

It is assumed that when the onboard subsystem fails, the principle of backward reasoning is also used to calculate the probability of the failure of each component of the onboard system, probability is shown in Table 4. As can be seen from the table, the DMI in the onboard system is most likely to be faulty, BTM and automatic train protection computer unit (ATPCU) followed. Combined with Table 3, it is found that Balise, LEU and BTM have high probability of failure, so the channel for transmitting information by Balise needs to be focused. In addition, VDX, TIU and TSG are rarely mentioned in current literature, but the probability of failure in this paper is relatively large, so it must be focused.

4. Conclusion

This paper studies the reliability of the train operation control system in the CTCS-3 operating mode. By constructing physical architecture and data flow diagram, the information transmission of the train in CTCS-3 operating mode and functions of the various components involved are described. Based on the above two models, the DFT is established and converted into BN according to the corresponding principle. Using the forward reasoning and backward reasoning functions of BN, the reliability and weak modules of the train control system are analyzed. The research conclusions are as follows:

By reasoning, the reliability of the train operation control system in the CTCS-3 operating mode meets the standard requirements.
Through backward reasoning, it is found that the failure of the onboard subsystem leads to a greater probability of failure of the train control system.
The DMI, BTM and ATPCU modules are weak components in the onboard subsystem. VDX, TIU and TSG are rarely involved in previous research, the result in this paper shows that these three modules are also weak components in the subsystem, which requires attention.

Figures

Figure 1.

Physical architecture of CTCS-3

Figure 2.

Data flow diagram of CTCS-3 train control system

Figure 3.

Dynamic fault tree

Figure 4.

The rule of mapping DFT to BN

Figure 5.

Bayesian net

Table 1.

Meaning of each node

Node	Description	Node	Description	Node	Description
C1	Interruption with Balise	B1	Master BTM failure	B20	Standby Ss failure
C2	VC failure	B2	Master CAU failure	B21	Standby Radar failure
C3	Interruption with RBC	B3	Standby BTM failure	B22	Standby SDP failure
C4	DMI failure	B4	Standby CAU failure	B23	Master VDX failure
C5	Speed and distance unit failure	B5	Master ATPCU failure	B24	Standby VDX failure
C6	VDX failure	B6	Standby ATPCU failure	B25	TSG failure
C7	TIU failure	B7	Master STU-V failure	B26	Standby TIU failure
C8	TSRS failure	B8	Standby STU-V failure	B27	Master TIU failure
C9	LEU failure	B9	Master GSM-R radio failure	B28	TSRS1 failure
C10	TCC failure			B29	TSRS2 failure
A1	Master BTM unit failure	B10	Master GSM-R antenna failure	B30	TSRS3 failure
A2	Standby BTM unit failure	B11	Standby GSM-R radio failure	B31	TSRS4 failure
A3	STU-V failure	B12	Standby GSM-R antenna failure	B32	Master LEU failure
A4	GSM-R failure	B13	Master DMI failure	B33	Standby LEU failure
A5	Master GSM-R failure	B14	Standby DMI failure	B34	BRC failure
A6	Standby GSM-R failure	B15	Master SDU failure	B35	Balise failure
A7	Master Speed and distance unit failure	B16	Master Ss failure	B36	Master TCC failure
A8	Standby Speed and distance unit failure	B17	Master Radar failure	B37	Standby TCC failure
A9	Master TSRS unit failure	B18	Master SDP failure
A10	Master TSRS unit failure	B19	Standby SDU failure

Table 2.

Failure rate of each node

Node	Description	Failure rate(/h)	Node	Description	Failure rate(/h)
B7	Master STU-V failure	1.80 × 10^–5	B22	Standby SDP failure	3.19 × 10^–5
B8	Standby STU-V failure	1.80 × 10^–5	B23	Master VDX failure	1.02 × 10^–7
B9	Master GSM-R radio failure	1.20 × 10^–5	B24	Standby VDX failure	1.02 × 10^–7
B10	Master GSM-R antenna failure	1.45 × 10^–8	B25	TSG failure	1.03 × 10^–7
B11	Standby GSM-R radio failure	1.20 × 10^–5	B28-31	TSRS failure	3.20 × 10^–6
B12	Standby GSM-R antenna failure	1.45 × 10^–8	B34	BRC failure	5.00 × 10^–8
B15	Master SDU failure	2.50 × 10^–9	B35	Balise failure	2.90 × 10^–6
B16	Master Ss failure	5.50 × 10^–8	B36-37	TCC failure	2.50 × 10^–8
B17	Master Radar failure	1.80 × 10^–8	C1	Interruption with Balise	1.04 × 10^–6
B18	Master SDP failure	3.19 × 10^–5	C2	VC failure	7.45 × 10^–7
B19	Standby SDU failure	2.50 × 10^–9	C4	DMI failure	2.50 × 10^–6
B20	Standby Ss failure	5.50 × 10^–8	C7	TIU failure	1.05 × 10^–7
B21	Standby Radar failure	1.80 × 10^–8	C8	LEU failure	2.02 × 10^–6

Table 3.

Probability of each component failure when ground system fails

Ground system
Balise	LEU	RBC	TSRS	TCC
0.6614	0.3225	0.1608	1.31 × 10^–5	1.99 × 10^–10

Table 4.

Probability of each component failure when onboard system fails

Component	Probability	Component	Probability
DMI	0.5327	SDP	1.996 × 10^–4
BTM	0.2205	STU-V	6.90 × 10^–5
ATPCU	0.1587	GSM-R antenna	3.99 × 10^–5
VDX	0.0435	GSM-R radio	3.07 × 10^–5
TIU	0.0224	Ss	3.40 × 10^–7
TSG	0.0219	Radar	1.20 × 10^–7
CAU	2.99 × 10^–4	SDU	1.56 × 10^–8

References

Di, L.Q., Yuan, X.E. and Wang, Y.N. (2010), “Research on the evaluation method for the RAM goals of CTCS-3”, China Railway Science, Vol. 31 No. 6, pp. 92-97.

Flammini, F., Marrone, S., Mazzocca, N. and Vittorini, V. (2006), “Modeling system reliability aspects of ERTMS/ETCS by fault trees and Bayesian networks”, European Safety and Reliability Conference, pp. 18-22.

Kabir, S., Walker, M. and Papadopoulos, Y. (2014), Reliability Analysis of Dynamic Systems by Translating Temporal Fault Trees into Bayesian Networks. Model-Based Safety and Assessment, Springer International Publishing.

Khakzad, N., Khan, F. and Amyotte, P. (2011), “Safety analysis in process facilities: comparison of fault tree and Bayesian network approaches”, Reliability Engineering and System Safety, Vol. 96 No. 8, pp. 925-932.

Pai, G. and Joanne, B.D. (2001), “Enhancing software reliability estimation using bayesian networks and fault trees”, Conference: International Symposium on Software Reliability Engineering (ISSRE).

Portinale, L. and Bobbio, A. (2013), “Bayesian networks for dependability analysis: an application to digital control reliability”, Computer Science, pp. 551-558.

Przytula, K.W. and Thompson, D. (2000), “Construction of Bayesian networks for diagnostics”, Proceedings of IEEE aerospace conference, Vol. 5, pp. 193-200.

Qin, Y. Lin, S., Wantong, L.I., Yong, F.U. and Jia, L. (2016), “Research on safety reliability analysis and evaluation method of high-speed train system”, Electric Drive for Locomotives.

Su, H. and Che, Y. (2013), “Reliability assessment on CTCS-3 train control system using faults trees and Bayesian networks”, International Journal of Control and Automation, Vol. 6 No. 4, pp. 271-292.

Wang, J. and Ding, B-Z. (2017), “The reliability prediction of the standby system composed of two component”, Journal of CAEIT, Vol. 12 No. 4, pp. 428-431.

Yevkin, O. (2016), “An efficient approximate Markov chain method in dynamic fault tree analysis”, Quality and Reliability Engineering International, Vol. 32 No. 4, pp. 1509-1520.

Acknowledgements

Authors would like to acknowledge the support of the research program of Comprehensive Support Technology for Railway Network Operation (2018YFB1201403), which is a subproject of Advanced Railway Transportation Special Project belonging to the 13th Five-Year National Key Research and Development Plan funded by Ministry of Science and Technology of China.

Corresponding author

Lijuan Shi can be contacted at: shilijuan150@tongji.edu.cn