Improvement of C5.0 algorithm using internet of things with Bayesian principles for food traceability systems

Balamurugan Souprayen (Annamalai University, Chidambaram, India)
Ayyasamy Ayyanar (Annamalai University, Chidambaram, India)
Suresh Joseph K (Pondicherry University, Puducherry, India)

Modern Supply Chain Research and Applications

ISSN: 2631-3871

Article publication date: 11 December 2020

Issue publication date: 14 May 2021

1253

Abstract

Purpose

The purpose of the food traceability is used to retain the good quality of raw material supply, diminish the loss and reduced system complexity.

Design/methodology/approach

The proposed hybrid algorithm is for food traceability to make accurate predictions and enhanced period data. The operation of the internet of things is addressed to track and trace the food quality to check the data acquired from manufacturers and consumers.

Findings

In order to survive with the existing financial circumstances and the development of global food supply chain, the authors propose efficient food traceability techniques using the internet of things and obtain a solution for data prediction.

Originality/value

The operation of the internet of things is addressed to track and trace the food quality to check the data acquired from manufacturers and consumers. The experimental analysis depicts that proposed algorithm has high accuracy rate, less execution time and error rate.

Keywords

Citation

Souprayen, B., Ayyanar, A. and K, S.J. (2021), "Improvement of C5.0 algorithm using internet of things with Bayesian principles for food traceability systems", Modern Supply Chain Research and Applications, Vol. 3 No. 1, pp. 2-23. https://doi.org/10.1108/MSCRA-07-2020-0019

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Balamurugan Souprayen, Ayyasamy Ayyanar and Suresh Joseph K

License

Published in Modern Supply Chain Research and Applications. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/legalcode


1. Introduction

For the past decade, food is the primary energy resource of human civilization and its quality and safety has been a major issue throughout the world especially in China for several causes (Liu et al., 2016). For example, the event of 2008, the embarrassment of Sanlu melamine milk powder, has staggered humanity because of its effects on thousands of babies, resulting in the deaths of many of them (Wen et al., 2018). Another event that shocked Chinese society and humanity occurred in 2011, when the Shuanghui assembly's animal protein which is the China's largest meat supplier was exposed to carrying a drug named Clenbuterol hydrochloride that is forbidden from injecting into food substances in China (Lin et al., 2019a) (Abad et al., 2009). Therefore, it is very important to expand technologies to ensure food safety for entire food supply chain (FSC) includes manufacture, processing, warehouse, shipping, storage and distribution.

To deal these issues from a technical perspective, people need a system of food traceability, which is capable of monitoring the complete life of food cycle including the production, processing, transport, storage and sales of foodstuffs (Lin et al., 2019b), which involve numerous untrustworthy issues. More studies around the globe have been carried out with the introduction of several technologies such as the internet of things (IoT) to help the food user recognise food quality and safety concerns (Li et al., 2019). IoT is an idea to tie the whole thing around the time to time, and it's expected to change the importance of our human life dramatically in the future (Liang et al., 2019). IoT technologies should be capable of providing possible solutions for identifying traceability, tracking and manageability concerns for FSC. IoT will take part in the task of deciding the problems of food quality and safety in terms of monitoring the nutrient value of each product, throughout its lifetime and also providing functional information to make it easier and more secure (Tolba and Altameem, 1331).

Sensors are capable of boosting an IoT’s anxiety and other parameters (Tsang et al., 2019). The environmental conditions of the food traceability system are evaluated using sensors with cost-reduced techniques based on an economical background and quick communication with the system. Connectivity has been made within Transport Systems, Agriculture, Energy Use, Security and Privacy, Building Management, Embedded Systems, Industry Systems (Etim and Lota, 2016), Pervasive Computing, Smart Home (Feng et al., 2017) and Applications for Health Care (Riazul Islam et al., 2015). When the volume of knowledge obtained from a number of IoT device increases, big data processing and monitoring remain a significant problem for IoT applications. While big data will usually be assisted by data compression methodologies, the likelihood is that compression would minimize an unnecessary volume of data (Xiao et al., 2018).

Big data is data collected from mobile Internet communication devices, social networking, video sharing, IoT sensors and smart devices, and so on. Big data consists of a wide-ranging collection of datasets, primarily in the definition of information for research, manipulation and effective storage, which are the scalable specifications of the architecture (Chen et al., 2015). The sensors scattered across the globe and the precise tools that operate on the system. Once these machines are exposed, a large volume of data is transferred to a centralized storage place for end. The right decision to interpret such data in real-time requirements was taken on the basis of an objective obsession (Wang et al., 2020). To make the best choices about individuals and issues using data mining methodologies and machine learning techniques, it allows making the best decisions. The IoT that infuse large quantities of knowledge needed to be explored with application parameters will be processed and disseminated in order to provide access to reliable, usable and bug-free details for the purposes of data analysis of the right decision and avoiding problems (Tran et al., 2012).

Traceability is part of public security and sustainable development (Chen et al., 2019). The main measure of food-related management is the operation of the entire supply chain. If any problems arise in the production of food safety, they can be easily identified by effective management (Liu et al., 2019).

The prime contribution of the paper is

  1. Construction of cost-effective methodologies for the management of the FSC from manufacturer to customer, facilitating and updating any abnormal food condition.

  2. The manufacturing cycle is sustained by supplying sufficient data and confirmation to all customers.

  3. Guaranteed information retrieval device the data inside communications and applications to prevent unnecessary details that would impact the comfort of the consumer and even economic development.

The rest of the paper is organized as Section 2 demonstrates the Related Works, Section 3 presents the proposed work, Section 4 presents the performance evaluation, and finally the conclusion of the paper.

2. Background and related work

Every state of the food supply chain management system has been conceded and demonstrated meticulously to improve the safety of the food. HACCP is an anticipatory methodology to remove the chemical contents in the production system (Food Safety Management Sy, 2014). The Internet of Vehicle is introduced to communicate the vehicles in real-time using the sensor network with a wide communication range. The related software is used to implement connectivity (Ryan, 2014). A methodology (Borthakur et al., 2017) is implemented to represent the relationship between big data and business analytic. The smart environment evolved which consist of transmitting the data onto the smart network of IoT. For getting the right decision, decision-making model used on the data gathered from (IoT) devices by the business analytic. Conclude that the data analytic in a business field gives the right decision at the right time. Moreover, it is a successful key in business.

The innovative method (Alam et al., 2016) is used to demonstrate a complete review of the use of the C5.0 algorithm for clinical speech data. Analysing the data obtained by smart devices, the C5.0 quantitative algorithm used is focused on the foggy design of smart devices. This showed the potential of massive data to carry out work in the field of smart media apps. Effects and skill of algorithms focused on data mining techniques. The ID3 and C4.5 archive enhanced performance, mostly through increased memory capacity and high operation. At the end of the process, ANN and DLANN show the highest accuracy by modelling high-level data abstraction but are computationally expensive (Meidan et al., 2017). In order to identify approved machines, IoT system data has been used as a differential effect on machine learning algorithms. Random prediction refers to the analysis of network traffic data collection functions. The inventory is defined in order to specifically monitor IoT products and multi-class classifiers tested for each category of class. The optimal description of the particulars is recorded as the most reliable result (Singh and Gupta, 2014).

Discussions were held on the problems and methods of the comparative study of three classification algorithms. In comparison, various data sets are included in this analysis from UCL data set repositories. Experiment tests database, C5.0 algorithm has increased performance in all cases (HSSINA et al., 2014). A new significant attribute introduced by a filter that characterizes the decision defined for each instance by a quantified classification algorithm (Patil et al., 2012). Whether the classification algorithm is built from an initial batch of results or the comparison is a sequentially categorized model file to be used instead. Mathematically, the filter is a special subset of a partially well-arranged set. Filter is invoked for converging while its lower limit is low and its upper limit is high (Kaur et al., 2015). C5.0 is a decision type used for exploratory data processing of the gathered knowledge. C5.0 is a probabilistic modelling technique and a fairly significant volume of knowledge is used in data mining (Krishnan et al., 2020). The purpose of this algorithm is to detect the amount of collections given by the parameter itself, the data set.

FSC has been used to minimize the wastage of foods in developed countries. The impact of utilizing the environmental aspects provides the resource utilization in effective manner. The life cycle management has been utilized to assess the environmental and customer satisfaction in real-time scenario (Carino et al., 2020). The economic aspects of food service has been improved using the FSC, it will also help the patients for utilizing the enhanced food safety. The appraisal tool has been used to identify the quality of appraisal from Environmental Science and Evaluation Databases (Kay and Janssen, 2019). Traceability in FSC has been used to deliver the high quality of food globally in complex situations. Blockchain concept has been implemented to provide the traceability in FSC with trust. The boundaries are identified to categorize the quality of traceability to effective independent governance (Xu et al., 2020). Security in the Eco system has been maintained by utilizing the natural resources and food management. The food supply networks may cover the food components and to maintain the eco system with safety. The optimized mathematical modelling has been generated to utilize the IoT (Abid HaleemKhan and Khan, 2019). The FSC with the traceability functionality has been developed to maintain the confidence to the customer. The grey related methodology is utilized to identify the relationship with the customers and supply chain management (Gu et al., 2017).

The rest of this section will review the related work on different methods, technologies and applications for smart FSC and IoT.

The new method was developed using Material Conscious and Information Network (MCIN)-based smart agriculture architecture, which is different from the current vertical architecture and includes development, management and commerce (Kaur, 2016). This architecture was used for enhances current agriculture and stimulates a lot in the electronic commerce combined with production-marketing. The realization of IoT in the field of agriculture and food, including a comprehensive review of its implementation structure, considerations and implications is arrived. The result shows that using IoT in fields and orchards can help farmers reap the benefits of their multiplicity of technology (). Many researchers surveyed several traditional Agriculture IoT Sensor Monitoring Network innovations using the backbone of cloud computing. This shows precision farming sensor monitoring network is widely used to measure agro-related information such as temperature, humidity, soil pH, soil nutrition, water level, etc. so IoT farmers can monitor their crop and equipment remotely by phones and computers (Cambra et al., 2017).

The design of a smart IoT communication system that would be used as a low cost controller and novel fuzzy computational algorithm for smart IoT irrigation systems. All data collected from the microcontroller for statistical information and processing are sent to a cloud database (Kokkonis et al., 2017; Kinjal et al., 2018). The new application was developed in the field of IoT, called “Smart Irrigation Analysis,” which provides the end-user with remote field irrigation analysis that is better than traditional field crop irrigation. Cloud data is analysed, and irrigation-related graph report is made for future use by farmers to determine which crop to sown (Hsu et al., 2008).

Traceability is very important for ensuring food safety for consumers within the FSC. In recent years many solutions have been proposed with different emerging ICT technology to improve the traceability of animals, plants, and food products. A traceability system enabled by RFID for the supply chain of live fish is managed and system has been implemented and deployed for trial in the Live Fish logistics centre, and the results are valuable for practical reference (Tian, 2016). It also proposed a traceability system for the agro-food, using RFID (Radio-Frequency IDentification) and Blockchain technology. He analysed the advantages and disadvantages of using RFID and Blockchain technology in the construction of the traceability system for the agro-food, and demonstrated this system's construction process (Zinas et al., 2017).

In new innovation and implementation to open source IoT for the monitoring of cows using LoRaWAN architecture for long-range communication and studied that system architecture of high-level cattle tracking systems (Carbone et al., 2018). It also proposed a new approach that would lead to trusted cooperative applications and services within the agro-food chains. They used Blockchain to enhance transparency, information flow and management capacity, allowing farmers to better interact with other parts of the supply chain, particularly the consumer. Through proposing new food-on-demand model, they think the research will provide better performance value chains (https://catalog.data.gov/dataset/nyserda-new-york-offshore-wind-supply-chain-dataset-9b665).

3. Proposed work

3.1 System model

Figure 1 demonstrates the Food traceability from the manufacturer to the features of the customer and the mutual methodology for the food traceability method. The track is mentioned that the farmers connected with the manufacturers and transport regulation play an important role to establish food traceability. The distribution endeavours are used to transform the things to the retail endeavours and finally the customer, this process is called as Backtracking. Tracking and backtracking procedures have been travelled in a simultaneous way to form food traceability. For people's wellbeing and social security and growth, food traceability plays an extremely rare role. It is an essential indicator of risk management for food safety and an efficient technology to monitor the whole supply chain. If there is a food safety processing concern that can be traced back to the source, the issue and successful governance can be established. The food safety traceability scheme not only involves the recording and tracking of agriculture from birth to the slaughterhouse feeding process (feeding and control, disease prevention, treatment), but also includes food items on the customer market (supermarket), customers can question food products breeding, slaughtering, harvesting , processing via any food specific identification code.

Sensors have the capability to improve the apprehension of an IoT and several associated parameters. Food traceability system the environmental conditions are evaluated using the sensors with cost reduced techniques based on economical computer board and fast communication with the system. The connectivity has been done within the Transport systems, agriculture sector, Energy utilization, Security and Privacy, Management of Building, Embedded systems, Wireless systems, Pervasive computing, Wireless Sensor network, Smart cities and healthcare applications.

Traceability and environment detection are the two vital parameters for food-related supply chain management. The traceability of FSC system is to maintain the confidence for the customer regarding the quality. The production unit is produced the product based on the customer needs.

To achieve the goal, several standards are developed. They are

  1. Sensors with tiny, minimized cost, and easy to handle.

  2. Bleakness and easily movable.

  3. A minimum amount of persistence.

  4. Comfort and Consistent data.

  5. Consists of the data about the product set and available resources.

  6. Permanent monitoring system for food traceability.

  7. Ensure the decision indication.

  8. Storing the data about the procedure of the production system and the communication way to other systems.

  9. Transmit the output to the communication system they only the data are viewed in the presented format.

  10. Enhanced system to eliminate the vulnerability in this food traceability system.

The Enterprise Resource Planning system is combined with IoT to share the data related to the FSC. The IoT framework is responsible for connecting within the users and the supply chain devices. The entire process is demonstrated in Figure 2.

3.2 Enhanced C5.0 Bayesian network

Initially, we require keeping the study for traceability of the entire simulation of the FSC and scheming the entire improvement. If the C5.0 classifier and Bayesian theory are joint, it can outcome in an efficient tree generation, pruning and optimization algorithm, which can be accepted to produce very close-optimal decision trees. This paper proposes an algorithm that adopts the C5.0 as the classifier and uses the post-pruning step of Bayesian posterior theory as a precision enhancer. Figure 3 demonstrates the cycle which will be followed by the proposed algorithm to generate, prune and optimize the decision tree.

There is some information gain associated with the attributes associated with every record in the training data collection. The C5.0 classifier operates by extracting the attribute with the highest gain of knowledge and is using this area of attributes as the dividing factor. To generate multiple subsets, this function is done recursively. Ultimately, a tree-like structure is created which follows structural hierarchy to enforce the training set classification. Fundamentally, splitting requirements are required for proposed algorithms to break a node into a tree structure.

Entropy analysis is used to determine a food node's degradation. It is specified as: (for values of a class)

(1)Entropy(t)=- p(i/t)log2p(i/t) 

Gini Index is the calculation of the difference between the probability distributions of the FSC, the values of the food attribute is different from that of impurity and is defined as:

(2)GiniIndex=1 [p(i/t)]2 

Classification Error: is computed as:

(3)Classification error(t)=1max[p (i/t)]
where, p (I / t) at a specified node t denotes the fraction of records belonging to class i.

Information gain is a variable based on impurity that utilizes entropy calculations as the impurity quantifies. It's the differentiation between manufacturer entropy and consumer entropy.

(4)InfoGain=Entropy(manufacturer)entropy(consumer)

The benefit ratio “normalizes” the advantage of information as follows

(5)AdvantageRatio=InformationGainforfoodsystem/Entropy

Impurity metrics such as entropy and Gini Index are likely to support various food attributes of dissimilar values. Then Gain Ratio is determined which is used to evaluate the food quality of a break. According to their function and type properties, each splitting criterion has its keep analysis and rule.

The Gini Index will face problems with food safety when the target food attribute domain is relatively broad. In this scenario, differential requirement named towing criteria may be employed. We describe this requirement as:

(6)TowingCriteria(t)=PLPR( (|p(i/tL)p(i/tR)|))

At the present, it is an opportunity to put a reference to the principle of the Bayes, which is, among the features, self-determination. So now, we are splitting proof into the new parts. Now if these two X and Y incidents are separate

(7)P(X,Y)=P(X)P(Y)

Consequently, we enter the result:

(8)P(y|(x1,x2,xn))=P(x1|y)P(x2|y)P(xn|y)P(y)/P(x1)P(x2)P(xn) 

this can be translated as

(9)P(y|(x1,x2,xn))=P(y)i=1nP(xi|y)/P(x1)P(x2)P(xn)

Now, as the denominator for a given effort leaves constant, we can remove that term:

(10)P(y|(x1,x2,xn))P(y)i=1nP(xi|y)

Now, for all potential ideals of the class variable y, we need to build a food model to discover the probability of recognized place of inputs and desire the yield with the highest probability. That can be scientifically articulated as:

(11)y=argmaxyP(y)i=1nP(xi|y)

Finally, the assignment of manipulative P(y) and P(xi) where P(y) is also called probability of class and P(xi) is called probability of condition. The dissimilar Bayesian networks diverge largely from the recommendation they make regarding the P(xi) distribution.

A Bayesian network is two-way methods of thinking that inputs (manufacturer) will compute output (consumers) and vice versa. The understood values of food nodes, the organization analyses the potential distribution of objective nodes to predict what is needed for food or to determine the likely reasons of big generated products. The investigation of observation is the significant source of decision making. It can set the unreliable that has the most consumer preference control. It implies that if distribution from the changeable is large, consumers are additionally likely to obtain. Shared data is a dependency function between 2 random variables and is ideal for forecasting the food data for the Bayesian network. It is the decrease in uncertainty attributed to meaningfulness, and vice versa. The shared data is between 2 variables and is given by:

(12)D(A,B)=a,bp(A,B)logP(A,B)P(A)P(B)
where P(A, B) is the object of the joint probability distribution, P(A) and P(B) are the boundaries of the A and B probability distribution functions respectively. D (A, B) implements abuse of A on B. The well-constructed the significance of the is D (A, B), the stronger the influence of A on B. Then the volatile result would be rank wise based on the value of D (A, B). And the element that has the upper priority role should be given extra focus and order of the produce process in real time. Create a food prediction model to determine the likelihood of reported input location for all possible class vector y values, and wish the yield with maximum probability. This can be objectively described as:
(13)B=argmaxyP(B)i=1nP(Ai|B) 

To establish that the attribute tuple X from the classification regulation matches one of the class mark attributes (R1, R2… Rn), we need to prove that A belongs to Rx. It is possible if and only if

P(Ra|X)>P(Ri|X)1in,ia

Calculating all class mark attributes probabilities P(Rk) and probability of P(X|Rk):

(14)P(Rk)=|Rk,D|/|D|,wherek=1,2,i
where's (k = 1,….,n) is the class mark attribute having n kinds of different classes that define n kind of different classes, Rk, D is a set of tuples that belongs to class Rk in training set D, |D| is the number of training set D, |Rk, D| is the number of Rk,D. The value of P (X|Ra) can be calculated with the help of data set D.

Using Bayes theory,

(15)P(Ra/X)=P(X/Ra)P(Rx)/P(A) 

Attribute tuple A can be classified into class Sx only if the value of P (X|Ra) is maximum. This means that this branch should not be pruned. If this condition is not true, then the branch should be pruned.

The Bayesian network can entirely illustrate the replacement of conditional probability to logic gate. The application of conditional probability method can make full use of the historical data and the prior probability of food traceability to improve the accuracy of data. Using Bayesian network quantitative methods to analyse the performance of C5.0 is more helpful to analyse the food contamination in the actual operation while conducting in-depth research on the influencing factors and risk transmission links of its operation mechanism and traceability. Some reasonable suggestions are put forward to solve the problems such as unbalanced supply and demand of fresh agricultural products, and difficulties in continuous supply under seasonal consumption peaks or emergency management conditions caused by information asymmetry, natural disasters, food safety, etc.

This paper suggests a hierarchical technique called the C5.0 Bayesian Network which enables determined sets of their entity to be combined with other distinct ones. Although this attributes has not been accepted in the IoT data logic region, we have developed it and make it possible to improve the quality of the product in Figure 4. Enhanced C5.0 Bayesian model for the phase of the food supply chain management is typical of the analytical approach for incremental production.

C5.0 BN is a high-accuracy classification method by combining decision tree and Bayes theorem together. It uses averaged global accuracy as the measurement of goodness in the induction process of the tree structure, and chooses the local classifier that is most specific for the target instance to make the decision. It mainly introduces a pruning strategy based on local accuracy estimation. Instead of directly using the most specific local classifier (mostly the classifier in a leaf node) to making classification in C5.0 BN, our pruning strategy uses the measurement of local accuracy to guide the selection of local classifier for decision.

3.2.1 Enhanced C5.0 Bayesian modelling algorithm

The C5.0 BN is a novel development of Bayesian Network Algorithms focused on decision trees and constructed from a directory of conditional possible attributes and testing case location, and then the decision trees may be used to identify subsequent test case sets. C5.0 BN has been extended as an improved version of a respected and commonly used C4.5 classifier, and has many important factors over its predecessor. C5.0 BN is the categorization algorithm that is suitable for very large data set. On time of execution, contrast of performance, and precision-recall, it is higher than C4.5. The C5.0BN model works by dividing data on food quality training and gives full impact. C5.0 BN actually includes further attributes and omits attributes from the set of data on food quality preparation.

The training quality data is used in this paper to construct C5.0 decision tree when forecasting the research food results. It causes the resulting trees for judgment to be minimized and also the acceptance of numeric attributes, omitted values and noisy data. This produces a threshold in order to hold continuous attributes, and then splits the array into those attribute worth which is more than the threshold which is less than or equivalent to it. C5.0 Bayesian network has formerly formed through the decision tree and attempts to eradicate branches that do not help by replacing them with leaf nodes. This paper enlarges C5.0 classifier accuracy by applying Bayesian post-pruning technique. Using Bayesian posterior theory, the decision tree created by C5.0 is checked and all branches that do not meet the necessary requirements are removed. The following steps describe the proposed algorithm to:

Input: Target Attribute, Example, Attribute

Begin Procedure c5.0BN ()

  • a.divide tag;

  • b.divide := 0;

  • c.For every aS

  • i.a¯=pre_regions(a);

  • d.If closure (a¯) 0 then

  • i.divide (a);

  • ii.divide := 1;

  • e.End if

  • f.connect.process connect.controller; 

  • g.process.start  complete;

  • h.If connect.controller  complete then

  • i.shift.process;

  • i.End if

  • j.connect.process unification;

  • k.Evaluate(result);

  • l.End for

  • m.End Procedure

Output: A tree of post-pruned decisions.

3.3 Splitting formation

C5.0 uses the splitting variable for maximizes the gain ratio. When tracing the path from the root node (manufacturer) to a particular leaf node (consumer), a set of rules can be established which condition path is used. In this way, traversing all the leaf nodes produces a rule collection, which is a textual description of the decision tree was created. The C5.0 algorithm is as follows;

Input: Target Attribute, Example, Attribute

  • Step 1: Analyse the reference cases.

  • Step 2: Use the training data to create a decision tree.

  • Step 3: Choose the highest information gain value.

  • Step 4: Using the decision tree to decide its class for each element in the dataset, since the application of a given tuple to a decision tree is relatively straightforward.

Output: A decision tree.

3.4 Calculating the probabilities for food traceability management

Traceability is a key pillar in providing a perception of safety. Further, in terms of firm behaviour, the cost of penalties (e.g. infringement notices, prohibition, seizure, and plant closure), loss of reputation or prestige, and the probability of detecting unsafe food (e.g. food-borne illness surveillance) improves the cost-benefit equation for traceability systems.

At the moment, it's time to set a food data assumption to the Bayes' theorem, which is, independence among the food attributes. So now, we divide evidence into the independent parts.

Now, if any two events X and Y are independent, then

(16)P(X,Y)=P(X)P(Y)

Hence, we reach to the result:

(17)P(y|(x1,x2,xn))=P(x1|y)P(x2|y)P(xn|y)P(y)P(x1)P(x2)P(xn) 

this can be expressed as:

(18)P(y|(x1,x2,xn))=P(y)i=1nP(xi|y)P(x1)P(x2) P(xn) 

Now, as the denominator remains constant for a given input, we can remove that term:

(19)P(y|(x1,x2,xn))P(y)i=1nP(xi|y) 

Now, we require building a decision classifier to find the probability of known set of inputs for all possible values of the class variable y and choose up the output with maximum probability. This can be expressed mathematically as:

(20)y=argmaxyP(y)i=1nP(xi|y) 

So, finally, the mission of calculating P(y) and P(xi | y) where P(y) is also called class probability and P(xi | y) is called conditional probability. The different naive Bayes classifiers differ mainly by the hypothesis they make concerning the distribution of P(xi | y).

3.5 Bayesian classifier influences food traceability management

Food monitoring issues at manufacturing is focussed to defect tracking process, a form based on Bayesian Network investigation is a useful implement to examine this dilemma. The BN associations are created by the device which is collected data from the sensor. In the verification that there is no contamination found in transportation makes the Chemical Contamination2 independent of the Biological Contamination1. This confirmation made from the concept of Markov chain of the Biological Contamination1 affect the producer, dealer and shopkeeper nodes. Using Bayesian rule, capable of not only to examine and approximation the probable origin of food defect, also to recognize opportunity of contamination extend such as Biological Contamination2 and Biological Contamination3 are shown in Figure 5.

3.6 Food quality monitoring procedure

Irrespective of the development of an automated approach for the FSC information system, the container is linked to the CPU and the temperature and humidity of the sensors are regarded. The most significant use of the proposed research is to develop the auxiliary sensors with the constriction of the related instruments. The complete system is user friendly with the barcode reader apps, and the Bluetooth is attached to RFID. Web-related services are provided with the Web and GSM. The sensor is responsible for identifying the variation of contact between RFIDs. Each time the car rotates, the sensor inspects the RFID data. After inspecting the obtainable data, the Monitoring system may review the information.

The safety system locates the connectivity to the database by classifying the essential constraints for the sensor dimension. To observe the system, the IoT data is used for monitoring the food products. The image for the management of food traceability in real time is updated to ensure that the commodity is defect or not. Using the self-governing power system, the entire system is motorised. The suggested research is used with the corresponding freight framework, using the sensor networks.

4. Performance evaluation

In this section, we discuss the steps of the study and the experimental to get the result. The data was collected by applying industrial Dem and J Response (DR) by the IoT. Data is for facility energy management systems. This can be used for academic purpose. NYSERDA supply chain Dataset [44] is used for the performance evaluation. The proposed model is trained using the classifier and the training dataset has the parameters of variable selection and validation process. It is used to produce the efficient result. It contains 16,382 instances which split into two parts train 11,467 and test 4,914 and includes 7 attributes itemized below:

Demand_Response {Numeric} , area {Numeric}, season {Numeric}, energy {Numeric}, cost {Numeric}, pair_no {Numeric} and distance {Numeric} (Figure 8).

The dataset is divided into training and testing sets. The proposed C5.0 BN is used to implement the training of the dataset and produced the output. The Bayes theorem is used to construct the decision. The simulation parameters are demonstrated in Table 1.

Through the use of RStudio IDE using java programming language, the experimental data is filtered in order to remove the missing or erroneous values generated during the collection of data. The original data was divided by the ratio of 7:3 after the characteristic variables through correlation analysis. The 70% group (Training set) is used to training the decision tree. The remaining 30% group (Test set) is used to verify the tree's classification accuracy. A flow chart for the steps in generating the decision tree is shown in Figure 6.

4.1 Analysis of quality parameters

4.1.1 Accuracy

According to the findings obtained, the quality of the decision tree is significantly improved after execution of the Bayesian Network Classifier with the C5.0 algorithm. Table 1 and Figure 2 note the accuracy analysis of ID3, C4.5 and C5.0 results and of the proposed C5.0BN algorithm. The accuracy rate of proposed C5.0 BNA algorithm has obviously been improved.

It is conditional from Table 2, that the C5.0 Bayesian Network Algorithm (C5.0 BNA) has highly developed classification accuracy compared to the previous classification algorithms such as the ID3, C4.5 andC5.0 algorithms. The relation of the decision tree classifiers consistency tests is shown in Figure 7.

4.1.2 Memory utilization

Table 3 shows the complete memory representation used by ID3, C4.5 andC5.0 and the suggested C5.0 BNA algorithm, and their comparison. And the relative memory consumption of the proposed C5.0BN and algorithms ID3, C4.5 and C5.0 is provided using Figure 3. According to the findings achieved; the volume of memory usage of the ID3, C4.5 and C5.0 algorithms is larger than the suggested C5.0BN algorithm.

It is tentative from Table 3 that the C5.0 Bayesian Network (C5.0BN) algorithm has limited memory consumption as compared to previous classification algorithms such as the ID3, C4.5 and C5.0 algorithms. The power usage relation for the decision tree classifiers can be seen in Figure 8.

4.1.3 Training time

Based on the results collected, the training time required to approximate the data is greater than the algorithms ID3, C4.5 and C5.0. Thus output of the algorithms ID3, C4.5 and C5.0 is expected in training time stipulations. Using Figure 9, the sum of training time is given. The implemented algorithms were proposed for ID3, C4.5 and C5.0 and for C5.0BN.

It is impermanent from Table 4 that the C5.0 Bayesian Network (C5.0BN) algorithm has better training period than the previous classification algorithms such as the ID3, C4.5 and C5.0 algorithms.

4.1.4 Search time

The relative training period of the proposed algorithm, as well as ID3, C4.5 and C5.0 is shown in Table 5. The suggested C5.0BN takes less time to train the model when tested on the ID3, C4.5 and C5.0 algorithms as it is in performance. The efficiency of the proposed classification algorithm is highly proficient relative to the ID3, C4.5 and C5.0 algorithms, according to the predicted production figures. The comparative search time for algorithms from the decision tree is also given using Figure 10.

It is indicative from Table 5 that the C5.0 Bayesian Network has less search time than previous classification algorithms such as ID3, C4.5 and C5.0. The quest period relation for the decision tree classification is shown in Figure 13.

4.1.5 Error rate

Table 6 shows a comparison of traditional C5.0 algorithm performance and an improved CBN algorithm in error rate. Improved CBN algorithm has been seen to give fewer errors. The proposed percentage error rate and the traditional decision tree algorithm for the C5.0 are given using Figure 11.

It is conditional from Table 6 that the C5.0 Bayesian Network algorithm has a low error rate as compared to previous classification algorithms such as ID3, C4.5 and C5.0 algorithms. The comparison of the measurements of accuracy for the decision tree classifiers is shown in Figure 11.

Research estimates the quality of the FSC that the proposed procedure will have the increased sensitivity according to the original value. Figure 12 illustrates a quality management of the FSC. It mainly uses past data to predict the stipulation of the market, but market demand depends on a range of compound factors, quality of facilities counting, customer groups and government plan.

Figure 13 demonstrates the efficiency of the FSC, it also shows that 4 algorithms are evaluated and the output is determined that the proposed C5.0BN is well evaluated according to the algorithms ID3, CART and C4.5. It is concluded that the C5.0BN performs 1.6 percent better than the C4.5 algorithm, 2.8 percent better than the CART and 5.5 percent better than the ID3 algorithm for correctly classified instances.

The proposed C5.0BN finds the optimum as opposed to all other algorithms in the decision tree during the results analysis. Finally, the C5.0BN is the most accurate classifier compared to any other classification algorithm based on measurements of efficiency, accuracy and error rate and Table 7 shows the overall performance among the four algorithms.

4.2 Implications of the work

The experiments show that C5.0BN performs better in terms of memory consumption, training time, search time, error rate and performance measures than the other algorithms on FSC problems, especially when the problem dimension becomes bigger. The main reason is that C5.0BN has some good features compared with other algorithms. Firstly, C5.0BN has a decision-providing algorithm is hybrid with C5.0 algorithm and Bayesian approach to the network. Therefore the well-organized decisions are made to examine the possibility of using IoT is to monitor and effectively track the quality and safety of the food. C5.0 is established to be supportive to afford an excellent balance of comprehensive and local search ability for the algorithm. In this regard, C5.0 BN is a novel development of Bayesian Network model focused on decision trees and constructed from a directory of conditional possible attributes and testing case location, and then the decision trees may be used to identify subsequent test case sets. Experiments in section 5 show that this mechanism can significantly improve the performance of the algorithm.

The classification performance of the Bayesian network classifier, compared to non- Bayesian classifiers using real-world problem data, outperformed the ID3 algorithm and the random forest, and demonstrated to be competitive to C5.0 and a neural network, obtaining near to 99 percent in correct classification. Also, it can be pointed out that for each one of the folds in the five-fold cross validation experiment, the Bayesian network classifier presents less variability than the neural network due to the limited amount of edges in the network structure. Generally, the C5.0 BN classifier with its incremental learning method tries to overcome the bias/variance dilemma also known as overfitting, thereby improving the generalization power

5. Conclusion

With the fast development of processing services in FSC, it has become a significant concern to attain the optimal marketing service composition with a large number of manufacturer, distributer and retailer. We have proposed a hybrid algorithm to tackle the FSC problem. In FSC, this algorithm is implemented to address the limitations of the current FSC to prevent food defects from exceeding dangerous levels and to tell consumers when and where safety controls should be applied for the best results. Additionally, it is implemented to produce the efficient food traceability management using IoT. The quality is maintained from the producer for the needs of the customer with effective transportation. The proposed methodology has been implemented in an abnormal food condition. The unwanted data has been removed to enhance the health of the customer with economic growth. The proposed work has achieved the reduced computational complexity and hardware utilization. The efficient food traceability methodology is used to discover the food products from the producer and it will use the traceability.

5.1 Future enhancement

The Decision Tree Classifier algorithm may be explored in future enhancement on other datasets to produce more booming accuracy. By reflection such as the training set, the Decision Tree Classifier algorithms can also be analysed; F-measure, TP rate, ROC curve, Precision and the Kappa value test set. In future, it will also be compared with ensemble algorithms like random forest and Chi-square automatic interaction detection algorithm (CHAID).

Figures

Procedure for food traceability

Figure 1

Procedure for food traceability

Architecture for food based internet of things

Figure 2

Architecture for food based internet of things

Operational view of algorithm

Figure 3

Operational view of algorithm

Analytical approach based on Bayesian in FSC

Figure 4

Analytical approach based on Bayesian in FSC

Traceability with Bayesian network

Figure 5

Traceability with Bayesian network

Steps of work

Figure 6

Steps of work

Comparison of accuracy measure for decision tree algorithms

Figure 7

Comparison of accuracy measure for decision tree algorithms

Comparison of memory consumption for decision tree algorithms

Figure 8

Comparison of memory consumption for decision tree algorithms

Comparison of training time for decision tree algorithms

Figure 9

Comparison of training time for decision tree algorithms

Comparison of search time for decision tree algorithms

Figure 10

Comparison of search time for decision tree algorithms

Comparison of error rate for decision tree algorithms

Figure 11

Comparison of error rate for decision tree algorithms

Quality management of the FSC

Figure 12

Quality management of the FSC

Sustainability of the FSC

Figure 13

Sustainability of the FSC

Simulation parameters

Simulation parameterMeaning
SoftwareANYLOGISTIX
Output generatorJ Response
Total instances16,382
Training set11,467
Testing set4,915
FSC1,000
Number of instances10
Classifier functions445

Comparison of accuracy measure for decision tree classifiers algorithms

Number of experimentAccuracy of ID3 algorithm (%)Accuracy of C4.5 algorithm (%)Accuracy of C5.0 algorithm (%)Accuracy of C5.0 BNA algorithm (%)
171.5%86%80.5%85%
278%81%85%89%
374%79.5%83%93.5%
477%80.5%84%88.5%
581%85%89%94%
682.5%86.8%90.5%94.8%
776.5%81.5%87.5%94%

Comparison of memory consumption for decision tree algorithms

Number of experimentID3 (KB)C4.5 (KB)C5.0 (KB)C5.0 BNA (KB)
131,21130,62129,81729,021
234,82534,32733,82333,627
336,01336,17136,26336,681
439,46138,72637,46136,726
539,25138,72638,27137,726
635,62834,73433,92832,834
741,52840,17339,92839,173

Comparison of training time for decision tree algorithms

Number of experimentID3 (ms)C4.5 (ms)C5.0 (ms)C5.0 BNA (ms)
13.24.35.97.5
27.18.57.99.2
35.16.27.48.6
44.26.59.312.3
54.25.16.98.03
63.96.878.310.56
74.86.99.5212.08

Comparison of Search Time for Decision Tree algorithms

Number of experimentID3 (ms)C4.5 (ms)C5.0 (ms)C5.0 BNA (ms)
15.24.33.32.1
27.16.55.43.6
36.15.23.82.4
47.26.54.83.5
57.86.15.34.3
66.96.34.63.7
78.86.95.44.8

Comparison of error rate for decision tree algorithms

Number of experimentID3 (%)C4.5 (%)C5.0 (%)C5.0 BNA (%)
129.523.519.1416
220.8717.8914.911.45
319.2118.1217.8317.1
422.1419.4515.9811.51
521.5417.4511.015.78
616.2513.569.565.45
723.4718.5612.236.02

Overall comparison of performance of Decision Tree algorithms

Sl.noParametersC5.0 BNC5.0C4.5ID3
1Accuracy (%)92.2186.7881.6776.58
2Error rate (%)11.0115.8119.1727.09
3Memory consumption (KB)35,11235,48236,10136,879
4Training time (ms)9.127.26.55.2
5Search time (ms)3.254.855.196.27

References

Abad, E., Palacio, F., Nuin, M., González de Zárate, A., Juarros, A., Gómez, J.M. and Marco, S. (2009), “RFID smart tag for traceability and cold chain monitoring of food: demonstration in an intercontinental fresh fish logistic chain”, Journal of Food Engineering, Vol. 93 No. 4, pp. 394-399.

Abid Haleem, Khan, S. and Khan, M.I. (2019), “Traceability implementation in food supply chain: a grey-DEMATEL approach”, Information Processing in Agriculture, Vol. 6 No. 3, pp. 335-348.

Alam, F., Mehrnood, R., Katib, I. and Albeshri, A. (2016), “Analysis of eight data mining algorithms for smarter Internet of Things (IoT)”, Procedia Comput Sci, Vol. 98, pp. 437-442.

Borthakur, D., Dubey, H., Mahler, N.C.L. and Mankodiya, K. (2017), “Smart fog: fog computing framework for WIBupervised clustering analytics in wearable internet of things”, Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, November 14–16, 2017, Montreal, Canada, pp. 472-476, 978-l -5090-5991-l.

Cambra, C., Sendra, S., Lloret, J. and Garcia, L. (2017), “An IoT service-oriented system for agriculture monitoring”, Proceeding of the 2017 IEEE International Conference on Communications (ICC'17), pp. 1-6.

Carbone, A., Davcev, D., Mitreski, K., Kocarev, L. and Stankovski, V. (2018), “Blockchain based distributed cloud fog platform for IoT supply chain management”, Proceedings of the Eighth International Conference on Advances in Computing, Electronics and Electrical Technology (CEET'18), pp. 51-58, doi: 10.15224/978-1-63248-144-3-37.

Carino, S., Porter, J., Malekpour, S. and Collins, J. (2020), “Environmental sustainability of hospital foodservices across the food supply chain: a systematic review”, Journal of the Academy of Nutrition and Dietetics, Vol. 120 No. 5, pp. 825-873.

Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A.V., et al. (2015), “Data mining for the internet of things: literature review and challenges”, International Journal of Distributed Sensor Networks, Vol. 2015, pp. 1-14.

Chen, L., Lu, Y. and Zhao, R. (2019), “Analysis and application of modern supply chain system in China”, Modern Supply Chain Research and Applications (MSCRA), 2631-3871.

Etim, I.E. and Lota, J. (2016), “Power control in cognitive radios, internet-of things (IoT) for factories and industrial automation”, Proceedings of Annual Conference IEEE, IEEE Industrial Electronics Society, p. 47014705.

Feng, S., Setoodeh, P. and Haykin, S. (2017), “Smart home: cognitive interactive people-centric internet of things”, IEEE Communications Magazine, Vol. 55 No. 2, p. 34_39.

Food Safety Management System (2014), “The amber valley”, available at: http://www.ambervalley.gov.uk/health-and-social-care/food-safety/food-safety management-system.aspx.

Gu, X., Chai, Y., Liu, Y., Shen, J., Huang, Y. and Nan, Y. (2017), “A MCIN-based architecture of smart agriculture”, International Journal of Crowd Science1, Vol. 3 No. 2017, pp. 237-248, doi: 10.1108/IJCS-08-2017-0017.

Hssina, B., Merbouha, A., Ezzikouri, H. and Erritali, M. (2014), “A comparative study of decision tree ID3 and C4.5”, International Journal of Advanced Science and computer Applications (IJACSA), Special Issue on Advances in Vehicular Ad Hoc Networking and Applications, pp. 13-19, doi: 10.14569/SpecialIssue.2014.040203.

Hsu, Y., Chen, A. and Wang, C. (2008), “A RFID-enabled traceability system for the supply chain of live fish”, Proceeding of the 2008 IEEE International Conference on Automation and Logistics, pp. 81-86, doi: 10.1109/ICAL.2008.4636124.

Kaur, K. (2016), “The agriculture Internet of things: a review of the concepts and implications of implementation”, International Journal of Recent Trends in Engineering and Research (IJRTER), Vol. 02, p. 04.

Kaur, D. and Bedi, R., Gupta, K.S. (2015), “Review of decision tree data mining algorithms: ID3 and C4.5”, International Conference on Information Technology and Computer Science.

Kay, B. and Janssen, M.F.W.H.A. (2019), “Boundary conditions for traceability in food supply chains using blockchain technology”, International Journal of Information Management, p. 101969.

Kinjal, A.R., Patel, B.S. and Bhatt, C.C. (2018), “Smart irrigation: towards next generation agriculture”, Studies in Big Data Book Series, in Dey, N., Hassanien, A., Bhatt, C., Ashour, A. and Satapathy, S. (Eds), Internet of Things and Big Data Analytics toward Next-Generation Intelligence, Springer, Cham, Vol. 30, doi: 10.1007/978-3-319-60435-0_11.

Kokkonis, G., Kontogiannis, S. and Tomtsis, D. (2017), “A smart IoT fuzzy irrigation system”, IOSR Journal of Engineering, Vol. 06 No. 2017, pp. 15-21.

Krishnan, R., Agarwal, R., Bajada, C. and Arshinder, K. (2020), “Redesigning a food supply chain for environmental sustainability – an analysis of resource use and recovery”, Journal of Cleaner Production, Vol. 242, p. 118374.

Li, G., Xu, G., Sangaiah, A.K., Wu, J. and Li, J. (2019), “EdgeLaaS: edge learningas a service for knowledge-centric connected healthcare”, IEEE Netw., Vol. 33 No. 6, pp. 37-43.

Liang, H., Wu, J., Mumtaz, S., Li, J., Lin, X. and Wen, M. (2019), “MBID: micro-blockchain-based geographical dynamic intrusion detection for V2X”, IEEE Communications Magazine, Vol. 57 No. 10, pp. 77-83.

Lin, Q., Wang, H., Pei, X. and Wang, J. (2019a), “Food safety traceability system based on blockchain and EPCIS”, IEEE Access, Vol. 7, pp. 20698-20707.

Lin, X., Li, J., Wu, J., Liang, H. and Yang, W. (2019b), “Making knowledge trad-able in edge-AI enabled IoT: a consortium blockchain-based efficientand incentive approach”, IEEE Trans Ind. Informat., Vol. 15 No. 12, pp. 6367-6378.

Liu, Y., Han, W., Zhang, Y., Li, L., Wang, J. and Zheng, L. (2016), “An Internet-of-Things solution for food safety and quality control: a pilot project in China”, Journal of Industrial Information Integration, Vol. 3, pp. 1-7.

Liu, W., Wang, D., Long, S., Shen, X. and Shi, V. (2019), “Service supply chain management: a behavioural operations perspective”, Modern Supply Chain Research and Applications(MSCRA), 2631-3871.

Meidan, Y., Bohadana, M., Shabtai, A., Ochoa, M., Tippenhauer, N.O., et al. (2017), “Detection of unauthorized IOT devices using machine learning techniques”, Cryptography Secur, Vol. 1, pp. 1-13.

Patil, N., Lathi, R. and Chitre, V. (2012), “Comparison of C5.0 & CART classification algorithms using pruning Technique”, International Journal of Engineering Research and Technology (IJERT), Vol. 1 No. 4, pp. 1-5.

Riazul Islam, S.M., Kwak, D., Humaun Kabir, M., Hossain, M. and Kwak, K.S. (2015), “The Internet of Things for health care: a comprehensive survey”, IEEE Access, Vol. 3, pp. 678-708.

Ryan, J.M. (2014), Guide to Food Safety and Quality during Transportation: Controls, Standards and Practices, Elsevier, Academic Press.

Singh, S. and Gupta, P. (2014), “Comparative study ID3, CART and C4.5 decision tree algorithm: a survey”, International Journal of Advanced Information Science and Technology (IJAIST), Vol. 27 No. 27, pp. 97-103.

Tian, F. (2016), “An agri-food supply chain traceability system for China based on RFID and blockchain technology”. Proceeding ofthe 13th International Conference on Service Systems and Service Management(ICSSSM'16), pp. 1-6, doi: 10.1109/ICSSSM.2016.7538424.

Tolba, A. and Altameem, A., “A three-tier architecture for securing IoV communications using vehicular dependencies”, IEEE Access, Vol. 7, pp. 61331-61341, 2019.44996VOLUME 8, 2020.

Tran, T.L., Peng, L., Diao, Y., McGregor, A. and Liu, A. (2012), “CLARO: modeling and processing uncertain data streams”, VLDB Journal, Vol. 21 No. 5, pp. 651-676.

Tsang, Y.P., Choy, K.L., Wu, C.H. and Ho, G.T.S. (2019), “Multi-objective mapping method for 3D environmental sensor network deployment”, IEEE Communications Letters, Vol. 23 No. 7, pp. 1231-1235.

Wang, M., Sobhan, A., Lincoln, C. and Bill Wang, W. (2020), Logistics Innovation Capability and its Impacts on the Supply Chain Risks in the Industry 4.0 Era, Modern Supply Chain Research and Applications(MSCRA), 2631-3871, doi: 10.1108/MSCRA-07-2019-0015.

Wen, Z., Hu, S., Clercq, D.D., Beck, M.B., Zhang, H., Zhang, H., Fei, F. and Liu, J. (2018), “Design implementation and evaluation of an Internet of Things (IoT) network system for restaurant food waste management”, Waste Management, Vol. 73, pp. 26-38.

Xiao, L., Wan, X., Lu, X., Zhang, Y. and Wu, D. (2018), “IoT security techniques based on machine learning”, Cryptography Secur, Vol. 1, pp. 1-20.

Xu, W., Zhang, Z., Wang, H., Yang, Yi and Zhang, Y. (2020), “Optimization of monitoring network system for Eco safety on Internet of Things platform and environmental food supply chain”, Computer Communications, Vol. 151, pp. 320-330.

Zinas, N., Kontogiannis, S., Kokkonis, G., Valsamidis, S. and Kazanidis, I. (2017), “Proposed open source architecture for Long Range monitoring. The case study of cattle tracking at Pogoniani”, Proceedings of the 21st Pan-Hellenic Conference on Informatics(PCI 2017), Article 57, 6, ACM, New York, NY. doi: 10.1145/3139367.3139437.

Corresponding author

Balamurugan Souprayen can be contacted at: chella40978@gmail.com

Related articles