Novel comparative methodology of hybrid support vector machine with meta-heuristic algorithms to develop an integrated candlestick technical analysis model

Armin Mahmoodi (Department of Aerospace Engineering, Carleton University, Ottawa, Canada)

Leila Hashemi (Department of Mechanical and Aerospace Engineering, Carleton University, Ottawa, Canada)

Amin Mahmoodi (Department of Mechanical Engineering, Islamic Azad University of Izeh, Izeh, Iran)

Benyamin Mahmoodi (Department of Industrial Engineering, Islamic Azad University of Izeh, Izeh, Iran)

Milad Jasemi (Stephens College of Business, University of Montevallo, Montevallo, Alabama, USA)

Journal of Capital Markets Studies

ISSN: 2514-4774

Article publication date: 8 December 2023

Downloads

194

pdf (6.2 MB)

Abstract

Purpose

The proposed model has been aimed to predict stock market signals by designing an accurate model. In this sense, the stock market is analysed by the technical analysis of Japanese Candlestick, which is combined by the following meta heuristic algorithms: support vector machine (SVM), meta-heuristic algorithms, particle swarm optimization (PSO), imperialist competition algorithm (ICA) and genetic algorithm (GA).

Design/methodology/approach

In addition, among the developed algorithms, the most effective one is chosen to determine probable sell and buy signals. Moreover, the authors have proposed comparative results to validate the designed model in this study with the same basic models of three articles in the past. Hence, PSO is used as a classification method to search the solution space absolutelyand with the high speed of running. In terms of the second model, SVM and ICA are examined by the time. Where the ICA is an improver for the SVM parameters. Finally, in the third model, SVM and GA are studied, where GA acts as optimizer and feature selection agent.

Findings

Results have been indicated that, the prediction accuracy of all new models are high for only six days, however, with respect to the confusion matrixes results, it is understood that the SVM-GA and SVM-ICA models have correctly predicted more sell signals, and the SCM-PSO model has correctly predicted more buy signals. However, SVM-ICA has shown better performance than other models considering executing the implemented models.

Research limitations/implications

In this study, the authors to analyze the data the long length of time between the years 2013–2021, makes the input data analysis challenging. They must be changed with respect to the conditions.

Originality/value

In this study, two methods have been developed in a candlestick model, they are raw based and signal-based approaches which the hit rate is determined by the percentage of correct evaluations of the stock market for a 16-day period.

Keywords

Citation

Mahmoodi, A., Hashemi, L., Mahmoodi, A., Mahmoodi, B. and Jasemi, M. (2023), "Novel comparative methodology of hybrid support vector machine with meta-heuristic algorithms to develop an integrated candlestick technical analysis model", Journal of Capital Markets Studies, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/JCMS-04-2023-0013

Publisher

:

Emerald Publishing Limited

License

Published in Journal of Capital Markets Studies. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) license. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this license may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

There are several advantages in tracking stock price movement; hence investors and scientists invest time to study in this area. To forecast precisely, there are some prerequisites such as: correct data about stock market and its changes, trend prediction and an outcome of random-walk behavior of a stock series. However, with knowing all these factors, stock price prediction is still difficult due to its non-linear stocker market fluctuation, to overcome this issue investors and financial analysts need to have safe tools (Jasemi et al., 2011a, b; Mahmoodi et al., 2023b, c). Fortunately, artificial intelligence (AI) with its ability to analyze non-linear relations and since it can apply the dominant uncertainty in stock market, it can address this issue.

Through AI, more precise and new prediction methods compared to the previous ones have been achieved which are, however, not exempt from negative points. They are categorized into two classes: fundamental and technical analyses. The former analysis studies various factors with great influence on stock market, that mostly are unavailable, like micro-economics, macro-economic, political and even psychology. The latter, technical analysis, uses previous patterns for making new predictions, nevertheless these patterns are not easily noticeable (Xiao et al., 2013). Digital era's unprecedented advances have made forecasting also a technological matter. The most reliable and commonly used approaches are currently based on artificial neural networks (ANNs), recurrent neural networks, which are basically involved in machine learning (Olivia, 2008). In many instances the most challenging task is to train a deep neural network which can generalize well to new data. To tackle this issue, some methods such as cross-validation (regularization) or Bayesian methods have been proposed (MacKay, 1992).

A novel method recognized for supervising learning and overcoming limitations is SVM. The features of this method are classification and regression, and it is practically efficient since it is based on solid theoretical foundation. SVM helps to yield global optimal solutions, however, ANN frequently results are local optimal solutions. In SVM, a data component is assigned to a point in n-dimensional space (n is the number of accessible highlights of dataset) in which the esteem of highlight being the esteem of a specific facilitate. It classifies the data by realizing the hyper lane that separates the two classes; hence the precision of backup vectors is closely related to setting up the parameters. The interest of investors to use machine learning methods such as Japanese candlestick forecasting models, is based on the above-mentioned positive points including its optimization methods. A supervised feed-forward neural network has been applied by Jasemi et al. (2011a, b) and Mahmoodi et al. (2023b, c) as an example and Barak et al. (2015) uses a Wrapper ANFIS-ICA as a fuzzy neural network; a NARX as a non-dynamic neural network as an analyst for their candlestick models has been applied by Ahmadi et al. (2016). In all priorly mentioned studies meta-heuristic algorithms were applied for finding the suitable number of variables and computational intelligence methods for stock price forecasting. Among them, PSO has been implemented in their research methods, which indicates the increasing prevalence of them in predictive models. In most of these studies optimization is done by using optimizers which aim to yield local and global results. These optimizers act like crossover operations used by genetic algorithm (GA) (Mahmoodi et al., 2023b, c, 2021, Mehrjoo et al., 2014). In this way by choosing the most suitable optimizer the fitness function in PSO yields the optimum solution.

What differentiates the particle swarm optimization (PSO), and the evolutionary computing are flying potential ways through hyperspace. In swarm optimization concept it accelerates toward better solutions while in evolutionary computation schemes it directly does progress toward potential solutions that are explained as locations in hyperspace (Kennedy, 2003).

Due to the lack of literature focusing on SVM, in this study SVM along with three meta-heuristic algorithms is investigated. The objective is an optimization model which has investigation on movement prediction of stock prices for General Motors company with direct effect on the combination of input variables and analysis of the accuracy of such procrastinations (Ahmadi et al., 2018; Mahmoodi et al., 2023b, c). PSO, GA and the imperialist competition algorithm (ICA) are used as optimizers.

The contributions of this study can be summarized into below points.

New machine learning methods for having the most suitable SVM parameters.
Comprehensive analysis of candlestick coefficients to select the most suitable signal forecast methods.
Implementation of PSO-SVM model in two different periods for the sake of the model's performance analysis.
Results have been compared with three other developed models including SVM-GA, SVM-ICA, ANN algorithms (the base study) approaches to examine the consistency and reliability of the model.

The flow of this research is as below:

Literature review is written in the second section. In the third section, backgrounds and last studies are introduced, which explains this work completely. Three new models of the study as well as the conceptual basics of the models are explained in section 4. Section 5 runs the models with real data and presents the outcomes. Section 6 explains the final discussions of the study; and references are covered in section 7.

2. Literature review

Researchers and financial investors have so far shown the forceful impact that stock markets and efficient factors have on economic structures of countries. These factors to date show how they are important in the determination of prices in a market. Many techniques and procedures have been done which were analyzed in three parts: Technical, Fundamental and Combined analysis. Additionally, each of these analyses is structured through different aspects such as machine learning, data sources' nature, accuracy, error criteria and modeling heuristic or meta-heuristic approaches.

ANN has been applied for prediction of stock index by Farahani and Rahimi (2021) and meta-heuristic algorithms, social spider optimization (SSO) and bat algorithm (BA) for learning it. However, for feature selection he used GA and technical indexes for input data.

A new meta-heuristic method was used for prediction of a trade company's stock price which is a learner algorithm motivated by financial institutes' performance. The trader is as a weak learner in this method and provides the companies with slight information. Kumar et al. (2020) used ANNs, fuzzy logic and GAs to teach the data and feature selection and introduced an intelligent method to predict stock prices. Hegazy et al. (2013) presented for 13 financial data collection a machine learning approach which predicts stock price with the PSO algorithm and least square support vector machine (LS-SVM). Then the results were compared to the neural network algorithm and Levenberg–Marquardt (LM). Like the current research, in most of them, a combination of technical methods and meta-heuristic methods has been applied. Nevertheless, in this study the minimum-maximum method for data pre-processing and the wrapper method for feature selection has been used. Additionally, as the predictor neural network and SVM and nonlinear autoregressive network was used and mean squared error and hit rate were applied as function criteria. The model in this research has been organized from different dimensions.

The data collection is considered the same as Ahmadi et al. (2018) and Mahmoodi et al., 2023b, c.
The candlestick technical trading strategies input data was analyses by SVM.
For optimization of SVM parameters and feature selection GA, colonial competition and PSO algorithm have been used for teaching and testing the data.

Finally, their function, and the gained accuracy degree of each presented hybrid model was evaluated by the hit rate index evaluated and then were compared with each other.

Even though many studies have been performed in this regard, their focus has not been on selecting the input data by the candlestick chart. Their focus was on choosing predictive methods. This study has considered two types of datasets like Jasemi et al.’s research. To yield different and excellent results the new hybrid model of SVM-PSO has been used by the achieved precision compared to the studies of Barak et al. (2015), Jasemi et al. (2011a, b), and Ahmadi et al. (2018). Several studies have investigated the advantages of candlestick in predicting the stock market (Lee and Jo, 1999; Xie et al., 2012; Lan et al., 2011).

Soft computing methods are popularly implemented for stock market problems (Barak et al., 2017) according to a nonlinear stock market system. They are useful tools for predicting such turbulent areas which suggest finding their nonlinear behavior. It is prevalent to use intelligent systems like neural networks, fuzzy systems and GA or hybrid models to foresee the financial implications. Financial time series of stock market funds forecasting problems are now common to be addressed by artificial neural network and SVM (Anbalagan and Maheswari, 2015). Many studies that combine evolutionary techniques with classification mechanisms can be found, nevertheless, even after developing many efficient models, few negative points can be realised in ANNs (Dahal et al., 2015; De Campos et al., 2016; Kuo et al., 2011). It results in lack of reproducibility of the process because of its learning proves based on strong likelihood, which is the reason that new approaches based on robust statistical principles like SVM is preferred by many researchers (Fernandez-Lozano et al., 2013). The SVM method, one of supervised learning methods, has recently become more popular as being one of the most advanced applications of regression and classification methods, due to its minimization of structural risk and its highly efficient practicality (Huang et al., 2005). Based on the above-mentioned advantages of SVM and the fact that it is present in Vapnik's statistical learning theory, much research concentrates on this theory and its applications. Even some researchers use SVM for times series prediction (Tay and Cao, 2001; Huang et al., 2005).

The SVM introduced by Vapnik (1995, 1998) is a machine learning approach that has been applied on problems of non-linear prediction because of its great performance (Wang et al., 2003).

It has been widely used in pattern recognition, regression and time series forecasting despite not being the best choice for researchers.

For instance, to predict time series Tay and Cao (2001) try to use this type of neural network. They concluded that SVM has better results than multi-layer propagation neural network in financial time series. In another example,

Examining these studies shows that although, in all recent research, prediction has implemented via justified approaches including various meta-heuristic methods without a comprehensive comparison between them and in none of them, PSO algorithm was superior in prediction. In this our study, more than presenting a developed SVM-PSO algorithm, GA and ICA are used to optimize the parameters of SVM which has never been done in the literature.

3. Methodology

Achievement of precision with the use of suitable stock market trading signals forecast models is the objective of this research. For this end, three structures are used to analyze the technical adjustment based on the background which is previously explained. It should be noted that each of these mentioned models will be explained in two different parts. The general methodology is as shown in Figure 1.

3.1 Input data

As mentioned above, the input dataset of this research is retrieved from the insights introduced by Jasemi et al. (2011a, b) and Mahmoodi et al. (2023a) per these two approaches, the daily stock prices including low, high, open and close prices change into 15 and 24 indicators.

3.2 The introduction of the models

3.2.1 SVM-PSO

3.2.1.1 Development of the initial particles in PSO

X matrix with (nf+2)×np dimensions have been made to produce the position of the primary particle so that nf equals to features numbers. It means that 15 and 24 in raw approach and signal approach, and np equals to the number of particles (18). Therefore, each column in the X matrix position shows the positions of the particles in the space of (nf+2) dimension.

Figure 2 reports one column of the X matrix. The first two components are C and Gamma parameters from SVM hyper parameters in this column. Parameter C first selects the values of C_min = 0 and C_max = 100. The Gamma parameter and the other components select a random number from the uniform random distribution over [0, 1). If its value is higher than 0.05, the feature is selected for SVM teaching for the third component. Otherwise, it will not be selected. Only the second feature is selected from the four features shown in Figure 6. V matrix speed is the same as the X matrix position, and their components are produced in the forms of random normal distribution with a mean of 0 and a standard deviation of 0.1.

3.2.1.2 Objective function

Figure 3 shows how a particle moves in the response space. Only two dimensions of (nf+2) dimension has been drawn for displaying. When moving, each particle should consider its previous moving direction. In addition, it should regard the best position that it has achieved so far (P_best) and the best position that all the particles have received (g_best) and update its position based on nine and ten steps in pseudo code. Due to considering the best position of each particle and all the particles, an objective function of which the output minimum is in the best position should be regarded. The objective function used in this research is calculated in relation 1.

(1)Objective Function=−accuracy

With regards to the fact that PSO naturally aims to find the minimum, symmetric accuracy has been used as the objective f2unction. It should be considered that accuracy is for a six-day signal.

3.2.2 SVM-GA

3.2.2.1 Initial chromosome creation in GA

Each chromosome in the taught model is modeled the same as each particle in the PSO algorithm. Nevertheless, the first and second genes are a random uniform number in the [l_b, u_b] range. Therefore, Figure 1 can be considered a chromosome. Further, each chromosome has a fitness value of which the value equals the output of the objective function of the problem highlighted in relation 1.

3.2.2.2 Parent selection and crossover

After each navigation from respective n_iter navigation, at first crossover will be done and its children will be produced.

First, the crossover is performed for each navigation of the n_iter from which the children are produced. Figure 4 shows how to select parents and crossover. First, the selection is made among the current 20 chromosomes via the Roulette wheel algorithm. Since this algorithm gives more acceptance chances to the chromosome with high fitness and objective function stresses the minimum, the reverse fitness of each chromosome is used for this algorithm. Finally, the parents' chromosome is divided into two sections among the two selected parents via the single-point crossover method. The first child takes the first parcel from the first parent and the second parcel from the second parent. In addition, the second child gets the first parcel from the second parent and the second parcel from the first parent. The mentioned operation is repeated n_cross/2 times, and finally, children are produced in the amount of n_cross, meaning that n_cross is achieved according to relation (2).

(2)ncross=⌊2×⌊nchromosomes×pc2⌋⌋

In relation 2, if the p_c value of an odd number in decimal part is selected to be 0.9, then multiplying it by n_chromosomes such as ten and dividing it into two will be 4.5. After floor, 4 is multiplied by 2, and the result is 8. Therefore, it is concluded that n_cross is an even number that is dividable into 2 in parent selection navigation and crossover.

3.2.2.3 Mutation

After producing children from the crossover, the mutation children should be produced. In so doing, the children are produced in the amount of n_mut, meaning that n_mut is achieved from relation 3.

(3)nmut=⌊nchromosomes×pm⌋

One chromosome is randomly selected among the 20 current chromosomes in each navigation for mutation, and one child is produced with its components. Then, a uniform mutation happens so that two genes of the child's genes are randomly selected. If the selected genes are from the primary two genes (C and Gamma parameters), the component is summed with a uniform random number in the range of [l_b, u_b). Otherwise, it will be summed with a random number in the [0, 1) range. It should be noticed that the primary two genes should not select the value smaller than l_b, and the other genes should not select the value less than 0 or higher than 1. Thus, a check bound is performed, and the invalid values change to boundary values. Figure 5 shows mutation for one time.

3.2.2.4 Selection of chromosomes for the next generation

To do so, first, the current chromosomes are merged to chromosomes achieved from crossover and mutation. After that, the number of c_hromosomes chromosomes that have the best fitness value is selected based on the ordered fitness value.

3.2.3 SVM-ICA

3.2.3.1 Initial population creation

First, the colony is produced in c_ountries numbers that each of them selects Figure 6 parameters like the two methods of SVM-GA and SVM-PSO. The primary quantifying of the parameters is identical to SVM-GA. The fitness value of each colony is the output of the objective function in relation 1. After colony production, they are ordered based on fitness value, and the first n_imp is selected as imperialist. The rest of the colonies were once shuffled to balance imperialist power, and then they are devoted to imperialist, respectively. It means that the first remaining colony is devoted to the first imperialist, and the second remaining colony is devoted to the second imperialist, and this process continues to n_imp. The remaining colony of n_imp + 1 is again devoted to the first imperialist, and this process continues to the last. Figure 10 shows a sample of this process.

3.2.3.2 Assimilation

To catch the colony to imperialist, a uniform random error in [0, 1) range will be produced in the amount of (nf+2). Then, the new position of the colony is calculated based on relation 4, in which the assimilation coefficient of β and U (0, 1) are the uniform random number between 0 and 1.

(4)x→colony=(x→imperalist−x→colony)×βU(0,1)

It should be noticed that the first two members of colony position (C and Gamma) should not be less than l_b, and the rest of the components should not pass over the [0, 1] range. Therefore, the bounds checking procedure occurs after relation 4.

3.2.3.3 Revolution

In this phase, revolution happens for each colony in each imperialist in the probability of p_revolve. It means that some components of (nf+2) that report the current position of the colony are selected randomly. Thus, if these components are from the first two components, they select a uniform random value in [l_b, u_b) range. Otherwise, they select in [0, 1) range.

3.2.3.4 Exchange

After performing the two previous steps on the colonies, if a colony has fitness value better than its imperialist, the position of itself and the imperialist change. In so doing, for each imperialist, the fitness value of its best colony is checked with its fitness value, and if it is less, an exchange operation will be performed.

3.2.3.5 Imperialist total fitness

Given that by performing the previous three steps, the general fitness value of the imperialist is expected to change, this updating is performed regarding relation 5 in which the Sigma of cost-efficient of colonies means and n are the colonies number in imperialist.

(5)TotalFitness(Imperialist)=fitness(imperialist)+ξn∑i=1nfitness(colonyi)

3.2.3.6 Imperialist competition

The weakest colony in the weakest imperialist will be selected in this step. After that, it is assigned to other imperialists based on the Roulette Wheel algorithm. It should be claimed that the reverse fitness value has been used for the Roulette wheel algorithm, of which the fitness with less value has more opportunity to be selected. Figure 7 offers this process in terms of the five imperials' existence.

After assigning the worst colony, the worst imperial might have only one imperialist. In this scenario, the relevant imperialist is calculated similarly as a colony and is assigned to an imperial regarding the Roulette wheel algorithm.

3.3 Calculate the total number of signals and hit rate

Implementation measures can be categorized in two parts that are statistical and non-statistical.

As per this research, the statistical part is more popular with the most popular one is hit rate (Atsalakis and Valavanis, 2009). Hit rate is defined as (number of success)/(total signals). It is worth mentioning that if the hit rate is higher than 51%, the model is useful (Lee, 2009). Since the base of this study is retrieved from the article of Jasemi et al. (2011a, b), reading that paper is recommended for better understanding. At this step, sell and buy signals and total number of signals are figured out and the number of correct signals during a six-day period are calculated. Additionally, the number of correct signals is calculated during a period of six days.

In his stem by using model outputs, all number of signals and sell and buy signals are discovered. Additionally, the number of correct signals is calculated during a period of six days. Due to the base study of this research which is Jasemi et al. (2011a, b), all details are set based on that study and for understanding better, reading that paper is suggested.

4. Results and discussion

4.1 Experimental results of models

4.1.1 Results of SVM-PSO

Values used are as below.

n_iter = 500,
n_particles = 18
c₁ = c₂ = 2
w_min = 0.4
w_max = 1.4
C_min = 0
C_max = 100

4.1.1.1 Results of accuracy of the implementation SVM-PSO model raw approach and signal approach

Table 1 reports the parameters and the selected features in the raw approach and the signal approach of the SVM-PSO model. The trained model in these two approaches has achieved the mean accuracy of 78.72 and 79.19%, respectively.

4.1.1.2 Confusion matrix SVM-PSO

Figure 8 shows the one-day signal prediction of the SVM-PSO model, meaning that tags of zero, one and two are sell, buy and neutral signals, respectively. According to the confusion matrix, the SVM-PSO model shows a relatively appropriate performance in buy and sell signals and can select an appropriate signal with an accuracy of 52.19%. For increasing this accuracy, a six-day assemblage has been used. It means that if the model predicts a buy signal, but the value decreases the next day, and we have a negative financial return, the signal is considered correct when we have a positive financial return in at least one day from the second day until the sixth day. In other words, the predicted signal might be correct until the following six days and have a positive financial return. Using this technique, the model accuracy in this dataset increases by 79.82%.

It should be noticed that the neutral signal is not considered for calculating accuracy because if the model exports the neutral signal, the user does not observe any disadvantage. On the other hand, if the buy and sell signal is exported, but the price does not change, it will be considered a disadvantage because of the person's investment. Thus, the last column of the confusion matrix is removed to calculate accuracy.

4.1.1.3 Bar chart SVM-PSO

Figure 9 (a) shows the accuracy and selected features numbers in the SVM-PSO model in the raw approach and Figure 9 (b) in signal approach. According to this figure, selecting seven features to teach the model has happened nine times in the raw approach as the most repeating action in the dataset; however, it has occurred once in signal approach. The best accuracy in each of these approaches (signal approach and raw approach) has happened in dataset 15 selected for features 10 and 6 with 85.37 and 86.95%, respectively.

4.1.1.4 Produced signals by SVM-PSO model in raw approach and signal approach

Table 2 shows the correctly predicted signals by the SVM-PSO model in each dataset. For example, in dataset 1, the model has produced 248 signals that each of them is the qualitative agent of increase, decrease and absence of stock change. If the predicted signal is a buy signal, the stock price must increase until the following six days at most, and if it is a sell signal, the stock price must decrease until the following six days at most. Of the 248 exported signals by the model, 114 signals happen in one day. In other words, if it is a buy signal, the price increases the next day, and if the signal is a sell one, the price decreases the next day. Among the 134 signals that had not been predicted correctly in the next day, 39 signals were correctly predicted in the following two days. Put another way, if a signal is a buy one, the price increases in the following two days and if it is a sell, the price decreases in the following two days. In this regard, this process continues until the following six days. Moreover, of the 248 exported signals by the model in dataset 1, 205 signals happen correctly, and the model predicts an 82.66% accuracy value.

4.1.2 Results of SVM-GA

Values use are as below.

n_iter = 50
p_c = 0.8
p_m = 0.2
n_chromosomes = 20
u_b = 10,000
l_b = 1×10−10

4.1.2.1 Results of accuracy of the implementation SVM-GA model raw approach and signal approach

Table 3 reports the parameter and the achieved features and their accuracy in the available 48 datasets in the SVM-GA model. The achieved accuracy mean for raw approach and signal approach is 79.46 and 80.10%, respectively, which offers the preference of signal approach to raw approach in this model.

4.1.2.2 Bar chart SVM-GA

Figure 10 (a) and (b) show the accuracy and selected features of teaching the SVM-GA model in the raw approach and signal approach, respectively. In terms of accuracy, the best performance has been done in dataset 48 with 84.80% accuracy in the raw approach, selecting six features of the available 15 features. On the other hand, the worst performance in dataset 10 with 71.60% accuracy has occurred with selecting ten features. This statistic has happened for the signal approach with 84.58% accuracy in dataset 48, selecting seven features, and 74.89% accuracy in dataset 34, selecting nine features. In terms of frequency, the utilized features to teach the models have occurred six times in the raw approach and 14 times in the signal approach.

4.1.2.3 Produced signals by SVM-GA model in raw approach and signal approach

Table 4 shows the correctly predicted signals by the SVM-GA model in each dataset. As can be seen, in dataset 1, Of the 236 exported signals that have been produced correctly by the model, 114 signals happen in one day. This indicates that like SVM-PSO model, the combination of SVM-GA in our model, has high accuracy in its performance.

4.1.3 Results of SVM-ICA

Used Values.

max_iter = 100
n_countries = 30
n_imp = 5
beta = 0.5
zeta = 0.1
p_revolve = 0.7
u_b = 10,000
l_b = 1×10−10

4.1.3.1 Results of accuracy of the implementation SVM-ICA model raw approach and signal approach

Table 5 indicates the accuracy of chosen features for teaching of the model and its parameters. Average accuracy in raw approach and signal approach are 80.66 and 81.59% respectively.

4.1.3.2 Bar chart SVM-ICA

A and B in Figure 11 report accuracy and selected features in the SVM-ICA model in the raw approach and the signal approach, respectively. The maximum accuracy of the raw approach that has occurred in dataset 48 has reached 84.98% accuracy, selecting the eight features to teach the model. Further, the maximum accuracy of the signal approach has occurred in dataset 48 with 85.77% accuracy, selecting eight features. In terms of frequency, the selected features in the raw approach have been eight features with 11 frequencies, and in the signal approach, they have been nine features with 13 frequencies.

4.1.3.3 Signals produced by SVM-ICA model in raw approach and signal approach

Table 6 shows the correctly predicted signals by the SVM-ICA model in each dataset. Compared to the other hybrid model (SVM-GA, and SVM-PSO), out of 232 correctly predicted signals, 112 signals happened in one day. Therefore, the SVM-ICA model had a weaker performance than the other two models.

4.2 Compare models and best result

Figures 12–14 report the achieved accuracy frequency in various models in the total of the raw approach and the signal approach (96 datasets). Considering the figure, the SVM-ICA and SVM-GA models have had the most frequencies, with 80% value happening in 20 datasets of the 96 datasets. The SVM-PSO has performed better than the other models; however, it has achieved the best accuracy in the dataset with 86.96%.

It should be claimed that seed = 35 has been used for teaching all the models in all the datasets to select all the identical initial random values and prevent different results in different performances.

According to Table 7 the developed models in one-day signals have performed better than the previous models. Compared with the six-day signals, using the ICA as a heuristic algorithm (Barak et al., 2015) performs better than the other models. However, this article's developed GA has performed better than other algorithms.

4.2.1 Heatmap comparison

Figures 15 and 16 show the accuracy comparison of the raw approach and signal approach models. It is observed that SVM-ICA has performed in the best way. However, it is observed that the models in the raw approach do not perform appropriately in the 3, 10 and 34 datasets. When accuracy improvement happens in using the signal approach in these datasets, it is observed that the models perform differently in different datasets. For example, the SVM-ICA model performs better in most datasets, while the SVM-PSO has performed better than the other models in dataset 15.

4.2.2 Confusion matrix comparison

Figure 17 shows the comparison of the confusion matrix of the different models in dataset 37 in the raw approach. It means that (1) is the SVM-PSO model, (2) is the SVM-GA model and (3) is the SVM-ICA model. Based on the confusion matrixes, the models of SVM-GA and SVM-ICA correctly predict more sell signals, and the SVM-PSO model correctly predicts more buy signals in this dataset.

5. Conclusion

The SVM-ICA has performed better than the other models according to the implemented models. Further, considering Figure 14, the performance of the correct sell and buy signals numbers are relatively different in different models. Thus, an ensemble approach can be used to improve the performance. It means that three different models are each taught with different initial random values, and a majority voting occurs when producing a signal. This process provides higher accuracy, but its execution requires much time.

Figures

Figure 1

Research method

Figure 2

Demonstration of one column in matrix X

Figure 3

Movement of a particle in the solution space

Figure 4

Parent selection and crossover

Figure 5

Parent selection and mutation

Figure 6

Assigning colonies to the imperialists for the first time

Figure 7

Moving weakest colony in weakest empire to another empire

Figure 8

Confusion matrix of SVM-PSO in dataset 39 – raw approach

Figure 9

Accuracy and number of selected features in SVM-PSO

Figure 10

Accuracy and number of selected features in SVM-GA

Figure 11

Accuracy and number of selected features in SVM-ICA

Figure 12

Prediction raw approach accuracy of different methods

Figure 13

Prediction signal approach accuracy of different methods

Figure 14

Frequency of model accuracies

Figure 15

Heatmap diagram of models over 48 datasets – raw approach

Figure 16

Heatmap diagram of models over 48 datasets – signal approach

Figure 17

Confusion matrix comparison of different models