Framework For Developing Intelligent Machine Learning Solutions To Maintain Security In Cyber World

  • Words 2249
  • Pages 5
Download PDF


Cyber Security (CS) is an important domain in Information Technology (IT) irrespective of whatever the techniques are used for digitalization. The process of securing cyberspace and cyber systems is in high priority, since no organization is willing to compromise their confidentiality and privacy. Several security principles, tools and protocols are available for identifying and detecting security threats. But, most of which uses signature based identification. This leads to sluggish and less accurate solutions in threat detection. Most of Advanced Persistent Threats (APTs) become unnoticed which affects the reputation of an organization to a larger extent. Thus, Machine Learning (ML) solutions can be applied in cyber security to effectively identify and predict the old and upcoming new threats. This paper introduces several phases involved in developing intelligent solutions for cyber security systems with the application of machine learning solutions.

Keywords: Cyber Security (CS), digitalization, Advanced Persistent Threats (APTs), Machine Learning (ML), intelligent solutions

Click to get a unique essay

Our writers can write you a new plagiarism-free essay on any topic

1. Introduction:

This paper details about the basic introduction of Cyber Security (CS) and its various domains. It also briefs the basics of Machine Learning (ML) which is a part of Artificial Intelligence (AI). With the advancement in computing power and capability to handle large datasets, Machine Learning provides precise solutions in detecting and preventing threats. A brief insight is provided on the effectiveness in collaborating Machine Learning and Cyber Security for generating intelligent threat detection and prevention systems. An overview of the processes involved in designing the intelligent systems are briefed which may differ based on the type of the domain in which the model has to be applied.

2. Overview on Cyber Security:

Cyber Security refers to the protection of cyber systems such as, laptops, mobile phones, tablets, clients systems, server systems, network devices, etc. Security is also needed for protecting confidential data that has high value for the referring organization.

Figure 2.1 Domains of Cyber Security

With the advancement of technology, many common operations such as communication, ordering food, shopping, paying bills, transferring money, etc. has become much easy for the users. This has led to the tremendous marketing development for the merchants too by covering most of the crowd. As the behaviour of the user changes from physical activity to digital activity, the necessity for protecting the systems have also transformed with due respect of time. Many intruders try to take advantage by sniffing and spoofing the confidential details of the online users which affects and compromises the privacy of users and creates distrust on merchants among users.

3. Overview on Machine Learning (ML):

Machine Learning is a part of Artificial Intelligence (AI) domain. AI refers to the process of automation of any tasks by applying human-like intelligence. Machine Learning is the process of automation of any tasks (T) by applying human-like intelligence, but also learns and upgrades itself from the experiences (E) associated with the tasks (T), by performing the tasks and its effectiveness is measured through performance (P). The definition for machine learning was proposed effectively by Tom Mitchell, “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E”.

Figure 3.1 Characteristics of Machine Learning

Thus, machine learning process includes three important characteristics such as, Tasks (T), Experience (E), and Performance (P). The major factor needed for implementing machine learning in any domain depends on the size of the dataset. When the quantity of the data set is large, the algorithms can be trained effectively which can provide accurate results on real-time data.

4. Collaboration of Machine Learning (ML) and Cyber Security (CS):

Security is an important concern in every part of human life, and of course for evolving new technologies. Every technologies and technological devices require security in various forms to protect themselves for various threats. Cyber security is a universal term for security which includes network security, mobile security, data security, application security and much more. It also varies based on the mode of operations, wired, wireless or virtual.

The most important principles of cyber security are CIA (Confidentiality, Integrity, and Authentication). Due to technological advancement and release of Advanced Persistent Threats (APTs), several other principles emerged such as, authorization, non-repudiation, accountability, risk assessment, awareness, timeliness, security management and much more.

Cyber security represents the protection of hosts such as web servers, mobile phones, tablets, routers, switch, etc. that are connected in the network. These devices, that are connected to the network, faces threats and vulnerabilities in various forms like illegal use of an resource, data modification, injecting malicious code like virus, worms, Trojan horse, and Remote Access Trojans (RATs). Some of the advanced types of threats are DDoS (Distributed Denial of Service), Phishing URLs, network anomalies and much more.

The recent report published by McAfee Labs (December 2018) shows the growth rate of new malwares and URL phishing in detail. It is to be noted that the rate rises alarmingly and it can cause serious havoc to the organizations and its associated assets.

Figure 4.1 McAfee Labs (December 2018) Report on new malicious code growth rate

The main reason to collaborate machine learning with cyber security is to:

  1. Analyse large datasets and understand patterns
  2. Accurately predict advanced threats
  3. Less human intervention, thus, reducing much of the time consumption
  4. Quick and timely response
  5. Recognize the attacks at initial stages and prevent it from spreading across the entire organization.

The traditional cyber security operations are mostly signature based detection. With the implementation of machine learning algorithms, it is found that the percentage of accuracy in threat detection can be improved with less human intervention. Much of the tasks are performed by these intelligent systems which saves the time of security analysts. Machines have the capacity to analyse large data sets and gather patterns from it, which is a time consuming task for security analysts. Machines can be trained, as like humans, with large datasets and can be tuned to improve its accuracy.

Early prediction of threats assists in mitigating the risk and theft of sensitive data on network devices. With high level of accuracy using machine learning, these threats can be detected with less false positives and high accuracy rate

5. Developing Security Intelligent Model:

The advantage of collaborating machine learning and cyber security can be achieved only when an appropriate security model is designed. The approach for implementing machine learning on operating environment to generate intelligent security system includes nine phases.

(i) Problem statement

(ii) Data collection

(iii) Feature extraction

(iv) Generate training and testing data

(v) Selection of model

(vi) Train the model

(vii) Test the model

(viii) Model optimization

(ix) Model usage

The first phase is problem statement where the domain to which the machine learning solutions are identified. It also specifies the selection of Tasks (T), its associated experience (E) whose performance (P) is measured. In cyber security, the problem statement can be finding and detecting attacks that happen in networks or any domain area.

The second phase includes the process of data collection. It is mandatory to select appropriate data for solving any machine learning problem effectively. Based on the problem statement on a particular area, relevant data should be collected. Various data collection software tools are available on the market such as, Wireshark, tcpdump, windump, Netflow, log files etc. Data collection on a network can be performed either through packet based data collection, flow based data collection or log based data collection.

The third phase represents the most important stage in developing a machine learning model i.e. feature extraction. Data collected during the second phase may be full of noise, redundant data. Some of the data may not be useful for generating a machine learning solution. It is necessary to pre-process the data before it is used in training and testing the machine learning solution. Processes like removing erroneous data, formatting the data in a pre-defined process structure, sampling the useful and needed data, breakdown of data which assists in undergoing effective prediction of the problem and measuring the data in different units and values. Some of the features that are necessary for generating machine learning based security solutions include server error rate, same service rate, connection to same port rate, number of failed login attempts, etc.

The fourth phase includes the process of generating data for training and testing purpose. Collected data after feature extraction has to be classified as training data and testing data. A single piece of data cannot be applied to both train and test the machine learning solutions. Thus, the data, which is classified as training data, is used to instruct or train our machine learning solutions through which it gains experience in solving the real-time solutions. The second type of data, which is classified as testing data, is used for evaluating the performance of the solutions. With reference made by Soma Halder and Sinan Ozdemir, the training data should include 60% of labelled (known) data and 40% of unlabelled (unknown) data, whereas, the testing data should include 40% of labelled (known) data and 60% of unlabelled (unknown) data.

The fifth phase is selection of a machine learning solutions for security. Several algorithms of are available for applying machine learning solutions in security such as, k-means clustering, classification, Support Vector Machines (SVM), random forest, decision trees, Classification and Regression Trees (CART), etc. Many researchers have contributed in studying and detailing the effectiveness of each algorithm in different security domains. Thus, the models can be tried and tested to adapt the best suitable solution.

The sixth phase represents the process of training the model which is selected earlier based on the security domain problem. The dataset designed for training the algorithm is used to instruct the behaviour and process pattern of the problem statement. For example, in regression the algorithm moves randomly by placing a separating line until the most efficient solution is not obtained.

The seventh phase represents the process of testing the model. The data that is classified for testing the algorithm is applied to check the performance of the machine learning solutions.

The eighth phase includes the process of optimization of model. The objective is not only to develop, evaluate and implement the intelligent solution, but also to tune its performance to much desired and accurate level. The greater the rate of threat prediction will lead to lessen the chance for data compromise. Thus, this step should not be treated as an optional one.

The final phase represents the process of implementing the model for real-time usage. The real-time dataset is constantly applied to the machine learning solutions. If the attack is known, it can be easily identified, since it may be included in the training or testing data. If the attack is not known, still, it can be easily identified based on its features which is stated during the feature extraction process.

6. Conclusion:

Cyber security is a vast domain. Thus, generating a single solution for all these problems is not an easy task. When a single solution is applied to the security systems, its framework becomes an easy target for the attackers. Thus, multiple layers of solutions have to be applied to obtain better intelligent solutions. Though machine learning enhances the performance of intelligent security solutions, it is not a silver bullet for security. The probability, accuracy and speed for predicting and detecting an attack become easier while using machine learning solutions in generating security systems.

7. References:

  1. Mukkamala, A. Sung, and A. Abraham, “Cyber security challenges: Designing efficient intrusion detection systems and antivirus tools,” in Enhancing Computer Security with Smart Technology, V. R. Vemuri,Ed. New York, NY, USA: Auerbach, 2005, pp. 125–163.
  2. Patcha and J.-M. Park, “An overview of anomaly detection techniques: Existing solutions and latest technological trends,” Comput. Netw., vol. 51, no. 12, pp. 3448_3470, Aug. 2007.
  3. A. L. Buczak and E. Guven, “A survey of data mining and machine learning methods for cyber security intrusion detection,” IEEE Commun. Surveys Tuts., vol. 18, no. 2, pp. 1153_1176, 2nd Quart., 2016.
  4. M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255_260, 2015.
  5. D. Moon, H. Im, I. Kim, and J. H. Park, “DTB-IDS: An intrusion detection system based on decision tree using behavior analysis for preventing APT attacks,” J. Supercomput., vol. 73, no. 7, pp. 2881_2895, 2017.
  6. S. Jo, H. Sung, and B. Ahn, “A comparative study on the performance of intrusion detection using decision tree and arti_cial neural network models,” J. Korea Soc. Digit. Ind. Inf. Manage., vol. 11, no. 4, pp. 33_45, 2015.
  7. V. Benjamin, W. Li, T. Holt, and H. Chen. Exploring threats and vulnerabilities in hacker web: Forums, irc and carding shops. In
  8. Intelligence and Security Informatics (ISI), 2015 IEEE International Conference on, pages 85–90. IEEE, 2015.
  9. C. M. Bishop and I. Ulusoy. Object recognition via local patch labelling. In Deterministic and Statistical Methods in Machine Learning, pages 1–21, 2004.
  10. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, pages 92–100, New York, NY, USA, 1998. ACM.
  11. S. Chakrabarti, K. Punera, and M. Subramanyam. Accelerated focused crawling through online relevance feedback. In Proceedings of the 11th international conference on World Wide Web, pages 148–159. ACM, 2002.
  12. ISC2. ‘The 2015 (ISC) 2 Global Information Security Workforce Study.’
  13. Enterprise, Symantec. ‘Internet Security Threat Report 2015.’ (2016).
  14. L. Fourie, S. Pang, T. Kingston, et al. ‘The global cyber security workforce: an ongoing human capital crisis.’ (2014).
  15. K. Evans and F. Reeder. “A Human Capital Crisis in Cybersecurity: Technical Proficiency Matters”. CSIS, 2010.
  16. K. Francis and W. Ginsberg, The Federal Cybersecurity Workforce: Background and Congressional Oversight Issues for the Departments of Defense and Homeland Security.
  17. McAfee Labs Report, March 2018.
  18. M. Jordan and T. Mitchell. ‘Machine learning: Trends, perspectives, and prospects.’ Science 349, no. 6245 (2015): 255-260


We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.