Intrusion Detection Model Using Machine Learning Algorithm On Big Data Environment


Author: M Ruthvik Mohan

ABSTRACT:
An Intrusion Detection Model (IDM) using a Machine Learning (ML) algorithm on a Big Data environment is a method for identifying and preventing unauthorized access to a computer system. The IDM utilizes a ML algorithm to analyze large sets of data, or “Big Data,” in order to identify patterns and anomalies that may indicate a security breach. These patterns and anomalies are then used to create a model that can detect intrusions in real-time. The use of a Big Data environment allows for the IDM to process and analyze large amounts of data quickly and accurately, making it more effective at detecting and preventing intrusions. This approach can be used in a variety of industries, including finance, healthcare, and government, to improve the security of sensitive informationis compared to the Chi Logistic Regression classifier and the results show that the Spark Chi SVM model has a high performance, reduces training time and is efficient for handling Big Data.
EXISTING SYSTEM:
There are several existing systems that utilize an Intrusion Detection Model (IDM) using a Machine Learning (ML) algorithm on a Big Data environment. One example is the Apache Mahout project, which is an open-source framework for creating scalable ML algorithms. It includes an implementation of the Random Forest algorithm, which can be used for intrusion detection in a Big Data environment. Another example is the Hadoop-based Distributed File System (HDFS) and MapReduce framework, which can be used to store and process large amounts of data. Overall, these existing systems demonstrate the capability of IDM using ML algorithms on big data environment to effectively detect and prevent intrusions by analyzing large amounts of data..
PROPOSED METHOD:
The proposed model in this study is the Spark Chi SVM model, as illustrated in Figure 1. The following steps are included in this model:
1. Loading the dataset and converting it into Resilient Distributed Datasets (RDD) and Data Frame in Apache Spark.
2. Data preprocessing to prepare the data for analysis.
3. Feature selection to identify relevant features for the analysis.
4. Training the Spark Chi SVM model using the training dataset.
5. Testing and evaluating the model using the KDD dataset.

DATASET DESCRIPTION:
The proposed model in this study is evaluated using the KDD99 dataset. The dataset includes 494,021 instances. The KDD99 dataset includes 41 attributes, with a “class” attribute that indicates whether a given instance is normal or an attack. Table 1 provides a description of the KDD99 dataset attributes and class labels.




Intrusion Detection Model Using Machine Learning Algorithm On Big Data Environment

FREE DOWNLOAD


@ engpaper.com published paper
PUBLICATION PROCEDURE WITH US ENGPAPER.COM
ENGPAPER.COM PUBLISHED PAPERS