Data Driven based Malicious URL Detection using Explainable AI

Saranda Poddar, Deepraj Chowdhury, Ashutosh Dhar Dwivedi*, Raghava Rao Mukkamala

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

Abstract

With the ever-increasing reach of the internet, and its increasing access through various types of devices, the spread of malware, phishing attempts, etc. have steadily been increasing, along with their level of sophistication. Thus it becomes very important to conduct research on different methods to prevent such harmful attacks on systems and users. Using a malicious URL is the common way for hackers to attack a system, thus, to accommodate the variety attack vectors of malicious websites, 21 features were extracted from 651,191 URLs to train the proposed model. A two-stage stacked ensemble learning model, based on gradient boosting methods and random forest, has been trained and tested in the 70:30 ratio of the 651,191 URLs, and an accuracy of 97% has been achieved. Then Explainable AI (XAI) has been used to clearly explain the working of the model, and study the impact of each of the 21 features on the 4 class predictions (benign, defacement, phishing and malware).
OriginalsprogEngelsk
Titel2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
Antal sider7
UdgivelsesstedLos Alamitos, CA
ForlagIEEE
Publikationsdato2023
Sider1266-1272
ISBN (Trykt)9781665494267
ISBN (Elektronisk)9781665494250
DOI
StatusUdgivet - 2023
Begivenhed21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE TrustCom 2022 - Wuhan, Kina
Varighed: 9 dec. 202211 dec. 2022
Konferencens nummer: 21
http://www.ieee-hust-ncc.org/2022/TrustCom/index.html

Konference

Konference21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE TrustCom 2022
Nummer21
Land/OmrådeKina
ByWuhan
Periode09/12/202211/12/2022
Internetadresse
NavnIEEE International Conference on Trust Security and Privacy in Computing and Communications
ISSN2324-898X

Emneord

  • Malicious URL detection
  • Ensemble-learning
  • Random forest
  • Gradient boosting
  • Explainable AI

Citationsformater