Data Driven based Malicious URL Detection using Explainable AI

Saranda Poddar, Deepraj Chowdhury, Ashutosh Dhar Dwivedi*, Raghava Rao Mukkamala

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

With the ever-increasing reach of the internet, and its increasing access through various types of devices, the spread of malware, phishing attempts, etc. have steadily been increasing, along with their level of sophistication. Thus it becomes very important to conduct research on different methods to prevent such harmful attacks on systems and users. Using a malicious URL is the common way for hackers to attack a system, thus, to accommodate the variety attack vectors of malicious websites, 21 features were extracted from 651,191 URLs to train the proposed model. A two-stage stacked ensemble learning model, based on gradient boosting methods and random forest, has been trained and tested in the 70:30 ratio of the 651,191 URLs, and an accuracy of 97% has been achieved. Then Explainable AI (XAI) has been used to clearly explain the working of the model, and study the impact of each of the 21 features on the 4 class predictions (benign, defacement, phishing and malware).
Original languageEnglish
Title of host publication2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)
Number of pages7
Place of PublicationLos Alamitos, CA
PublisherIEEE
Publication date2023
Pages1266-1272
ISBN (Print)9781665494267
ISBN (Electronic)9781665494250
DOIs
Publication statusPublished - 2023
Event21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE TrustCom 2022 - Wuhan, China
Duration: 9 Dec 202211 Dec 2022
Conference number: 21
http://www.ieee-hust-ncc.org/2022/TrustCom/index.html

Conference

Conference21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE TrustCom 2022
Number21
Country/TerritoryChina
CityWuhan
Period09/12/202211/12/2022
Internet address
SeriesIEEE International Conference on Trust Security and Privacy in Computing and Communications
ISSN2324-898X

Keywords

  • Malicious URL detection
  • Ensemble-learning
  • Random forest
  • Gradient boosting
  • Explainable AI

Cite this