A new intelligent system for malicious URLs detection

https://doi.org/10.55214/25768484.v9i2.4650

Authors

  • Hayder Raeed Hekmat AL-Shawk Dept. of Information Systems, Faculty of Computer and Information, Mansoura University, Egypt.
  • Ibrahim M. El-Hasnony Dept. of Information Systems, Faculty of Computer and Information, Mansoura University, Egypt.
  • Hazem M. El-Bakry Dept. of Information Systems, Faculty of Computer and Information, Mansoura University, Egypt.

In cybersecurity, recognizing and mitigating malicious URLs represents paramount challenges due to their various cyber threats, including phishing, malware distribution, and fraud. This paper aims to create a URL detection system that employs machine learning and data mining methods. The proposed system comprises several steps: data acquisition, preprocessing, feature selection, URL tokenization, and classification. First, we acquire a recent dataset containing both malicious URLs and normal ones and 87 numerical features. The features are preprocessed by scaling them using a standard scaler to prevent the model from being biased towards certain features. Furthermore, Fick's Law metaheuristic optimization algorithm (FLA) is used for feature selection, utilizing the Light Gradient Boosting Machines (LGBM) accuracy as a fitness function for the algorithm, resulting in a 50% feature reduction. The URLs are tokenized using Bidirectional Encoder Representations from Transformers (BERT) and converted to a feature vector. The combined BERT feature vector and FLA-selected features are input for the Categorical Boosting (CatBoost) classifier, achieving 96.59% accuracy, 96.75% precision, 96.41% recall, and 96.58% F1-score. The system surpasses all other machine learning and deep learning methodologies in its validation. Additionally, the proposed system outperformed the results of previous studies that utilized the same dataset. The proposed system is an effective and efficient approach for detecting malicious URLs, safeguarding digital assets, and ensuring the integrity of online environments.

Section

How to Cite

AL-Shawk, H. R. H. ., El-Hasnony, I. M. ., & El-Bakry, H. M. . (2025). A new intelligent system for malicious URLs detection. Edelweiss Applied Science and Technology, 9(2), 1374–1390. https://doi.org/10.55214/25768484.v9i2.4650

Downloads

Download data is not yet available.

Dimension Badge

Download

Downloads

Issue

Section

Articles

Published

2025-02-14