Enhancing Generalization in Phishing URL Detection via a Fine-Tuned BERT-Based Multimodal Approach | Zendy

Yi Wei | Zendy; Masaya Nakayama | Zendy; Yuji Sekiya | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Enhancing Generalization in Phishing URL Detection via a Fine-Tuned BERT-Based Multimodal Approach

Author(s) -

Yi Wei,

Masaya Nakayama,

Yuji Sekiya

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3591843

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Phishing has emerged as a prevalent cyber threat aimed at stealing confidential information through diverse delivery media such as emails, instant messages, and social media posts, often resulting in severe identity theft and financial loss. Traditional approaches typically rely on machine learning models combined with handcrafted, specialized feature engineering. More recent efforts have shifted toward fully automated, featureless strategies using deep learning in an end-to-end manner. However, many existing studies are highly data-dependent and lack practical validation across diverse scenarios, thus leaving their real-world applicability uncertain. In this study, we propose a multimodal approach that enhances the generalization capability of phishing URL detection systems by combining a fine-tuned BERT model for URL processing with auxiliary external features derived from public Internet resources. In addition, we share a publicly available dataset named PhishMail, which contains 8,937 phishing samples collected over an extended period from daily incoming malicious emails. This dataset serves as a valuable testing resource for simulating zero-day detection scenarios. The proposed framework is evaluated through a series of extensive experiments and is thoroughly validated, with results across multiple cross-dataset evaluations demonstrating significant improvements in detection effectiveness and generalization capability.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research