z-logo
open-access-imgOpen Access
A Domain-Agnostic Framework for Visual Element Detection from High-Level Descriptions
Author(s) -
Maroun Ayli,
Youssef Bakouny,
Nader Jalloul,
Hani Seifeddine,
Rima Kilany
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3613058
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Modern web applications feature dynamic and visually complex interfaces that challenge the reliability and maintainability of traditional automated testing frameworks. These conventional approaches often rely on fixed document object model locators, such as XPath or CSS selectors, which are highly sensitive to minor user interface changes. As a result, tests frequently break, leading to increased maintenance costs and reduced accessibility for non-technical users. To address these challenges, we propose a unified and lightweight framework that combines a domain-specific restricted natural language for intuitive, high-level test specification, with a vision-language module capable of real-time user interface element detection directly from live screenshots. This eliminates the brittleness of fixed locators and enables robust test execution even under frequent user interface changes. To validate our approach, we implemented and deployed a fully functional prototype, demonstrating that the system can operate efficiently on consumer-grade hardware without reliance on cloud services offering a highly cost-effective solution suitable for enterprise use. A key enabler of our framework is the streamlined dataset construction process: we introduce a hybrid data acquisition strategy that combines automated annotation from large-scale web crawling with a scalable human-in-the-loop refinement pipeline, significantly reducing manual effort while maintaining high data quality. Notably, the framework is domain-agnostic and can be readily adapted to other disciplines requiring efficient vision-language fine-tuning, broadening its potential impact beyond web testing. Experimental evaluations show substantial improvements in test robustness, maintenance overhead, and execution efficiency, establishing our framework as a practical and scalable solution for next-generation automated UI testing.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom