
Automated Testing for Service-Oriented Architecture: Leveraging Large Language Models for Enhanced Service Composition
Author(s) -
Mahsun Altin,
Behcet Mutlu,
Deniz Kilinc,
Altan Cakir
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3571994
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This article explores the application of Large Language Models (LLMs), including proprietary models such as OpenAI’s ChatGPT 4o and ChatGPT 4o-mini, Anthropic’s Claude 3.5 Sonnet and Claude 3.7 Sonnet, and Google’s Gemini 1.5 Pro, Gemini 2.0 Flash, and Gemini 2.0 Flash-Lite, as well as open-source alternatives including Qwen2.5-14B-Instruct-1M, and commercially accessed models such as DeepSeek R1 and DeepSeek V3, which were tested via APIs despite having open-source variants, to automate validation and verification in Application Programming Interface (API) testing within a Service-Oriented Architecture (SOA). Our system compares internal responses from the Enuygun Web Server against third-party API outputs in both JSON and XML formats, validating critical parameters such as flight prices, baggage allowances, and seat availability. We generated 100 diverse test scenarios across varying complexities (1–4 flight results) by randomly altering request and response parameters. Experimental results show that Google Gemini 2.0 Flash achieved high accuracy (up to 99.98%) with the lowest completion time (85.34 seconds), while Qwen2.5-14B-Instruct-1M exhibited limited capability in processing complex formats. Models such as OpenAI’s ChatGPT and Anthropic’s Claude Sonnet models also demonstrated strong performance in single-flight validation scenarios, making them suitable for low-latency, high-precision tasks. Our findings indicate that some open-source models can offer promising cost-effective alternatives, though performance significantly varies. This integration of LLMs reduced manual workload, improved test scalability, and enabled real-time validation across large-scale datasets. As LLM technologies mature, we anticipate further advances in automation, accuracy, and efficiency in software validation systems.
Empowering knowledge with every search
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom