Operational Benchmarking of ML Models for Fraud Detection: A Comparative Study on AWS EC2 and ECS | Zendy

Lucas H. Benevides E Braga | Zendy; Rodrigo Marins Piaba | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Operational Benchmarking of ML Models for Fraud Detection: A Comparative Study on AWS EC2 and ECS

Author(s) -

Lucas H. Benevides E Braga,

Rodrigo Marins Piaba

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3620247

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Credit card fraud detection increasingly depends not just on model accuracy but on real-time deployment performance and scalability. While many studies benchmark algorithmic improvements in classification precision, this work shifts focus toward the operational deployment layer, a critical yet underexplored aspect in machine learning (ML) systems engineering. We present a comparative benchmark of traditional ML models (Logistic Regression, XGBoost, LightGBM), deep learning architectures (CNN, LSTM, Transformers), and stacking ensembles across three AWS-based deployment configurations: EC2 virtual machines, single-task ECS Fargate containers, and autoscaled ECS Fargate with load balancing. Rather than measuring predictive accuracy, our experiments systematically evaluate inference latency, throughput (requests per second), reliability, and scalability under load, using containerized Flask APIs exposed via REST endpoints. Load tests were executed with varying user concurrency to simulate production behavior. Results show that while traditional ML models offer the lowest latencies, deep learning and stacked models demonstrate superior scalability under ECS autoscaling, achieving latency reductions up to 65% and throughput improvements over 100%. We also analyze runtime (Python 3.9 vs. 3.12), and estimate AWS cost-performance trade-offs. This work provides a reproducible framework for evaluating model-serving infrastructure and offers practical deployment guidance for practitioners building real-time fraud detection systems on AWS.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research