From LLMs to Randomness: Analyzing Program Input Efficacy with Resource and Language Metrics | Zendy

Gavin Black | Zendy; Eric Yocam | Zendy; Varghese Vaidyan | Zendy; Gurcan Comert | Zendy; Yong Wang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

From LLMs to Randomness: Analyzing Program Input Efficacy with Resource and Language Metrics

Author(s) -

Gavin Black,

Eric Yocam,

Varghese Vaidyan,

Gurcan Comert,

Yong Wang

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3571205

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Security-focused program testing typically focuses on crash detection and code coverage while overlooking additional system behaviors that can impact program confidentiality and availability. To address this gap, we propose a statistical framework that combines embedding-based anomaly detection, resource usage metrics, and resource-state distance measures to systematically profile software behaviors beyond traditional coverage-based methods. Leveraging over 5 million labeled samples from 50 Python programs, we evaluate how these independent scoring terms distinguish among different sources of input, including Large Language Model (LLM)-generated inputs, and demonstrate how standard statistical tests (e.g., Kolmogorov–Smirnov and Kendall’s τ) confirm their effectiveness. Our findings show that LLM-generated samples can trigger diverse behaviors but are often less effective at exploring resource usage dynamics (CPU, memory) compared with conventional fuzzing. However, combining LLM outputs with existing techniques broadens behavior coverage and reveals commonalities between commercial LLM outputs. We provide open-source tools for this evaluation framework, demonstrating the potential to refine software testing by integrating behavior metrics into security-testing workflows.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search