z-logo
open-access-imgOpen Access
LLMYOLOEdge: An Edge-IoT Aware Novel Framework for Integration of YOLO with Localized Quantized Large Language Models
Author(s) -
Partha Pratim Ray,
Mohan Pratap Pradhan,
Shuai Li
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3613423
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Deploying multimodal AI on severely resource-constrained hardware remains challenging due to tight latency, memory, and privacy requirements. We present LLMYOLOEdge, a fully on-device framework that integrates YOLO object detection (YOLOv8/11/12 variants) with quantized instruction-tuned LLMs—Qwen2.5:0.5b-instruct for reliable image-reference extraction and Granite3-MoE:1b-instruct for concise textual summarization—served locally via Ollama and orchestrated by a lightweight Flask API on Raspberry Pi 4B as a resource constrained egde-Internet of Things (IoT) setup. The system employs grammar-constrained, multi-shot prompting to guarantee structured JSON outputs and instruments every stage with fine-grained metrics for rigorous statistical analysis. On a Raspberry Pi 4B, LLMYOLOEdge sustains real-time operation while preserving data locality. Across extensive trials, we observe significant performance differences among YOLO backbones – yolo11n.pt yields the shortest inference latency (1013.241 ms), whereas yolo12s.pt minimizes extractor prompt-evaluation time (19.242 × 10 9 ns). The multi-shot extractor achieves perfect URL/path extraction accuracy (100%), outperforming a baseline Granite3-MoE approach (88.89%). One-way ANOVA with Tukey’s HSD and pairwise t -tests confirm these effects at p < 0.001, establishing both efficiency and accuracy gains under strict resource budgets. Our contributions are threefold such as (i) an integrated, privacy-preserving multimodal pipeline that runs entirely on commodity edge hardware; (ii) a principled prompting strategy that removes brittle parsing failure modes; and (iii) a reproducible evaluation suite reporting per-stage latencies, throughput, and correctness. Code and assets are available at https://github.com/ParthaPRay/yolo_ollama_raspberrypi, offering a practical template for edge-native multimodal AI deployments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom