z-logo
open-access-imgOpen Access
Reasoning Beyond Length Limits: Improving Accuracy in Long-Context Question Answering with Small-Scale Language Models
Author(s) -
Minyoung Kyoung,
Joon-Ho Lim,
Youngsoo Kim
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3617449
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Long-context question answering (QA) remains a significant challenge, particularly when using small-scale language models (SLLMs) with limited computational capacity. Despite their efficiency, SLLMs often struggle to capture complex reasoning patterns and synthesize information from lengthy documents due to constraints in context size and inference depth. Traditional retrieval-augmented generation (RAG) approaches offer partial relief but typically fall short when precise reasoning across multiple passages is required. In this study, we present a novel, lightweight framework designed to improve long-context QA performance in SLLMs by combining two key strategies: (1) instruction-tuned embedding-based retrieval for extracting semantically aligned context, and (2) a question rephrasing mechanism that decomposes complex queries into stepwise subquestions. This dual strategy enables structured reasoning without the need for additional training or model fine-tuning. Experiments on the LongBench and LongBench v2 benchmarks demonstrate consistent performance improvements, with gains of up to 5% over strong baselines. Our method is model-agnostic, effective across diverse input lengths and task difficulties, and compatible with a wide range of SLLMs, including LLaMA and GLM. The proposed approach offers a practical, generalizable solution for deploying robust QA systems in resource-constrained environments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom