Improving Semantic Parsing and Text Generation through Multi-Faceted Data Augmentation | Zendy

Muhammad Saad Amin | Zendy; Luca Anselma | Zendy; Alessandro Mazzei | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Improving Semantic Parsing and Text Generation through Multi-Faceted Data Augmentation

Author(s) -

Muhammad Saad Amin,

Luca Anselma,

Alessandro Mazzei

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3593857

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

The increasing use of large language models has heightened the demand for more extensive datasets in natural language processing (NLP). While various augmentation techniques are being employed to enhance data quantity, many introduce noise or struggle with structurally complex inputs like Discourse Representation Structures (DRS). This study introduces novel data augmentation techniques for both semantic parsing (Text-to-DRS) and text generation (DRS-to-Text), emphasizing enhancements such as named entity augmentation, lexical substitutions utilizing WordNet, and grammatical transformations through changes in tense. The proposed methods led to a considerable expansion of the Parallel Meaning Bank (PMB) dataset, ensuring semantic accuracy and contextual relevance. The augmentation increased both gold and silver instances by a factor of 9, resulting in over 1.3 million new examples. We evaluated four transformer models (byT5, mT5, T5, and mBART) using this augmented dataset. Experimental evaluations revealed substantial improvements across multiple performance metrics. Notably, for semantic parsing, we observed a 17.65% increase in SMATCH (F1) score, and among different evaluation measures for text generation, we have improvements of 14.38% in BLEU score and 6.43% in METEOR score. The observed improvements highlight the effectiveness of our proposed augmentation methodologies in boosting model capabilities for complex neural semantic parsing and generation tasks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research