
Improving Semantic Parsing and Text Generation through Multi-Faceted Data Augmentation
Author(s) -
Muhammad Saad Amin,
Luca Anselma,
Alessandro Mazzei
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3593857
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The increasing use of large language models has heightened the demand for more extensive datasets in natural language processing (NLP). While various augmentation techniques are being employed to enhance data quantity, many introduce noise or struggle with structurally complex inputs like Discourse Representation Structures (DRS). This study introduces novel data augmentation techniques for both semantic parsing (Text-to-DRS) and text generation (DRS-to-Text), emphasizing enhancements such as named entity augmentation, lexical substitutions utilizing WordNet, and grammatical transformations through changes in tense. The proposed methods led to a considerable expansion of the Parallel Meaning Bank (PMB) dataset, ensuring semantic accuracy and contextual relevance. The augmentation increased both gold and silver instances by a factor of 9, resulting in over 1.3 million new examples. We evaluated four transformer models (byT5, mT5, T5, and mBART) using this augmented dataset. Experimental evaluations revealed substantial improvements across multiple performance metrics. Notably, for semantic parsing, we observed a 17.65% increase in SMATCH (F1) score, and among different evaluation measures for text generation, we have improvements of 14.38% in BLEU score and 6.43% in METEOR score. The observed improvements highlight the effectiveness of our proposed augmentation methodologies in boosting model capabilities for complex neural semantic parsing and generation tasks.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom