z-logo
open-access-imgOpen Access
Domain based Chunking
Author(s) -
Nilamadhaba Mohapatra,
Namrata Sarraf,
Swapna sarit Sahu
Publication year - 2021
Publication title -
international journal on natural language computing (print)/international journal on natural language computing
Language(s) - English
Resource type - Journals
eISSN - 2319-4111
pISSN - 2278-1307
DOI - 10.5121/ijnlc.2021.10401
Subject(s) - computer science , chunking (psychology) , artificial intelligence , natural language processing , transformer , sentence , annotation , named entity recognition , grammar , security token , task (project management) , linguistics , philosophy , physics , management , quantum mechanics , voltage , economics , computer security
Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here