Domain based Chunking | Zendy

Nilamadhaba Mohapatra | Zendy; Namrata Sarraf | Zendy; Swapna sarit Sahu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Domain based Chunking

Author(s) -

Nilamadhaba Mohapatra,

Namrata Sarraf,

Swapna sarit Sahu

Publication year - 2021

Publication title -

international journal on natural language computing (print)/international journal on natural language computing

Language(s) - English

Resource type - Journals

eISSN - 2319-4111

pISSN - 2278-1307

DOI - 10.5121/ijnlc.2021.10401

Subject(s) - computer science , chunking (psychology) , artificial intelligence , natural language processing , transformer , sentence , annotation , named entity recognition , grammar , security token , task (project management) , linguistics , philosophy , physics , management , quantum mechanics , voltage , economics , computer security

Chunking means splitting the sentences into tokens and then grouping them in a meaningful way. When it comes to high-performance chunking systems, transformer models have proved to be the state of the art benchmarks. To perform chunking as a task it requires a large-scale high quality annotated corpus where each token is attached with a particular tag similar as that of Named Entity Recognition Tasks. Later these tags are used in conjunction with pointer frameworks to find the final chunk. To solve this for a specific domain problem, it becomes a highly costly affair in terms of time and resources to manually annotate and produce a large-high-quality training set. When the domain is specific and diverse, then cold starting becomes even more difficult because of the expected large number of manually annotated queries to cover all aspects. To overcome the problem, we applied a grammar-based text generation mechanism where instead of annotating a sentence we annotate using grammar templates. We defined various templates corresponding to different grammar rules. To create a sentence we used these templates along with the rules where symbol or terminal values were chosen from the domain data catalog. It helped us to create a large number of annotated queries. These annotated queries were used for training the machine learning model using an ensemble transformer-based deep neural network model [24.] We found that grammar-based annotation was useful to solve domain-based chunks in input query sentences without any manual annotation where it was found to achieve a classification F1 score of 96.97% in classifying the tokens for the out of template queries.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore