
Question Answering Module Leveraging Heterogeneous Datasets
Author(s) -
Abinaya Govindan,
Gyan Ranjan,
Amit Verma
Publication year - 2021
Publication title -
international journal on natural language computing (print)/international journal on natural language computing
Language(s) - English
Resource type - Journals
eISSN - 2319-4111
pISSN - 2278-1307
DOI - 10.5121/ijnlc.2021.10601
Subject(s) - computer science , variety (cybernetics) , pipeline (software) , question answering , search engine indexing , parsing , information retrieval , context (archaeology) , world wide web , data science , artificial intelligence , paleontology , biology , programming language
Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering.