z-logo
open-access-imgOpen Access
SECTOR: A Neural Model for Coherent Topic Segmentation and Classification
Author(s) -
Sebastian Arnold,
Rudolf Schneider,
Philippe Cudré-Mauroux,
Felix A. Gers,
Alexander Löser
Publication year - 2019
Publication title -
transactions of the association for computational linguistics
Language(s) - English
Resource type - Journals
ISSN - 2307-387X
DOI - 10.1162/tacl_a_00261
Subject(s) - computer science , segmentation , artificial intelligence , market segmentation , natural language processing , salient , german , set (abstract data type) , section (typography) , f1 score , filter (signal processing) , reading (process) , domain (mathematical analysis) , embedding , pattern recognition (psychology) , linguistics , computer vision , business , mathematical analysis , philosophy , mathematics , marketing , programming language , operating system
When searching for information, a human reader first glances over a document, spots relevant sections, and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates the identification of the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available data set with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR long short-term memory model with Bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 over state-of-the-art CNN classifiers with baseline segmentation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom