z-logo
Premium
Thesaurus structure, descriptive parameters, and scale
Author(s) -
Losee Robert
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23544
Subject(s) - thesaurus , computer science , information retrieval , set (abstract data type) , controlled vocabulary , vocabulary , natural language processing , linguistics , philosophy , programming language
A thesaurus contains a set of terms or features that may be used to represent recorded information, including prose documents or scientific data sets. The focus of this work is on the basic structural nature of a thesaurus itself, not on how people develop a thesaurus or how a thesaurus effects retrieval performance. Thesauri in this research are automatically developed in a simulation from sets of randomly or exhaustively generated documents. Each thesaurus is generated by the T hesaurus G enerator software from a set of several hundred documents, and thousands of different document sets are used as input to the T hesaurus G enerator, producing thousands of thesauri. Thus, thousands of thesauri are generated for each data point in accompanying graphs. The characteristics of this large number of thesauri are studied so that the relationships between thesaurus parameters can be determined. Some rules governing these relationships are suggested, addressing factors such as tree height and width, number of tree roots in thesauri, and number of terms available for the vocabulary. How these parameters scale as vocabularies grow is addressed. These results apply to various information systems that contain features with hierarchical relationships, including many thesauri and ontologies.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here