z-logo
Premium
Self‐organizing maps for latent semantic analysis of free‐form text in support of public policy analysis
Author(s) -
Till Bernie C.,
Longo Justin,
Dobell A. Rod,
Driessen Peter F.
Publication year - 2013
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.1112
Subject(s) - computer science , cluster analysis , information retrieval , unstructured data , latent semantic analysis , context (archaeology) , topic model , document clustering , natural language processing , artificial intelligence , big data , data mining , paleontology , biology
The huge amount of free‐form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self‐organizing maps, as considered in this context, and show how to apply the self‐organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free‐form commentary with a well‐structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. WIREs Data Mining Knowl Discov 2014, 4:71–86. doi: 10.1002/widm.1112 This article is categorized under: Algorithmic Development > Web Mining Application Areas > Government and Public Sector Technologies > Structure Discovery and Clustering

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here