A computation study on contextual self-organizing maps for subset data integration
Author(s) -
Holly Deann Baiotto
Publication year - 2020
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.31274/etd-20200902-9
Subject(s) - computation , computer science , data science , data mining , algorithm
As sensing and data collection capabilities have dramatically increased in recent years, many areas from medicine to entertainment to engineering have to rethink how products are designed, delivered and maintained. In engineering fields data is everywhere and its use as a decision aid, in the constant stream of tradeoff decisions, is critical to delivering more robust products and services accurately and efficiently. Thus, the need to develop intelligent methods to analyze and visualize large datasets, to enable human understanding, is critical. One method that has been proven effective in this endeavor is the self-organizing map (SOM). However, SOMs require substantial computational resources and time to train, making them impractical for large datasets or datasets that may be added to over time. If this issue could be overcome, this approach could be widely adopted. This thesis studies the concept of using a subset of data to represent the characteristics of a full data set via a SOM. The correlation of a subset and full dataset SOM was studied on two different test cases. The percent difference of node weights was used to compare map representations between the partial and full datasets. A node alignment process was designed and implemented to enable a more accurate comparison of two SOMs. The methodology was evaluated on two test cases. A hundred comparisons of node weights from subset and full datasets maps were completed per test case. Results showed that pairing node weights by row and column designation did not accurately compare two different SOMs. The alignment process was then performed on ten samples of map comparisons per test case. Results of the aligned nodes provided a much more accurate comparison of SOMs from partial and full datasets. The results of this study show that with a good representative subset of data very similar nodal weights can be reached through map training compared to using the full dataset. This
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom