Premium
Analysing a quality‐of‐life survey by using a coclustering model for ordinal data and some dynamic implications
Author(s) -
Selosse Margot,
Jacques Julien,
Biernacki Christophe,
CoussonGélie Florence
Publication year - 2019
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.1111/rssc.12365
Subject(s) - ordinal data , cluster analysis , ordinal scale , dimension (graph theory) , computer science , perspective (graphical) , set (abstract data type) , data set , artificial intelligence , statistics , psychology , mathematics , machine learning , data mining , pure mathematics , programming language
Summary The data set that motivated this work is a psychological survey on women affected by a breast tumour. Patients replied at different stages of their treatment to questionnaires with answers on an ordinal scale. The questions relate to aspects of their life referred to as ‘dimensions’. To assist psychologists in analysing the results, it is useful to highlight the structure of the data set. The clustering method achieves this by creating groups of individuals that are depicted by a representative of the group. From a psychological position, it is also useful to observe how questions may be clustered. The simultaneous clustering of both patients and questions is called ‘coclustering’. However, placing questions in the same group when they are not related to the same dimension does not make sense from a psychological perspective. Therefore, constrained coclustering was performed to prevent questions of different dimensions from being placed in the same column cluster. The evolution of coclusters over time was then investigated. The method uses a constrained latent block model embedding a probability distribution for ordinal data. Parameter estimation relies on a stochastic expectation–maximization algorithm associated with a Gibbs sampler, and the integrated completed likelihood–Bayesian information criterion is used to select the number of coclusters.