
Collecting conversations: three approaches to obtaining user-to-user communications data from virtual environments
Author(s) -
Mika Lehdonvirta,
Vili Lehdonvirta,
Akira Baba
Publication year - 2011
Publication title -
journal of virtual worlds research
Language(s) - English
Resource type - Journals
ISSN - 1941-8477
DOI - 10.4101/jvwr.v3i3.807
Subject(s) - representativeness heuristic , computer science , variety (cybernetics) , avatar , active listening , sampling frame , data collection , phone , resource (disambiguation) , data science , multimedia , world wide web , human–computer interaction , artificial intelligence , psychology , statistics , linguistics , philosophy , mathematics , communication , social psychology , population , demography , computer network , sociology
Transcripts of conversations are a valuable research resource in social sciences andcan be used to make inferences about subjects’ behavior and intentions. Large-scalecommunications records can be coded and analyzed statistically for generalizableresults. Virtual environments are a good place to gather communications records,because they exhibit a wide variety of subject behaviors However, compared totraditional channels such as forums and chat rooms, virtual environments can be morechallenging to obtain data from. In this article, we describe three approaches tocollecting user-to-user communications data from virtual environments: requestingback-end records from the operator of the environment, recruiting “data donors”among the users, and setting up researchers’ own “listening posts”. The datacollection approaches are evaluated empirically in Uncharted Waters Online, aJapanese massively-multiplayer game. Avatar gender ratio is used as a diagnosticvariable to compare the representativeness of the resulting data sets. Both data donorsand listening posts yielded data with a gender ratio that corresponds to the back-endrecords, but the back-end gender ratios differed significantly between two differentservers. We conclude that all three approaches can be statistically viable: the choice ofmethod depends more on desired sampling scope and on practical factors such asresources and timetable; but when defining a sampling frame, it cannot be assumedthat one server is necessarily representative of the whole platform