Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December

Back to program

Classifieds: Coding open-ended responses using machine learning

Friday 6th November, 11:45 - 13:15

Using supervised classification for categorizing answers to an open-ended question on panel participation motivation

Ms Anna-Carolina Haensch (GESIS Leibniz Institute for the Social Sciences) - Presenting Author
Dr Bernd Weiss (GESIS Leibniz Institute for the Social Sciences)
Ms Katja Bitz (University of Mannheim)

Our research illustrates how supervised learning models can be used to categorize a considerable number of answers to an open-ended question on survey participation motivation. This question on survey participation motivation has been asked annually in the GESIS Panel since 2014. The GESIS Panel is a German probability-based mixed-mode access panel that has around 5,400 panelists. Over time, around 25,000 answers based on six panel waves have been collected.

Conceptual frameworks (Brueggen 2011) behind survey participation motives often differentiate between extrinsic (e.g., Incentives, need for recognition) and intrinsic (Interest, enjoyment, helping, giving an opinion) motives and also between different foci of motivation: oneself (enjoyment or curiosity) or others (e.g., helping/obligation). However, while there has been research on respondents' own reasons to participate in surveys (Singer 2011), little is known about possible changes in motivation over years of panel participation. The respondents' answers are of high interest since they offer us more insights into panel participation, the underlying motives of panelists in particular, and, hence, offer possibilities for better panel management in the future.

The coding scheme, which is utilized to categorize the survey participation, is based on similar coding schemes in the literature (Brueggen 2011) but has been adapted to fit the needs of the GESIS Panel. The number of answers (25,000) would make coding by hand very time-intensive. We, therefore, opted to code only a much smaller sample of 2,500 answers by two coders and used a supervised classification algorithm (Support Vector Machine) to classify the remaining answers (see also Schonlau 2016 for other examples of semi-automated categorization of open-ended questions). The answers to the open-ended questions in the GESIS Panel are particularly suited for automated classification since they are, for the most part, one-dimensional. This is due to the questionnaire design: instructions were to give only one reason per answer box, and panelists were able to fill three separate answer boxes in total.
Preliminary 10-fold evaluation results show excellent performance; macro measures are all around 0.85 for precision, recall, and the F measure. Also, for most of the categories, micro measures are good. Preliminary results show that significant parts of the panelists are motivated by interest, curiosity, incentives, and the need for recognition, but also by the wish to help the scientific community, political leaders, or even the society in general. We are yet to conduct analyses examining (intra-personal) changes in motivation and the relationship between different survey motivations and panel attrition.

A framework for using machine learning to support qualitative data coding

Ms Amanda Smith (RTI International) - Presenting Author
Mr Peter Baumgartner (RTI International)
Dr Murrey Olmsted (RTI International)
Ms Dawn Ohse (RTI International)

Open-ended survey questions provide qualitative data that are useful for a multitude of reasons. These data can help survey researchers add depth or nuance to quantitative findings and can also be used to better understand phenomena before drafting future measures. However, qualitative data analysis is labor intensive, and researchers often lack the needed time and resources to take full advantage of the benefits these data can provide. As a result, open-ended questions are often underutilized in analysis of survey data, or even omitted entirely from surveys.

To address these issues, we evaluated a method to support qualitative analysis using machine learning to auto-code open-ended survey responses. Specifically, we used human-coded qualitative data from an open-ended question on a 2018 employee survey to train a model that predicted codes to apply to data from the same question on the 2019 survey. The model employed transfer learning and followed BERT machine learning architecture. The model-predicted codes were then adjudicated by human coders well-versed in the code definitions and coding structure. During the adjudication process each coded response was reviewed to ensure model-predicted codes were accurate and complete. Precision (i.e., of the predicted codes, what percent were correct) and recall (i.e., of the codes that should have been applied, what percent were predicted) were calculated for each code to determine the accuracy of coding predictions. We found precision values for the 24 codes in the coding frame ranged from 88% to 98% (95.6% overall), with recall ranging from 56% to 94% (81.5% overall). The majority of codes had precision and recall values above 90%. Results suggest this is a promising approach that can be used to support traditional coding methods and has the potential to alleviate some of the burden associated with qualitative data analysis.

Training deep learning models with active learning framework to classify “other (please specify)“ comments

Ms Xin (Rosalynn) Yang (Westat) - Presenting Author
Ms Ting Yan (Westat)
Mr David Cantor (Westat)

Adding an “other (please specify)” comment box to a close-ended item is widely done by researchers to capture answers provided by respondents when existing response categories do not apply. However, it is often time-consuming to analyze these items when there is a large number of responses collected. The analysis of these items usually involves multiple coders’ efforts of manually reviewing the comments and looking for eligible new categories. In some cases, when respondents provide additional information about an existing category that they have chosen, it is also the coder’s job to code such comments back to the original response categories. Previous research in automating text classification has yet to provide an efficient solution to aid coding tasks of this nature.
This study proposes a framework of classifying these “other (please specify)” comments in a time-efficient manner with natural language processing (nlp) and deep learning. In order to do this, comments are transferred into vectors via text embeddings from neural network models trained with auxiliary nlp tasks. Then, the machine selects a few comments based on a selection mechanism for a human coder to identify whether existing categories are applicable, and/or to create new eligible categories if needed. Under the active learning framework, the machine fits and updates neural network models each time additional labeled comments are provided, and stops model update when additional comments can no longer improve the model performance.
In this study, we will present a use case for 3,000 comments from an item in the interviewer observations instrument that asks what other security measures they see on a housing unit, other than the ones listed as response categories. We will evaluate the machine-labeled results against human codes with multi-label classification metrics and discuss cost implications of the proposed framework. Solutions to a few practical challenges, such as methods for dealing with unbalanced data, will also be discussed.

Measuring the validity of open-ended questions: Application of unsupervised learning methods

Professor Eric Plutzer (Penn State University)
Professor Burt Monroe (Penn State University) - Presenting Author

The rapid adoption of large N surveys with open ended questions has occurred without the same knowledge base that informs the design of forced choice questions. More specifically, we know relatively little about how question wording impacts the quality of answers to open-ended survey questions. In this paper, we introduce a new method that allows investigators to measure convergent and discriminant validity of open-ended questions. We utilize text-as-data methods that estimate the relative frequency of word use across one or more partitions, while accounting for sampling variability. Specifically, we apply an algorithm initially developed to analyze political documents – Monroe, Colaresi and Quinn’s “fightin’ words” metric (2008) – to a corpus of over 50,000 answers to open ended questions. We apply this unsupervised machine learning method to answers from six different open-ended questions in order to generate a similarity metric. This metric is then used, in the conventional way, to assess discriminant and convergent validity. We show how this can help researchers investigate the impact of question wording experiments and, more generally, to assess the validity of open-ended questions as measures of their intended constructs. The method is flexible and can be applied in many survey research settings.

Writerly Respondents: Explaining Nonresponse and Response Length for Open-Ended Questions

Mr Arnold Lau (Pew Research Center) - Presenting Author

Open-ended questions in surveys, where respondents are asked to freely input text, are fertile ground for various machine learning approaches seeking to classify and analyze large volumes of text at once. However, open-ended questions also have higher item nonresponse rates than questions with a fixed set of choices (and if forced to answer, many respondents may provide nonsensical or low-effort responses). There are many reasons to believe intuitively that the subset of people who respond in coherent and detailed ways to an open-ended question are not representative of the larger sample, let alone the population being described, leaving inference based on their responses also unrepresentative. Previous studies have shown that factors such as answering a survey on a mobile device, being younger, being less educated, and being employed may make people less likely to answer an open-ended question. However, some of these factors are correlated with one another, and some may be more important than others. Furthermore, not all open-ended questions are the same. Some are best answered with several sentences while for others a couple of words or even a numeric response will suffice. Some ask the respondent to express their feelings or opinions about a broad issue while others ask the respondent to elaborate on a previous closed-ended answer. For open-ended prompts asking for lengthier responses, respondent characteristics can affect not only whether they respond but also, if they do so, how many words they use. Respondents who use more words, or a greater variety of words, can disproportionately impact the results of machine learning methods such as topic modeling, which becomes a problem when seeking not just to describe a corpus of text but in trying to claim that said text represents a larger population. This study seeks to leverage years of data from Pew Research Center’s American Trends Panel, a nationally representative panel of randomly selected U.S. adults whose panelists participate via self-administered web surveys at least once a month. The study examines how question characteristics, respondent characteristics and device type come together to affect item nonresponse and response length in open-ended survey questions.