Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December

Back to program

I'm sensing there's another APP for that too!

Friday 13th November, 10:00 - 11:30

An end-to-end statistical process with mobile network data for Official Statistics

Dr David Salgado (Statistics Spain (INE)) - Presenting Author
Dr Luis Sanguiao (Statistics Spain (INE))
Dr Sandra Barragán (Statistics Spain (INE))
Dr Bogdan Oancea (University of Bucharest)
Mr Marian Necula (Statistics Romania (INS))

Mobile network data has been proven to provide a rich source of information for statistics in multiple domains such as demography, tourism, urban planning… However, the incorporation of this data source to the routinely production of official statistics is taking many efforts since a diversity of highly entangled issues (access, methodology, IT tools, quality, etc.) must be solved beforehand. To do this, one-shot studies with concrete data sets are not enough and a standard statistical production process must be put in place.

In the context of the European Statistical System (ESS), different initiatives have been launched to prepare both European and national statistical systems to use mobile network data as a source for the production of a number of official statistics. The project ESSnet on Big Data is one of these initiatives embracing the efforts from more than 20 European countries. In this context, we are building a production framework with mobile network data, among other sources.

Access to this data in a sustainable form is still under way with initiatives even beyond the statistical public administration, which has driven us to develop a mobile network event data simulator producing user-defined parametrizable scenarios for a population of individuals carrying mobile devices which connect to a mobile telecommunication network across a chosen geographical territory.

This has allowed us to propose an end-to-end statistical process going from the raw telco data to the final estimates of interest. This process is structured into evolvable modules detaching the strongly technological layer underlying this data source from the necessary statistical analysis producing outputs of interest. This architecture is called the ESS Reference Methodological Framework for Mobile Network Data.

Each of these modules deals with a different aspect of this data source. We apply hidden Markov models for the geolocation of mobile devices, use Wilks’ theorem on this model to disambiguate devices belonging to the same individual, compute aggregate numbers of individuals using probability theory, and model hierarchically the integration of auxiliary information from the telco market and official data to produce final estimates of the number of individuals across different territorial regions in the target population.

The proposal has been applied to synthetic data and completed providing software tools and different quality indicators monitoring the performance of the process. Currently, this exercise has been applied to the estimation of population density and origin-destination matrices.

We present an illustrative example of the execution of these production modules comparing results with the simulated ground truth, thus assessing the performance of each production module.

Using a mobile app when surveying highly mobile populations: Panel attrition, consent and interviewer effects in a survey of refugees

Dr Simon Kühne (Universität Bielefeld) - Presenting Author
Mr Jannes Jacobsen (DIW Berlin)

Panel attrition poses major threats to the survey quality of panel studies. This is especially true for highly mobile populations. Empirically, immigrants move more frequently than native populations. Therefore, panel studies on migration cohorts have to deal with an increased risk of panel attrition. Many features have been introduced to keep panel attrition as low as possible. Based on a random sample of refugees to Germany, a highly mobile population, we investigate whether using a mobile phone application in a panel survey improves address quality and response behavior. Various features, including geo-tracking and collecting email addresses, are tested. Additionally, we investigate respondent and interviewer effects on the consent to download the app and sharing GPS geo-positions. Our findings show that neither geo-tracking nor the provision of email addresses nor the collection of address changes through the app improves address quality substantially. We further show that interviewers play an important role in convincing the respondents to install and use the app, whereas respondent characteristics are largely insignificant. Our findings provide new insights into the usability of mobile phone applications and help determine whether they are a useful tool to decrease panel attrition.

Making time count: A machine learning approach to predict time-use from sensor-based signal data

Dr Talip Kilic (World Bank) - Presenting Author
Dr Seyit Höcük (CentERdata - Tilburg University)
Mr Pradeep Kumar (CentERdata - Tilburg University)
Dr Natalia Kieruj (CentERdata - Tilburg University)
Mr Joris Mulder (CentERdata - Tilburg University)
Dr Alberto Zezza (World Bank)

Collecting objective measures of physical activity has become increasingly more popular, accurate, and affordable due to advances in technology. In particular, accelerometers are being extensively used in physical activity studies. However, applications in low- and middle-income countries are scant. These methods are rarely being used in large scale socio-economic surveys. Currently, these methods appear to offer the best available compromise between precision and affordability for measuring physical activity in large scale socio-economic surveys, especially in their ability to capture individual level activity without relying on proxy respondent reporting.

For this research, we make use of data collected by an ActiGraph sensor-based physical activity tracker on 415 adults (aged 15+) from 215 rural households in two districts in Malawi. This is part of a broader methodological survey experiment on the measurement of agricultural labor. Complementing the activity tracking data, each adult responded to a 24-hour recall time-use module. Furthermore, the weight and height of each subject was measured. In addition, detailed information on labor market outcomes, agricultural production, consumption and expenditures were collected.

In this paper we explore the feasibility of imputing time-use data through predictive modeling using unstructured physical activity tracking output. This is achieved by utilizing a state-of-the-art deep learning framework and leveraging the latest advances in machine learning. Since collecting accurate time-use data is both cost- and supervision-intensive, it will be operationally-relevant for survey practitioners to know whether time allocation to human activities can be predicted from physical activity sensor output as part of a survey set-up.
Recent research has shown that employing Artificial Intelligence (AI), and in particular deep learning, is more useful in predicting human activities from body-worn sensors than traditional methods (Mulder et al. 2020; Wang et al. 2019; Willetts et al. 2018; Yang et al. 2015; Yang 2009). In this paper we rely on a (deep) convolutional neural network (CNN) to train a predictive model on a representative and balanced labelled dataset so that the model can generalize well.
We train the supervised deep learning model on the readily available time-use data. For the model we use a range of attributes, including physical tracking output from the sensors and other individual and household information. We explore whether the model exhibits any heterogeneity by age-group and by gender. Using the model, we derive predicted measures of time allocation to aggregate groups of activities for the rest of the sample on the minute level. This is then compared to the observed measures for this sample on the whole and by gender.
The findings of the paper will be relevant for advancing the agenda on the introduction of sensor-based, objective measurements of physical activity in large scale socio-economic surveys. This type of data collection would be useful for a broad range of application from public health, to food security and poverty, to time-use and labor productivity studies.

The Combination of survey and health app data: Sharing behavior, quality assessment, and validation of survey-based health indicators

Ms Evgenia Kapousouz (University of Illinois at Chicago) - Presenting Author
Mr Christoph Beuthner (gesis)
Mr Florian Keusch (University of Mannheim)
Mr Henning Silber (Gesis)
Mr Bernd Weiß (Gesis)
Mr Timothy Johnson (University of Illinois at Chicago)

Digitalization has opened entirely new possibilities concerning available data sources for empirical social research. This study makes use of the fact that an increasing number of people use smartphone devices for daily communication and many other activities. Against this background, our study explores the linkage of survey and health app data and the benefits of doing so. The three main research questions to be explored are: (1) who is willing to share additional health data, (2) what is the quality of these additional data, and (3) can health app data be successfully used to validate survey-based health indicators? Data were collected in December 2019 within a German non-probability web survey, in which 1,085 respondents who reported having an iPhone were asked whether they were willing to share their health data. Data sharing could be done by first downloading the data from one’s smartphone and then uploading them via a data-sharing platform. This data-sharing method is often referred to as data donation because the respondents give their data freely away in support of the research. A randomized incentive experiment with four different groups (conditional incentive of 0, 2.5, 5, or 10 Euros for uploading the data) was included to test optimal monetary motivation for the data sharing request. In total, we were able to obtain 107 health app data sets, which resulted in a data-sharing rate of 9.9 percent. To address our research questions, we will present an assessment of nonresponse and examine factors which influence data-sharing willingness (e.g., incentive, privacy concerns, the importance of science, frequency of physical activity, technical skills, and age). We will also examine the amount and quality of the data obtained. Most importantly, we will analyze the data to validate survey-based health indicators (e.g., reported body weight and reported physical activity). Our presentation will conclude with a discussion of the opportunities and challenges of this new data source. In addition, we will give practical advice on data linkage involving survey and health app data.

Giving respondents a choice: does it increase sharing of sensor data?

Dr Bella Struminskaya (Utrecht University) - Presenting Author
Dr Peter Lugtig (Utrecht University)
Mr Goran Ilic (Utrecht University)

Collecting picture data using smartphone cameras can provide detailed information about people’s everyday life and behavior. These data can potentially reduce respondent burden since fewer questions have to be asked and improve measurement accuracy for phenomena that are difficult to express in words. However, respondents have to be willing to take pictures and share them with researchers. Previous studies show that hypothetical willingness to use one’s smartphone camera to take pictures is rather high, but when it comes to actually taking and sharing them with researchers, the rates are depend on the content of the picture. To our knowledge, no studies have examined the data quality and the usefulness of picture data to answer substantial research questions. To study the willingness of participants to take pictures in context of a survey, we conducted a randomized experiment in a Dutch online panel of the general population. Smartphone owners (n about 1800) were assigned to one of the three conditions: 1) providing picture data, 2) answering questions, 3) choice whether respondents want to provide pictures or answer questions. Within the survey on living conditions, respondents were asked to take pictures of their garden/balcony, their heating appliance, and their favorite place in the house. The first two pictures aim at estimating the green space available to respondents and type of heating. Respondents collected about 3.8 GB of picture data. First, we assess the willingness rates depending on framing of the request (giving respondents a choice vs. not). Second, drawing on the rich data from the previous panel waves, we study the selectivity that the request (offering a choice vs. not) produces. Third, we assess the picture data quality depending on the condition respondents were in.