Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December
Back to program
Poster session 2* Marked poster recordings will not be available beyond Nov. 20
|Friday 20th November, 11:45 - 13:15|
Improved regression-in-ratio estimators in estimating the population means in simple random sampling with empirical extreme maximum and minimum values in survey statistics
Dr Peter Ogunyinka (Olabisi Onabanjo University, Ago-Iwoye, Nigeria) - Presenting Author
Mr Emmanuel Ologunleko (University of Ibadan, Ibadan, Nigeria)
Professor Ademola Sodipo (University of Ibadan, Ibadan, Nigeria)
Significant improvement had been made to regression-in-ratio estimators in simple random sampling in Survey Statistics. However, such estimators would be over-estimated or under-estimated in the presence of extreme maximum or minimum value in the survey data, respectively. This study had proposed three regression-in-ratio estimators (¯y_1,¯y_2 and ¯y_3 ) that corrected the challenge of over-estimation or under-estimation in the estimates when there are extreme values in the survey data. The bias and the Mean Square Error (MSE) expressions were established. Theoretical comparison confirmed the conditional efficiency of the proposed estimators to the reviewed estimators. Further empirical comparison, with twenty-six simulated populations comprising of high and low extreme maximum values, was used to ascertain the asymptotic sensitivity of the proposed estimators to different magnitudes of extreme values. Two estimators (¯y_1 and ¯y_2 ), out of the three proposed estimators, proved to be more biased than the corresponding reviewed estimators while one estimator (¯y_3 ) proved to be less bias than the corresponding reviewed estimator. The proposed estimators proved to be asymptotically efficient with smaller variances and Mean Square Errors (MSEs) over the reviewed estimators. Finally, the ranking of the percentage relative efficiency showed that the three proposed estimators (¯y_1,¯y_2 and ¯y_3 ) were 120%, 119% and 120%, respectively efficient over the corresponding reviewed estimators. Sample Survey method to test for significant extreme values, in Survey Statistics, before the application of extreme value correction method was suggested for further study.
Saving sex for marriage: Understanding the complexity between sexual abstinence and marital bliss based on social media and survey data
Mr Emmanuel Olamijuwon (University of the Witwatersrand) - Presenting Author
Professor Clifford Odimegwu (University of the Witwatersrand)
Almost a third of females who have been in a relationship have experienced physical or sexual violence. Although several studies from diverse contexts have examined the predictors of intimate partner violence, the pervasiveness of the problem highlights the need for more research to identify possible mechanisms through which the problem persists.
This study combines social media data with the 2018 Nigeria demographic and health survey (NDHS). In our analysis, we use causal loop diagrams (CLDs) to illustrate from young people’s perspectives, the pathways through which saving sex till marriage might contribute to marital bliss. The Facebook group comprised of more than 176,461 young adults mostly from African countries and a total of 3,482 posts and comments related to sexual abstinence and marital bliss between June 1 2018 – May 31 2019. We used survey data from 1,817 couples sampled in the NDHS to validate the CLD and the underlying conceptual thinking.
Young adults believe that sexual abstinence could affect the timing of first marital birth and lead to sexual and marital satisfaction for women. Women who saved sex for marriage are also perceived to have high levels of self-control and discipline which may reduce their likelihood of engaging in marital infidelity and subsequently boost partner’s trust and confidence and reduce the likelihood of partner control. Women who have their partner’s trust and confidence are also unlikely to experience any form of physical, emotional or psychological violence by their husband, all of which contributes to marital satisfaction.
Binomial logistic regression models were fitted to examine associations between couples’ premarital sexual experience and women’s experience of intimate partner violence. We observed that unions in which only the woman is a virgin are not significantly different from unions in which none of the partners was virgins in terms of women’s experience of sexual, emotional and physical violence while adjusting for covariates. On the other hand, women in unions in which only the partner had no premarital sexual experience are less likely to experience emotional (β= -0.62, CI: -1.09; -0.16) or physical (β= -0.93, CI: -1.50; -0.36) violence compared to unions in which both partners have some premarital sexual experience. In a like manner, women in unions in which both partners had no premarital sexual experience are less likely to experience emotional (β= -0.57, CI: -0.99; -0.15) or physical (β= -1.05, CI: -1.54; -0.55) violence compared to unions in which both partners have some premarital sexual experience.
Our study holds important implication for the design of effective interventions for addressing cultural attitudes and women’s experience of intimate partner abuse. This study also contributes to scholarship by showing that using social media data like Facebook offers a new possibility for understanding the complexity of social issues that affect women’s health and well-being. Furthermore, by adopting a system thinking approach, we were able to make explicit assumptions, identify new hypotheses and test the same against survey data.
The ecosystem of technologies for data collection, survey and analysis in the social sciences
Dr Daniela Duca (SAGE Publishing) - Presenting Author
Mrs Katie Metzler (SAGE Publishing)
The growth in digitally borne data, combined with increasingly accessible means of developing software, has resulted in a proliferation of tools to support the research lifecycle, and especially for social data research. To understand the variety of tools and their key uses, we reviewed 418 software applications and packages used by social science researchers. This paper explores who leads the development of these tools, where the supporting communities and investors are, and what challenges users and creators face. Among the 418 tools we found, close to 50% are based in the United States, just over 50% are free to use for researchers. The software tools are either developed by private (50%), big tech (5%), public sector, or individuals as side-projects (45%). Only 10% of the key people involved in designing and developing these tools were women. When it comes to supporting the development of research tools, a growing number of communities, organisations, and consortia offer guidance, training, and some form of financial support. Among these are the Software Sustainability Institute, the discipline-specific Digital Methods Initiative, NUMFocus and Pelagios Commons, and the regional NeCTAR and CESSDA. Where tools have applicability or a primary focus in the business intelligence world, we find top venture capitalists involved, such as Sequoia and Index Ventures. With the exception of Prolific, only a few startups coming out of university incubators target academics specifically. Although large, our list of tools is biased towards English-based survey, social media data, text mining, annotation, and qualitative data analysis tools. When looking at surveying tools, we note that the successful ones like Qualtrics, SurveyMonkey, and TypeForm do the basic job of survey management, enable an easier-to-use interface in designing complex questionnaires, interoperate with games and experimental sites, and help recruit participants. The next most fascinating development, however, will address the effectiveness and efficiency of these surveys. Matt Salganik and his team developed allourideas.org, a free surveying tool that enables researchers to engage their respondents to contribute to the survey development while also collecting answers. A group of computer scientists and social scientists from the University of Washington Madison developed NEXT, a surveying tool powered by an algorithm that adapts the survey sample and questions as more people answer them to get better results faster and without having to rerun the survey. Academics are at the forefront of these projects, developing new methodologies and tools that will eventually be taken up by the private sector. To do that more effectively, they need a community of users, financial support, consortia and other organisations that are able to host and scale up their tools. This will ensure that more researchers can use and build on existing tools, as well as enable the development of sustainable models and the growth of the community of users.
How delivering personalized survey results can change the attitude towards surveys: The case of developer ecosystem personalized infographics
Ms Aleksandra Makeeva (Junior marketing analyst, JetBrains s.r.o.) - Presenting Author
Mr Vladimir Volokhonsky (Analyst, JetBrains s.r.o.)
The number of surveys grows with each passing day. Surveys have become part of everyday life: from small feedback forms all the way through to complex censuses. Now we are beginning to find ourselves in a situation where the audience is tired of surveys and their want to participate decreases dramatically. In these conditions, where the “survey brand” has lost popularity, it’s getting harder to reach our audience and collect appropriate data.
Of course, there are survey panels and “professional” respondents, but these channels have a number of limitations. Also, there are some well-known ways to draw attention to our surveys and reach the target audience, for example, raffles, gifts, guaranteed monetary incentives, sample products. However, the use of such incentives also has limitations and may affect data quality.
Sharing the survey results and findings with respondents is also used to motivate people to participate, especially in academic surveys. Since people like to feel they are making a significant contribution by being involved, it seems to be an effective way of engaging respondents, which simultaneously resolves most of the problems connected with the use of panels and monetary prizes.
How can we make this communication through delivering survey results even more effective and enjoyable for all parties involved? We argue that in these terms surveys are very similar to marketing communication (even if the survey is not marketing itself). Over the last decade there has been a notable trend in personalization among marketers all over the world. There is proof of strong and measurable growth in loyalty towards brands who use personalization in their marketing communications. So why not use personalization for delivering surveys results?
Personalization provides respondents’ with more of an incentive to participate. It is most important, as it helps to leave a positive impression not only about the study but also about the very practice of taking surveys. They no longer think filling out a survey is a waste of time as their responses simply don’t impact the overall results, and instead, they provide more meaningful answers to get relevant personal results. Eventually, it could help to make people more loyal to a “survey brand”.
Not so long ago, we began producing mostly automated personalized surveys results in our company. In this talk we’ll explain the reasons for such a decision in more detail, give an overview of the whole complex infrastructure which helps automate this process, and describe the challenges we faced with this process. Also we’ll compare the effectiveness of personalized results against general infographics and discuss how we could do better.
Predicting race and ethnicity from the sequence of characters in a name
Dr Gaurav Sood (Sunnyvale Labs) - Presenting Author
Mr Suriyan Laohaprapanon (Appeler, Inc.)
To answer questions about racial inequality, we often need a way to infer race and ethnicity from a name. Until now, a bulk of the focus has been on optimally exploiting the last names list provided by the Census Bureau. But there is more information in the first names, especially for African Americans. To estimate the relationship between full names and race, we exploit the Florida voter registration data and the Wikipedia data (Ambekar et al. 2009). In particular, we model the relationship between
the sequence of characters in a name, and race and ethnicity using Long Short Term Memory Networks. Our out of sample (OOS) precision and recall for the full name model estimated on the Florida Voter Registration data is .83 and .84, respectively. This compares to OOS precision and recall of .79 and .81 for the last name only model. As expected, the gains are asymmetric. Recall is considerably better for Asians and Non-Hispanic blacks with the full name---.49 and .43 respectively, compared to .41 and
.21 respectively. The precision with which we predict non-Hispanic Blacks is also considerably higher---it is 9 points higher for the full name model. To illustrate the use of this method, we apply our approach to the campaign finance data to estimate the share of donations made by people of various racial groups. We find that relying on census last name data understates the racial differences in contribution because of its higher error rate. For instance, based on the census last name data, in 2010, about 83.5% of the contributions were made by Whites. But the commensurate number based on the Florida full name model was nearly 3% more, 86.5%. Moving to blacks, we see a similar story. Based on the census last name data, about 10.2% of the contributed money came from blacks. But based on Florida full name model, the number is about 2.3% lower or a massive 22.2% relative change.
Survey of digital assets for project-based studies
Dr Gloria Miller (maxmetrics) - Presenting Author
Projects are a key vehicle for economic and social action, and the source of innovation, research, and organizational change. They can be the size of the gross domestic product of a small nation or larger than the organizations participating in them. In 2019, the world bank assessed 214 projects worth approximately $71 billion (IEG World Bank Project Performance Ratings, 2020) and they argued that private sector investments and leveraging of digital technologies are crucial in boosting economic growth. Projects offer situations and digital assets that can be analyzed through a multitude of theoretical lenses. This research is a survey of digital assets available through a project; specifically, what is available, why it would be a relevant source for research, and who could be the sources of the data.
Projects are a temporary organization with a set of actors working together over a limited, pre-determined period of time for a given outcome. Their existences can be explained and studied using philosophical underpinnings such as the Newtonian understanding of time, space and activity, through the archetypes of project contexts such as project-based organizations, project-supported organizations, or project networks, or through the investigation of the project processes or actors (Geraldi & Söderlund, 2016; Jensen, Thuesen, & Geraldi, 2016; Lundin, 2016). Furthermore, they can be used for project studies such as stakeholder engagement, project performance, and individual and group performance. The majority of project-based research relies on qualitative methods such as literature reviews, surveys, and case studies. Thus, there is a call for new research approaches that investigate the actual or lived experience (Drouin, Müller, & Sankaran, 2013; Geraldi & Söderlund, 2016).
The use of historical data from structured project repositories for cost estimation is well-known in project management. However, little project research uses text mining, machine learning, topic analysis, or social network analysis against those data sources for other research purposes. While email data has been used as an alternative to survey data, even that research has not fully exploited the data for insights into other social intricacies. This research uses a literature review and interviews to compile a survey of digital assets available for research through a project-context, including suggestions as to why the data source would be relevant and counterparties that may be able to provide the data.
Drouin, N., Müller, R., & Sankaran, S. (2013). Novel Approaches to Organizational Project Management Research: Translational and Transformational. Denmark: Copenhagen Business School Press.
Geraldi, J., & Söderlund, J. (2016). Project studies and engaged scholarship: Directions towards contextualized and reflexive research on projects. International Journal of Managing Projects in Business, 9(4), 767-797. doi:10.1108/IJMPB-02-2016-0016
IEG World Bank Project Performance Ratings. (2020). Retrieved from: http://ieg.worldbankgroup.org/ratings
Jensen, A., Thuesen, C., & Geraldi, J. (2016). The projectification of everything: Projects as a human condition. Project Management Journal, 47(3), 21-34.
Lundin, R. A. (2016). Project society: paths and challenges. Project Management Journal, 47(4), 7-15.
Developing a tool suite for managing large scale cross-national web surveys within the framework of the European Open Science Cloud
Dr Gianmaria Bottoni (ESS HQ, City University of London) - Presenting Author
Professor Rory Fitzgerald (ESS HQ, City University of London)
Professor Nicolas Sauger (Sciences Po)
Dr Genevieve Michaud (Sciences Po)
Dr Quentin Agren (Sciences Po)
The European Social Survey, European Research Infrastructure Consortium recently experimented with the worlds’ first input harmonised, probability based cross-national web panel in three countries by recruiting panel members who had taken part in the face-to-face survey. The experiment took place in Estonia, Great Britain and Slovenia (the CRONOS web panel).
A key challenge identified during the CRONOS experiments was the absence of a sample management system that was well suited for use in a multi-country environment and which could also meet data protection requirements. In addition, handling multiple language versions in a harmonised way proved difficult.
This paper will describe work developing a sample management system for a cross-national web panel that meets the needs of different surveys in a complex multi-national environment and which also links seamlessly to a survey platform. The work is being conducted under the Social Science and Humanities Open Science Cloud (SSHOC) H2020 project.
Proposals for content will be outlined with the key fields for sample management presented. In addition, functionality will be discussed such as contact modes (SMS, postal and e-mail) and user accounts. User profiles rights will also be outlined. The paper will discuss how the system links with the commercial software Qualtrics based on the API. The approach taken for managing survey administration across these two tools in multiple countries will be showcased with examples from testing for CRONOS-2 being conducted in 12 countries later in 2020 used to highlight opportunities and limitations of the suite.
The tested software will in due course be made available on the SSH Open Market Marketplace and Workbench for installation by third parties along with related documentation.
Validating measures of employment related information in surveys using linked administrative data
Dr Manfred Antoni (Institute for Employment Research (IAB)) - Presenting Author
Ms Nadine Bachbauer (Institute for Employment Research (IAB))
Dr Corinna Frodermann (Institute for Employment Research (IAB))
Dr Débora Maehler (GESIS - Leibniz-Institute for the Social Sciences)
Mr Knut Wenzig (German Institute for Economic Research (DIW))
Existing validation studies often use data sets that link survey data with other data sources at the level of individual respondents (see, e.g., Antoni et al. 2019, Bollinger 1998). Analyses then examine the extent and direction of measurement errors in survey data, usually by determining deviations from information in other data sources that are known to be less affected by measurement errors for certain variables (e.g., earnings, frequency and dating of events or duration of labour market conditions). The common feature of such analyses is that they are limited to the measurement errors in single or very few variables and to the information collected in only one survey. This, however, affects the generalizability and the potential benefit of such results for questionnaire design or practical fieldwork of future surveys. For example, a limitation to the validation of an individual study makes it more difficult to assess the extent of the measurement error determined in comparison to other surveys. For survey practice, on the other hand, it would be important to learn which different ways of collecting a variable (e.g., question wording, filtering questions, use of preloads as memory anchors) lead to the smallest deviation from the "true" value.
To close this gap, we compare measurement errors for a number of employment related variables across different surveys. In order to keep the data with which we compare the survey information consistent, we limit our analyses to survey datasets that have been linked to the same source of German administrative data, the Integrated Employment Biographies of the Institute for Employment Research (IAB). By doing so, we are able to include survey data of the National Educational Panel Study (NEPS), the Panel Study Labour Market and Social Security (PASS), the Programme for the International Assessment of Adult Competencies-Longitudinal (PIAAC-L) study and the Socio-Economic Panel (SOEP) in our analyses.
One of the methodological challenges is to identify employment related variables that are included in all five data sources and measured in a comparable way. This enables us to harmonise these variables and to compare the following employment related variables: earnings, dating (start/end), duration or frequency of unemployment, job search or participation in active labour market policy measures. These variables are collected very reliably in the administrative data of the IAB, which is why the information contained there can be regarded as ground truth data.
Antoni, M., Bela, D., & Vicari, B. (2019). Validating Earnings in the German National Educational Panel Study. Determinants of Measurement Accuracy of Survey Questions on Earnings. methods, data, analyses, 13(1), 59-90. doi:10.12758/mda.2018.08
Bollinger, C. R. (1998). Measurement Error in the Current Population Survey: A Nonparametric Look. Journal of Labor Economics, 16(3), 576-594. doi:10.1086/209899
The impact of regional holidays on early job-seeker registrations
Dr Gerald Seidel (Federal Employment Agency) - Presenting Author
Early job-seeker registrations, which are enforced by the German Social Code and reflect the number of terminated job contracts, are an important indicator of the labor market. Therefore, I analyse the impact of regional holidays on this indicator. I approximate the number of early job-seeker registrations by the entries of employed persons to the official status ‘job-seeking’ at the Federal Employment Agency.
The results of my RegARIMA analysis indicate that most of the regional holidays significantly (and to a plausible degree) reduce the number of early job-seeker registrations. In contrast, only the Reformation Day turns out to be insignificant for most German Länder (states). I check the latter result for robustness exploiting the variations in regional holiday legislation due to the 500th reformation anniverary. The overall missing effects of the Reformation Day on early jobseeker registrations might (partly) be explained by the fact that it coincides with the last day of a month (October 31st).
Combining survey data and bibliographic records: Advantages and obstacles
Mr Justus Rathmann (University of Zurich) - Presenting Author
Professor Heiko Rauhut (University of Zurich)
Knowledge is codified and communicated through scientific publications. In addition, authorship of publications enables ideas to be attributed to scientists and to generate a reputation for them. This abstract focuses on the question of how best to combine survey and publication data and furthermore how to address the problems in the process of combining these data sources.
Bibliometrics is concerned with the statistical analysis of (scientific) publications, such as books, journal articles or patents. Studies of bibliometrics often have two conflicting goals: if one wants to maximize the information on authors, one is dependent on surveys; if one wants to maximize the information on publications, one is dependent on publication records (e.g. Web of Science or Scopus). Both data sources have different advantages and disadvantages, thus, combining both data sources enables research questions to be answered that were previously considered difficult to address.
In surveys, data on socio-demographics and attitudes of authors can easily be collected. However, the collection of publication data is often subject to errors and does not allow for much more detailed information. Especially very experienced scientists can hardly estimate how many publications or citations they have obtained. Information on article metadata, such as the number of authors or the journal of the article, cannot in most cases be reliably collected for several different articles. In mere surveys, therefore, only superficial bibliometric analyses are feasible.
For studies based only on publication data sets, however, the reverse is true. Although the gender and age of an author can be approximated using their first name or year of first publication, it is not possible to obtain attitudes or more personal data. Instead, these data sets offer a great variety of publication-specific information on, for example, author order, keywords, or citations.
While there are clear benefits from combining these data sources, there are nevertheless some challenges. Besides legal issues, as for instance, data protection legislations, the unambiguous identification of authors is somewhat complicated. In case of authors having the same name, publications can easily be misassigned. However, this can be mitigated by plausibility checks using survey data. Nevertheless, assignment issues remain with authors who have changed their names, for example by marriage, or who have changed the use of their first names.
In addition, the response rate to surveys among scientists is often rather low. Because respondents may be reluctant to give their permission to link their survey data to external data sources, it is important to have a large survey sample.
We invited 150,000 scientists working in Austria, Germany, and Switzerland to participate in the Zurich Survey of Academics, which focuses on work and research environments, norms and practices of authorship, publication strategies, publication bias, scientific misconduct. At the end of the survey the participants were asked whether they agreed to link their survey data with publication data and data provided on their websites. This new data set thus offers a unique combination of innovative survey methods, publication data and publicly available website data.
Big analytics and segmentation- A new framework for single datasets and integrating Survey and big data
Dr Richard Timpone (Ipsos) - Presenting Author
Mr Jonathan Kroening (Ipsos)
As Big Data altered the face of research, the same defining factors of Volume, Velocity and Variety reflect changes in opportunities of analytic data exploration as well. Improvements in algorithms and computing power provide the foundation for platforms that explore masses of models to identify new insights for research goals. We previously introduced this as the concept of Big Analytics (Timpone, Yang and Kroening 2018).
Extending the idea of Big Analytics, we developed our Segmentation Evaluation System (SES) to evaluate over 1000 different solutions from a single dataset to identify those best suited for research goals. This provides new opportunities in both social science research and business settings for deeper understanding of differences among groups of individuals.
The array of models conducted is evaluated on success criteria chosen for specific research problems. These include criteria focused on single dataset evaluation, such as segment cohesion or number of features needed to create a typing tool, but also allow explicitly identifying solutions that both meet research standards for the segmented dataset and how it performs in linking solutions to other survey and Big Data databases.
This new conceptual framework contrasts with traditional approaches that run single solutions (whether with traditional methods like k-means or ML solutions like Self-Organizing Maps). No one method is best suited across the board and this approach provides fit for purpose methods to bridge the art and science of segmentation.
The SES platform includes diverse clustering methods (k-means, hierarchical clustering, ensemble clustering, Self-Organizing Maps, latent class, affinity propagation, non-negative matrix factorization among others) and multiple distance measures (Euclidean, angular distance, generalized distance measures and random forest dissimilarity).
Given the large number of segmentation solutions, the key becomes their evaluation. We demonstrate how different solutions vary as the research goals of a segmentation change. Criteria for success in the framework include factors such as segment differentiation and cohesion, that are central to segmentation, as well as criteria such as profiling variable ownership, segment reproducibility, and practical criteria like the overall fit of survey questions to create a typing tool and how few items can be used to create a robust typing tool.
These latter criteria make this Big Analytic approach superior to traditional segmentations for integrating survey data with other sources including Big Data. We have successfully leveraged SES for online behavioral segmentations that also successfully differentiate on attitudinal items from a survey to ensure actionability. In the other direction, creating need and attitudinal survey-based segments and using criteria on how well they type on hooks to link to other types of databases (from CRM to online media personas) ensure better linkage than building a segmentation and then post hoc trying to predict segments in other databases.
Beyond the theory, we show how this framework has been used in practical cases to identify more actionable solutions in general as well as linking across databases as a clear exemplar of the vision of BigSurv, and the explicit linkage of Survey and Big Data.
From big data to (trusted) smart surveys
Professor Markus Zwick (Institute for Research and Development in Federal Statistics) - Presenting Author
Ms Shari Stehrenberg (Institute for Research and Development in Federal Statistics)
In order to reap the benefits of the data revolution, the European Statistical System (ESS) launched the ESSnet Big Data I and as a follow-up project, the ESSnet Big Data II based on the Scheveningen Memorandum. Both ESSnet projects have their foundation in the Big Data Action Plan and Roadmap (BDAR), which was adopted by the ESS in 2014.
The overall objective of both ESSnet Big Data projects is to further prepare the ESS for integrating big data sources into the production of official statistics. Meanwhile the ESS explored various non-traditional data sources and first results are delivered. Some National Statistical Institutes as well as Eurostat have established a section on their web sides to publish the results of this experimental statistics.
Both ESSnet Big Data projects are data-related with a focus on how, new and non-traditional data sources could be integrated into the production of official statistics. With the Bucharest Memorandum, the ESS achieved to step forward from Big Data to Trusted Smart Statistics (TSS). With the TSS concept, the research interest is wider. Beside the new non traditional data, it is also relevant, how the ESS is able to use the digitalisation to further enhance the production of traditional data like surveys.
With the ESSnet Smart Survey 2020-2021 twelve NSIs started a project to research the opportunities of mobile application to further digitalize surveys. The Federal Statistical Office of Germany coordinates the project.
By the term “smart surveys” we refer to surveys that use smart personal devices, equipped with sensors and mobile applications. The concept of smart surveys goes well beyond the mere use of web-based (online) data collection that essentially transforms the paper questionnaire into an electronic version. Smart surveys involve dynamic and continuous interaction with the respondent and with his/her personal device(s).
The term “trusted smart surveys” refers to an augmentation of smart surveys by technological solutions that collectively increase their degree of trustworthiness and hence acceptance by the citizens. Constituent elements of a trusted smart survey are the strong protection of personal data based on privacy-preserving computation solutions, full transparency and auditability of processing algorithms. (Trusted) smart surveys will increase the attractiveness of participating in a survey, not only because they reduce the time needed to fill out a questionnaire, but also because participants receive individualized incentives.
A second goal of the ESSnet Smart Surveys is to define the specifications for a European Platform supporting the use of shared smart survey solutions and furthermore to assess the usage of applications for European social surveys, such as the Time Use Survey (TUS) or the Household Budget Survey (HBS). Both surveys are considered to be quite burdensome to respondents and to be prone to low recall as well as underreporting errors.
The presentation will give an insight of the ESS concept Trusted Smart Statistics with a special focus on how this concept is used within the ESSnet Smart Surveys.
Predicting basic human values from digital traces on social media
Mr Mikhail Bogdanov (National Research University ) - Presenting Author
There is a number of studies that demonstrate that some human traits and attributes are predictable from digital traces on social media. Probably the most studied phenomenon is personality traits. It has been shown in meta-analyses that personality traits are predictable from digital traces on social media. However, there are only a few studies that attempt to predict human values based on the digital traces. Although the values are more socially constructed than personality traits and, therefore, might reflect in people’s social media profiles. More than that, most studies use digital traces from such globally popular social media platforms as Facebook and Twitter. There is substantively less number of studies that employ data of local social media websites.
In this study, we try to fill these niches by predicting Schwartz’s Basic Human Values using digital traces from Russian social network platform “Vkontakte” (analogue of Facebook). Our analysis was based on the data of nationally representative cohort panel study - “Trajectories in Education and Careers” (TrEC). This study is based on the cohort of eight graders of 2011 who participated in the international study “Trends in Mathematics and Science Study” (TIMSS). Now the average age of these respondents is 23 years. We use the survey data from the recent wave about values measured by Schwartz’s Basic Human Values approach and subscriptions to the public pages and groups on the social media platform “Vkontakte”. The Vkontakte is a leading social media platform in Russia with over 90% of youth registered on it.
We employed different machine learning algorithms (random forest, boosting, regularized regression etc) to predict Basic Human Values from subscriptions to the public pages on this platform and found that values could be predicted from digital traces with similar accuracy as the personality traits.
Using spatial big data to unpack neighbourhood effects on social wellbeing
Professor Chan-Hoong Leong (Singapore University of Social Sciences) - Presenting Author
This study examines how social environment of neighbourhoods shape social trust, immigrant perception, and emotional resilience. Social environment is defined based on the geo-locations of public residential apartments known to have a high concentration of ethnic minorities, immigrant groups, and low housing resale prices (i.e., lower income neighbourhoods). The data measuring spatial clustering of ethnic and migrant communities and residents of lower socio-economic status are obtained from various online platforms managed by the Singapore housing authorities. Using Geographic Information Systems, the spatial data is first transformed into a continuous raster data format, and then overlaid and integrated with a large national survey measuring various aspects of social and individual wellbeing, including social trust, emotional resilience, and support for multiculturalism. The combination of survey and spatial big data demonstrated a complex web of mutual interdependence between individual profile - measured in the survey - and the environmental features. Neighbourhoods with a higher concentration of minority ethnic groups reported lower social trust. On the other hand, neighbourhoods with higher immigrant density demonstrated strong mutual trust, emotional resilience, and support for multicultural policies. The presence of immigrants in the neighbourhood moderated the impact of minority ethnic groups. The findings are discussed in the context of Singapore public housing policies, and the limitations of traditional multilevel research models and the modifiable areal unit problem.
Measuring attitudes and behaviors toward the 2020 Census across time
Dr Yazmin Trejo (U.S. Census Bureau) - Presenting Author
Mrs Jennifer Hunter Childs (U.S. Census Bureau)
As part of the effort for the 2020 Census in the United States, researchers designed a survey called “2020 Census Attitudes Survey.” The goal of this survey is to track public opinion before and during the census data collection, the bulk of which is scheduled to take place March thru July. The survey was conducted monthly from September to December 2019 and will be conducted weekly from January to June 2020 in English and Spanish. This paper reports on measured survey trends including intention to participate in the census, census awareness, knowledge, and potential participation concerns for the general population and across groups (e.g. age, education, sex, race, and ethnicity). The measurement of behaviors and attitudes associated with intention to participate will serve as a baseline to inform the day-to-day decisions for the operations of the communications campaign. The survey uses a combination of both a nationally representative telephone sample of the U.S. population and a nonprobability sample using online panels. This survey serves as a unique opportunity to compare public opinion with actual participation in a mandatory civic activity, which is the act of filling out the census.
Subjective wellbeing and the intention to emigrate: A cross-national analysis of 157 countries, 2006-2017
Dr Tatiana Karabchuk (UAE University) - Presenting Author
Dr Marina Selini Katsaiti (UAE University)
Mrs Karin Johnson (University of California Riverside)
The core of migration literature examines the processes by which people migrate and their experiences during and after migration. However, there is little work that explains what factors influence whether a person intends to emigrate to another country. This study contributes to this gap by investigating to what extent individual subjective wellbeing and the broader social environment affect the likelihood someone wishes to leave their home country. This paper fits hierarchical linear models to Gallup Poll data across 157 countries for the years 2006 to 2017 as a first step. As a second step the paper tests machine learning modeling of the big cross-national data in years. We hypothesize that greater levels of subjective wellbeing will reduce the intention to migrate abroad, but that even when wellbeing is high if the broader social context is restrictive or ineffective, people will have a greater likelihood of wishing to migrate than residents of a country with a more effective social system. Furthermore, we hypothesize that results will show a gradient of intentionality based on the region in which a person lives. These findings have three implications: first, they describe patterns of migration and how they change over time in relation to individual- and country-level factors; second, they broaden our understanding of migration push factors beyond economic hardship or conflict; and, third, we may consider how to modify existing programs in a home country to improve welfare, as well as reception policies in countries where people intend to migrate to facilitate their social, economic, and cultural contribution. An additional methodological contribution of this paper is in its comparative methodology test: traditional econometric models VS machine learning techniques.
Quality guidelines for the acquisition and usage of big data
Dr Alexandcr Kowarik (Statistics Austria) - Presenting Author
Dr Magdalena Six (Statistics Austria)
The increasing knowledge and experience within the European Statistical System (ESS) in the acquisition, processing and use of new data sources provides now a clearer picture on quality demands. These quality based experiences are used by the ESSnet Big Data II to formulate guidelines for NSIs who already use and/or plan to use new data sources for the production of official statistics. Looking at the production process of statistics, the usage of new data sources mostly affects quality aspects of processes related to input and throughput. Taking this into account the guidelines concentrate on the input and the throughput phase of the statistical production process.
With new data sources, the access to as well as the processing of input data makes it necessary to consider new and very source- and data-specific sub-processes. The variety of sub-processes is much broader compared to the use of traditional data sources. What is relevant for one data class and one data access might be of no interest for others. We therefore decided to develop a modular approach for the structure of the quality guidelines, allowing producers to focus on the guidelines relevant for the intended form of data access and the intended data usage taking into account the peculiarities of a specific data class .