Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December

Back to program

Getting your estimates on point! Even more survey calibration approaches in the era of probability and nonprobability surveys

Friday 20th November, 11:45 - 13:15

Investigation of nonresponse bias and representativeness in the first cross-national probability based online panel (CRONOS)

Dr Olga Maslovskaya (University of Southampton) - Presenting Author
Dr Peter Lugtig (Utrecht University)
Professor Gabi Durrant (University of Southampton)

Relevance and Research Question: We live in a digital age with high level of use of technologies. Surveys have also started adopting technologies for data collection. There is a move towards online data collection across the world due to falling response rates and pressure to reduce survey costs. Evidence is needed to demonstrate that the online data collection strategy will work and produce reliable data which can be confidently used for policy decisions. No research has been conducted so far to assess nonresponse bias and representativeness in online probability-based panels. This paper is timely and fills this gap in knowledge. This paper aims to explore representativeness and nonresponse bias across waves in the first cross-national online probability-based panel (CRONOS).
Methods and Data: CRONOS panel data were collected on the back of the European Social Survey (ESS) Round 8. After completing the ESS face-to-face interview, respondents in three countries (Estonia, Great Britain and Slovenia) who were 18 years old or older were invited to participate in seven 20-minute online surveys over a time period of twelve months.
We employ R-indicators as well as other indicators to assess representativeness across waves in cross-national context in CRONOS data. The analysis allows comparison of the results over time as well as across three countries.
Results: The results suggest that R-indicators are important tools to assess representativeness across time and between countries and should be used in survey practice to monitor and improve representativeness during data collection for online panels. The results demonstrate that there are differences in representativeness over time in each country and also across three countries used for the analysis. The results also suggest that those with lower levels of education and those who are in the oldest age category and also in the youngest group in Great Britain context contribute more to the lack of representativeness in all three country contexts.
Added value: This paper proposes an innovative approach to the use of R-indicators in the contexts of representativeness. This approach can be adopted by survey practice to monitor and improve representativeness. Recommendations are provided for future probability-based online panels.

Retrospective causal inference via elapsed time-weighted matrix completion, with an evaluation of the effect of the Schengen Area on the labour market of border regions

Dr Jason Poulos (Duke University) - Presenting Author

We propose a strategy of retrospective causal inference in panel data settings where (1) there is a continuous outcome measured before and after a single binary treatment; (2) there exists a group of units exposed to treatment during a subset of periods (switch-treated) and group of units always exposed to treatment (always-treated), but no group that is never exposed to treatment; and (3) the elapsed treatment duration, z, differs across groups.

The potential outcomes under treatment for the switch-treated in the pre-treatment period are missing and we impute these values via nuclear-norm regularized least squares using the observed (i.e, factual) outcomes. The imputed values can be interpreted as the counterfactual outcomes of the switch-treated had they been always-treated. Differencing the counterfactual outcomes from the factual outcomes can be interpreted as the effect of not having assigned treatment to the switch-treated in the pre-treatment period.

A possible complication for our strategy arises when the evolution of the potential outcomes under treatment for the two groups might not be only influenced by calendar time, but also by z. The latter is particularly important if the treatment effect takes time before stabilizing in a new “steady state” equilibrium. We address this problem by weighting the loss function of the matrix completion estimator so that more weight is placed on the loss for factual outcomes with higher values of z.

We apply the proposed strategy to study the impact of the visa policy of the Schengen Area on the labour market of border regions. We first aggregate over 2.2 million individual labour market decisions from the Eurostat Labour Force Survey to the region-level for regions always-treated and switch-treated by the policy during the period of 2004 to 2018. We then estimate the effect of not implementing the policy on the probability of working in any bordering region for switch-treated regions. Preliminary results indicate the share of the labour market working in bordering regions would have been about 0.5% larger had the switch-treated regions adopted the policy prior to 2008.

On informative sampling and informative nonresponse

Dr Abdulhakeem Eideh (Al-Quds University) - Presenting Author

The main purpose of this paper is to consider how to account for the joint effects of informative sampling design and informative nonresponse in analysis of survey data - estimation and prediction, under single-stage sampling. Here, we combine two methodologies used in the model-based survey sampling: the prediction of finite population total, under informative sampling, and full response, see Sverchkov and Pfeffermann (2004), and the prediction of finite population total when the sampling design is noninformative and nonresponse mechanism is nonignorable, see Eideh (2012). For this purpose, we use the response distribution and relationships between moments of the superpopoulation, sample, sample-complement, response, and non-response distributions, see Eideh (2016), for semiparamentric prediction of finite population total, constructing a new test for informative nonresponse based on the product of sampling weights and propensity scores, and introducing new measure of representativeness of response set. The derived semiparametric best linear unbiased predictors of finite population total, use the observation for response set of the study variable, values of auxiliary variables and their population totals, sampling weights, and propensity scores. An interesting outcomes of the present study are: first, the treatment of informative nonresponse as informative sampling; second, most predictors known from model-based survey sampling, can be derived as a special case from the general theory presented in this paper; third, the construction of new test of nonignorable nonresponse; and forth is the proposed new measure of representative response set. For illustration we apply the new theory under different famous population models in model-based survey sampling, namely: homogeneous populations, population with regression structure, ratio population model, linear population model, and multiple linear population model, see Ray and Clark (2012). Finally a new calibration weights are derived taking into account informative sampling design and nonignorable nonresponse.

Double regression with post-stratification (DRP) for analyzing high-dimensional survey data

Mr Eli Ben-Michael (UC Berkeley) - Presenting Author
Professor Avi Feller (UC Berkeley)
Professor Erin Hartman (UCLA)

An important challenge in modern survey research is to find calibrated weights when covariates are high dimensional and especially when deep interactions are important. Traditional approaches, like raking, can perform poorly in this setting, typically balancing a small number of marginal distributions while failing to balance higher-order interactions. In this paper, we propose a class of generalized regression estimation that combines calibrated weights with a (multilevel) outcome model. We first construct an approximate calibration weighting estimator that enforces tight balance constraints for marginal balance and looser constraints for higher-order interactions; we then correct for the bias due to the relaxed constraints via an outcome model. This bias-corrected estimator is driven primarily by the weights where data is plentiful, relying instead on the outcome model when extrapolation is necessary. We also show that the approximate calibration estimator has a dual representation as a multilevel model for survey response. Thus, we view our approach as a generalization of standard Multilevel Regression with Post-Stratification (MRP). Since we also allow for a multilevel model for the weights we refer to our proposal as Double Regression with Post-Stratification (DRP). We assess the performance of this method with extensive simulation studies and apply it to a recent large-scale survey of political attitudes.

Combining probability and non-probability samples using propensity modeling and small area estimation: Choosing the key set of dependent variables

Ms Vicki Pineau (NORC at the Univesity of Chicago) - Presenting Author

Methods to augment probability samples with nonprobability samples is a fast growing research area with the goal of improving the cost efficiency of survey estimation without loss of statistical accuracy. Through case studies and simulations, the authors have evaluated several estimation methods from each of three general approaches: quasi-randomization, superpopulation modeling, and doubly robust (Valliant, 2019). Our evaluations show that these methods produce comparable point estimates, but Propensity Weighting (quasi-randomization) and Small Area Modeling (doubly robust) exhibit better properties in terms of bias reduction and confidence interval coverage. (Ganesh et al., 2017; Yang, et al. 2018, 2019). To take advantage of both methods, we have further investigated a combined Propensity Weighting plus Small Area Modeling method, where Propensity Weighting is used to estimate the probability of being a non-probability sample respondent and Small Area Modeling is used to correct for potential biases in the non-probability sample via models commonly used in small area estimation. For this combined method to work efficiently and accurately across the majority of outcome variables in a survey, the selection of key dependent variables for the Small Area Modeling portion of the approach is critical, as it is impractical to have a model for each response variable. In this presentation, we first review the method of combining Propensity Weighting with Small Area Modeling for estimation with both probability and non-probability samples (Presented at 2018 BigSurv, 2019 ESRA, 2019 JSM). Secondly, we evaluate methods to choose a small set of key dependent variables for the Small Area Modeling portion of the estimation approach. We investigate the properties of the various estimators using several general population studies and a simulation study.