BigSurv20 Big Data Challenge
Jan-Philipp Kolb, Laurent Smeets, Qixiang Fang, Shivam Sharam aka Team TRAVEL ESCAPE. The jury appreciated the amount of technical work accomplished by team Travel Escape in the short time frame of the Big Data Challenge. The extra research was done on the CMAP dataset shows that contestants' scientific view of their problem and provided a nice validation of their approach. For a short presentation go to minute 11 of the recording of the closing session.
A special thanks goes to our Big Data Challenge Jury: Craig Hill (RTI), Jolien Oomens (CBS), Sofie De Broe (CBS, chair), Tom Smith (ONS).
Like BigSurv18, BigSurv20 will have a live data science event sponsored by Statistics Netherlands: the Big Data Challenge. To avoid privacy issues and legal procedures, the Big Data Challenge will not use existing data like BigSurv18. Instead, BigSurv20 teams will be invited to create and evaluate big data generating smart systems. They will need to prove and demonstrate at the conference that the systems are feasible in practice and that the resulting big data are sufficient to solve the challenge that was posed to them.
The Big Data Challenge will run between November 6 and December 4 with pitches of the challenges at the opening event (on Novembern 6) and presentation of the winning solutions at the closing event (on December 4). If you have any questions, simply reach out to us at [email protected].
How Can I/Can We Participate In The Challenge?
Registration for the Big Data challenge is closed. Participants will be contacted by e-mail before the start of the conference.
What Are The Challenges?
CHALLENGE 1 - HOUSEHOLD CONSUMPTION (owner: Eurostat, contact: Konstantinos Giannakouris): Annual information on household expenditure is available from national accounts compiled through a macroeconomic approach. An alternative source for analysing household expenditure is the household budget survey (HBS). Data collection involves a combination of one or more interviews and diaries or logs maintained by households and/or individuals, generally on a daily basis. The challenge aims at making the HBS smart. How can we use smart sensors, geolocation and relevant contextual information combined with data collection modalities such as smart “diary-based” questions and smart payments in order to collect information in a privacy preserving way and provide estimates for specific expenditure codes related to household expenditure. Payments can be done using smartphones, smart cards or conventional methods. Can the expenditure related to a place (e.g. restaurant) be estimated, based on a “profiling” process? Contextual information should be used in order to combine geolocation, smart relevant questions collecting qualitative and quantitative data.
CHALLENGE 2 – TRAVEL MOTIVE (owner: Dutch Ministry of Infrastructure/Statistics Netherlands, contact Ole Mussmann & Barry Schouten): Travel surveys ask respondents to record their travels for a specified time period. Days are divided into tracks and stops. A track is associated with a mode of transportation (bicycle, car, train, etc) and a stop is associated with a purpose (study, work, leisure, etc). The challenge is about deriving the (likely) purpose of a stop based on automated location-time measurements in dedicated smartphone apps. These can then be suggested to respondents for validation and/or used in the estimation of travel statistics. Can one predict the likely purpose of a stop? In addition, can one improve accuracy of predictions based on respondent checks/corrections?
CHALLENGE 3 – LABOUR FORCE SKILLS AND JOB VACANCIES (owner: Statistics Netherlands, contact Fannie Cobben & Martine Mooij): COVID-19 pointed at the urgency of matching those who seek employment and possible emerging job vacancies. The need to match offer an demand in the labour market has long before been recognized and is topical in many countries. The urgency of the COVID crisis has shown that rapid matching should happen at skills rather than profession level as people in certain professions who have been afflicted by the crisis may have skills to offer for labour market activities where the demand for their skills has increased. A labour force skills set for the entire population could be an additional challenge to create and to identify which inflow (recent graduates) and outflow (retired people) can be observed. Can we then, in the labour force skills set, measure the differences between supply and demand on a population level? (Can one go even a step further and measure distances in desirable skills and available skills and identify niches and risks?) The challenge seeks answer to such questions by using a mix of public online data, data donation and perhaps additional surveys
CHALLENGE 4 – CONSEQUENCES OF COVID-19 FOR STUDY CHOICE (owner: Studiekeuze123, contact: Pauline Thoolen): Prospective students visit open days at educational institutions, participate matching procedures and receive information necessary to be able to prepare well for making an informed choice for a follow-up study. Because of Covid-19 and the intelligent lock-down, open days and other information sessions were cancelled. But educational institutions may have succeeded in arranging online alternatives, such as chat-sessions with student and teachers, virtual guided tours, webinars etc. An important aspect of a sound study choice is whether the expectation of the prospective student matches reality. It is even more important that the new student is able to bond with the educational institution and the course itself. Higher educational courses are likely to continue their lessons in the autumn of 2020 online. The challenge is, using existing data or by gathering new data, questions can be answered what has been the effect of Covid-19 on the choice of study. Covid-19 seems to have increased the intake of students in higher education, due to the limitations to travel around during a gap year. How did these students make their study choice, will they switch more frequently or leave education earlier? The challenge is supported by Studiekeuze123, the official study choice website in the Netherlands. They help people step by step to choose the course that suits them best and provide all information necessary to be able to prepare well for making an informed choice for a follow-up study.
How Does My Team Win The Challenge?
There are two sets of requirements that teams must follow to participate in the Challenge – agreeing to and signing the Conditions for Participation documents and adhering to the Challenge Rules. Conditions for participation need to be signed by participating teams at the start of the event. Each participating team will receive the Conditions for Participation documents in advance, so they have a chance to read through and ensure they understand what they are agreeing to. The Challenge Rules will be posted here before October 1, 2020.
Teams will be scored on the following four aspects: Novelty, Feasibility, Scalability, and Usability. A jury made up of at least 3 experts from across disciplines will score each team. Jury members will be announced before October 1, 2020.
The winning team will be awarded a gift card for each participating member and the opportunity to present their solution at the main conference in a plenary session on December 4.
- Craig Hill (Senior Vice President Survey, Computing, and Statistical Sciences at RTI, USA)
- Sofie de Broe (director Center for Big Data statistics at Statistics Netherlands)
- Tom Smith (Managing director Data Science campus at the Office for National Statistics, UK)
- Jolien Oomens (data scientist at Statistics Netherlands)
What Can My Team Win?
Prizes are available for the winning team and the runner up sponsored by Statistics Netherlands. The winning team will be awarded a prize of 800 Euro. The runner up team will get 400 Euro. Both teams will get the opportunity to present their ideas at the main conference in a plenary session on December 4.
Important Information For Challenge Owners and Responding Teams
Big data challenge dates:
- November 6 to November 27
Structure and format:
- A team consists of three or four members
- On November 6, teams are asked to score the four challenges in order of preference. The challenge organizing committee will allocate the challenges and account for the preferences as much as possible
- The BigSurv20 conference will offer all teams online workspaces (most likely through Slack) to meet and exchange ideas
- Challenge owners are available for email questions throughout the challenge. Each Friday, they will join meetings with the teams to answer questions and to comment on progress and intermediate results
- Teams remain owners of their solutions, code and systems they develop during the challenge, but the ideas behind their contributions must be presented and, consequently, become public property
- Data challenge owners cannot be data challenge team members
- Participation in the Big Data Challenge is free of charge
What you need to do:
- If you like to participate as a team or individual, register between September 1, 2020 and October 23, 2020
- October 23: Closing date for registration for Big Data Challenge teams and individual team members
- November 6: Start of Big Data Challenge (pitches of challenge owners and choice and assignment of challenges to teams)
- November 13 and 20: Intermediate meetings of teams with challenge owners
- November 27: Open presentation of solutions to Big Data Challenge jury and challenge owners
- December 4: Plenary presentation of winning solutions at main conference
What if You Have Questions?
Simply reach out to us at [email protected].