Distancia-Covid Survey: Third Preliminary Report

Report Authors

John Palmer, Universitat Pompeu Fabra (UPF)

Ramona Ottow, Universitat Pompeu Fabra (UPF)

Frederic Bartumeus, Centre d’Estudis Avançats de Blanes (CEAB-CSIC) & CREAF

Distancia-Covid Project Principal Investigators

José J. Ramasco (coordinator), Instituto de Física Interdisciplinar y Sistemas Complejos (IFISC-CSIC)

Frederic Bartumeus (coordinator), Centre d’Estudis Avançats de Blanes (CEAB-CSIC) & CREAF

Alvaro López García, Instituto de Física de Cantabria (IFCA-CSIC)

Diego Ramiro Fariñas, Instituto de Economía, Geografía y Demografía (IEDG-CSIC)

Sandro Meloni, Instituto de Física Interdisciplinar y Sistemas Complejos (IFISC-CSIC)

David Alonso, Centre d’Estudis Avançats de Blanes (CEAB-CSIC)

John Palmer (external PI), Universitat Pompeu Fabra (UPF)

26 July 2021

Introduction

This report provides results from the first three waves of the Distancia-Covid Survey launched on 14 May 2020 under the CSIC-funded project “Impacto de las medidas de distanciamiento social sobre la expansión de la epidemia de Covid-19 en España.” It relies on the survey responses received from the launch date through 10 January 2021. This period encompasses three “waves” during which the survey was disseminated through social media and other channels. As described further below, Survey Wave 1 ran from 14 May 2020 through 10 June 2020, Survey Wave 2 ran from 24 July 2020 through 31 August 2020, and Survey Wave 3 ran from 14 December 2020 through 10 January 2021. (Note that the Survey Waves should not be confused with the waves of the pandemic.)

The vast majority of the responses received during Survey Wave 1 came during the time in which Spain still had social distancing measures in effect but was transitioning away from the extensive restrictions on mobility and social contacts that had been put into place with the state of alarm decreed on 14 March 2020. The state of alarm lasted until 21 June 2020 and Spanish territories were moving, at varying rates, through the three phases of the de-escalation process during the Wave 1 period analyzed here. All of the responses in Survey Wave 2 were received when most restrictions had been lifted and there was no longer a state of alarm in effect. In addition, Survey Wave 2 ends on 31 August in order to coincide with the end of the traditional summer vacation period and avoid overlapping with the September transition back to work and school. Survey Wave 3 brackets Spain’s winter holiday period, starting on 14 December, when schools were still open and most people were still working, overlapping all of the school holiday period, and ending on 10 January, after which most schools were open again and people were back at work.

Survey Design

The survey was designed by the Distancia-Covid team in order to better understand changing patterns of human mobility and social contacts in Spain in the context of the Covid-19 pandemic. Many of the questions draw on the approach taken by the POLYMOD study (Mossong et al. 2008; Prem, Cook, and Jit 2017), and were developed in coordination with researchers in other countries working on similar surveys related to social mixing (Del Fava et al. 2020; Feehan and Mahmud 2020; Perrotta et al. 2020).

The survey was distributed in Spanish, Catalan, Galician, Basque, and English using Kobo Toolbox¹. Respondents accessed the survey at https://distancia-covid.csic.es/encuesta and it remains available at present at that URL. Respondents are able to access the survey questions only if they first provide informed consent.

The sampling design was non-random, based entirely on people self-selecting into the respondent pool by connecting to the survey URL online. The survey URL was distributed through press releases, Twitter, Whatsapp, and other channels by members of the project team and institutional press offices, and it appears to have propagated through digital networks reasonably well, reaching all provinces in Spain and a relatively wide segment of the population (see further below).

As of 11 January 2021 there were 10127 valid submissions, 4402 in Survey Wave 1, 2560 in Survey Wave 2, and 3165 in Survey Wave 3. Initial data cleaning was done to improve the interpretability of variable names and generate additional variables calculated from the original ones. Among other things, an imputed usual postal code variable was created based on the two postal code questions in the survey, which asked respondents to list their current and usual postal codes. The imputed variable takes the value of the usual postal code when this has been provided. When it has not been provided it takes the value of the current postal code on the assumption that these are the same in these cases. In addition, province variables were created based on the first two digits of the postal code responses.

Descriptive Statistics

This section provides descriptive statistics of the survey submissions received to date, with distinctions made between the two waves as appropriate. Throughout the text and plots, “NA” is used to denote missing data due to respondents declining to answer certain questions on the survey. It should be noted that these statistics are not necessarily representative of the population given the non-random sampling design. Population estimates are now being made using multilevel regression with poststratification, as described in the next section.

Temporal Distribution

Most survey submissions were made soon after the survey was released and promoted in each wave. Figure 1 shows the submission time pattern on a histogram with the data aggregated in 1-hour bins. As can be seen, there were several sub-waves of submissions within each of the three main Survey Waves. There is also a clear daily cycle of submissions, which drop off at night (as one would expect), which can be seen if one zooms in on the plot by clicking on it.

Figure 1: Histogram of survey start times binned by the hour. Survey Wave 1 is shown in red; Survey Wave 2 is shown in green; Survey Wave 3 is shown in blue.

Geographic Distribution

Based on the imputed usual postal code variable, survey respondents appear to have had their usual places of residence distributed across Spain, with at least one respondent in each province. (This mostly also corresponded to their current places of residence, although 1108 respondents listed different current and usual postal codes, and of these, 690 are in different provinces.)

In absolute terms, most respondents reported their usual places of residence in Madrid or Barcelona, as shown in Figure 2. Relative to the province residential populations (taken from the padron), the greatest sampling fraction is from Girona, followed by Toledo, Bizkaia, Barcelona, and Castellon, as shown in Figure 3.

Figure 2: Number of people sampled in province during each Survey Wave.

Figure 3: Province sampling fractions. Percentage of each province’s residential population sampled in each Survey Wave. Province populations are based on padron.

Age and Gender

The survey respondents also represented a broad cross-section of ages, ranging from 18 (the requirement for participation) up to 92. The median age of respondents was 46 in Survey Wave 1, 47 in Survey Wave 2, and 43 in Survey Wave 3. The middle 50% of ages of respondents was 36 to 57 in Survey Wave 1, 38 to 56 in Survey Wave 2, and 36 to 51 in Survey Wave 3. The survey’s gender question provided binary response options of male or female in order to match the phrasing of Spain’s labor force survey (Encuesta de población activa),² which is being used for poststratification. There were both male and female respondents in nearly every age group in all waves. In Survey Wave 1 62% of respondents identified themselves as female, 35% as male, and 2% declined to respond to the gender question. In Survey Wave 2 65% of respondents identified themselves as female, 34% as male, and 1% declined to respond to the gender question. In Survey Wave 3 67% of respondents identified themselves as female, 32% as male, and 1% declined to respond to the gender question. Figure 4 provides a population pyramid of male and female respondents. Note that Survey Wave 2 had relatively fewer respondents below 38 years old than the other two Survey Waves.

Figure 4: Age and gender distribution of survey respondents.

Education

The survey asked respondents to report their highest level of education, divided into four levels. Submissions were received from people reporting all four levels, with most reporting undergraduate or graduate level. Figure 5 shows reported education levels by gender. A relatively large proportion of the respondents had high education levels.

Figure 5: Distribution of respondent education levels by gender.

Work

The survey also asked respondents to report their “occupation or type of work” as well as “the activity of the establishment in which [they] work.” The distribution of responses to the occupation question is shown in Figure 6, with the labels on the x-axis corresponding to the following response options (abbreviated version for chart in italics, followed by full response option shown in English version to respondents):

military: Military occupations; Armed forces
directors: Directors and managers; Business and Public Administration Management
scientists: Scientific and intellectual technicians and professionals
support techs: Support Technicians and Professionals
admin: Accounting, Administrative, and Other Office Employees; Administrative type employees
caterers: Catering, personal, protection and trade vendor workers
skilled agricultural: Skilled workers in the agricultural, livestock, forestry and fishing sectors; Skilled workers in agriculture and fishing
skilled manufacturers: Artisans and skilled workers in manufacturing and construction industries (except facilities and machinery operators); Artisans and skilled workers in manufacturing, construction, and mining industries, except facilities and machinery operators
machinery operators: Plant and machinery operators and assemblers
unskilled: Elementary occupations; Unskilled workers
other: Other

The categories in Figure 6 are ordered according to their relative prevalence during Wave 1. We observe a relatively large proportion of scientists in all three Survey Waves, likely related to the dissemination strategy of the survey, which mainly relied on academic social networks. We can also see similar distributions of the other occupational categories, with the exception of the “other” category, which dropped from Wave 1 to Wave 2 (in proportion to the others), and the non-response “NA” category, which rose. Presumably this reflects some combination of (1) variation by respondents in the decision of whether to choose “other” or simply not to respond when they did not see a category fitting their occupation, (2) an increase in unemployment and job instability leading respondents to see themselves less attached to a particular occupational category, (3) survey fatigue or loss of motivation to respond to all questions due to the length and changing nature of the pandemic, and (4) changes in the networks through which the survey propagated.

Figure 6: Distribution of respondent occupations by gender.

The distribution of responses to the work activity question is shown in Figure 7, with the labels on the x-axis corresponding to the following response options (abbreviated version for chart in italics, followed by full response option shown in English version to respondents):

Agriculture: Agriculture, forestry and fishing
Food: Food, textile, leather, wood and paper industry
Extractive: Extractive industries, oil refining, chemical, pharmaceutical, rubber and plastics industries, electricity, gas, steam and air conditioning supply, water supply, waste management. Metallurgy
Construction: Construction of machinery, electrical equipment and transport material. Industrial installation and repair
Building: Building
Wholesale: Wholesale and retail trade and its facilities and repairs. Auto repair, hospitality
Transportation: Transportation and storage. Information and communications
Financial: Financial intermediation, insurance, real estate activities, professional, scientific, administrative and other services
Public: Public administration and education
Health: Health activities
Other services: Other services
Other: Other

As with the occupation plot, Figure 7 shows the activity categories on the x-axis in the order of their prevalence in the Wave 1 responses. In this case, we see the highest proportion of responses coming in the Public category, again very likely reflecting the networks through which the survey was distributed. We see somewhat less stability in the relative proportion of other categories across the two waves and we again see a large increase in the non-response “NA” category, which may be explained in the same way as in the occupation case above.

Figure 7: Distribution of respondent work activities by gender.

Country of Birth

Most respondents reported that they were born in Spain (94%). Of those who reported being born outside Spain, the top 5 countries of birth were Argentina (12% of non-natives), Italy (8% of non-natives), Germany (6% of non-natives), the UK (4% of non-natives), and France (6% of non-natives).

Continuing to work

For Survey Wave 1, the survey asked respondents, “Are you continuing to work during the lockdown?” The distribution of responses is summarized in Figure 8 with the labels on the x-axis corresponding to the following response options (abbreviated version for chart in italics, followed by full response option shown in English version to respondents):

no: No
all remote: Yes, I work remotely from home 100%
some remote: Yes, I work remotely part time
all in-person: Yes, I work outside my home

For Survey Waves 2 and 3, the question was modified to reflect the ending of the “lockdown” and also to better account for the variety of working/non-working situations. The question in this wave was, “What is your current employment status?” The distribution of response is summarized in Figure 9 with the labels on the x-axis corresponding to the following response options (abbreviated version for chart in italics, followed by full response option shown in English version to respondents):

all remote: I am employed (full time or part time) and do all my work from home
all in-person: I am employed (full time or part time) and do all of my work outside my home
some remote: I am employed (full time or part time) and do some of my work from home and some outside my home
unemployed: I am unemployed
retired: I am retired or unable to work
student: I am a student and not working

Figure 8: Responses in Survey Wave 1 to question: Are you continuing to work during the lockdown?

Figure 9: Responses in Survey Waves 2 and 3 to question: What is your current employment status?

ICT Resources

Nearly all respondents reported owning or living with someone who owns an information and communication technology (ICT) device, with personal computers being most prevalent, followed by smart phones and then tablets (Figure 10). Respondents mostly reported multiple devices. Most (>60%) of respondents also reported being constantly connected to the internet, and most of the rest reported being connected several times per day (Figure 11).

Figure 10: Proportion of respondents reporting that they or someone they live with owns particular ICT devices. The x-axis shows options from which respondents could select one or more; bars show proportion of respondents who included each option in their response. (In many cases respondents included more than one option.)

Figure 11: Distributiuon of responses to the question about how many times per day respondents connect to the internet.

Trips out of home

As one way of assessing levels of mobility, respondents were asked about the trips they had taken out of their dwellings during the past week. Figure 12 shows the distribution of number of trips reported. The maximum value listed in the responses was 10,000, but this was omitted from the analysis as obviously erroneous. Several people reported 50 or more trips (including two reporting 100) and these were retained, as they reflect plausible behavioral patterns (e.g., delivery work). In Survey Wave 1, the mean and median number of reported trips were both 5. For Survey Wave 2, the mean was 9 and the median was 7. For Survey Wave 3, the mean was 8 and the median was 7. Overall, 80% reported having gone out between 1 and 7 times during Survey Wave 1, 61% reported this during Survey Wave 2, and 65% reported this during Survey Wave 3. The mode of the distribution (most frequent value) in all three Survey Waves was 7 trips, presumably because many people actually tend to go out once per day (even during the confinement period) or because 7 is simply the rough estimate many people use to answer the question. Reports of more than 7 trips accounted for 15% of responses in Survey Wave 1, 37% in Survey Wave 2, and 33% in Survey Wave 3.

Figure 12: Relative proportions of the number of trips out of dwelling during past week.

Respondents were also asked about the farthest distance they had traveled on any of these trips as well as all of their destinations and safety precautions. The distributions of responses are shown in Figures 13, 14, and 15.

In Survey Wave 1, nearly 80% of respondents reported having traveled less than 10 km from their home and nearly 40% reported having traveled less than 1 km. The most frequent destination was stores, followed by public spaces and workplaces. Nearly all respondents reported taking some sort of safety precaution, with masks, social distancing, and handwashing being the most frequent. In Survey Waves 2 and 3, there were proportionally fewer displacements below 1 and 10 km, and proportionally more displacements above 10 km among the respondents. Final destinations in Survey Waves 2 and 3 were more diverse compared to Survey Wave 1, but stores remained the most frequent destination. In terms of safety precautions while traveling during Survey Waves 2 and 3, again masks, social distancing, and handwashing were the most frequent. There was also a decrease in the proportion of respondents reporting use of gloves in Survey Waves 2 and 3 compared to Survey Wave 1.

Figure 13: Relative proportions of farthest distance traveled out of home during past week.

Figure 14: Relative proportions of destinations of trips during past week. The x-axis shows options from which respondents could select one or more; bars show proportion of respondents who included each option in their response. (In many cases respondents included more than one option.)

Figure 15: Relative proportions of precautions employed on trips taken during past week. The x-axis shows options from which respondents could select one or more; bars show proportion of respondents who included each option in their response. (In many cases respondents included more than one option.)

Households

An important source of information about social mixing comes from the sizes and age structures of people’s households (defined here as the group of people with whom they were residing at the time of the survey submission). Figure 16 shows the number of co-residents reported by each respondent by autonomous community and city. This raw data is very noisy due in part to the non-random sampling design and the small number of respondents from some autonomous communities/cities (particularly, for example, Ceuta and Melilla). (Modeled population estimates are provided in Figure 18.)

Figure 16: Distribution of co-residents by autonomous community/city. The y-axis has been truncated at 0.3 to aid visualization.

Out-of-home contacts

Relevant social mixing also occurs outside the home. Respondents were asked to report the number and ages of the people with whom they had contact on the previous day. Following the POLYMOD approach, contacts were defined for respondents as: “EITHER a two-way conversation with three or more words in the physical presence of another person, OR physical skin-to-skin contact (for example a handshake, hug, kiss or contact sports).” The distribution of the reported numbers of contacts is shown in Figure 17. Note the relatively large proportion of respondents reporting 0 contacts in Survey Wave 1 compared to Survey Waves 2 and 3. Although all of the responses in Wave 1 were received at the time of the de-escalation process, this appears to reflect the effect of the extensive restrictions on mobility and social contacts of the previous months. As with all of these descriptive statistics, however, we need to be extremely cautious in making any population inferences directly from the raw data as we know the samples are not representative. (Modeled population estimates are provded below in Figures 20, 21, and 22.)

Figure 17: Distribution of daily out-of-home contacts.

Population Estimates

The project team is now using multilevel regression with poststratification (MRP) (Zhang et al. 2014; Downes et al. 2018; Park, Gelman, and Bafumi 2004) to make population-level estimates from the survey data. Preliminary results are offered here and have already been incorporated into several epidemiological models. We focus here on social mixing patterns because of the obvious relevance to understanding Covid-19 dynamics. We consider in and out of home contacts, distinguishing between co-residents and non-co-residents.

MRP is a statistical method that has the potential to produce reliable population-level estimates from non-representative samples (Downes et al. 2018; Wang et al. 2015; Del Fava et al. 2020). The approach relies on multilevel modeling to first estimate an outcome of interest for different combinations or cells of respondent characteristics. MRP then uses model predictions and poststratification to generate population-level estimates based on knowledge of the relative proportion of each cell in the total population (Downes et al. 2018).

In our case, key outcomes of interest are (1) the number of co-residents in each household, (2) the probability of having had an out-of-home contact during a given 24-hour period, and (3) the number of such contacts in that period. The respondent characteristics used to create the cells are taken from survey questions that provide information also obtained from Spain’s large, representative labor force survey (Encuesta de población activa),³ from which the population proportions needed for poststratification are taken.

We use multilevel negative binomial regression models for the mean of the count response variables — both in-home co-residents and out-of-home contacts — conditional on poststratification cells. We use a multilevel logistic regression model to estimate the probability of having any out-of-home contact (again conditional on these cells).

We assume the random variable representing the number of co-residents or out-of-home contacts for each individual \(i\) follows a negative binomial distribution. We further transform the scale of the expectation into non-negative values with a log link and define the multilevel model for the expected number of co-residents or out-of-home contacts using random intercepts for occupation, province of residence, and response date. (The date intercept is included to account for potential temporal autocorrelation arising from the network structure along which the survey was distributed; model predictions are then made for a hypothetical unobserved date within each wave.)

In the co-resident model for all contact ages pooled, fixed effects are included for gender and five-year age group, whereas in the out-of-home contacts model with pooled contact ages, fixed effects are included for education level and five-year age group. Age-specific contact models include random effects for respondent, education, occupation, respondent’s five-year age group, contact’s 10-year age group, gender, the crossed effects of gender, respondent’s age group, and contact’s age group, as well as for province of residence and response date as discussed above.

In order to model the probability of any out-of-home contact, we first defined a random variable representing the occurrence of any contact for any individual \(i\), following a Bernoulli distribution with probability \(\pi_i\). We then fit a multilevel logistic regression with random intercepts for occupation, province of residence, and date (as in the count models) and fixed effects of education and five-year age group.

We fit all models in R (R Core Team 2020) using Stan and the rstanarm package (Stan Development Team 2015, 2016), with the default priors described in the rstanarm 2.21.1 documentation.⁴

After fitting these models, we made population level estimates by sampling from the posterior predictive distributions according to the corresponding cell size in the labor force survey data. As a comparison, we also modeled the co-resident outcome directly from the labor force survey, using the same count model described above.

Households

Starting with the number of co-residents each respondent reported, we estimate a population-level distribution of co-resident counts for people aged 20 and over. This is shown in Figure 18. As a comparison, Figure 19 shows that same estimates based directly on the Spain’s labor force survey (Encuesta de población activa) for each quarter during 2019 and 2020. Comparing Figures 18 and 19, we see that the Distancia-Covid survey estimates (using MRP) match very closely with the estimates obtained from the much larger more representative labor force survey. Looking at the two waves of the Distancia-Covid survey in Figure 18, we see very little difference in the distribution of co-residents. Looking at the labor force survey estimates in Figure 19, we see that this patterns appears to have been quite stable over the past two years.

Figure 18: Distancia-Covid Survey Waves 1-3: Estimated distribution of co-residents by autonomous community/city. Estimates are limited to population aged 20 and over. X-axis shows number of co-residents and bars indicate estimated proportion of of each population residing with this number of co-residents.

Figure 19: EPA 2019-20: Estimated distribution of co-residents by autonomous community/city. Estimates are limited to population aged 20 and over in order to match the sample used in the Distancia-Covid survey.

Out-of-Home Contacts

For out-of-home contacts we use the survey responses to estimate the distribution and age-structured contact matrix for the population aged 20 and over.

Probability of Any Contact

Since a large number of respondents in Wave 1 reported no out-of-home contacts at all on the previous day, we start by simply estimating the probability of any out-of-home contact. Figure 20 shows the estimated probabilities and the 90% credible intervals for these estimates for each province in each wave. We see a clear increase in all provinces in the probability of having had any out-of-home contact.

Figure 20: Estimated probability of having any out-of-home contact on the previous day for each province. Estimates are limited to population aged 20 and over. They are indicated by the points with 90% credible intervals shown by the lines.

Number of Contacts

Figure 21 shows the estimated distribution of the number of out-of-home contacts for the total population aged 20 and over. The mean number of contacts increases with each Survey Wave, from 3 (Survey Wave 1), to 5 (Survey Wave 2), to 6 (Survey Wave 3). More interesting, however, is how the distribution changes, with greater variability in Survey Waves 1 and 3 than in Survey Wave 2. Survey Wave 3 has the highest variability (the standard deviation is 10, compared with 6 in Survey Waves 1 and 2), with a long upper tail representing people with many contacts. But the distribution of Survey Wave 3 also has a lot of weight on low contact numbers. (Note that the y-axis is in log scale.) Thus, while the mean of this Survey Wave is higher than the others, the median is only 2. In contrast, the median number of contacts estimated in Survey Wave 2 is 4, and for Survey Wave 1 the median is 1. We see similar patterns when we examine these estimated distributions by age and occupation in Figures 22 and 23.

It should be noted that the contact distribution estimates for Survey Wave 3 include some unrealistically high values (the maximum is 1392), which results from the high variability of the responses and the stochastic nature of the model. The plots shown here have the x-axis truncated at 300 to aid visualization, since only a tiny proportion of estimates (0.0003%) exceed this value. Even if we truncate the estimates at this value – or even at the maximum number of out-of-home contacts actually reported on the survey, which is 120 – the mean, median, and standard deviation of the distributions remain the same (when rounded as above; and only slightly different if we include additional digits).

Apart from this technical modeling issue, however, the question of high contact numbers is of great interest. High out-of-home contacts would be consistent in certain occupations, particularly in the service sector or manufactoring jobs involving large numbers of workers on factory floors. This can be seen in the raw survey responses as well as in the model estimates (23 and we are currently exploring this question further.

Figure 21: Distribution of estimated number of daily contacts per person. Estimates are limited to population aged 20 and over.

Figure 22: Distribution of estimated number of daily contacts per person by 5-year age group of the reference person. Estimates are limited to population aged 20 and over. Panels are labelled by the lower age of each group.

Figure 23: Distribution of estimated number of daily contacts per person by occupation. Estimates are limited to population aged 20 and over. Occupations abbreviated as: admin = admin (Accounting, Administrative, and Other Office Employees; Administrative type employees), ctrrs = caterers (Catering, personal, protection and trade vendor workers), drctr = directors (Catering, personal, protection and trade vendor workers), mchno = machinery operators (Plant and machinery operators and assemblers), mltry = military (Military occupations; Armed forces), scnts = scientists (Scientific and intellectual technicians and professionals), sklla = skilled agricultural (Skilled workers in the agricultural, livestock, forestry and fishing sectors; Skilled workers in agriculture and fishing), skllm = skilled manufacturers (Artisans and skilled workers in manufacturing and construction industries (except facilities and machinery operators); Artisans and skilled workers in manufacturing, construction, and mining industries, except facilities and machinery operators), spprt = support techs (Support Technicians and Professionals), unskl = unskilled (Elementary occupations; Unskilled workers)

Total Contacts

For epidemiological models, an age-specific estimate of total contacts (both co-residents and non-co-residents, in-home and out-of-home) is often most useful. We make such estimates for all Survey Waves by combining the EPA co-resident contact estimates with estimates of non-co-resident contacts drawn from the survey. We include here very rough estimates for the age groups not included in the survey (under 18 years old), which are based on proportionally distributing contact ages from the other age groups across these younger ages. This is a reasonable starting point in the absence of other sources of information about these younger age groups, but the estimates should be treated with caution. In particular, for periods when schools were in session, these age groups surely had higher contacts than estimated here, and this may be best approximated using average classroom sizes in schools.

Figure 24 shows the estimated age-structured total contact matrix for the population in each wave. The x-axis here indicates the 5-year age group of reference for the estimate (“self age group”), while the y-axis indicates the 10-year age groups of the estimated contacts for the reference groups. Cell colors indicate the mean number of contacts each of the respective reference groups are estimated to have had with each of the respective contact age groups in some hypothetical day during the wave period. Hovering the cursor over the cells will also show medians and the central 90% of the contact distributions. These matrices are not symmetrical because population sizes vary by age group.

We can observe here high mean number of daily contacts in the diagonal (as is the case also of the estimated household contact matrix). That is, people from one age group tend to have contact with people from the same age group. We also observe increases in estimated mean contacts moving from Survey Waves 1 to Survey Wave 2 and then Survey Wave 3. The differences between Survey Waves 1 and 2 are difficult to distingush from the colors but can be seen from the values shown then the cursor is hovered over each cell. The differences are more evident from the colors in the matrix for Survey Wave 3, with the highest values appearing for contacts between people in their 40s, presumably reflecting a combination of household structure and work activity during this period.

In general, the estimated mean number of daily contacts are rather small. It should also be noted (as with the descriptive statistics) that these estimates are based on cross-sectional data that does not incorporate information about variation in the number of contacts each person has over time. Thus, an estimate of 0.5 could reflect variation within individual contact patterns over time, with a person of age X having contact with a person of age Y on average every 2 days. Alternatively, it could reflect variation at the population level, with some people having one or more contacts on a daily basis and others having no contacts on a daily basis. In fact, these estimates surely reflect variation at both levels, but it is not possible to differentiate between them from the available data.

Figure 24: Estimated mean number of total daily contacts by age group.

Availability for epidemiological models

The Distancia-Covid Group continues to analyze this data to better understand social mixing patterns across ages, occupations and other collected variables, to build-up a network-focused analysis, and to prepare data to feed into a variety of epidemiological models, ranging from agent-based models to classical SEIR compartmental models. The contact estimates shown here are already being used to make a number of epidemiological models more realistic, provide insight into how they may be affected by changing contact networks, and improve predictions about future scenarios. These estimates are available upon request and will soon be placed in an open access repository.

Acknowledgements

Special thanks to Ane Calvo, Jose A. Costoya, and Manuel Pereira, for translating the survey into Basque and Galician, and to Wiebke Weber, Dennis Feehan, Ayesha Mahmud, Emilio Zagheni, and Jorge Cimentada for suggestions and feedback on the survey design.

References

Del Fava, Emanuele, Jorge Cimentada, Daniela Perrotta, André Grow, Francesco Rampazzo, Sofia Gil-Clavel, and Emilio Zagheni. 2020. “The Differential Impact of Physical Distancing Strategies on Social Contacts Relevant for the Spread of COVID-19.” medRxiv. https://doi.org/10.1101/2020.05.15.20102657.

Downes, Marnie, Lyle C Gurrin, Dallas R English, Jane Pirkis, Dianne Currier, Matthew J Spittal, and John B Carlin. 2018. “Multilevel Regression and Poststratification: A Modeling Approach to Estimating Population Quantities From Highly Selected Survey Samples.” American Journal of Epidemiology 187 (8): 1780–90. https://doi.org/10.1093/aje/kwy070.

Feehan, Dennis, and Ayesha Mahmud. 2020. “Quantifying Interpersonal Contact in the United States During the Spread of COVID-19: First Results from the Berkeley Interpersonal Contact Study.” medRxiv. https://doi.org/10.1101/2020.04.13.20064014.

Mossong, Joël, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, et al. 2008. “Social contacts and mixing patterns relevant to the spread of infectious diseases.” Edited by Steven Riley. PLoS Medicine 5 (3): 0381–91. https://doi.org/10.1371/journal.pmed.0050074.

Park, David K., Andrew Gelman, and Joseph Bafumi. 2004. “Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls.” Political Analysis 12 (4): 375–85. https://doi.org/10.1093/pan/mph024.

Perrotta, Daniela, André Grow, Francesco Rampazzo, Jorge Cimentada, Emanuele Del Fava, Sofia Gil-Clavel, and Emilio Zagheni. 2020. “Behaviors and Attitudes in Response to the COVID-19 Pandemic: Insights from a Cross-National Facebook Survey.” medRxiv. https://doi.org/10.1101/2020.05.09.20096388.

Prem, Kiesha, Alex R. Cook, and Mark Jit. 2017. “Projecting social contact matrices in 152 countries using contact surveys and demographic data.” Edited by Betz Halloran. PLoS Computational Biology 13 (9): e1005697. https://doi.org/10.1371/journal.pcbi.1005697.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Stan Development Team. 2015. Stan Modeling Language User’s Guide and Reference Manual, Version 2.10.0. http://mc-stan.org/.

———. 2016. “rstanarm: Bayesian applied regression modeling via Stan.” http://mc-stan.org/.

Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman. 2015. “Forecasting Elections with Non-Representative Polls.” International Journal of Forecasting 31 (3): 980–91.

Zhang, X., J. B. Holt, H. Lu, A. G. Wheaton, E. S. Ford, K. J. Greenlund, and J. B. Croft. 2014. “Multilevel Regression and Poststratification for Small-Area Estimation of Population Health Outcomes: A Case Study of Chronic Obstructive Pulmonary Disease Prevalence Using the Behavioral Risk Factor Surveillance System.” American Journal of Epidemiology 179 (8): 1025–33. https://doi.org/10.1093/aje/kwu018.