Description: Metered data, also called “web-tracking data”, is generally collected from a sample of participants who willingly install or configure, into their devices, technologies that track digital traces left when people go online (e.g. URLs visited). Since metered data allows observation of web browsing unobtrusively and without relying on self-reports, it has been proposed as a potentially useful tool to measure online behaviours.
As survey data, metered data is normally used to measure specific constructs, and then make inferences for a target population. To operationalize these constructs into measurements, researchers must specify the pieces of information from participants’ tracked online behaviors that will be used. For instance, to measure the “number of visits to political articles”, a list of visits to URLs considered as containing “political articles” must be defined and added up. While previous research shows that most survey design decisions can affect the validity of survey questions, little is known for metered data.
Using a three-wave online survey conducted in opt-in panels in 2021 in Spain, Portugal and Italy, matched at the individual level with metered data, we first discuss the design decision that should be considered when creating metered data measures (e.g. how much time should a visit take to be considered?). Second, by operationalizing a set of specific constructs (e.g. time spent visiting political articles) using different combinations of design decisions, we estimate the convergent and discriminant validity of an extensive range of measurements (up to 432 per concept). Though the application of random forests of regression trees, finally, we estimate the influence of the different design choices on the validity of measurements.
Our results can help researchers deciding what to consider when operationalizing constructs into metered data measurements, by shedding light on the mechanisms in which different design decisions can affect their validity.