A highlight from Measuring Web Search Behavior

Data Skeptic
|

Automatic TRANSCRIPT

Whenever I look over somebody's shoulder, always with permission and watch what they're doing on the Internet. Invariably, their behaviors a little different than mine. I'm more likely to open information in a new tab rather than the current tab. Someone close to me is constantly selecting things, but not to copy and paste just to highlight, which I find odd. We all use our machines slightly differently and we definitely use our browsers and search engines in slightly different ways each of us. I consider myself an above average Internet user in terms of my technical merits and how much I use the Internet, but you know, more than 50% of people also think they're above average. So who knows? Well, my two guests today probably know. They had access to a large dataset we'll talk about how they got it. A combination of web tracking and later survey data. When they blend these two data sources, a number of insights are available about the different ways different demographic groups use search engines. We'll get into those details and more on today's interview. So my name is Alexandra orman or Sasha, is also the kind of nickname I go by. And I'm a postdoctoral researcher at the university of Zürich in Switzerland. I work with social computing group that's department of informatics, though my background generally is in social sciences. So I'm kind of working my work generally in between social science and computer science and my primary research areas. I would say currently is research on web search on the HCI aspects of that. So this is like algorithmic bias in web search, but also how users interact with it. And another stream of my research is political communication on social media platforms. Broadly defined, so to say. Yes, my name is Michael Omaha. I am an Alfred lander lecturer at the university of bay, specifically working at the institute of media and communication status environment. One of my central projects that they are working great now deals with the impact of the algorithmic systems such as web search engines or recommendation systems on the Holocaust membranes. But I also definitely having a bunch of interests. So which deal order a broadly with information table systems, as well as their potential bias, as well as the implications for the public sphere, especially information behavioral in relation to the politics, but also a historical information. Well, how did your collaboration come together? Well, I was actually doing PhD works now at the institute, but we didn't meet there. We met before, Michael joined, I think, a year or so, right before when I was still doing my PhD in mikula, was already a postdoc. And Amsterdam at that time, if I recall correctly and we met basically at a conference. And we talked about the paper that we could potentially collaborate on based on, I think, what we both were presenting and then we just kind of started collaborating remotely via email. And then Michael joined the institute because there was a position open that was pretty fitting. I would say the main paper I invited you guys on to discuss is the you are how and where you search comparative analysis of web search behavior using web tracking data. So caught my attention right away, but neither of you work at Google who has most of the web tracking data. How do you get started on a project like this? So essentially, we're both at the time working. I was still in Bern, and we were working in that web tracking project more generally, where essentially it's a project joint between Germany and Switzerland. The goal of which was to collect browsing data overall, not focused on web search specifically just browsing data from the users who agreed to participate in it so they basically installed a plugin that would record all of their browsing accept a dedicated block list, which was a list of sensitive websites like we didn't record anything on their visits to banking or to insurances, to adult websites and things like that. So everything else was recorded unless they would press a button and say, don't record me for the next 15 minutes, and they could press it as much as they wanted to. So we had this data collected for different projects that just deals more generally with people's information consumption online and news consumption online, and since we're both more interested in web search, as we do a lot of work on web search bias, so not focusing on the users, but focusing on the search engines themselves. This we saw is like an opportunity to just use this data to look at the other side that we didn't explore before. The user side about how users actually search because previous studies were mostly based on either eye tracking data, so there's a small lab studies where people come, there is a night tracker, and the researchers look at what people look at on web search. Pages which is cool to see in more detail what people do, but also the kind of ecological validity of these studies is a little bit lower kind of not too high. And another stream of research was historically log based studies. For example, when Google or studies that were done in the early 2000s, there were some other search engines, not Google who would researchers from this companies or they would give access to some academic researchers just based on all the transaction logs from one search engine, basically check what people click on, what do they do? And we have this drove of data that allowed us to look at scale and of in real life at multiple search engines at the same time. So bringing together the benefits, so to say of these two previous methodologies. You'd mentioned being able to collect some of that data from the chrome plugin, could you expand a bit on how you got people involved in that study? It was basically the paper is one of the outputs of a lightroom project, which was done by the two universities, the university of Berlin and the team led by silke atom and the university of koblenz Landau in the team led by Michelle Meyer, and the idea was to basically representative sample of German and Swiss citizens, and then also invite them to share their data using the plugin system. So pretty much we collaborated with a market research company, which has samples of online panels coming from the two countries, so Germany and the Swiss arm and pretty much we indeed ask them to recruit a sample of participants. And then each participant was basically asked to express Isaac his consent or the lack of concern to be tracked and naturally quite a number of people didn't agree to participate, which was expected because it's quite first of all quite noble way of researching information behavior. But second, it's also still, I would say white gravis a sensitive way to inclusive way of actually starting people's behavior. But in the end, we actually received quite a number of four participants who are agreed, participate both in the tracking component and then also in the survey component. And basically this group of people was I get groups that we actually worked with since the project and based on whose data we actually load this paper. And what does that dataset turn into at the end of the day? Do you have a list of URLs or something richer? This dataset is richer, so essentially the plugin was developed within this research project, and it records also the snapshots of HTML pages essentially as a user sees them. It's not just the set of URLs you also have all the HTML, even though they're naturally quite messy. If you've worked with a lot of data, it's quite difficult to extract stuff, but with web search specifically, it's easier in a way because we have a limited number of

Coming up next