For over two decades, third-party cookies have been the cornerstone in a seedy, shadowy, multi-billion dollar advertising surveillance industry on the web. Winding down tracking cooking and other staunch third-party identifiers is long overdue. But, as the foundations are whirlwinds beneath the ad industry, giant players are determined to weather the storm.
Google is leading the charge to replace third-party cookies with a new technologies suite (aka Privacy Sandbox) to target ads on the internet. Some of the tech giant’s proposals indicate that it has not learned the lessons from the ongoing backlash to the surveillance business model (1).
And as promised, today we will talk about one of those proposals, FloC, Federated Learning of Cohorts, which is perhaps the most ambitious, still potentially the most harmful.
Google aims to introduce a new way to make the browser do what the third-party profiling trackers used to do themselves with FLoC. Here, it would boil down users’ recent browsing activity into a behavioral label and then sharing it with advertisers and websites. While the technology can avoid the privacy risks that came with third-party cookies, it would create new ones in the process. It may also intensify several of the worst non-privacy issues with behavioral ads, including predatory and discriminated targeting.
Google’s pitch is that a world of the internet with FLoC would be better than the world we have today in terms of privacy. However, their framing is based on a false premise that we have to pick between ‘old tracking and ‘new tracking.’ It is neither.
Instead of reinventing the tracking wheel, we need to imagine a better world with no targeted ads myriad issues.
We stand at a crossroads. Behind us, there is an era of third-party cookies, perhaps the biggest mistake of the internet. Ahead of us are two possible futures.
In one, users decide what info they share with each site they interact with. It is unnecessary to worry about their past browsing, held against them, or leveraged to manipulate them when they open the next tab.
In another, each user’s behavior follows them from one site to another as a label, inscrutable at a glimpse but rich with meaning to those who grasp. Their past history, distilled into a few bits, would be ‘democratized’ and shared with other nameless parties that participate in the service of each web page. Users would begin each interaction with a confession: here is what I have been up to this week. Please treat me accordingly.
The internet must reject FLoC and other misguided attempts to ‘reinvent’ behavioral targeting. We urge Google to abandon FLoC and redirect its effort towards creating a genuinely user-friendly internet (2).
What is FLoC?
In 2019, Google presented the Privacy Sandbox (3), its vision for the future of privacy on the internet. The project focuses on a suite of cookieless protocols to satisfy the numerous use cases of third-party cookies to advertisers.
Google offered its proposals to the W3C, the standards-making body for the internet (4). These proposals have primarily been discussed in the Web Advertising Business Group, a body of ad-tech vendors.
Google designed FLoC to help advertisers perform behavioral targeting without third-party cookies. A FLoC enabled browser would collect users’ browsing habit information and then use it to assign its users to a group or ‘cohort.’ Users with ‘similar’ browsing habits would be grouped into the same cohort. Each user’s browser would share a cohort ID, indicating which group they belong to with advertisers and websites. As per the proposal, at least a few thousand users would belong to each cohort.
Your FLoC ID would be like a concise summary of your recent activity on the internet in a simpler term.
In its proof of concept (5), Google used the sites’ domains that each user visited as the basis for grouping them. It then used SimHash, an algorithm to create the cohorts. One can compute SimHash locally on each users’ device. Hence there is no need for a central server to collect behavioral data. But a central admin can have a role in enforcing privacy guarantees.
Google also proposed a central actor to count the number of users assigned to each group to prevent any group from being too small or too identifying. If any groups are too small, they can be combined with other similar groups until there are enough users represented in each one.
FLoC can also perform clustering based on page content or URLs instead of domains. Instead of SimHash, it can also use a federated learning-based system, as the name implies, to generate the cohorts. There is also no clarity about how many possible cohorts there will be.
One thing we know that Google used 8-bit cohort identifiers in its experiment. It means that there were only 256 possible groups. In practice, however, the numbers can be much higher. The document proposes (6) a 16-bit group ID compromising four hexadecimal characters – the more cohorts there are, the more specific they would be. Longer cohort IDs would mean that advertisers would learn more about each user’s interests and have an easier time fingerprinting them.
One thing that Google specified is duration. FLoC would recalculate its cohorts weekly, each time using the data from the past week’s browsing. It makes FLoC cohorts less useful as long-term identifiers. However, it also makes them more potent to calculate how users behave over time.
New Privacy Issues
FLoC is a part of the technologies suite to bring targeted ads in a privacy-preserving feature. However, its core design involves sharing new info with advertisers. There is no surprise that it also creates new privacy risks.
The first problem is fingerprinting. Browser fingerprinting is the practice of collecting several discrete information pieces from a user’s browser to create a stable and unique identifier of that browser. It means that the more ways your browser acts or looks different from others, the easier it gets to fingerprint it.
Even though Google has promised that most FloC groups would comprise thousands of users each, so a cohort ID alone should not distinguish you from other users like you, it still gives fingerprinters a massive head start. If the tracker starts with your FLoC cohort, it only has to find your browser from a few thousand others instead of a few hundred million.
FLoC groups will contain several bits of entropy (7), up to 8 bits as Google’s proof of concept trial in information-theoretic terms. However, this info is even more potent, considering that it is unlikely to be correlated with other data that the browser exposes. It will make it much easier for trackers to put together a unique fingerprint of FLoC users.
Google has acknowledged it as a challenge and has pledged to solve it as a part of their broader ‘Privacy Budget’ (8) plan to deal with fingerprinting in the long term. While solving fingerprinting is an admirable goal, and its proposal is a promising avenue to pursue. However, as per the FAQ (10), the plan is ‘an early-stage proposal and yet to have a browser implementation.’ Meanwhile, Google has already started testing FLoC with its new Google Chrome 90 update it released last week.
Fingerprinting is particularly tough to stop. Browsers such as Tor and Safari have engaged in years-long attrition wars against trackers, sacrificing large bundles of their feature sets to reduce fingerprinting attack surfaces. Such mitigations generally involve restricting or trimming off unnecessary entropy sources, which is what FLoC is. Google should not build new fingerprinting risks until it figures out how to deal with existing ones.
Another issue is less easily explained. The tech will share new personal data with trackers who can already find users. For FloC to be helpful to advertisers, a user’s group would necessarily reveal data about their behavior.
The project’s Github page marks it upfront.
‘The API democratizes access to some data about a user’s general history, and thus, general interests, to any platform that opts into it. Sites that know a person’s PPI, like when people sign in via their Gmail address, can record and reveal their group. It means that information about a user’s interests may eventually become public.’
As mentioned above, FLoC groups should not work as identifiers themselves. But, any company can identify a user in other ways. For say, by offering ‘login with Google’ services to sites around the web. It will enable them to tie the information it learns from FLoC to the individual’s profile.
This way, two categories of data may get exposed:
- Specific information about a user’s browsing history. Trackers may reverse-engineer the cohort-assignment algorithm to ascertain that any user who belongs to a specific cohort definitely or probably visited specific sites.
- General information about interests and demographics. Trackers may learn that, in general, users in a specific cohort are substantially likely to be a specific type of person. For instance, a particular cohort may overrepresent users who are female, young, and black. Or another cohort of middle-aged, or LGTTQ+youth.
It means every site a user visits would have a good idea about what kind of person they are on the first contact with without tracking the user across the web. Additionally, as the user’s FLoC cohort would update over time, advertisers and sites that can find him in other ways would also track how his browsing history changes. Remember, a FLoC cohort is nothing more than a summary of a user’s recent browsing activity.
We, users, should have the right to present different aspects of our identity in different contexts. If we visit a site for medical info, we may try it with the info about our health. But there is no reason it needs to know about our political interests. Likewise, if we visit a retail store, they should not know whether we have recently read up on treatment for depression. FLoC disintegrates such contexts separations. Instead, it offers the same behavioral summary to every site we interact with.
Going Beyond Privacy
Google designed FLoC to prevent a particular threat of individualized profiling, enabled via cross-context identifiers today. FLoC and other proposals aim to avoid allowing trackers to access specific information pieces to link to specific people. But as we have discussed, FLoC may help trackers in several ways. However, even if Google could iterate its design and prevent these risks, the harms of targeted ads are not limited to privacy violations.
The core objective of FLoC is at odds with other civil rights.
The power to target offers the power to discriminate. If we go by definition, targeted advertisements allow advertisers to reach some people while excluding others. Businesses can use a targeting system to decide who gets to see a job posting or loan offers as easy as advertising bags.
Over the years, the targeted advertising machinery has been frequently used for discrimination, exploitation, and harm. The potential to target users based on their religion, ethnicity, age, or ability allows discriminatory ads for housing, credit, and job. Targeting based on location, demographics, and political affiliation helps promoters of politically motivated misinformation and voter suppression. All such behavioral targeting can increase the risk of convincing scams (11).
Facebook, Google, and several other ad platforms are already trying to restrict certain users of their platforms. For instance, Google limits advertisers’ ability to target people in ‘sensitive interest categories’ (12). Their efforts often fall short; determined actors usually find loopholes to platform-wide restrictions on certain kinds of ads or targetings.
Even with absolute power over what data can be used to target whom, platforms are frequently unable to prevent abuse of their technologies. However, FLoC would use an unsupervised algorithm (13) to create its cluster. It means that no one would have direct control over how people are grouped. Ideally, for advertisers, FLoC would create cohorts that have meaningful interests and behaviors in common. However, online behavior is connected to all kinds of sensitive characteristics and demographics such as gender, age, ethnicity, and income – the five big personality traits and even mental health.
FLOC would likely group uses along some of these axes too. It may also directly reflect visits to sites related to financial hardship, substance abuse, or support for trauma survivors.
Google has proposed monitoring the system outputs to check for any correlations with its sensitive categories (13). Suppose it finds that a particular group is closely related to a particular sensitive category. In that case, the admin server can choose new algorithm parameters and tell users’ browsers to group them again.
The solution seems both Orwellian and Sisyphean. If Google is looking to monitor how FLoC groups correlate with sensitive categories, the tech giant would need to run massive audits with data about users’ gender, race, age, religion, health, and financial status. Whenever Google finds a cohort that correlated too strongly to any of those axes, it would have to reconfigure the entire algorithm and test again with the hope that no other ‘sensitive categories’ are implicated in the new version. It is a further complicated version of the issue it is already trying but frequently failing to solve (14).
In a world with FLoC, it may not be easy to target individuals based on gender, age, or income. It won’t be impossible. Trackers with access to auxiliary user information would learn the ‘meaning’ of FLoC groups, the people they contain via experiment and observation. Those who are determined would still be able to discriminate.
Additionally, such behavior would be more problematic for police and platforms to tackle than it already is. Advertisers with sinister motives would have plausible deniability. After all, they are not directly targeting protected categories. They only reach out to users based on their behavior, making the whole system opaque to regulators and users.
A Word to Google
If our readers browse through the internet, they will find several other platforms agreeing with our views. There are manifold others out there calling FLoC ‘the opposite of privacy-preserving tech.’ We hope that such pieces would shed light on FLoC’s fundamental flaws and cause Google to reconsider it.
Several issues on the official Github page have raised the same concerns (15, 16, 17) highlighted in this article. However, Google has already started testing the system while making no fundamental changes.
It has already started pitching FLoC to advertisers, boasting that it is a 95% adequate replacement for third-party cookies-based targeting (18). And starting with Chrome 89, rolled out on March 2, it has deployed the tech for a trial run. A ‘small’ portion of Chrome users, still likely millions of users, would be or have been assigned to test the new technology (19).
There is no room for any mistake. If Google implements FLoC in Chrome, it would likely give everyone involved options. The system would probably be opt-in for the advertisers that will gain from it and opt-out for users who would hurt it. Knowing Google, it would indeed promote it as a step forward to ‘transparency and user control’ since it knows full well that the vast majority of users would not understand how the system works, and only a few would go out of their way to turn it off.
Google would pat itself on the back for bringing in a new, private era on the internet, free of the evil third-party cookies, the tech that Google only helped extend well past its shelf life while making billions in the process (20).
It does not have to be that way. The essential parts of the privacy sandbox, such as dropping third-party identifiers and fighting to fingerprint, would change the internet for the better. Google can prefer to demolish the old scaffolding for surveillance without substituting it with something unusual and uniquely harmful.
With that, we emphatically reject a future with FLoC. It is not the world we want nor the one we deserve. Google needs to learn the right lessons from the third-part tracking era and develop its browser to work for users and not for advertisers.