Google’s tagline was “don’t be evil” until May this year when it quietly removed the catchphrase from its code of conduct. The take of German search engine Cliqz is more like “assume we will all turn evil”. Cliqz CEO Marc Al-Hames believes in preparing for the worst. He doesn’t collect or store any identifiable data from internet users and as a consequence, he’s not asking you to trust him with it. “If the government or a hacker gets access to our servers, or if an employee turns evil or if Cliqz turns evil as a company; the data we have at our backend should never be sufficient to identify a user or what a user has been doing,” Al-Hames said in an interview with Sifted.
Founded in 2008, Cliqz aims to deliver direct and relevant results to internet users, without collecting and selling their every digital move. With their headquarters in Munich, Germany, Cliqz’ user base is predominantly German but in 2018 they began to expand their search index to cover France and the US. Tech behemoth Google dominates search – over 90% globally – but with data breaches and surveillance scares, alternative privacy-centric search engines are becoming an increasingly attractive option. American search engine DuckDuckGo is getting more queries a day on average than ever before and it is now the seventh most popular search engine, according to NetMarketShare. Last month, France’s National Assembly and Army Ministry announced that they would be adopting Qwant, a French-German search engine that does not track users, as their default search engine instead of Google as part of their drive to retain “digital sovereignty” – keeping hold of citizen data to avoid becoming one of Google’s “digital colonies”.
Cliqz is an anti-tracking searching engine. How do you manage to make searches relevant to users and how do you train algorithms if you don’t collect user data?
I think the question is not whether you collect data, but what type of data you collect. What we need to know is: where do people travel on the web. So we need to know statistics about the type of websites people visit, how long they stay on these websites, because that tells you something about the relevance and tells you something how to train your search engine algorithm.
Until a couple of years ago the fundamental approach of everyone in the web was “let me collect basically everything”. They would set up a profile called Marc Al-Hames that will record everything that Marc Al-Hames is doing, and then make money with the data maybe, as well as mine the data to train algorithms. We take a different approach to that: only collect what you really need, so data minimisation (we don’t collect data and filter at the back end) and secondly, never collect anything related to a person.
So what type of information do you collect exactly? Would you collect location, website visits, purchases?
We are collecting all that but never in the connection with any other data that could, for example, we would not connect a location to an individual. And the location would be at the city level – not a precise location. We split up data in atomic units: a person from Munich (1) has searched for Marc Al Hames (2) and clicked on Marc Al Hames LinkedIn profile (3). We wouldn’t make a connection between these three because it would be sufficient to potentially identify someone.
Get the Sifted Newsletter
Does unlinking data in this way compromise the quality of search?
No, we can learn as much as we need for the search engine as if we used personal data – it’s just a bit more complicated and takes a bit more effort. By the way, Google didn’t record as much data in the beginning – they only started later on. For personalisation, there is a user profile on your device, but we don’t have access to that, only you. This means that say you’re looking for a hotel in Munich, we would send you back quite a few results and and your device would filter them based on your information.
And does not collecting personal data compromise your ability to make a profit?
There’s a yes and no. Obviously, having more data is a nice thing to a create a monopoly yet not having personal data does not stop you from creating a business. So Cliqz for example, monetises “MyOffrz”, so it’s a targeted ad model. On your device, you have a profile and we send certain offers we have to your device. Let’s say Amazon have an iPhone X with a Vodaphone contract and it’s very cheap at the moment. We send this to all of our users and the moment you start search for an iPhone, your device will recognise that and show you our offer. So you can highly target an ad, but we’ve never learned that you are the person that looked for one.
We do things the other way round to Google and Facebook. They do all this from their server: so they collect your profile data and as you’re interested in an iPhone, they will send you the ad. We send the ad to every one of our users and then the offers are filtered by your device. It’s the same effect but the way it is done in the background is completely private.
Would your ad targeting tool tap into data from other apps held on your device?
No we obviously don’t do that. It is the typical Google and Facebook approach to collect as much information as possible. They usually put the Google analytics pixel or the Facebook framework into other people’s apps to collect that information and build a profile that is as ad comprehensive as possible. Sure, if you look at the ad market it allows them to show more ads than we do because they have this kind of information. We don’t have that so if you go to the Amazon app and shop for shoes we won’t get that information. I have to say, it’s not required to build a profitable business and except for more profit it doesn’t serve a lot of purpose.
How do you ensure that your privacy mechanism is watertight?
It’s obviously a risk that we do make a mistake, and you do make honest mistakes. So what do we do? There’s a couple of principles. Number one is all our code that is running on the computers is open source so everyone can actually verify what we do and can tell us if we are making mistakes. Secondly, the data that is flowing from the user to us is also open, so people can monitor it and see what we do. Certainly, we do constant reviews of our data. So every three months we actually invite someone to Cliqz and they are challenged to identify a user. And all this should help keep us honest and also help us discover if we made a mistake because we are humans as well. But the difference is if we made a mistake we’re obviously fixing it immediately because we have no intention to collect that type of data. It’s a difficult challenge – it’s not that you install privacy by design once and then it’s done. We actually have a large team of engineers only dedicated to making sure that what we do is actually still guaranteeing privacy.
I’m always a bit worried if something is only compliant. If you take GDPR, Google is the most compliant company in the world, it doesn’t mean it’s the most private company in the world, it just happens that they know exactly what to do. (Here is a Cliqz take on why “Google is the biggest beneficiary of GDPR” and why users can’t just rely on GDPR to protect their privacy.)
You started Cliqz in 2008 and went into full web search in 2013, so this was five years ahead of GDPR. Has the introduction of the regulation increased interest in Cliqz from users?
Five years ago people smiled at privacy, and you still have a lot of people who smile at privacy but it’s taken very seriously now, and we see it actually at all angles. You see that all the discussion in the last year especially, it has just created more awareness about the topic and we do see this, both in terms of website traffic as well as an increase in user numbers. We don’t have full numbers, but it is significant and whenever there is an article about privacy or about one of the privacy scandals like Cambridge Analytica, we can see a spike of 20-30% on that day in total in scores and user polls. It is significant but while I wouldn’t call it a niche topic, it’s not yet a mass market topic.
Germany, where you are based, is seen as being very privacy oriented in terms of regulatory culture. Do you think German users are more concerned about privacy?
Get the Sifted Newsletter