Almost any kind of AI startup will get the attention of investors today in Europe. Founders are racing to apply algorithmic decision-making to healthcare, education, public safety, credit assessment, employment selection and more. This puts humans at ever-greater risk of arbitrary (and erroneous) judgements, something that’s easy to forget when Europe’s startup ecosystem is awash in cash.
But founders leveraging AI should not forget that one of the biggest risks to their business is having a customer calling them out. That could be for biased results or for using their data in ways that they didn’t originally agree to. You can’t hide from this forever — they will call you out.
The revenue loss that comes with a loss of customer trust can derail growth plans and destroy a company’s credibility. When even successful products (like Amazon’s Rekognition and HireVue’s Facial Monitoring) have to shut down because the data underlying the models prove to be skewed, unreliable or prohibitively difficult to fix, the impact is usually fleeing clients.
How can you avoid this risk? You might not be the technical founder or the technical talent in your startup, but as a founder, you should know what your product is doing and what it’s not doing. These are six questions you can ask to keep yourself — or anyone building “AI-driven” solutions — honest.
Knowing versus prediction
AI typically involves statistical modelling of data, with high and low-certainty prediction outcomes, like “will a customer click on this book after buying that book?” or “does a credit applicant with these characteristics pose a higher risk of default?” Since algorithmic models are based on historical data, you should always ask: What does this AI solution allow you to know with 100% certainty and what does it predict?
For instance: When an early customer of the newly releaseed AppleCard, David Heinemeier Hansson, received 20x the credit limit of his wife, the algorithm offered certainty that men had, historically, received higher credit limits than women. It predicted that this particular woman’s credit limit should be lower than that of her husband.
Identify bias in the data
How do you identify types of bias inherent in the data collection process that may skew results? When Amazon collected data to feed its AI hiring model, it used its own past hiring data as the “ground truth” dataset. This skewed data set resulted in the rejection of all women candidates by the hiring algorithm. (Amazon has since scrapped the tool.)
An algorithm can only treat the data it is fed as ground truth. If something is biased in the collection or labelling of training data, this bias will always determine the end result, no matter what modelling technique you use. An algorithm trained on US or EU images would quickly learn to identify women in white dresses and veils as brides. But without human intervention and/or tonnes of additional training data, that same algorithm would likely fail to identify Indian women wearing a traditional red-coloured lehenga — even with a veil — as a bride.
Do the people implementing data classification have relevant specialist expertise? If the AI will be used to determine, say, psychological characteristics, are licensed psychologists classifying the data? If medical analyses are being made, is there a doctor to oversee classification?
Perhaps the safest way for founders to police a dataset for use in interconnected global markets is to ensure the involvement of humanities experts in the building and testing of “AI'' algorithms. Involving an ethnography or anthropology specialist to analyse data collected from one population or geography to predict the behaviours and characteristics of other populations can go a long way to de-risk your use of datasets in general.
What’s the data pipeline process? How do you get from raw data to some kind of result? Is there a simple explanation for what the algorithm does? For example, if you are building an AI hiring model, what CVs do you start with? Which characteristics and demographics are included in those CVs? How do you get from millions of CVs with various factors to the “model” for who would be the best-qualified candidate?
Typical use cases
Does a “typical” or average use case look similar (including outcomes) across sectors/geographies and why? If an “AI” algorithm works well in theory, does it perform as well as expected when the model is applied to a different geography, demography or sector? For instance, if a model recommends a set of verbal, facial and physical cues to signify truthfulness — are these same cues applicable to older individuals? If not, why might people above a certain age not qualify as “typical” use cases?
What does an exceptional (or “edge”) case look like and why? Exceptions are to be expected for data models that pertain to humans — but the exact characteristics by which they are classified as “atypical” should be clearly explained. If, for example, all of the outlier or “edge” cases for hiring psychology interns belong to protected classes — they are all women, they are all ethnic minorities, they are all educated in non-Western institutions — then that dataset may have underlying biases.
The power of predictive software and its uses should never defy explanation. The historical data with which “AI'' algorithms are trained can and does serve as guardrails for high-speed decision-making. But, like toilet paper inconveniently stuck to your shoe, an algorithm’s dataset may contain lots of information that can be unwanted, biased or unreliable debris.
Making the AI sausage
“AI-solution” purveyors who can’t produce answers to the questions above aren't saying how their AI sausage is made. The newly proposed EU AI Act may soon make a lack of transparency on algorithmic systems a serious problem for European startups, especially those that offer AI solutions for the health, finance, social benefits, policing, employment or education sectors.
Luckily, creating strategies to mitigate data bias or data deficiencies is a human’s job. Founders and funders of “AI” ventures in Europe can do themselves a favour by asking these questions early and often.
Nakeema Stefflbauer is the division director, digital client services for ERGO and the founder of FrauenLoop, an NGO computer programming school for women in Germany. Samantha Edds is senior data scientist at Yelp in Germany.