October 28, 2021

Dear startups: before you worry about AI, do data right

Many startups think AI can make their products pop, but often they need to focus on their data instead

Patrik Liu Tran

4 min read

Patrik Liu Tran

It seems like all startups today are eager to have an AI/machine learning (ML) strategy. But do they really need one?

In 2019, a survey found that 40% of European companies that classify themselves as “AI companies” don’t use artificial intelligence in a way that is “material” to their businesses. Some of these companies probably used the AI label to attract talent and funding, while others probably thought that they needed to work with AI but struggled to find relevant applications.

It might sound less important, but for most startups, being obsessed with data is far more essential than struggling to build ML capabilities.


Determining if AI/ML makes sense

As an artificial intelligence and machine learning expert, I’ve helped hundreds of companies from big corporations to early-stage startups on the issue. When companies ask me if they should use AI/ML, my reply is always: ”Why will your company succeed? Is it based on a novelty in your business idea, operational excellence, or technology?“

If the answer is the novelty of the business idea (like Klarna, a buy now, pay later pioneer) or operational excellence (like Bolt, founded years after companies such as Uber, that needed to excel at operations and be more efficient and faster than its competitors to succeed), technology should be a support function to enable the business rather than to drive it in the early days. Therefore, advanced technology such as AI/ML should not consume too much attention early on. 

On the other hand, if the answer is technology (like TikTok, which has been heavily dependent on its algorithms for addictive content curation), AI/ML might be one of the technologies worth investing in from the start. 

Collecting high-quality data

What if AI/ML is right for your startup? To build sophisticated AI/ML (now or down the road), you need relevant data of high quality in order to train algorithms. Here are a few considerations when doing that: 

  1. Define use cases: To know what data is relevant, it’s essential to understand what you want to do with it and to have hypotheses about potential use cases. Many companies fail to identify use cases where AI/ML is a suitable solution, which signals that they should not invest additional time into AI/ML for the time being. If the competency to identify potential use cases is missing in-house, startups can initially rely on external resources such as advisers with prior hands-on experience for help.
  2. Collect relevant data of high quality: However, if you have clear use cases, you need to set up the right data stack to collect appropriate data for them. There are cases in which companies might already have collected relevant high-quality data proactively.
  3. Getting a data engineer: One of the main mistakes that many startups make is that they employ machine learning engineers, analysts and data scientists very early without even thinking about recruiting data engineers. All too often, this halts all progress since they have no valuable data to work with. This often forces these workers to become data engineers out of necessity. But since they are not trained for those specific tasks, this will take a lot of time and be suboptimal. You need data engineers to set up the correct infrastructure for data collection (often a modern data stack). It is for this reason that data engineering has become the fastest-growing job in tech and that there are 70% more open roles at companies in data engineering than in data science. They are the ones who are responsible for, among other things, setting up the data stack of the company and ensuring that the data that’s being collected is relevant, reliable and high quality. 

After building a proper pipeline for data collection, the next step depends on the defined use cases. Data analysts are often involved in the defined use cases related to analytics and data mining (such as finding relevant patterns in high-dimensional data).

Recurring or ad-hoc analyses are often performed, where the output is usually a report which gives insights into the business. It could, for example, look at a segmentation of customers to better understand their needs, which could inform areas such as product development and marketing.

If the defined use cases relate to deploying a ML model in production for automated predictions and inferences, data scientists and ML engineers are often involved in building the models, putting them into production, and ensuring that the entire process is robust and kept up to date.

Data quality trumps any algorithm

Most startups should not be too concerned about AI/ML, especially not if their main USP is related to a novel business idea or operational excellence. If technology is their main USP, then the likelihood of them applying ML in a relevant manner is higher. 

The next steps for those companies are to identify potential use cases and collect relevant and high-quality data for those. To apply machine learning efficiently and in a meaningful way, you need good data. Good data is beneficial for companies beyond ML since it improves all data-driven use cases, including business intelligence, analytics and general data-driven decision-making.

So startups, before you dive into an AI/ML strategy, try being obsessed with data first.