8 Top Data Startups

    More than a decade ago, Marc Andreessen wrote a prescient article in the Wall Street Journal titled “Why Software Is Eating The World,” which noted all the industries that were being disrupted by software. It set the stage for the megatrend of cloud computing.

    But his motto could also apply to data. If anything, the opportunity could be much larger. Data is becoming a competitive advantage for many companies.

    Yet that data can be difficult to process. The fact is that it’s common for AI and analytics projects to fail or underperform.  

    But there is good news. There are startups that are developing tools to help companies with their data journeys.

    Here’s a look at eight of them to put on your radar. No Databricks in here, which has become so big that the next step is likely an IPO, but there are some billion-dollar “unicorn” valuations even in a slowing market.

    Also read: Data Startups: Why the Eye-Popping Funding Rounds?

    People Data Labs

    People Data Labs (PDL) is focused on B2B and professional data. By processing resumes, the company has been able to provide valuable insights for recruiting, market research, sales and marketing.

    “We see every company in the world building data solutions,” said PDL CEO Sean Thorne. “This is a rapidly growing market.’

    The company does not focus on selling flat files of leads or contracts, which is the traditional approach. Instead, it uses a data-as-a-service model and is part of the AWS Data Exchange platform. This makes it easier to provide data to customers in an easy-to-use format.

    In 2021, PDL raised $45 million in a Series B round of funding.


    Airbyte is focused on rethinking the data integration market. The company’s technology is based on an open source platform, which has supercharged adoption and innovation. There are more than 20,000 companies on the system and the community includes about 7,000 data practitioners.

    A key to Airbyte is that it can handle virtually any data pipeline, such as with database replication and long-tail and custom connectors. There is no need for in-house data engineers to maintain the systems.

    Last year, the company raised more than $181 million.


    The founders of Imply are the creators of Apache Druid, which is an open source database system for high-performance, real-time analytics. This experience has been critical in evolving the technology and tailoring it to the needs of enterprise customers.

    The target end-user is software developers. With Imply, they can create sophisticated analytics applications.

    “While adoption of Druid started with digital natives like Netflix, AirBnB and Pinterest, increasingly enterprises in the Fortune 1000 are recognizing the value of analytics applications as a way of differentiating their businesses,” said Fangjin Yang, CEO and cofounder, Imply. “And that’s what’s fueling the tremendous market opportunity for our category of real-time analytics databases.”

    This year, the company raised $100 million at a $1.1 billion valuation.

    Also read: Best Database Management Software 2022


    A majority of data is unstructured, which can be difficult to store and manage.

    This is where MinIO comes in. Consider that its system gets over 1 million Docker pulls per day and more than half the Fortune 500 use the technology.

    “The market for MinIO’s object storage product can be described simply: everywhere AWS S3 isn’t,” said Garima Kapoor, COO and cofounder, MinIO. “Even accounting for AWS’s size, this is a massive market. MinIO delivers AWS S3-like infrastructure across any cloud, virtual or bare-metal deployment scenario.”

    To date, the company has raised $126 million.


    A major challenge for enterprises is dealing with diverse sources of data. But for Cribl, this has been a great opportunity. The company has built an open and interoperable platform to manage data better and get more value from it.

    “What we hear from our IT and security customers is that they have an array of important tools they use across the enterprise but none of those tools talk to one another,” said Nick Heudecker, Senior Director, Market Strategy & Competitive Intelligence, Cribl. “Cribl’s solutions are open by design, seek to connect the disparate parts of the data ecosystem – such as complementing tools like Datadog, Exabeam, and Elastic — and give customers choice and control over all the event data that flows through their corporate IT systems.”

    For fiscal year 2021, the company more than tripled its customer count. Ten of the 50 Fortune companies have signed on.

    Cribl has raised a total of $254 million since inception.


    Observable operates a SaaS platform for real-time data collaboration, visualization and analysis. The founders created the company because of their frustration of constant “tool hopping” with existing data products. This made the process error-prone, tedious and slow.

    Observable is JavaScript-native, which helps to lower the learning curve. The company also has the benefit of a large community of 5 million users. This has resulted in the largest public library of data visualizations.

    In all, the company has raised $46.1 million.


    Reltio is a cloud-native platform that focuses on the master data management category. There are many legacy players in the market, such as Informatica, Tibco, IBM, SAP and Oracle. As for Reltio, it sees an opportunity for disruption.

    “We have various integration options, including a low-code/no-code solution, that allow for rapid deployment and time to value,” said Manish Sood, founder and CTO, Reltio. “Our system also uses machine learning to discover deeper data insights and improve data quality. Then there is built-in workflow management, which helps simplify compliance requirements and improve information stewardship productivity.”

    The company counts 14 of the Fortune 100 as customers. To date, it has raised $237 million, with a valuation at over $1.7 billion.


    TigerGraph is a system that allows for advanced analytics and AI with connected data. The technology has diverse applications, such as for anti-money laundering, fraud detection, IoT (Internet of Things) and network analysis.

    Traditional analytics systems are built on relational databases. But this can be expensive and rigid. It can also be more difficult to leverage next-generation analytics like deep learning.

    This is why graph databases are becoming more popular. “Customers want to model their data from the viewpoint of the customer, supplier, or whatever entity they want to analyze and how they interact with the company across systems like CRMs, procurement, logistics and so on,” said Todd Blaschka, COO, TigerGraph.

    Last year, the company raised $105 million in a Series C funding.  

    A tougher market in 2022?

    2022 may not give us as many eye-popping funding rounds, but if any area stays strong, it’s likely to be the startups fueling the data analytics craze.

    Read next: Top Artificial Intelligence (AI) Software 2022

    Tom Taulli
    Tom Taulli
    Tom Taulli is the author of Artificial Intelligence Basics: A Non-Technical Introduction, The Robotic Process Automation Handbook: A Guide to Implementing RPA Systems and Modern Mainframe Development: COBOL, Databases, and Next-Generation Approaches (will be published in February). He also teaches online courses for Pluralsight.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles