10 Top Data Companies

    The term “data company” is certainly broad. It could easily include giant social networks like Meta. The company has perhaps one of the world’s most valuable data sets, which includes about 2.94 billion monthly active users (MAUs). Meta also has many of the world’s elite data scientists on its staff.

    But for purposes of this article, the term will be narrower. The focus will be on those operators that build platforms and tools to leverage data – one of the most important technologies in enterprises these days.

    Yet even this category still has many companies. For example, if you do a search for data analytics on G2, you will see results for over 2,200 products.

    So when coming up with a list of top data companies, it will be, well, imperfect. Regardless, there are companies that are really in a league of their own, from established names to fast-growing startups, publicly traded and privately held. Let’s take a look at 10 of them.

    Also see out picks for Top Data Startups.


    In 2012, a group of computer scientists at the University of California, Berkeley, created the open source project, Apache Spark. The goal was to develop a distributed system for data over a cluster of machines.

    From the start, the project saw lots of traction, as there was a huge demand for sophisticated applications like deep learning. The project’s founders would then go on to create a company called Databricks.

    The platform combines a data warehouse and data lakes, which are natively in the cloud. This allows for much more powerful analytics and artificial intelligence applications. There are more than 7,000 paying customers, such as H&M Group, Regeneron and Shell. Last summer, the ARR (annual recurring revenue) hit $600 million.

    About the same time, Databricks raised $1.6 billion in a Series H funding and the valuation was set at a stunning $38 billion. Some of the investors included Andreessen Horowitz, Franklin Templeton and T. Rowe Price Associates. An IPO is expected at some point, but even before the current tech stock downturn, the company seemed in no hurry to test the public markets.

    We’ve included Databricks on our lists of the Top Data Lake Solutions, Top DataOps Tools and the Top Big Data Storage Products.


    SAS (Statistical Analysis System), long a private company, is one of the pioneers of data analytics. The origins of the company actually go back to 1966 at North Carolina State University. Professors created a program that performed statistical functions using the IBM System/360 mainframe. But when government funding dried up, SAS would become a company.

    It was certainly a good move. SAS would go on to become the gold standard for data analytics. Its platform allows for AI, machine learning, predictive analytics, risk management, data quality and fraud management.

    Currently, there are 80,800 customers, which includes 88 of the Top 100 on the Fortune 500.  There are 11,764 employees and revenues hit $3.2 billion last year.

    SAS is one of the world’s largest privately-held software companies. Last summer, SAS was in talks to sell to Broadcom for $15 billion to $20 billion. But the co-founders decided to stay independent and despite having remained private since the company’s 1976 founding, are planning an IPO by 2024.

    It should surprise absolutely no one that SAS made our list of the top data analytics products.


    Snowflake, which operates a cloud-based data platform, pulled off the largest IPO for a software company in late 2020. It raised a whopping $3.4 billion. The offering price was $120 and it surged to $254 on the first day of trading, bringing the market value to over $70 billion. Not bad for a company that was about eight years old.

    Snowflake stock would eventually go above $350. But of course, with the plunge in tech stocks, the company’s stock price would also come under extreme pressure. It would hit a low of $110 a few weeks ago.

    Despite all this, Snowflake continues to grow at a blistering pace. In the latest quarter, the company reported an 85% spike in revenues to $422.4 million and the net retention rate was an impressive 174%. The customer base, which was over 6,300, had 206 companies with capacity arrangements that led to more than $1 million in product revenue in the past 12 months.

    Snowflake started as a data warehouse. But the company has since expanded on its offerings to include data lakes, cybersecurity, collaboration, and data science applications. Snowflake has also been moving into on-premises storage, such as querying S3-compatible systems without moving data.

    Snowflake is actually in the early stages of the opportunity. According to its latest investor presentation, the total addressable market is about $248 billion.

    Like Databricks, Snowflake made our lists of the best Data Lake, DataOps and Big Data Storage tools.


    Founded in 2003, Splunk is the pioneer in collecting and analyzing large amounts of machine-generated data. This makes it possible to create highly useful reports and dashboards.

    A key to the success of Splunk is its vibrant ecosystem, which includes more than 2,400 partners. There is also a marketplace that has over 2,400 apps.

    A good part of the focus for Splunk has been on cybersecurity. By using real-time log analysis, a company can detect outliers or unusual activities.

    Yet the Splunk platform has shown success in many other categories. For example, the technology helps with cloud migration, application modernization, and IT modernization.

    In March, Splunk announced a new CEO, Gary Steele. Prior to this, he was CEO of Proofpoint, a fast-growing cloud-based security company.

    On Steele’s first earnings report, he said: “Splunk is a system of record that’s deeply embedded within customers’ businesses and provides the foundation for security and resilience so that they can innovate with speed and agility. All of this translated to a massive, untapped, unique opportunity, from which I believe we can drive long-term durable growth while progressively increasing operating margins and cash flow.”


    While there is a secular change towards the cloud, the reality is that many large enterprises still have significant on-premises footprints. A key reason for this is compliance. There is a need to have much more control over data because of privacy requirements.

    But there are other areas where data fragmentation is inevitable. This is the case for edge devices and streaming from third parties and partners.

    For Cloudera – another one of our top data lake solutions – the company has built a platform that is for the hybrid data strategy. This means that customers can take full advantage of their data everywhere.

    Holger Mueller at Constellation Research praises Cloudera’s reliance on the open source Apache Iceberg technology for the Cloudera Data Platform.

    “Open source is key when it comes to most infrastructure-as-a-service and platform-as-a-service offerings, which is why Cloudera has decided to embrace Apache Iceberg,” Mueller said. “Cloudera could have gone down a proprietary path, but adopting Iceberg is a triple win. First and foremost, it’s a win for customers, who can store their very large analytical tables in a standards-based, open-source format, while being able to access them with a standard language. It’s also a win for Cloudera, as it provides a key feature on an accelerated timeline while supporting an open-source standard. Last, it’s a win for Apache, as it gets another vendor uptake.”

    Last year, Cloudera reported revenues over $1 billion. Among its thousands of customers, they include over 400 governments, the top ten global telcos and nine of the top ten healthcare companies.

    Also read: Top Artificial Intelligence (AI) Software for 2022


    The founders of MongoDB were not from the database industry. Instead, they were pioneers of Internet ad networks. The team – which included Dwight Merriman, Eliot Horowitz and Kevin Ryan – created DoubleClick, which launched in 1996. As the company quickly grew, they had to create their own custom data stores and realized that traditional relational databases were not up to the job.  

    There needed to be a new type of approach, which would scale and allow for quick innovation.  So when they left DoubleClick after selling the company to Google for $3.1 billion, they went on to develop their own database system. It was  based on an open source model and this allowed for quick distribution.

    The underlying technology relied on a document model and was called NoSQL. It provided for a more flexible way for developers to code their applications. It was also optimized for enormous transactional workloads.

    The MongoDB database has since been downloaded more than 265 million times. The company has also added the types of features required by enterprises, such as high performance and security.  

    During the latest quarter, revenues hit $285.4 million, up 57% on a year-over-year basis. There are over 33,000 customers.

    To keep up the growth, MongoDB is focused on taking market share away from the traditional players like Oracle, IBM and Microsoft. To this end, the company has built the Relational Migrator. It visually analyzes relational schemas and transforms them into NoSQL databases.


    When engineers Jay Kreps, Jun Rao and Neha Narkhede worked at LinkedIn, they had difficulties creating infrastructure that could handle data in real time. They evaluated off-the-shelf solutions but nothing was up to the job.

    So the LinkedIn engineers created their own software platform. It was called Apache Kafka and it was open sourced. The software allowed for high-throughput, low latency data feeds.

    From the start, Apache Kafka was popular. And the LinkedIn engineers saw an opportunity to build a company around this technology in 2014. They called it Confluent.

    The open source strategy was certainly spot on. Over 70% of the Fortune 500 use Apache Kafka.

    But Confluent has also been smart in building a thriving developer ecosystem. There are over 60,000 meet-up members across the globe. The result is that developers outside Confluent have continued to build connectors, new functions and patches.

    In the most recent quarter, Confluent reported a 64% increase in revenues to $126 million.  There were also 791 customers with $100,000 or more in ARR (Annual Recurring revenue), up 41% on a year-over-year basis.


    Founded in 2010, Datadog started as an operator of a real-time unified data platform. But this certainly was not the last of its new applications.

    The company has been an innovator – and has also been quite successful getting adoption for its technologies. The other categories Datadog has entered include infrastructure monitoring, application performance monitoring, log analysis, user experience monitoring, and security. The result is that the company is one of the top players in the fast-growing market for observability

    Datadog’s software is not just for large enterprises. In fact, it is available for companies of any size.

    Thus, it should be no surprise that Datadog has been a super-fast grower. In the latest quarter, revenues soared by 83% to $363 million. There were also about 2,250 customers with more than $100,000 in ARR, up from 1,406 a year ago.

    A key success factor for Datadog has been its focus on breaking down data silos. This has meant much more visibility across organizations.  It has also allowed for better AI.

    The opportunity for Datadog is still in the early stages. According to analysis from Gartner, spending on observability is expected to go from $38 billion in 2021 to $53 billion by 2025.

    See the Top Observability Tools & Platforms


    Traditional data integration tools rely on Extract, Transform and Load (ETL) tools. But this approach really does not handle modern challenges, such as the sprawl of cloud applications and storage.

    What to do? Well, entrepreneurs George Fraser and Taylor Brown sought out to create a better way. In 2013, they cofounded Fivetran and got the backing of the famed Y Combinator program.

    Interestingly enough, they originally built a tool for Business Intelligence (BI). But they quickly realized that the ETL market was ripe for disruption

    In terms of the product development, the founders wanted to greatly simplify the configuration. The goal was to accelerate the time to value for analytics projects. Actually, they came up with the concept of zero configuration and maintenance. The vision for Fivetran is to make “business data as accessible as electricity.”

    Last September, Fivetran announced a stunning round of $565 million in venture capital. The valuation was set at $5.6 billion and the investors included Andreessen Horowitz, General Catalyst, CEAS Investments, and Matrix Partners.


    Kevin Stumpf and Mike Del Balso met at Uber in 2016 and worked on the company’s AI platform, which was called Michelangelo ML. The technology allowed the company to scale thousands of models in production. Just some of the use cases included fraud detection, arrival predictions and real-time pricing.

    This was based on the first feature store. It allowed for quickly spinning up ML features that were based on complex data structures.

    However, this technology still relied on a large staff of data engineers and scientists. In other words, a feature store was mostly for the mega tech operators.

    But Stumpf and Del Balso thought there was an opportunity to democratize the technology. This became the focus of their startup, Tecton, which they launched in 2019.

    The platform has gone through various iterations. Currently, it is essentially a platform to manage the complete lifecycle of ML features. The system handles storing, sharing and reusing feature store capabilities. This allows for the automation of pipelines for batch, streaming and real-time data.

    In July, Tecton announced a Series C funding round for $100 million. The lead investor was Kleiner Perkins. There was also participation from Snowflake and Databricks.

    Read next: 5 Top VCs For Data Startups

    Tom Taulli
    Tom Taulli
    Tom Taulli is the author of Artificial Intelligence Basics: A Non-Technical Introduction, The Robotic Process Automation Handbook: A Guide to Implementing RPA Systems and Modern Mainframe Development: COBOL, Databases, and Next-Generation Approaches (will be published in February). He also teaches online courses for Pluralsight.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles