Rethinking Databases: Why Data’s ‘Shape’ May Matter Most

    Slide Show

    In-Memory: Speeding Up Value by Using Operational Intelligence

    For decades, the choice has been among relational databases from a variety of vendors, but that era is ending, argues Neo Technology CEO Emil Eifrem. Relational databases are no longer the default choice. Instead, Eifrem says, your data management choice will hinge on one question: What is the shape of your data?

    “They were all the same type of database,” Eifrem says in a recent ZDNet article. “But that era has gone forever and it will never come back because data is just so big and so irregularly shaped now that you’re always going to be able to get a hundred times improvement, a thousand times improvement, a million times improvement if you get a data technology that is shaped like the shape of your data.”

    This talk of ‘data shapes’ is common lingo for those familiar with graph databases, which is what Neo Technology produces. What makes graph databases useful is structure — the data’s structure, according to president and founder of Open Software Integrators Andrew C. Oliver.

    Ironically, graph databases let you put the relationships — specifically, the “distance” of relationships — back in the equation.

    “Relationships matter as much as, if not more than, the data itself,” Oliver explained in an InfoWorld column. “By using the RDBMS for everything over the last few decades, the industry has done the equivalent of using a list for every data structure. You wouldn’t use only one data structure for every type of data in memory, why would you do that just because you’re storing the data?”

    Whereas relationship databases are table-based, graph databases use nodes, properties and edges to store data. It’s more like the human brain, according to CMS Wire, and that allows you to create a graph of connections among people, objects and data.

    That’s also what makes them particularly useful for data-driven decision makers. It’s easier to explore causalities and look for patterns of behavior. That, in turn, makes it possible to make informed predictions about people’s actions — regardless of whether that person is a customer, a fraudster, a terrorist or a hacker.

    Cognitive Scale CEO Akshay Sabhikhi, who previously led IBM’s Smarter Care initiative, told Datanami that this supports four types of insights:

    • What happened
    • Why it happened
    • What you should do about it
    • What’s going to happen if you don’t act

    On a more practical level, graph databases also require much less code work, which Oliver says can decrease turn-around time and software bugs.

    While Datanami calls Neo Technology the “undisputed giant” in graph databases, a recent article lists other players:

    Data Analytics

    Open Source Solutions:

    • Titan, a distributed transactional graph database
    • Apache Giraph, which runs on Hadoop and uses MapReduce to process data
    • GraphX, which is included with Apache Spark and comes preloaded with algorithms

    Commercial Solutions:

    • Neo4j by Neo Technology
    • SPARQLverse by SPARQLcity
    • GraphLab
    • Cray, a YarcData appliance that ships with a graph
    • Sparksee (formerly DEX), which was developed by Polytechnic University of Catalonia and is being commercialized by Sparsity Technologies

    There are also hybrid NoSQL databases with graph-like capabilities such as MarkLogic, Arango-DB and Sqrrl, the article notes.

    Graph databases, NoSQL solutions and Hadoop-based solutions create more infrastructure options to support different data analytics needs. Hence, Eifrem says, it’s misleading for anyone to contend that one database can solve all your data problems. Instead, it will be up to data architects to determine when to use old-school relational databases or choose one of the newer options.

    “None of these new databases is horizontally better than any one of the others,” Eifrem told ZDNet. “But the really interesting question is not, ‘Is this one faster than the other?’ but ‘In what situations are they used?’ — and that comes down to the shape of the data.”

    Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Latest Articles