Big Data Analytics
The first steps toward achieving a lasting competitive edge with Big Data analytics.
It's a boy! But do you know why?https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=i
Having a male child isn't as simple as a 50/50 chance. Did you know, for instance, that the birth of boys has been on a downward trend since 1971? Or that black and Native American mothers have fewer male children than Asians and Caucasians? Or that your chance of having a boy goes down as the mother or father age? And what's up with those 3,316 boys born to 89-year-old men in 1989?
Sue Ranney knows why. The VP of product development at Revolution Analytics found the answer by analyzing 22 years' worth of 70 gigabytes of raw data - on her laptop - to demonstrate how Revolution R Enterprise's RevoScaleR Big Data analysis package works.
Revolution Analytics is a predictive analysis company that takes the open-source R language and adds enterprise support for it. Data scientists are something of a rare breed - much more so than, say, a .NET programmer. So one of the reasons companies would use Revolution Analytics is to have that data scientist build apps to process the data for embedding in a BI dashboard or even an Excel spreadsheet.
The app can be stored on the Revolution R Enterprise server, made accessible by a Web services API. Then a .NET programmer can embed it using a .NET client API that the solution provides, explained David Smith, the company's vice president of marketing, in a recent interview.
Smith says the newest release, Revolution R Enterprise 5.0, includes even more support for making the R programming language more enterprise-friendly and easier for IT to manage. Among the new features are Hadoop integration certified with the Cloudera CDH3 distribution, integration with the Microsoft HPC server platform for doing high-performance, distributed computing and LDAP support for better security.
But perhaps most significantly, version 5.0 supports multiple nodes for distributed/parallel computing, which can be used for high-performance statistical modeling.
"If somebody wanted to do that on 10 billion rows of data, they could farm that problem out to a cluster of five,10, 20 or 50 machines running on the Microsoft HPC server framework and really reduce the processing time required to do those types of computations," Smith said. "We did a test where we ran a regression on 10 billion rows of data using just five machines. These are just off-the-shelf machines, not a very high hardware cost, but we were able to do that computation in just 90 seconds. You can do that same kind of thing in SAS, but you'd have to do it with hardware that costs up toward a million dollars or so."
The company reports impressive performance benefits. Researchers at Michigan State University managed to cut a three-and-a-half month analytical project to a little over one week using Revolution R on Microsoft HPC. Smith expects that level of performance benefits will translate over into the commercial sector.