The Real Life of a Data Scientist

In early 2012, a group of Stanford University researchers interviewed 35 data analysts from 25 organizations across a variety of sectors, including health care, retail, marketing and finance, and identified the various challenges data scientists face in the data analysis process.

Despite being in high demand and hailed as one of the hottest professions of the 21st century, much of the work of a data scientist is actually dominated by the incredibly time-consuming process of changing data into a usable form. The data analysis process involves four tasks – discovery, transformation, modeling and reporting – with data scientists spending as much as 60 to 80 percent of their time in the data transformation stage.

In this slideshow, Trifacta, a provider of productivity platforms for data analysis, takes you through each of these tasks in greater detail, highlighting the pain points data scientists face at each stage. It’s clear tools are needed that can simplify the data analysis process while at the same time increasing productivity and collaboration among data scientists.

Click through for a closer look at the day-to-day activities of a data scientist, as identified by Trifacta.

Discovery: Acquiring data necessary to complete an analysis task.

Before data scientists start to transform data for analysis, they must first acquire the necessary data. However, they often find that data is distributed across multiple databases, and that their organization lacks documentation or search capabilities, requiring them to rely on database administrators or others for help.

Learn more:

Businesses Finding Profitable Uses for Open Data, Says Expert

The Best Advice from Big Data 2014 Predictions

Transformation: Manipulating the acquired data for analysis, diagnosing data quality and understanding what assumptions can be made.

The most time-consuming component of the analysis process, transformation, involves reformatting, validating data to make it palatable for databases and visualization tools, diagnosing the data for quality issues and trying to understand what assumptions they can make about it. In the transformation phase, data scientists encounter numerous challenges, including data sets that may contain missing, erroneous or extreme values. As a result, the assumptions that data scientists make about such data sets turn out to be wrong and misled.

Learn more:

For Better or Worse, Ontario Projects Show What’s Possible with Big Data and Integration

Study Finds Self-Service Integration Worthwhile for BI

Modeling: Constructing a model of the assembled data.

The biggest difficulty in constructing a model is understanding the relevance of each data set to a given analysis task. When data scientists get to this stage, they often find their data has not been completely transformed and must go back to the wrangling stage in order to identify useful patterns or relationships. Data scientists also find that during this stage, many existing analytics packages, tools or algorithms do not scale with the size of their data sets.

Learn more:

CIOs: Modernize Data Capabilities in Manufacturing, Supply Chains in 2014

Top 10 MDM Mistakes of 2013

Reporting: Sharing insights gained from the data.

Because of poor documentation of assumptions made during analysis, data scientists may find it hard to distribute and consume reports, which can affect the interpretation of results. With little to no knowledge of how the original input data was transformed, many reports do not allow for interactive verification or sensitivity analysis.

Learn more:

A Mini-FAQ on Combining MDM and Big Data

Picks for Best Data Success Stories from 2013

5G and Industrial Automation: Practical Use Cases

Is 5G Enough to Boost the Metaverse?

Building a Private 5G Network for Your Business

5G and AI: Ushering in New Tech Innovation

The Role of 5G in the Sustainability Fight

The Real Life of a Data Scientist

Get the Free Newsletter!

Latest Articles

How DeFi is Reshaping the Future of Finance

Enterprise Software Startups: What It Takes To Get VC Funding

Top RPA Tools 2022: Robotic Process Automation Software

Advertisers

Menu

Our Brands