More

    The Real Life of a Data Scientist

    In early 2012, a group of Stanford University researchers interviewed 35 data analysts from 25 organizations across a variety of sectors, including health care, retail, marketing and finance, and identified the various challenges data scientists face in the data analysis process.

    Despite being in high demand and hailed as one of the hottest professions of the 21st century, much of the work of a data scientist is actually dominated by the incredibly time-consuming process of changing data into a usable form. The data analysis process involves four tasks – discovery, transformation, modeling and reporting – with data scientists spending as much as 60 to 80 percent of their time in the data transformation stage.

    In this slideshow, Trifacta, a provider of productivity platforms for data analysis, takes you through each of these tasks in greater detail, highlighting the pain points data scientists face at each stage. It’s clear tools are needed that can simplify the data analysis process while at the same time increasing productivity and collaboration among data scientists.   

    The Real Life of a Data Scientist - slide 1

    Click through for a closer look at the day-to-day activities of a data scientist, as identified by Trifacta.

    The Real Life of a Data Scientist - slide 2

    Discovery: Acquiring data necessary to complete an analysis task.

    Before data scientists start to transform data for analysis, they must first acquire the necessary data. However, they often find that data is distributed across multiple databases, and that their organization lacks documentation or search capabilities, requiring them to rely on database administrators or others for help.   

    Learn more:

    Businesses Finding Profitable Uses for Open Data, Says Expert

    The Best Advice from Big Data 2014 Predictions

    The Real Life of a Data Scientist - slide 3

    Transformation: Manipulating the acquired data for analysis, diagnosing data quality and understanding what assumptions can be made.

    The most time-consuming component of the analysis process, transformation, involves reformatting, validating data to make it palatable for databases and visualization tools, diagnosing the data for quality issues and trying to understand what assumptions they can make about it. In the transformation phase, data scientists encounter numerous challenges, including data sets that may contain missing, erroneous or extreme values. As a result, the assumptions that data scientists make about such data sets turn out to be wrong and misled.

    Learn more:

    For Better or Worse, Ontario Projects Show What’s Possible with Big Data and Integration

    Study Finds Self-Service Integration Worthwhile for BI

    The Real Life of a Data Scientist - slide 4

    Modeling: Constructing a model of the assembled data.

    The biggest difficulty in constructing a model is understanding the relevance of each data set to a given analysis task. When data scientists get to this stage, they often find their data has not been completely transformed and must go back to the wrangling stage in order to identify useful patterns or relationships. Data scientists also find that during this stage, many existing analytics packages, tools or algorithms do not scale with the size of their data sets.

    Learn more:

    CIOs: Modernize Data Capabilities in Manufacturing, Supply Chains in 2014

    Top 10 MDM Mistakes of 2013

    The Real Life of a Data Scientist - slide 5

    Reporting: Sharing insights gained from the data.

    Because of poor documentation of assumptions made during analysis, data scientists may find it hard to distribute and consume reports, which can affect the interpretation of results. With little to no knowledge of how the original input data was transformed, many reports do not allow for interactive verification or sensitivity analysis.

    Learn more:

    A Mini-FAQ on Combining MDM and Big Data

    Picks for Best Data Success Stories from 2013

    Latest Articles