Data Engineering, Preparation, and Labeling for AI 2019

Data Engineering, Preparation, and Labeling for AI 2019

Document ID: CGR-DE100 | Last Updated: Jan. 31, 2019

It has always been the case that garbage in is garbage out in computing, but it is especially the case with regards to machine learning data. In this report, Cognilytica evaluates the requirements for data preparation solutions that aim to clean, augment, and otherwise enhance data for machine learning purposes, data engineering solutions that aim to give organizations a way to move and handle large volumes of data, and data labeling solutions that aim to augment data with the required annotations that are necessary to be used in machine learning training models.

$995.00

Abstract: It has always been the case that garbage in is garbage out in computing, but it is especially the case with regards to machine learning data. In this report, Cognilytica evaluates the requirements for data preparation solutions that aim to clean, augment, and otherwise enhance data for machine learning purposes, data engineering solutions that aim to give organizations a way to move and handle large volumes of data, and data labeling solutions that aim to augment data with the required annotations that are necessary to be used in machine learning training models.  
Key Findings:
  • The market for AI and machine learning relevant data preparation solutions is over $500M in 2018 growing to $1.2B by end of 2023.
  • Data preparation and engineering tasks represent over 80% of the time consumed in most AI and Machine Learning projects.
  • The market for third-party Data Labeling solutions is $150M in 2018 growing to over $1B by 2023.
  • For every 1x dollar spent on Third-Party Data Labeling, 5x dollars are spent on internal data labeling efforts, over $750M in 2018, growing to over $2B by end of 2023.
  • For every 1x dollar spent on Third-Party Data Labeling solutions, 2x dollars are spent on internal data efforts to support or enhance those labeling efforts.
  • AI projects relating to object / image recognition, autonomous vehicles, and text and image annotation are the most common workloads for data labeling efforts.
  • Within the next two years, all competitive data preparation tools will have machine learning augmented intelligence as a core part of the offering
  • The human in the loop is not going away any time soon for data labeling and AI quality control.
Premium PDF. Source: Copyright © Cognilytica LLC   Key Vendors Included in this Report:
  • CloudFactory
  • Figure Eight
  • iMerit
  • Melissa Data
  • Paxata
  • Trifacta
Report Details:
  • 24 Pages
  • 14 Charts
Table of Contents
  • Table of Contents
  • Executive Summary 2
    • Key Findings 2
  • Market Overview 3
    • Defining the Problem 3
      • Data Engineering 4
      • Data Preparation 5
        • AI-Relevant Data Preparation Solution Requirements 6
        • Data Preparation Use Cases 6
      • Data Labeling 6
        • Data Labeling Solution Provider Requirements 7
        • Data Labeling Use Cases 9
  • Cognilytica AI Vendor Classification 9
    • About the Cognilytica Vendor Classification System 10
  • Global Market Size for Data Preparation 10
  • Key Vendors 12
    • Key Data Preparation Vendors 12
      • Melissa Data 12
      • Paxata 14
      • Trifacta 16
    • Key Data Labeling Vendors 17
      • CloudFactory 17
      • Figure Eight 19
      • iMerit 20
    • Notes on Vendor Inclusion 22
  • Future Market Trends and Predictions 22
    • Data Preparation Market Predictions and Trends 22
    • Data Labeling Market Predictions and Trends 23
Although Cognilytica believes that the results, conclusions, and analysis produced in support of this report are well informed, comprehensive, and reasonable, Cognilytica cannot guarantee future results, accuracy of market predictions, or applicability of conclusions to report purchaser or reader’s business. Moreover, Cognilytica does not assume responsibility for the accuracy and completeness of such statements. The information derived in this report are statements of opinion only, and Cognilytica shall not be held liable in any manner for any conclusions or actions taken pursuant to this report. The information contained herein has been obtained from sources believed to be reliable. Cognilytica shall have no liability for errors, omissions, or inadequacies in the information contained herein or for interpretations thereof. Report purchaser and/or reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice. Cognilytica does not make open its research methods, underlying data, sources, or means and methods of analysis for inquiry, evaluation, or examination.