Cloudera, Inc. has announced the general availability of the Cloudera Data Science Workbench, its self-service tool for data scientists.
“We are entering the golden age of machine learning and it’s all about the data. However, data scientists continue to struggle to build and test new analytics projects as fast as they would like, particularly in large scale environments,” said Charles Zedlewski, senior vice president, Products at Cloudera. “The Data Science Workbench is a self-service tool that accelerates the ability to build, scale and deploy machine learning solutions using the most powerful technologies. This means that data scientists now have the freedom to share, collaborate and manage their data in a way that best suits them and their enterprise, resulting in an easier and faster path to production.”
With Python, R, and Scala directly in the web browser, Cloudera Data Science Workbench delivers a self-service data science experience. It gives users the ability to download and experiment with the latest libraries and frameworks in customizable project environments. Cloudera Data Science Workbench is both secure and compliant, with support for Hadoop authentication, authorization, encryption, and governance.
The Office of National Statistics (ONS), the UK's largest independent producer of official statistics, is aiming to use the Cloudera Data Science Workbench to create repeatable, accurate, and transferable statistical research. "We have seen a decreased time in developing models and better visibility in tracking progress and results," says Simon Sandford-Taylor, Chief Technology Officer. "We think that Cloudera Data Science Workbench has the potential to accelerate our release calendar and better share best practices."
Cloudera’s Data Science Workbench integrates with many deep learning frameworks including BigDL, a deep learning library for Apache Spark, open sourced by Intel. Built from the ground-up to run on distributed Spark/Hadoop infrastructure and performance-optimized to run on Intel Xeon processors (leveraging the Intel Math Kernel Library), BigDL works directly within Cloudera’s Data Science Workbench.
“Enterprise customers require a cohesive platform to scale their analytics solutions and maximize their investments. BigDL’s native integration with Apache Spark brings the world of deep learning to the Apache Spark ecosystem and higher value to enterprise customers,” said Michael Greene, vice president and general manager of the System Technologies and Optimization in the Software and Services Group, Intel Corporation. "The BigDL framework will help enterprise customers better utilize existing investments to build their analytics capabilities with optimized performance on Intel architecture."
The benefits of BigDL integration into Data Science Workbench include the ability to leverage deep learning libraries and tactics on CPU architecture without any additional hardware considerations or separate environments. The combination provides a convenient way to create Spark data science pipelines natively and integrate them with deep learning library (BigDL) and other Spark/Hadoop components on the Cloudera Data Science Workbench.