NORDICS20 - Assets

Deep Learning on AWS

Amazon Web Services Resources EMEA

Issue link:

Contents of this Issue


Page 23 of 50

Amazon Web Services Deep Learning on AWS Page 19 libraries, GPU drivers, and system libraries to speed up and scale machine learning on Amazon EC2 instances. The Base AMI comes with the CUDA 9 environment installed by default. However, you can also switch to a CUDA 8 environment using simple one- line commands. AWS Deep Learning Containers AWS provides a broad choice of compute to accelerate deep learning training and inference. Customers can choose to use fully managed services using Amazon SageMaker or decide to use a do-it-yourself (DIY) approach by using Deep Learning AMIs. DIY is a popular option among researchers and applied machine learning practitioners working at the framework level. In the last few years, using Docker containers have become popular because this approach allows deploying custom ML environments that run consistently in multiple environments. Building and testing the Docker container is difficult and error-prone. It takes days to build a Docker container due to software dependencies and version compatibility issues. Further, it requires specialized skills to optimize the Docker container image to scale and distribute machine learning jobs across a cluster of instances. The process is repeated as a new version of software or driver becomes available. With AWS Deep Learning Containers (AWS DL Containers), AWS has extended the DIY offering for advanced ML practitioners and provided the Docker container images for deep learning that are preconfigured with frameworks such as TensorFlow and Apache MXNet. AWS takes care of the undifferentiated heavy lifting that is involved in building and optimizing Docker containers for deep learning. AWS DL Containers are tightly integrated with Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS). You can deploy AWS DL Containers on Amazon ECS and Amazon EKS in a single click and use it to scale and accelerate your machine learning jobs on multiple frameworks. Amazon ECS and Amazon EKS handle all the container orchestration required to deploy and scale the AWS DL Containers on clusters of virtual machines. Today, AWS DL Containers are available for TensorFlow and Apace MXNet. The container images are available for both CPUs and GPUs, for Python 2.7 and 3.6, with Horovod support for distributed training on TensorFlow for Inference and Training.

Articles in this issue

Links on this page

view archives of NORDICS20 - Assets - Deep Learning on AWS