Amazon Web Services Deep Learning on AWS
Page 7
implementations. Challenges arise based on the complexity of most neural networks,
the high dimensionality of the dataset, and lastly the scale of the infrastructure needed
to train large models with a lot of training data. To accommodate these challenges, you
need elasticity and performance in your compute and storage infrastructure.
On AWS, you can choose to build your neural net from the ground up with the AWS
Deep Learning Amazon Machine Image (AWS DL AMI) which comes preconfigured with
TensorFlow, PyTorch, Apache MXNet, Chainer, Microsoft Cognitive Toolkit, Gluon,
Horovod, and Keras, enabling you to quickly deploy and run any of these frameworks
and tools at scale. Additionally, you can choose to use the preconfigured AWS Deep
Learning Containers (AWS DL Containers) preinstalled with deep learning frameworks
supporting TensorFlow and Apache MXNet and run them on Amazon Elastic
Kubernetes Service (Amazon EKS), self-managed Kubernetes, Amazon Elastic
Container Service (Amazon ECS), or directly on Amazon Elastic Compute Cloud
(Amazon EC2). Lastly, you can take advantage of the AWS SDK for Python. This SDK
provides open source APIs and containers to train and deploy models in Amazon
SageMaker with several different machine learning and deep learning frameworks. We
will discuss the most common solutions and patterns using these services in the second
half of this paper.
Step 4. Train, Retrain, and Tune the Models
Training neural networks is different from traditional machine learning implementations
because the model needs to learn the mapping function from the inputs to the outputs
via function approximation in a nonconvex error space with many "good" solutions.
Since we can't directly compute the optimal set of weights via a closed form solution (as
is the case with simple linear regression models), and we cannot get global
convergence guarantees, training a neural network can be challenging and usually
requires much more data and compute resources than other machine learning
algorithms.
AWS provides a variety of tools and services to simplify the training process of your
neural networks. Throughout this paper, we will discuss a variety of options that
includes running your self-managed deep learning environment on Amazon EC2;
running a deep learning environment on Amazon EKS or Amazon ECS; or using fully
managed service Amazon SageMaker for deep learning. All these environment uses
highly customized GPU powered hardware to reduce training time and training cost.
In addition to the model design discussed in Step 2. Choose and Optimize Your
Algorithm, you also have the option of setting hyperparameters before starting the