NORDICS20 - Assets

Deep Learning on AWS

Amazon Web Services Resources EMEA

Issue link:

Contents of this Issue


Page 20 of 50

Amazon Web Services Deep Learning on AWS Page 16 Amazon EC2 P3 instances provide a powerful platform for deep learning by leveraging 64 vCPUs using the custom Intel Xeon E5 processors, 488 GB of RAM, and up to 25 Gbps of aggregate network bandwidth leveraging Elastic Network Adapter (ENA) technology. We will discuss ENA is detail in the later sections. GPUs are faster than CPUs and can saturate the network and CPUs during the training job. The size of network pipe and number of vCPUs on a training instance can become a bottleneck and may limit you from achieving higher utilization of GPUs. Amazon EC2 P3dn.24xlarge GPU instances, the latest addition to the P3 instance family, have up to 4x the network bandwidth of P3.16xlarge instances and are purpose- built to address the aforementioned limitation. The above enhancements to Amazon EC2 P3 instances not only optimize performance on a single instance but also reduce the time to train deep learning models. This is accomplished by scaling out the individual jobs across several instances that leverage up to 100 Gbps of network throughput between training instances. AWS is the first cloud provider to deliver 100 Gbps of networking throughput, which helps remove data transfer bottlenecks and optimizes the utilization of GPUs to provide maximum instance performance. The doubling of GPU memory from 16 GB to 32 GB per GPU provides the flexibility to train more advanced and larger machine learning models as well as process larger batches of data, such as 4k images for image classification and object detection systems. For a comparison of P3 instance configurations and pricing information, see Amazon EC2 P3 Instance Product Details. AWS Inferentia Making predictions using a trained machine learning model–a process called inference– can drive as much as 90% of the compute costs of the application. Inference is where the value of ML is delivered. This is where speech is recognized, text is translated, object recognition in video occurs, manufacturing defects are found, and cars are driven. Amazon Elastic Inference solves these problems by allowing you to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or Amazon SageMaker instance type with no code changes. With Amazon Elastic Inference, you can now choose the instance type that is best suited to the overall CPU and memory needs of your application, and then separately configure the amount of inference

Articles in this issue

Links on this page

view archives of NORDICS20 - Assets - Deep Learning on AWS