NORDICS20 - Assets

Deep Learning on AWS

Amazon Web Services Resources EMEA

Issue link:

Contents of this Issue


Page 25 of 50

Amazon Web Services Deep Learning on AWS Page 21 Placement Groups A placement group is an AWS solution to reduce latency between Amazon EC2 instances. It is a mechanism to group instances running in the same Availability Zone to be placed as close as possible to reduce latency and improve throughput. Elastic Fabric Adapter Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run high performance computing (HPC) applications requiring high levels of inter-instance communications, like deep learning at scale on AWS. It uses a custom-built operating system bypass technique to enhance the performance of inter-instance communications, which is critical to scaling HPC applications. With EFA, HPC applications using popular HPC technologies like Message Passing Interface (MPI) can scale to thousands of CPU cores. EFA supports open standard libfabric APIs, so applications that use a supported MPI library can be migrated to AWS with little or no modification. EFA is available as an optional EC2 networking feature that you can enable on C5n.18xl and P3dn.24xl instances at no additional cost. You can use Open MPI 3.1.3 (or later) or NCCL (2.3.8 or later) plus the OFI driver for NCCL. The instances can use EFA to communicate within a VPC subnet, and the security group must have ingress and egress rules that allow all traffic within the security group to flow. Each instance can have a single EFA, which can be attached when an instance is started or while it is stopped. Amazon Elastic Inference Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%. Currently, Amazon Elastic Inference supports TensorFlow, Apache MXNet, and ONNX models, with more frameworks coming soon. To use any other deep learning framework, export your model by using ONNX, and then import your model into MXNet. You can then use your model with Amazon Elastic Inference as an MXNet model. Amazon Elastic Inference is designed to be used with AWS enhanced versions of TensorFlow serving or Apache MXNet. These enhanced versions of the frameworks are automatically built into containers when you use the Amazon SageMaker Python SDK, or you can download them as binary files and import them into your own Docker containers.

Articles in this issue

Links on this page

view archives of NORDICS20 - Assets - Deep Learning on AWS