NORDICS20 - Assets

Deep Learning on AWS

Amazon Web Services Resources EMEA

Issue link:

Contents of this Issue


Page 28 of 50

Amazon Web Services Deep Learning on AWS Page 24 Amazon SageMaker Neo for Model Optimization Once you have a trained model, you may want to deploy it in the cloud, at the edge, or on mobile devices. The request for inference has to travel through the client HTTP stack, over the network, through the web server and application server stack to finally make it to the inference endpoint. Considering the latency introduced by all of the above layers, there is a small fraction of time left to compute the inference and serve it back to the client before it starts impacting the user experience. Therefore, it is always desirable to get maximum performance out of the inference endpoint. Improving the performance of an inference endpoint is a complex problem. First, the computation graph of the model is a compute-intensive task. Second, optimizing machine learning models for inference requires tuning for specific hardware and software configuration on which the model is deployed. For optimal performance, you must know the hardware architecture, instruction set, memory access patterns, and input data shapes, among other factors. In the case of traditional software, compilers and profilers handle the tuning. In the case of deep learning model deployment, it becomes a manual trial and error process. Amazon SageMaker Neo can help you eliminate time and effort required to tune the model for specific software and hardware configuration by automatically optimizing TensorFlow, Apache MXNet, PyTorch, ONNX, and XGBoost models for deployment on ARM, Intel, and NVIDIA processors. This list of supported deep learning frameworks, model formats, and chipsets will continue to grow in the future. Amazon SageMaker Neo consists of a compiler and a runtime. First, Amazon SageMaker Neo APIs read models and parse it into a standard format. It converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimization on the model graph. Then, it generates binary code for the optimized operations. Amazon SageMaker Neo also provides a lean runtime for each target platform and source framework that is used to load and execute the compiled model. Last, Amazon SageMaker Neo is also available as open source code as the Neo-AI project under the Apache Software License, enabling you to customize the software for different devices and applications.

Articles in this issue

Links on this page

view archives of NORDICS20 - Assets - Deep Learning on AWS