Using clouds for large data parallel processing offers cost and maintenance advantages. On another hand, GPGPU is often required as the best solution for optimization, especially when machine learning is involved. Amazon AWS provides a small but scalable GPU instance. Tightly coupled with NVIDIA Grid solution, it allows CUDA implementation of their data processing model.
Large scale data consuming companies work hand in hand with Amazon to develop and enhance cloud solution under the so-called theme “Big Data”. So when NetFlix, for example, implements it’s machine learning set in the quest of a customer pattern, Amazon provides a full support and on-demand hardware solution. But not everyone is a big company such as Netflix, and even in this example, the prototype was first implemented out of the cloud. So why would you need a cloud? Everything depends on if your data is already in it or if it should all be handled by the clouds. For a few Terabytes of data, you might ask yourself this question.
We might want to confront desktop custom solutions and its cloud counterpart.
(The following information is current of time of writing in May 2014. Pricing and specifications are subject to change.)
The AWS G2.2xlarge instance offers a 1536 CUDA core GPU with 4Gb memory. This, on the top of a 8 vCPU (Xeon E5-2670) and 15Gb Memory Linux AMI. The GPU specification corresponds to a GeForce GTX 680 with 4Gb VRAM. This video card is sold for $600 USD (¥61,000). As a comparison, 1 full month of use of the G2 instance costs $668 USD (¥70,000). So each month costs the equivalent to another card added to a custom desktop model.
One more point we may underline is that the AWS GPU instance doesn’t come with any documentation. The AWS is based on a high level of virtualization. Nvidia made it understood that it’s Grid system requires vGDA for direct access to the GPU card under virtual environment. This let us think that virtualization is handled by VMWare vGDA. Yet at the time of this writing, RedHat KVM virtualization GPU direct access is still in beta distribution of their 7th enterprise edition. Everything leads us to believe that vDGA is the default implementation of the GPU access. It allows 1 user per GPU. In this case you may wonder how you can prototype multi-GPU memory card sharing or cluster based GPGPU. Without full support from Amazon engineers, it seems daunting.
Here’s a block diagram of the NVIDIA GRID GPU in the g2 instance:
In conclusion we could say that cloud GPGPU is still in beta and is highly reserved for the major big players advertised in the product page of Amazon. Then what is the target business of such a cloud technology? Parallel computing on GPU is based on a highly customized set of hardware/software, so porting it to the cloud is an overhead of work and complexity which needs to be balanced in the budget. But as a young technology this complexity will surely disappear in the coming months. from many points of view, it represents the future.