How to Setup Training for a Sparse Neural Radiance Grid, on AWS EC 2

Zain Raza
7 min readApr 5, 2022

--

ASCII art showing a camera icon.
Photo by Alexander Sinn on Unsplash

An EC 2 Setup Guide

Sparse Neural Radiance Grids” (abbreviated “SNeRG”) [1] are no doubt one of the hottest inventions to come out of last October’s ICCV 2021. And with good reason: it can generate 3D models without actually needing to store the 3D triangle meshes on disk. This makes it a particularly exciting area for 3D reconstruction research. As the project page for the paper shows, it renders the 3D model in the web browser in real-time, with nearly photorealistic graphical quality. You can hear the author of the paper, Peter Hedman, explain:

Source: Peter Hedman, [2].

In just a few years, we might start seeing algorithms all over the place, from how we train self-driving cars to augmented reality applications.

Sooo, the question you might is: how can I start using SNeRG my AI projects?

I personally have been using it in my current role for just about every day this past month, and this blog is the setup guide I wish I had when I started. In it, I am going to share everything you need to setup a training pipeline for SNeRG on AWS EC 2, so you can easily start running your own experiments.

Note: because this blog is specifically aimed towards those new to AWS, I’ll also share bits of info that should be helpful for setting up AI experiments in general on the platform.

Why AWS?

Let’s be real for a moment: SNeRG is a large neural network. For most of you, trying to train it on your own laptop will take you until at least a week (and I am not exaggerating). Who has time for that?

Therefore, let’s set up our training on Amazon Web Services, aka AWS. It’s is a popular cloud computing platform, which means you get to use other people’s computers — for us today, that means using graphical processing units (aka GPUs) — for running your machine learning workloads. While it is not free to use, in my experience it was well worth it to not have to go out and buy all the GPUs that are otherwise needed on my own. The process is fairly straightforward, albeit some steps (which I’ve marked with ***) are more involved than others:

Enter AWS

  1. Sign up at https://aws.amazon.com. You’ll need a credit card ready, although you’re not charged for anything upfront.
  2. Choose EC 2, from under the “Services” dropdown menu (located in the top left): this is where we’ll select the type of computer hardware accelerators (which are also called “Instances” in the AWS documentation) to run our training. Note: AWS has another service called SageMaker to setup machine learning — but for now we’ll focus on EC 2, since it’s cheaper :).
  3. On the next page, click on “Launch Instances” — it’s the big orange button in the top right.

Get Ready to Launch!
You should be on a screen that prompts you to “Choose an Amazon Machine Image (AMI).” It sounds like a mouthful, but all this “image” does is tell the EC 2 instance (that we’ll choose in a later step) exactly how we want to setup our server.

Why this is cool: with an AMI, now we don’t need to manually setup the instance ourselves! Basically, the AMI acts like a list of instructions to the EC 2 instance — it tells it exactly what kinds of environment (which refers to the various software packages, device drivers, and any other dependencies we might need) when training our AI algorithm. Personally, I feel it might be more appropiate to call it a recipe than an image!

  1. AWS offers several, special kinds of AMIs that are specifically for training deep learning models — these Deep Learning AMIs (DLAMIs) are what we’ll use today. Search for “Deep Learning” and then you’ll see several of these options. While any of these could work, in my experience the Deep Learning AMI (Amazon Linux 2) is the best. Note that the the version of this DLAMI I used time of writing this was 60 — but AWS does update fairly often.

***Getting a GPU Instance***

Now, we need to choose the right instance type for the job. Just like a hammer can’t unscrew a nail, we need to choose an instance type that will make it as easy as possible to run our neural network on the DLAMI environment. For the majority of you who are using DLAMIs on AWS for the first time, I would suggest choosing the g3s.xlarge instance type.

But there’s actually one problem we need to fix before we can request a g3s.xlarge, or any other GPU instance: by default, AWS does not allow new users to request it when they first sign up for the service. This is because the g3s.xlarge is, well, a very large, non-free-tier instance — it actually takes up to 4 virtual CPUs to run under the hood. And as a new user you’re only allowed to take up 0. Ideally, this restriction helps new AWS users avoid paying for server time than they actually need, and paying more as a result.

Fortunately, for our purposes we can request a limit increase so you can get your hands dirty with model training:

  1. Head over to this URL (fill in the variables with your own AWS region, e.g. “us-east-1”), to open a new support case in AWS:
https://<your-aws-region>.console.aws.amazon.com/support/home?region=<your-aws-region>&skipRegion=true#/case/create?issueType=service-limit-increase&limitType=service-code-ec2-instances&serviceLimitIncreaseType=ec2-instances&type=service_limit_increase

2. Be sure to fill out the form accurately — for “Primary Instance Type”, you can select “All G instances.” The description doesn’t need to be long (but keep in mind, real people answer these requests).

You may need to wait a day or two for someone get back to you. This is hard, but try to be patient — believe me, I’m sure someday in 5 years we’ll all look back to today and laugh about how long this process used to be :).

Waiting is a also good opportunity to catch up on sleep. 😴 Photo by Jordan Whitt on Unsplash.

3. Once you’re request is approved, we’re ready to roll! Follow the steps above to select the DLAMI again, whichever one you’ll be using for this instance. On the next page, you should see a LONG list of instance types. The g3s.xlarge is pretty far down in the list, so I would recommend using a Cmd+F search (or Ctrl+F on Windows) to jump down to the row that has it, and then click the checkbox at the very left.

4. For now we don’t need to get too caught up in additional configuration details (but you can watch this video by Jeff Heaton where he shares plenty of useful advice on that if you’re interested). Otherwise, you can click the blue “Review and Launch” button at the bottom, and (at last) turn on the instance!

5. The final step is to actually login (or “connect”) to your EC 2 instance. Obviously we can’t go to an AWS data center to do this in-person — instead, we’ll be using an SSH tunnel. For those of you on macOS, I’d recommend following along with this video by Pythoholic to learn how to do so. For those of you on Windows, the Jeff Heaton video I linked above should provide a great explanation.

Your First SNeRG

If I’ve done my job well, you should now have sufficient computing power needed to train powerful neural networks like SNeRG in just a few hours.

To run such an experiment though, we still need to do a few steps inside the actual EC 2 instance:

  1. Clone the source code for the SNeRG algorithm
  2. Configure your environment (in terms of install Python dependencies, setting environment variables, etc.)
  3. Pull in a dataset of training/test images
  4. Running the train.py script (which the author of SNeRG was kind enough to include in the GitHub repo!)

This process can get fairly involved — so as an example of how to do it, I’ve provided the following commands for you to run inside your EC 2 instance. This will use the run SNeRG on the demo dataset, provided by the SNeRG authors:

Run the following commands inside your EC 2 instance, to try out your first SNeRG experiment!

Please see the the README file of the SNeRG GitHub repo if you’re curious about how this source code actually functions. There’s a lot in there, so I personally would say it’s a good way to get exposure to more advanced Python syntax and computer graphics concepts :)

Last but not least — to keep your costs down, please be sure to switch your EC 2 instance to the “Stop” state when you’re done playing around with it!

Part 1 Conclusion

Booyah! You’ve just taken another step to becoming an AI scientist. To summarize, if you understood this content well you now have the tools to:

  1. Setup GPU instances on AWS EC 2, and with an appropiate DLAMI
  2. Connect to a EC 2 instance remotely, using an SSH connection
  3. The sequence of commands you can run to train SNeRG

If I’ve piqued your curiosity, feel free to leave a comment on this post and I can answer any questions you have, or make a follow up blog on anything you want to see expanded on, e.g. how to setup a SNeRG experiment with your own custom dataset, logging metrics in TensorBoard, etc. Otherwise, happy SNeRG’in!

References

[1] P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec, “Baking Neural Radiance Fields for Real-Time View Synthesis,” ArXiv210314645 Cs, Mar. 2021, Accessed: Feb. 17, 2022. [Online]. Available: http://arxiv.org/abs/2103.14645.

[2] P. Hedman, “Baking Neural Radiance Fields for Real-Time View Synthesis,Project page for Baking Neural Radiance Fields for Real-Time View Synthesis., 2021. https://nerf.live (accessed Apr. 06, 2022).

--

--

Zain Raza

Software engineer interested in A.I., and sharing stories on how tech can allow us to be more human.