Classifying 3D Shapes using Keras on FloydHub

(Note: I originally wrote this article in June 2017. At the time, FloydHub’s introductory offer was 100 hours of free GPU time. I added FloydHub to the article so that people without access to GPUs could run the experiment. But since their new free tier only includes CPU time, you’ll now need to find access to a GPU on your own.)

In this tutorial, we’ll train a 3D convolutional neural network (CNN) to classify CAD models of real-world objects.

We’ll be implementing our neural network using Keras. If you’ve never used it before, read “30 seconds to Keras” first to get a feel for the syntax and style of the API.

When we finish our Keras code, we’ll run it on FloydHub, an up-and-coming platform for running deep learning experiments in the cloud.

About the data

We’ll be using ModelNet10, from the Princeton Vision and Robotics Group. ModelNet10 contains 4,899 CAD models across 10 categories of common household objects: bathtubs, dressers, toilets, etc. Each example is given as a OFF file, but the standard approach is to discretize each CAD model into a 30 x 30 x 30 grid of voxels, or volumetric pixels—think Minecraft.

The data is already split into a training set (of size 3,991) and a test set (of size 908).

The ModelNet website showcases a leaderboard of current best results on ModelNet10. We’ll be using the neural network architecture presented in this paper by Xu and Todorovic. Their reported result: 88% classification accuracy.

We’ll first need to download the data and voxelize it. If you’d like to skip to training the neural net, you can download the data here (2 MB).

Data prep

The Xu and Todorovic paper describes how we should discretize the ModelNet10 data:

Each shape is represented as a set of binary indicators corresponding to 3D voxels of a uniform 3D grid centered on the shape. The indicators take value 1 if the corresponding 3D voxels are occupied by the 3D shape; and 0, otherwise. Hence, each 3D shape is represented by a binary three-dimensional tensor. The grid size is set to 30 × 30 × 30 voxels. The shape size is normalized such that a cube of 24 × 24 × 24 voxels fully contains the shape, and the remaining empty voxels serve for padding in all directions around the shape.

Download

We’ll start by grabbing the data:

# About 550 MB compressed, 2.2 GB uncompressed
wget http://3dshapenets.cs.princeton.edu/ModelNet10.zip
unzip ModelNet10.zip

Take a look around! The directory structure should be self-explanatory.

Voxelize

Next, we’ll use binvox to voxelize the CAD models. Download the binvox executable for your OS and put it somewhere in your PATH. Then:

for f in ModelNet10/*/*/*.off; do binvox -d 24 -cb $f; done

(Note: If you’re running binvox on a headless server, you may need to fake a display buffer using Xvfb. See the binvox website for details.)

The -d 24 tells binvox to output a 24 x 24 x 24 voxel grid. The -cb tells it to center the shape in the grid.

Give this command some time to run. You may see pop-up windows flicker into and out of existence. When it’s done, you should have 4,899 .binvox files:

$ find ModelNet10/ -type f -name '*.binvox' | wc -l
4899

Aggregate

Our last step is to aggregate the 4,899 data files into a single NumPy archive, for ease of access. We’ll also pad the data using np.pad so that each example is 30 x 30 x 30.

We’ll use binvox_rw to read the .binvox files as NumPy arrays. Download binvox_rw.py. Then, in a Python file:

import os

import numpy as np

import binvox_rw

ROOT = 'ModelNet10'
CLASSES = ['bathtub', 'bed', 'chair', 'desk', 'dresser',
           'monitor', 'night_stand', 'sofa', 'table', 'toilet']

# We'll put the data into these arrays
X = {'train': [], 'test': []}
y = {'train': [], 'test': []}

# Iterate over the classes and train/test directories
for label, cl in enumerate(CLASSES):
    for split in ['train', 'test']:
        examples_dir = os.path.join('.', ROOT, cl, split)
        for example in os.listdir(examples_dir):
            if 'binvox' in example:  # Ignore OFF files
                with open(os.path.join(examples_dir, example), 'rb') as file:
                    data = np.int32(binvox_rw.read_as_3d_array(file).data)
                    padded_data = np.pad(data, 3, 'constant')
                    X[split].append(padded_data)
                    y[split].append(label)

# Save to a NumPy archive called "modelnet10.npz"
np.savez_compressed('modelnet10.npz',
                    X_train=X['train'],
                    X_test=X['test'],
                    y_train=y['train'],
                    y_test=y['test'])

Graph construction and training

In our training script, we begin by reading and shuffling the data:

import numpy as np
from sklearn.utils import shuffle

data = np.load('modelnet10.npz')
X, Y = shuffle(data['X_train'], data['y_train'])
X_test, Y_test = shuffle(data['X_test'], data['y_test'])

We’ll be doing 10-way classification, so our neural network will end with a 10-way softmax. So we need to one-hot encode our targets, which are currently integers between 0 and 9. Keras comes with a nice one-liner for doing this:

import keras

Y = keras.utils.to_categorical(Y, 10)

The Xu-Todorovic architecture is

Our best found model consists of three convolutional layers and one fully-connected layer. Our first layer has 16 filters of size 6 and stride2 [sic]; the second layer has 64 filters of size 5 and stride 2; the third layer has 64 filters of size 5 and stride 2; the last fully-connected layer has C hidden units [where C is the number of classes].

which translates into

from keras.models import Sequential
from keras.layers import Dense, Flatten, Reshape
from keras.layers.convolutional import Conv3D

model = Sequential()
model.add(Reshape((30, 30, 30, 1), input_shape=(30, 30, 30)))
model.add(Conv3D(16, 6, strides=2, activation='relu'))
model.add(Conv3D(64, 5, strides=2, activation='relu'))
model.add(Conv3D(64, 5, strides=2, activation='relu'))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

Then we compile our graph and start training:

model.compile(loss='categorical_crossentropy',
              optimizer=keras.optimizers.Adam(lr=0.001),
              metrics=['accuracy'])
model.fit(X, Y, batch_size=256, epochs=30, verbose=2,
          validation_split=0.2, shuffle=True)

To evaluate our model on the test set, we compute the most likely class for each example using the softmax outputs:

from sklearn.metrics import accuracy_score

Y_test_pred = np.argmax(model.predict(X_test), axis=1)
print('Test accuracy: {:.3f}'.format(accuracy_score(Y_test, Y_test_pred)))

Because the classes are inbalanced, the standard accuracy metric for ModelNet10 is average per-class accuracy. We can compute this from the confusion matrix:

from sklearn.metrics import confusion_matrix

conf = confusion_matrix(Y_test, Y_test_pred)
avg_per_class_acc = np.mean(np.diagonal(conf) / np.sum(conf, axis=1))

print('Confusion matrix:\n{}'.format(conf))
print('Average per-class accuracy: {:.3f}'.format(avg_per_class_acc))

The complete code is here. Try it out! You should get around 86% average per-class accuracy.

Training on FloydHub

FloydHub is YC-backed startup specializing in ML infrastructure and deployment. A self-proclaimed “Heroku for deep learning”, FloydHub lets you launch jobs from the command-line and then monitor them through a web UI.

To start, create a FloydHub account, install the CLI, and log into the CLI.

To upload the ModelNet data to FloydHub, create a new dataset called modelnet on the FloydHub website. Move the modelnet10.npz file into its own directory. (If you’re using the reference code, this has already been done for you: you’ll find the file under data/.) Then cd into the directory and run the following:

floyd data init modelnet
floyd data upload

Run floyd data status to check that the dataset was uploaded successfully.

Next, create a new project called classify-modelnet. Move the training script (main.py) into its own directory (src/). Then cd into the directory and run

floyd init classify-modelnet
floyd run --env keras --gpu --data [dataset-name] \
    "python main.py /input/modelnet10.npz /output"

Be sure to replace [dataset-name] with the full name of the ModelNet dataset. It should look something like [username]/datasets/modelnet/1. (You can’t just use the short name, modelnet.)

Your job should now be either queued or running. After it starts running, you can view the output using floyd logs -t [job-name]. You can also monitor the job at the FloydHub website.

When the job finishes, you should see the confusion matrix and average per-class accuracy for the test set:

2017-05-26 05:27:18,497 INFO - 
2017-05-26 05:27:19,199 INFO - Test accuracy: 0.879
2017-05-26 05:27:19,200 INFO - Confusion matrix:
2017-05-26 05:27:19,200 INFO - [[36  7  0  0  0  0  0  5  2  0]
2017-05-26 05:27:19,201 INFO - [ 1 99  0  0  0  0  0  0  0  0]
2017-05-26 05:27:19,201 INFO - [ 0  3 96  0  0  0  0  0  1  0]
2017-05-26 05:27:19,202 INFO - [ 2  0  1 60  3  1  4  4 11  0]
2017-05-26 05:27:19,202 INFO - [ 0  1  1  0 78  2  4  0  0  0]
2017-05-26 05:27:19,203 INFO - [ 0  0  2  1  1 95  1  0  0  0]
2017-05-26 05:27:19,203 INFO - [ 0  1  0  1 22  0 56  0  5  1]
2017-05-26 05:27:19,204 INFO - [ 0  0  0  0  1  0  1 98  0  0]
2017-05-26 05:27:19,204 INFO - [ 0  1  0 16  0  0  1  0 82  0]
2017-05-26 05:27:19,205 INFO - [ 0  0  2  0  0  0  0  0  0 98]]
2017-05-26 05:27:19,205 INFO - Average per-class accuracy: 0.866
2017-05-26 05:27:19,547 INFO -

If your job seems to be stuck in the queue, try killing the job with floyd stop and then rerunning it.