Table of Contents

Introduction

Deep learning model deployment doesn’t end with the training of a model. Especially with edge devices and the variety of processors, there can be many steps to get a network running on an embedded device. With the OpenCV AI Kit, I have camera modules with a Myriad X chip on the same board. It is a module well-suited for deep learning inference with what should be minimal bottleneck of image data transfer.

My OAK-1

I now have an OAK-1 (well, an earlier run MegaAI module), and I want to familiarize myself with the device. While the docs from the designers of the OAK-1 Luxonis show they have a model zoo, I am more interested in how to use your own networks with the device. Considering its design, the OAK devices are most suited for inference directly on image with minimal (if any) pre-processing. Tasks like keypoint and descriptor extraction (like Superpoint) and object detection (like YOLO) are good candidates for tasks to be taken up on this module. Since YOLO is already supported in the model zoo, I’ve gone through the steps of converting the pretrained pytorch Superpoint network to the Movidius Blob format for the Myriad X. Because not all architectures of networks are the same or even originate from the same framework, this does not illustrate the process of adapting all models. I will discuss some considerations you may need to make, but YMMV!

Also my OAK-1

Conversion Process

My code for performing all these steps can be found in my github repo, but I’ll put snippets here.

Prerequisites and Considerations

Since the Myriad X is the target, you first need to install the Intel OpenVINO Toolkit. At the time of writing, you should use version 2020.1 or 2020.2, especially if you’re planning to use the DepthAI python library to use the OAK device. Assuming one has already has a pretrained model or will be training a model, there some things to be aware of that will affect the amount of work in later steps. You should be aware of the following:

1. Check on layer compatibility for each intermediate step.

Layers may be incompatible or not have an implementatino between different conversion steps. In this work, I removed some incompatible layers in the ONNX conversion step and re-implemented them in numpy. When converting from ONNX to IR, you may face more layers that are not implemented. When you can’t run the full model on the OAK device due to incompatibile layers, you could:

  • Implement the layers for OpenVINO tools
  • Remove the layers from the Myriad X computation and run further computation on the host of OAK device
  • Train a model that doesn’t have such incompatibilities

2. Adjust your model to meet your runtime needs.

The Myriad X is for inference, but it’s still a more limited resource than your desktop or server GPU. Figure out what adjustments you can make (i.e. batch size, input resolution, etc.) and tune these parameters when training or converting models to sufficiently run on the Myriad X

Convert Pytorch Model to ONNX

The first step is to convert the pytorch model to ONNX as an intermediate representation. Intel’s IR models do not have a direct conversion from pytorch, but they do have conversion tooling from ONNX. As mentioned earlier, some layers or operations may not be supported in the conversion to ONNX. While Superpoint is mostly a CNN, the l2-norm is not implemented in ONNX. It’s used as a final operation on the coarse descriptor output, so I just re-implement it and absorb it into the post-processing of the model output that happens in superpoint_frontend.py.

The pytorch library already supports conversion to onnx. Part of the conversion script is reproduced below to show the relevant steps. Of note, the opset_version should be 10 for the supported OpenVINO versions at the time of writing.

import numpy as np
import torch


import network
from superpoint_frontend import reduce_l2

weights_path = 'weights/superpoint_v1.pth'
h = 100  # height of input image in pixels
w = 100  # width of input image in pixels

# Create model object.
pt_model = network.SuperPointNet()
pytorch_total_params = sum(p.numel() for p in pt_model.parameters())
print('Total number of params: ', pytorch_total_params)

# Initialize model with the pretrained weights
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
  map_location = None
pt_model.load_state_dict(torch.load(weights_path, map_location=map_location))
pt_model.eval()

# Create random input and run on model for onnx trace.
x = torch.randn(batch_size, 1, h, w, requires_grad=True)
torch_out = pt_model(x)

# Export the model
onnx_filename = os.path.join(output_dir, "superpoint_{}x{}.onnx".format(h, w))
torch.onnx.export(
  pt_model,               # model being run
  x,                         # model input (or a tuple for multiple inputs)
  onnx_filename,   # where to save the model (can be a file or file-like object)
  export_params=True,        # store the trained parameter weights inside the model file
  opset_version=10,          # the ONNX version to export the model to
  do_constant_folding=True,  # whether to execute constant folding for optimization
  input_names = ['input'],   # the model's input names
  output_names = ['semi', 'desc'], # the model's output names
  )
# Output ONNX model is saved to "output/superpoint_100x100.onnx", expecting an input image of [1,1,100,100]

Convert ONNX Model to Intel Intermediate Representation (IR)

Now that I have an ONNX model, I need to convert it to Intel’s IR. This IR can be used as is if targeting other devices and will serve as the input to a final step of converting to the Movidius Blob to be run on the Myriad X. I will use the OpenVINO model optimizer tools. Assuming OpenVINO is installed in /opt/intel/openvino, the following script would need to be run:

python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_shape [1,1,100,100] --input_model output/superpoint_100x100.onnx --output_dir output --data_type FP16
# FP16 data type is due to the Myriad X expecting weights and layers to be fp16 ops
# Output model file (superpoint_100x100.xml) and weights file (superpoint100x100.bin) to directory "output/"

We have an example script to use this model. We should expect an output like this:

run_openvino.py output

Generate Intel IR to Movidius Blob

Finally, I can now get our model in the Myriad X format.

/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/myriad_compile -m output/superpoint_100x100.xml -o output/superpoint_100x100.blob -ip U8 -VPU_MYRIAD_PLATFORM VPU_MYRIAD_2480 -VPU_NUMBER_OF_SHAVES 4 -VPU_NUMBER_OF_CMX_SLICES 4
# Output model to "output/superpoint_100x100.blob"

You also will need json file called “superpoint_100x100.json” for the OAK device DepthAI api to use the model. This is automated in onnx conversion script. This json file defines the output tensors of the model for parsing (but multiple output tensors may require more involved processing, see run_oak.py for an example).

superpoint_100x100.json

{
  "tensors":
  [
    {       
      "output_tensor_name": "semi",
      "output_dimensions": [1, 65, 12, 12],
      "output_entry_iteration_index": 0,
      "output_properties_dimensions": [0],
      "property_key_mapping": [],
      "output_properties_type": "f16"
    },
    {       
      "output_tensor_name": "desc",
      "output_dimensions": [1, 256, 12, 12],
      "output_entry_iteration_index": 0,
      "output_properties_dimensions": [0],
      "property_key_mapping": [],
      "output_properties_type": "f16"
    },          
  ]
}

I don’t have an example output here because I couldn’t get similar performance that I got out of the IR model. Some things to explore for this particular case might be looking at how OAK-1 does frame pre-processing as I subsample from the full frame and the OAK-1 seems like it’s center cropping instead. Another aspect is that I notice some speed issues, and a smaller trained model would improve that aspect. (Perhaps trying using a smaller CNN and distilling)

Conclusion

I have taken the superpoint pretrained model and adapted it to function on an OAK device. I share some obstacles that may be common in the chain conversion process and provide some suggestions to tackle them. I hope that this tutorial provides a useful blueprint for deploying your own models for the OAK device or other Intel device targets.