Custom Annotation for Roboflow Pre-trained Models with CV2

Roboflow has a great playground of models for computer vision, including object detection and tracking. Their web interface allows you to create datasets and labeling tools that can help any team label faster and be more collaborative with their online labeling tools.

In addition to the tools for custom labeling, a huge community also publishes datasets and pre-trained models for object detection, classification, segmentation, keypoint detection, and semantic segmentation, among many others.

Today, we will use a model to classify an image for Rock, Paper, or Scissors using the "rock-paper-scissors-sxsw/14" model.

This is the image we will use for our little experiment; feel free to download it and save it as rock.jpg

Open Google Colab

Go to google colab and create a new notebook. Put the rock.jpg at the root directory.

Loading Libraries

To use Roboflow's API, we need to create an account and get the API key. The following code will install the inference library.#!pip install inference-sdk

!pip install inference-sdk

Now, let's load the required libraries. We will use inference_sdk to call the pre-trained model from Roboflow. cv2 will be used to load the rock.jpg image and create the bounding box, and matplotlib will be used to display the image in the lab notebook.

import inference_sdk
import cv2
import matplotlib.pyplot as plt

Call Roboflows Inference API

This code is provided by Roboflow as part of the snippets they offer for each model. This one uses the InferenceHTTPClient to call the model hosted on Roboflow remotely. The client.infer method will stream the image to the Roboflow API and return a json with the bounding box and the classification and confidence score information.

from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
    api_url="https://detect.roboflow.com",
    api_key="PUT-YOUR-API-KEY-HERE"
)

result = CLIENT.infer('rock.jpg', model_id="rock-paper-scissors-sxsw/14")

result

The result object is a json that contains the following structure:

{'time': 0.04682349800009433,
 'image': {'width': 1024, 'height': 768},
 'predictions': [{'x': 552.5,
   'y': 383.5,
   'width': 385.0,
   'height': 371.0,
   'confidence': 0.5471271276473999,
   'class': 'Rock',
   'class_id': 1,
   'detection_id': '90730496-ac1c-4f26-af09-b03b33409047'}]}

Custom Annotation with CV2

With the JSON structure, we will use cv2 (OpenCV) to create a rectangle object as the bounding-box and a text object to show the class label and the confidence score.

# Load the image
image = cv2.imread("rock.jpg")

# Get image dimensions
image_width = image.shape[1]
image_height = image.shape[0]

# Get prediction data from JSON
x = result['predictions'][0]['x']
y = result['predictions'][0]['y']
width = result['predictions'][0]['width']
height = result['predictions'][0]['height']

# Calculate coordinates for the bounding box
x_min = int(x - width / 2)
y_min = int(y - height / 2)
x_max = int(x + width / 2)
y_max = int(y + height / 2)

# Draw bounding box on the image
color = (0, 255, 0)  # Green color for the bounding box
thickness = 4
image_with_box = cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color, thickness)

# draw label
text = result['predictions'][0]['class'] + ' ' + str(round(result['predictions'][0]['confidence'],2))
position = (x_min, y_min-10)  # Bottom-left corner of the text (x, y)
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 1
color = (255, 0, 0)  # Blue color in BGR
thickness = 2
line_type = cv2.LINE_AA

# Add text to the image
cv2.putText(image, text, position, font, font_scale, color, thickness, line_type)

# Display the annotated image
plt.imshow(image_with_box)
plt.show()

This code translates the coordinates provided by the Roboboflow model to something cv2 can understand. The resultant image is shown below.

Summary

The most complex part is showing the annotation. Roboflow's pre-trained models and tools make it easy to create custom datasets and models trained online with very little coding and very quickly. Roboflow offers a free account with limited functionalities (enough to run this example!). Other plans for computer vision tasks start at $250 USD per month.

So what If I want to do this myself with no Roboflow help? well, you will need to look for an annotation tool (like the COCO Annotator or the Ultralytics Annotator in case you are using one of the YOLO flavors) and an object detection framework such as YOLOv9 to train your models with GPUs from one of your favorite cloud providers such as AWS, Google Cloud or Azure).