Image Instance Segmentation

Multi-object instance segmentation and classification using Mask R-CNN (Inception V2 backbone) with GAN-based mask refinement for precise object silhouette extraction.

🔍 Overview

Standard semantic segmentation assigns a class label to each pixel, but cannot distinguish between separate instances of the same class. Instance segmentation provides both class labels and unique instance masks for every object in a scene — critical for robotics, autonomous driving, medical image analysis, and augmented reality.

This project implements a full instance segmentation pipeline:

Detection + segmentation: Mask R-CNN with Inception V2 backbone, pre-trained on MS-COCO 80 classes
Mask refinement: GAN-based approach to progressively improve mask fidelity and eliminate background clutter
Inference via OpenCV DNN: No TensorFlow runtime dependency for deployment — uses the frozen inference graph via cv2.dnn

🏗️ Pipeline Architecture

flowchart TD
    A[Input Image / Video Frame] --> B[OpenCV DNN Module\ncv2.dnn.readNetFromTensorflow\nfrozen_inference_graph.pb]
    
    B --> C[Mask R-CNN\nInception V2 Backbone]
    
    C --> D[Detection Branch\nBounding Boxes\nClass Labels\nConfidence Scores]
    C --> E[Segmentation Branch\n28×28 Binary Masks\nPer Instance]

    D --> F[Non-Maximum Suppression\nConfidence threshold 0.5\nNMS IoU threshold 0.4]
    E --> F

    F --> G[Resize Masks to\nBounding Box Dimensions]
    G --> H[Apply Binary Mask\nto Image Region-of-Interest]
    
    H --> I[GAN Mask Refinement\nProgressively improve\nsilhouette fidelity]
    I --> J[Clean Instance Masks\nBackground-free objects]

    D --> K[Annotated Output\nClass label + confidence\nColored instance overlays]
    J --> K

📊 Supported Classes (MS-COCO 80)

The model detects and segments 80 object categories including:

person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase sports ball bottle wine glass cup fork knife spoon bowl banana apple sandwich pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush ...

🚀 Installation

git clone https://github.com/ashish-code/Image-Instance-Segmentation.git
cd Image-Instance-Segmentation
pip install opencv-contrib-python numpy

Download the Mask R-CNN frozen model:

# Download from TF Model Zoo or ModelZoo.co
# Place frozen_inference_graph.pb in the models/ directory
wget -O models/frozen_inference_graph.pb \
  https://modelzoo.co/model/mask-r-cnn-inception-v2

💻 Usage

Single Image Inference

import cv2
import numpy as np

def load_model(model_path: str, config_path: str = None):
    """Load Mask R-CNN from frozen TF graph using OpenCV DNN."""
    net = cv2.dnn.readNetFromTensorflow(model_path)
    return net

def run_instance_segmentation(
    image_path: str,
    model_path: str = "models/frozen_inference_graph.pb",
    confidence_threshold: float = 0.5,
    nms_threshold: float = 0.4
):
    """
    Run Mask R-CNN instance segmentation on a single image.
    
    Returns detected instances with bounding boxes, class labels, 
    confidence scores, and binary segmentation masks.
    """
    # Load image
    image = cv2.imread(image_path)
    H, W = image.shape[:2]
    
    # Load class names (MS-COCO 80 classes)
    with open("models/mscoco_labels.txt") as f:
        class_names = [line.strip() for line in f.readlines()]
    
    # Load model
    net = load_model(model_path)
    
    # Prepare input blob
    blob = cv2.dnn.blobFromImage(
        image, swapRB=True, crop=False,
        size=(W, H), mean=(0, 0, 0)
    )
    net.setInput(blob)
    
    # Forward pass: get boxes and masks
    boxes, masks = net.forward(["detection_out_final", "detection_masks"])
    
    # Parse detections
    num_detections = int(boxes.shape[2])
    instances = []
    
    for i in range(num_detections):
        score = boxes[0, 0, i, 2]
        if score < confidence_threshold:
            continue
        
        class_id = int(boxes[0, 0, i, 1])
        x1 = int(boxes[0, 0, i, 3] * W)
        y1 = int(boxes[0, 0, i, 4] * H)
        x2 = int(boxes[0, 0, i, 5] * W)
        y2 = int(boxes[0, 0, i, 6] * H)
        
        # Extract and resize binary mask
        mask = masks[i, class_id]
        mask = cv2.resize(mask, (x2 - x1, y2 - y1))
        mask = (mask > 0.5).astype(np.uint8)
        
        instances.append({
            "class_id": class_id,
            "class_name": class_names[class_id],
            "confidence": float(score),
            "bbox": (x1, y1, x2, y2),
            "mask": mask
        })
    
    return image, instances

# Run on sample image
image, instances = run_instance_segmentation(
    "samples/street_scene.jpg",
    confidence_threshold=0.5
)

print(f"Detected {len(instances)} instances:")
for inst in instances:
    print(f"  {inst['class_name']}: {inst['confidence']:.2f} @ {inst['bbox']}")

Video Stream Processing

import cv2
from segmentation import run_instance_segmentation, draw_instances

cap = cv2.VideoCapture(0)  # or video file path
net = load_model("models/frozen_inference_graph.pb")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    _, instances = run_instance_segmentation(frame, net=net)
    annotated = draw_instances(frame, instances, alpha=0.5)
    
    cv2.imshow("Instance Segmentation", annotated)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

⚙️ Configuration

Parameter	Default	Description
`confidence_threshold`	0.5	Minimum detection confidence to retain
`nms_threshold`	0.4	IoU threshold for Non-Maximum Suppression
`mask_threshold`	0.5	Pixel probability threshold for binary mask
`model_path`	`models/frozen_inference_graph.pb`	Path to frozen TF graph

📚 References

He, K. et al. (2017). Mask R-CNN. ICCV.
Szegedy, C. et al. (2016). Rethinking the Inception Architecture for Computer Vision. CVPR (Inception V2).
Lin, T.Y. et al. (2014). Microsoft COCO: Common Objects in Context. ECCV.

📄 License

MIT License — see LICENSE for details.

_{Built by Ashish Gupta · Senior Data Scientist, BrightAI}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
cars.jpg		cars.jpg
cars.mp4		cars.mp4
colors.txt		colors.txt
frozen_inference_graph.pb		frozen_inference_graph.pb
image-result.jpg		image-result.jpg
image_instance_segmentation.py		image_instance_segmentation.py
mask_rcnn_inception_v2_coco_2018_01_28.pbtxt		mask_rcnn_inception_v2_coco_2018_01_28.pbtxt
mscoco_labels.names		mscoco_labels.names
webcam-result.avi		webcam-result.avi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Instance Segmentation

🔍 Overview

🏗️ Pipeline Architecture

📊 Supported Classes (MS-COCO 80)

🚀 Installation

💻 Usage

Single Image Inference

Video Stream Processing

⚙️ Configuration

📚 References

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image Instance Segmentation

🔍 Overview

🏗️ Pipeline Architecture

📊 Supported Classes (MS-COCO 80)

🚀 Installation

💻 Usage

Single Image Inference

Video Stream Processing

⚙️ Configuration

📚 References

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages