Object Detection and Classification Approaches

Literature Review and Comparative Study

Object detection and classification are fundamental tasks in computer vision, used in applications like facial recognition, autonomous driving, medical imaging, and defense systems.
Object Detection: Identifies objects in images, providing bounding boxes.
Classification: Assigns labels to detected objects.

Modern approaches leverage neural networks for improved accuracy and scalability.
Single-Shot Detection (SSD) vs Dual-Shot Detection.
YOLO (You Only Look Once): Ideal for high-speed detection, though less effective for smaller objects.
Region-Based CNNs (R-CNN):High accuracy but slower. Faster R-CNN introduces region proposal networks for better speed.
Mask R-CNN: Adds segmentation capabilities, enhancing shape analysis at a higher computational cost.

SSS images present unique challenges due to noise, low resolution, and texture variability.
Challenges: High noise levels, shape and texture similarities, and limited annotated data.

Real-Time Performance: YOLO excels in speed, suitable for real-time detection, while Faster R-CNN is better for accuracy in complex scenes.
Multi-Object Detection: Faster R-CNN handle cluttered scenes effectively.
Noise Adaptability: Domain adaptation and fine tuning techniques are essential for SSS, addressing noise and data scarcity.

Classification Metrics:
- Accuracy:Proportion of correct predictions.
- Precision & Recall: Balance false positives and false negatives.
- F1 Score: Harmonic mean of precision and recall.
Object Detection Metrics:
- IoU (Intersection over Union): Measures overlap between predicted and true bounding boxes.
- mAP (Mean Average Precision): Precision averaged across classes and IoU thresholds.
- Detection Speed: Frames per second (FPS), crucial for real-time tasks.

The dataset contains 1170 side-scan sonar images collected using a 900–1800 kHz Marine Sonic dual frequency side-scan sonar of a Teledyne Marine Gavia Autonomous Underwater Vehicle (AUV) .
All the images were carefully analyzed and annotated, including the image coordinates of the Bounding Box (BB) of the detected objects divided into NOn-Mine-like BOttom Objects (NOMBO) and MIne-Like COntacts (MILCO) classes

Date	Images	MILCO	NOMBO
2010	345	22	12
2015	120	238	175
2017	93	28	2
2018	564	95	46
2021	48	49	0

Table 1. Summary of the dataset.

Non mine image

Mine image

Initial experiments on sonar datasets reveal key insights:
Dataset Details: 825 sonar images split into 80% training and 20% testing.
Classification Results:
- Before Training: Accuracy of 0.27, indicating poor performance.
- After Training: Improved accuracy (0.91) with a ResNet-based classifier.
Detection Challenges: Pretrained YOLOv8 struggled with noisy SSS data, emphasizing the need for domain adaptation and fine-tuning.

Future efforts aim to address current limitations:
Enhanced Datasets: Include diverse, annotated sonar images to improve training.
Synthetic Data Generation:Augment datasets with realistic synthetic images.
Advanced Models: Explore fine-tuned YOLO and R-CNN variants for SSS data.
These steps will enhance model reliability and accuracy in real-world applications.

THANK YOU