User-Guided Instance-Level Data Augmentation and Detection-Aware Optimization Framework

Zhang, Daohu; Fu*, Dongxiang

Innovation Series: Advanced Science (ISSN 2938-9933, CNKI Indexed)

Volume 3 · Issue 3 (2026)

105
views

DOI number:

10.66521/2938-9933-2026032701

User-Guided Instance-Level Data Augmentation and Detection-Aware Optimization Framework

Daohu Zhang, Dongxiang Fu^*

School of Optical Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Corresponding Author: Dongxiang Fu (fudx@usst.edu.cn)

Abstract: Deep learning–based object detection models have achieved significant progress in industrial vision and robotic perception; however, their performance heavily depends on large-scale, high-quality an-notated data. To address the challenges of frequent emergence of new objects and the high cost of annotation, this paper proposes a user-guided instance-level data augmentation and detection-aware optimization framework for extremely few-shot scenarios. The proposed method leverages limited human–computer interaction to guide a segmentation model in extracting target instances and employs a mask quality evaluation mechanism to filter valid samples; meanwhile, semantic-consistency-aware instance-level copy-paste and adaptive illumination enhancement are combined to generate diverse training data. In addition, a detection-aware feedback mechanism utilizes model error information to guide data generation in a closed-loop manner, further improving robustness. Experimental results demonstrate that, under few-shot settings, the proposed method significantly outperforms conventional data augmentation strategies in terms of precision and recall while substantially reducing manual annotation costs, and its engineering feasibility and stability are further validated through real vision-guided robotic arm grasping experiments. Under the COCO 10-shot setting, the proposed method achieves a 15.2% improvement in mAP@50 and exhibits stronger robustness under complex illumination conditions.

Keywords: Object detection; Data augmentation; Few-shot learning; Instance segmentation; YOLOv8; Segment Anything Model

References

[1]

Hussain M. Yolov1 to v8: Unveiling each variant–a comprehensive review of yolo. IEEE access. 2024;12:42816-33.

[2]

Vijayakumar A, Vairavasundaram S. Yolo-based object detection models: A review and its applications. Multimedia Tools and Applications. 2024;83(35):83535-74.

[3]

Wei J, As’arry A, Rezali KAM, Yusoff MZM, Ma H, Zhang K. A Review of YOLO Algorithm and Its Applications in Autonomous Driving Object Detection. IEEE Access. 2025.

[4]

Manakitsa N, Maraslidis GS, Moysis L, Fragulis GF. A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies. 2024;12(2):15.

[5]

Brust C-A, Käding C, Denzler J. Active learning for deep object detection.

[6]

Wu X, Sahoo D, Hoi S. Meta-rcnn: Meta learning for few-shot object detection. Proceedings of the 28th ACM international conference on multimedia2020. p. 1679-87.

[7]

Qiao L, Zhao Y, Li Z, Qiu X, Wu J, Zhang C. Defrcn: Decoupled faster r-cnn for few-shot object detection. Proceedings of the IEEE/CVF international conference on computer vision2021. p. 8681-90.

[8]

Dadboud F, Patel V, Mehta V, Bolic M, Mantegh I. Single-stage uav detection and classification with yolov5: Mosaic data augmentation and panet. 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS): IEEE; 2021. p. 1-8.

[9]

Ghiasi G, Cui Y, Srinivas A, Qian R, Lin T-Y, Cubuk ED, et al. Simple copy-paste is a strong data augmentation method for instance segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition2021. p. 2918-28.

[10]

Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, et al. Segment anything. Proceedings of the IEEE/CVF international conference on computer vision2023. p. 4015-26.

[11]

Liu C, He Y, Zhang X, Wang Y, Dong Z, Hong H. CS-FSDet: A Few-Shot SAR Target Detection Method for Cross-Sensor Scenarios. Remote Sensing. 2025;17(16):2841.

[12]

Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning augmentation strategies from data. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition2019. p. 113-23.

[13]

Guo Q, Wang S, Chang C, Rambach J. CACP: Context-Aware Copy-Paste to Enrich Image Content for Data Augmentation. Proceedings of the Computer Vision and Pattern Recognition Conference2025. p. 5177-86.

[14]

Chen T, Zhu L, Deng C, Cao R, Wang Y, Zhang S, et al. Sam-adapter: Adapting segment anything in underperformed scenes. Proceedings of the IEEE/CVF International Conference on Computer Vision2023. p. 3367-75.

[15]

Sohan M, Sai Ram T, Rami Reddy CV. A review on yolov8 and its advancements. International Conference on Data Intelligence and Cognitive Informatics: Springer; 2024. p. 529-45.

[16]

Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:230407193. 2023.

[17]

Jain S, Dash S, Deorari R. Object detection using coco dataset. 2022 International Conference on Cyber Resilience (ICCR): IEEE; 2022. p. 1-4.

[18]

Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning augmentation policies from data.

[19]

Zhu X, Su W, Lu L, Li B, Wang X, Dai J. Deformable detr: Deformable transformers for end-to-end object detection.

Download PDF

Innovation Series

Innovation Series is an academic publisher publishing journals and books covering a wide range of academic disciplines.

About

About Us

Terms & Conditions

Contact

Francesc Boix i Campo, 7

08038 Barcelona, Spain