Ranjan Sapkota, Dawood Ahmed, and Manoj Karkee, “Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments,” preprint, Dec. 2023 [Online]. Available at: https://arxiv.org/abs/2312.07935. [Accessed: December 14, 2023]
@techreport{sapkota_comparing_2023,
type = {preprint},
title = {Comparing {YOLOv8} and {Mask} {RCNN} for object segmentation in complex orchard environments},
url = {https://arxiv.org/abs/2312.07935},
language = {en},
urldate = {2023-12-14},
author = {Sapkota, Ranjan and Ahmed, Dawood and Karkee, Manoj},
month = dec,
year = {2023},
arxivdoi = {10.48550/arXiv.2312.07935},
freepdf = {https://arxiv.org/pdf/2312.07935.pdf},
tldr = {Evaluation of YOLO8 and Mask RCNN in two datasets and for two different tasks. Datasets are color images of production apple trees; Dataset 1 from the dormant season (leafless trees) and Datatset 2 from the growing season with fruitlets. Tasks are single-class instance segmentation of fruitlets from Dataset 2, and multi-class instance segmentation of branches and tree trunks from Datatset 1. Total of ~1550 images, all manually annotated and split into train / val / test sets; models trained on this data. References of other works using YOLO-N or Mask-RCNN in orchard environments is useful. Concludes that YOLO8 works better in these environments than Mask-RCNN, with better precision and recall and lower inference times.}
}
Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN’s, respectively. These findings show YOLOv8’s superior accuracy and efficiency in machine learning applications compared to two-stage models, specifically Mast-RCNN, which suggests its suitability in developing smart and automated orchard operations, particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thinning.
tl;dr: Evaluation of YOLO8 and Mask RCNN in two datasets and for two different tasks. Datasets are color images of production apple trees; Dataset 1 from the dormant season (leafless trees) and Datatset 2 from the growing season with fruitlets. Tasks are single-class instance segmentation of fruitlets from Dataset 2, and multi-class instance segmentation of branches and tree trunks from Datatset 1. Total of 1550 images, all manually annotated and split into train / val / test sets; models trained on this data. References of other works using YOLO-N or Mask-RCNN in orchard environments is useful. Concludes that YOLO8 works better in these environments than Mask-RCNN, with better precision and recall and lower inference times.
Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments.
arXiv: http://doi.org/10.48550/arXiv.2312.07935
h/t Amy Tabbpdf: https://arxiv.org/pdf/2312.07935.pdf