YOLOv8L Wins — RT-DETR Doesn't Beat It
On every headline metric, YOLOv8L 1000ep beats RT-DETR 1000ep: mAP50 0.971 vs 0.964, mAP50-95 0.831 vs 0.789, precision 0.979 vs 0.953, recall 0.950 vs 0.929. The convolutional architecture with larger capacity outperforms the transformer on this dataset.
RT-DETR Beats YOLOv8L on BG→Varroa FP
RT-DETR 1000ep gets 171 BG→varroa false positives vs YOLOv8L's 98 — so it doesn't beat the FP benchmark either. However this is a massive improvement from the 555 at 200ep, confirming it was just undertrained then. The transformer attention is doing real work on the varroa/background distinction.
RT-DETR Has a Bee Miss Problem
The big red flag: 762 bees predicted as background (bee recall ~0.90 normalised). YOLOv8L has essentially zero of this — bees are saturated across all YOLO runs. RT-DETR is trading bee detection confidence for varroa sensitivity in a way YOLO never did. The val cls_loss also spikes hard after ep820, suggesting the model started overfitting its classification head late in training.
Val Cls Loss Spike — Overfit Signal
Val cls_loss reached its minimum at epoch 820 (0.275) then rose sharply to 0.407 by ep1000. This is the RT-DETR equivalent of the YOLO val DFL divergence — the classification head is overfitting. Best mAP50 was actually at ep818 (0.969), suggesting the saved weights at that checkpoint would be stronger than the final model.
mAP50-95 Still Rising at ep1000
Like YOLOv8L, RT-DETR's mAP50-95 was still climbing at epoch 1000 (best at ep999: 0.789). Box localisation is still improving even as classification starts to overfit. This is a consistent pattern across both architectures on this dataset.
Production Candidate: YOLOv8L
YOLOv8L 1000ep is the clear winner: best mAP50, best mAP50-95, lowest FP, highest precision and recall, no bee miss problem. The best checkpoint (ep783 weights) is the strongest model in the series. RT-DETR would need architectural tuning or more data to close the gap.