ICIP 2023 Challenge Session « CD-COCO »

International Conference on Image Processing (ICIP), 8-11 October 2023 Kuala Lumpur, Malaysia

ICIP 2023 Challenge Session


Challenge Title: Object detection under uncontrolled acquisition environment and scene context constraints

Image acquisition conditions can significantly affect high-level tasks in computer vision, such as object detection, object recognition, object segmentation, depth estimation, scene understanding, or object tracking just to name a few. The improvement of the sensors’ quality and deep learning methods provided an increase in robustness against distortions to reach suitable performance in various computer vision algorithms. However, even taking advantage of new sensor technologies and deep learning approaches, the performance is quite limited in real applications where the visual scene contains both local and global distortions. This is the case in autonomous vehicles, video surveillance, or medical robotics for example. Several object detection benchmarks dataset have been proposed [1-3]. The most popular benchmark is the MSCOCO dataset. The performance of the object detection models are generally evaluated using the Mean Average Precision metric. However only global image distortions are considered in the experiment. For a better assessment of the robustness of object detection models, it is important to also consider the presence of local distortions and the complexity of the observed scenes in real environments. This will give more realism and reliability to databases including such scenarios. To this end we built a database containing several images with various global and local distortions by taking into account some relevant features related to the contexts to give more realism to the images. Our dedicated dataset comprises original and distorted images from the well-known MS-COCO dataset. The synthetic distortions are generated according to several types and severity levels with respect to the scene context. Important: The selected teams will be invited to be part of a joint paper, summarizing the top proposed solutions, to be submitted for publication in an IEEE Transaction.

It is important to note that the performance of most deep learning-based computer vision algorithms is limited when trained on image databases that do not contain distortions [4]. Indeed, image databases dedicated to benchmarking some computer vision algorithms do not generally include real scenarios where we deal with distortions due to the acquisition conditions. The robustness of learning-based computer vision algorithms is therefore dependent on the representativeness and richness of the databases in terms of distortions [5]. This observation is even more visible in real applications where distortions are more complex and heterogeneous than synthetically generated distortions. Usually, deep learning methods improve their robustness through data augmentation or dedicated architecture design. The first solution, retained in our study, is based on adding synthetic distortions in the training set to accustom the network to perturbations. This challenge will perform the first comprehensive benchmark of the impact of realistic synthetic distortions on the performance of current object detection methods. This study will provide a reliable prediction of the performance of these methods in real applications thanks to the realism and coherence of our Complex Distorted COCO dataset (CD-COCO). Using the MS-COCO database as the source of original images enabled us to use ground truth information to design local distortions and use a well-known database for object detection methods. In addition, we generated complex and realistic distortions that deal with the realism requirement by wisely choosing the distortion type and efficiently tuning the distortion parameter with respect to the scene context and the distortion type. In general, the proposal will advance the current approach with the potential possibility of making this comprehensive benchmark an important contribution to a field. In addition, the development of more effective and efficient computer vision algorithms with such benchmark will significantly contribute to the challenges of real-world industrial applications, such as robotics.

The CD-COCCO database is available to ICIP 2023 challenge competitors who have registered via this link to test their methods against different distortions and severity levels. The proposed methods must be able to localize the objects as accurately as possible and determine their classes in a reasonable time. The duration of the detection process will be a parameter to be taken into account in the performance evaluation. The participants would be required to submit an easy to read code of their algorithm (preferably in Matlab or Python) with comments along with a document with a summary and steps of their method. This code should contain executable script with its corresponding readme file allowing us to test their solution on our CD-COCO test set. Some illustrative results could be submitted to display the efficiency of their solution. The challengers must also provide the execution time of their solution and their system configuration to normalize the execution time between competitors. Thus, the submitted methods must try to reach the following goals:

  • Detect the presence of objects
  • Determine which class they belong to
  • Determine their location as precisely as possible in bounding boxes

The submitted methods will be assessed according to the official COCO mAP metric, which characterizes the methods’ precision by their ability to detect objects and locate them accurately. The criteria of accuracy and speed will be summarized in a ratio describing the efficiency of the proposed solutions. Furthermore, all proposed methods will be tested on our Lab computer with the same GPU to normalize the execution time. The distorted test sets contain images submitted to various distortions at random severity levels and with severity levels increasing progressively from set to set. The evaluation will thus have 2 parts:

  • A general test set with all distortion types at random severity levels.
  • Test sets for each distortions type with a specific severity level increasing linearly from set to set.

The CD-COCO dataset that will be used in this challenge session comes from the famous MS-COCO dataset that contains 164K images split into three sets, respectively the training set with 118K images, the validation set with 5K images, and the test set with 41K images. We applied dedicated distortions type at specific severity levels to the training set according to the scene context of each of its images. The choice of the distortion type would be correlated to the scene type (indoor/outdoor) and the scene context (the objects present and the scene depth). Likewise, the distortion severity level would be assigned according to the object type and position (pixel and depth) for local distortions or atmospheric distortions (rain and haze). For example, haze and rain cannot be present in indoor scenes, and the object motion blur should be correlated to the object’s velocity, which depends on the object type and its position in the scene. Thus, the distortion severity level based on the object type should consider the object’s sensitivity for a given distortion. Conversely, the object position and scene depth will allow to deal with the scene specificity to make the distortions more coherent according to the scene context and type. Important: The link to access the dataset will only be provided to the registered participants.

Parameter Value
Number of images by sets 118K for training and 5K for validation
Acquisition conditions 2 (Day, Night)
Scene Type 2 (Indoor, Outdoor)
Resolution of images 640 x 480
Category images RGB images
Number of distortions 10
Number of distortion levels 10
Distortion types (D1) Image Compression, (D2) Noise (Additive White Gaussian Noise), (D3) Contrast changing, (D4) Rain, (D5) Haze, (D6) Motion blur (camera motion), (D7) Defocus Blur, (D8) Local Backlight illumination, (D9) Local Motion Blur, (D10) Local Defocus Blur
Object types 80 object categories from the COCO dataset (person, bicycle, car, etc.)

Our CD-COCO dataset comprises local distortions such as blur motion, defocus blur, and backlight illumination applied to objects or specific areas. It is worth noticing that the weighting and magnitude of each distortion is adjusted according to the position of the object in the observed scene. This implies both 2D spatial position and depth are taken into account in the application of the synthetic distortions. This database also contains the case of global distortions related to camera parameters and characteristics, such as noise sensitivity, defocus or instabilities, and those related to acquisition conditions such as atmospheric turbulence, image artifacts (lossy compression artefacts), motion blur or uncontrolled lighting. Among the atmospheric and weather factors affecting the image acquisition quality, we consider rain and haze phenomena. The other factors related to camera sensors’ limitations are mainly noise sensitivity, contrast sensitivity and spatial resolution. The global blur may result from camera motion and/or optical defocus. Whereas, local motion blur results from moving objects. Our dataset is detailed in the following tables (see tables 1 and 2).

Distortions Distortion Types Scene type Depth Influence Object type influence
Image Compression Global/Acquisition No No No
Noise Global/Acquisition No No No
Contrast changing Global/Atmospheric Yes Yes No
Rain Global/Atmospheric Yes Yes No
Haze Global/Atmospheric Yes Yes No
Motion blur Global/Camera conditions No No No
Defocus blur Global/Camera conditions No No No
Local Backlight illumination Local/Scene conditions No Yes No
Local Defocus blur Global/Scene conditions No Yes No

Following team will run the challenge session:

University Paris Saclay, France

Ayman0             Malik         beji_photo

   Aman Beghdadi                  Malik Mallem                    Lotfi Beji

    PhD candidate                     Professor                           Associate Professor

Norwegian University of Science and Technology (NTNU), Norway

Family           Mohib       

Faouzi Alaya Sheikh                  Mohib Ullah                       Adane N. Tarekegn

Professor                                       Postdoc Fellow            Postdoc Fellow

University Sorbonne Paris Nord, France

Zuheng-portrait  borhene-150x150   zohaiib-150x150         

 Zuheng Ming                   Borhene E. Dakkar         Zohaib A. Khan           Azeddine  Beghdadi

Assistant Professor           Postdoc Fellow                  Research scientist                 Professor

Registration opening: January 22, 2023
Training data available: January 31, 2023
Testing data available: March 1, 2023
Challenge paper submission (optional): April 26, 2023
Solutions/codes submission : April 30, 2023
Challenge paper acceptance notification: June 21, 2023
Camera ready submission of accepted challenge papers (optional): July 5, 2023
Announcement of the winners: @ IEEE ICIP 2023

The coordinators will contact potential sponsors for supporting 1-3 awards for the competition winners.

  1. Beghdadi, Ayman, Malik Mallem, and Lotfi Beji. « Benchmarking performance of object detection under image distortions in an uncontrolled environment. » 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022.
  2. Beghdadi, A., Qureshi, M.A., Dakkar, B.E., Gillani, H.H., Khan, Z.A., Kaaniche, M., Ullah, M. and Cheikh, F.A., 2022, October. A New Video Quality Assessment Dataset for Video Surveillance Applications. In 2022 IEEE International Conference on Image Processing (ICIP) (pp. 1521-1525). IEEE.
  3. I. Bezzine, Z. A. Khan, A. Beghdadi, N. Almaadeed, M. Kaaniche, S. Almaadeed, A. Bouridane, F. Alaya Cheikh,  » Video quality assessment dataset for smart public security systems « , in the Proceedings of the 23rd IEEE-INMIC, Bahawalpur, Pakistan, 5-7 November 2020.
  4. Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bringmann, O., Ecker, A. S., … & Brendel, W. (2019). Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484.
  5. Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261.<
  6. Lin, T. Y., Mayor, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.



Les commentaires sont fermés.