How to annotate your Image Classification
Dataset at minimum cost?

This website is based on the paper “Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets” accepted by CVPR21 as Oral presentation. For more details, check the project website and the code.

In this demo, we'll walk you through some key decisions for annotating an image classification dataset as a task designer aiming to minimize cost. We start from a given unlabeled dataset and a set of classes of interest. You are allowed to choose the expertise of workers to hire. Do you add gold standard questions for annotators? Which aggregation method do you use? If a vision (henceforth, CV) model is used, how frequently do you update it? etc.

  Take a glance at your dataset

Now you have an unlabeled dataset of 21740 images and you want to hire some workers to help you annotate the image labels. After some data cleaning, you end up with 16(+1) classes of interest and are ready to start annotating.

Here are the classes you have:

To make it more realistic, we will add around 10% additional images from other classes as distractions.


  Worker Configuration



  How to Aggregate labels


  Greedy Task Assignment with inferred worker skills
  Early stop when the number of confident images decays for 5 steps

  Results



Performance (compared with ImageNet ground truth)



Your labels:



Green bordered images are annotated less than 3 times. Red bordered images are annotated more than 3 times.