For example, when we want to train a human detection model for street camera views, what is the best strategy in annotation?
In most real-world situations, images will contain partially occluded silhouettes. If we see only the lower half of the body or only the upper half, should they all be annotated equally?
The best approach should involve an analysis of the specific use case. If the purpose of the model is privacy masking, we would want it to detect the upper half of the body so that it can blur the face accordingly, which would not be necessary at all if the upper half of a given silhouette is obscured.
For this reason, it is crucial to properly define the scope and context of the detection task before starting annotation. The annotation guidelines – the strategy – should be as precise as possible to avoid inconsistencies in the dataset and should track all edge cases encountered and relevant decisions. An iterative cycle of annotation and training is also good practice, as the first conclusions about model performance may force a change in the overall approach. To facilitate this process, we can use attributes to enrich the annotated objects with additional information and properties. In this way, we can easily filter the dataset without re-annotation, observe how the model behaves and tune it to the desired performance.
In our experience, up to 70% of computer vision projects fail due to an overly optimistic approach to the processed data in the model learning process. Moreover, it often happens that the only strategy the team has is to have the largest possible set, so with limited resources they replicate the data using augmentation, which often ends fatally for the quality of the models. Our job is not only to annotate the data, but also to provide knowledge on how to prepare it, which is one of the more difficult tasks in ML Ops processes, but a very quick return on investment.