By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

How to manage missing and null image annotations

May 3, 2022

Explaining data annotation for machine learning

Data annotation in machine learning involves labeling data to represent the results you want your ML model to analyze. It includes the labeling, marking, tagging and transcribing of ML datasets with the attributes you want your machine learning system to begin to learn to acknowledge or recognize.

What are the challenges of image annotations? 

From identifying objects, to making sure they are recognizable for ML models based on computer vision, there are quite a number of challenges that could arise when doing image annotation. Seeing how each object has to be labeled with cuboids, bounding boxes, lines or other methods of annotations required by client's needs, getting the actual context of a scene for object recognition could be a challenging job. Some of these challenges are

1. Inaccuracy of labelled data:

It can be really tasking for humans to annotate a dataset accurately without errors, especially when it involves labeling image features because it is usually difficult to identify all the useful properties of an image that can be used in describing it properly. This could result in the AI not being able to process what is being labeled or how it should be interpreted by the system, reducing efficiency. Accurate and proper data labeling is a process that involves a lot of work especially when there are large volumes of unstructured data.

2. Biased results:

Human judgement is flawed and biased, and this could create a problem when they carry out annotation tasks because they could add their own personal biases while defining labels for datasets.

3. It is time consuming:

If your project is a large one, image annotation could take you weeks to complete. This includes the time and effort it will take to create annotation guidelines and trainings for human annotators, which could affect or even delay  project timelines.

Types of annotation errors

1. Incorrect class: When an object is classified incorrectly, e.g. a house being labeled as a bus.

2. Missing annotation: When an object annotation is omitted where it should exist. 

3. Redundant annotation: When annotation is done for an object that doesn't require it.

4. Incorrect attribute: When the position of an object is not described correctly, e.g. a walking pedestrian is labeled as standing.

5. Incorrect annotation size: Where an object's annotation is different from it's actual dimensions.

The difference between missing and null annotations

In most result tables, nulls are interpreted as unknown values and missing values are interpreted as known values. A null value is a special marker usually used in a dataset to indicate that a data value does not exist in the database. Simply put, it is used to denote values that we do not know, while missing values occur when no data value is stored for the variables or participants under observation. It could be a single value missing in a cell or an entire row. They are among the most common mistakes that are encountered in data annotation.

How to handle missing and null annotations

1. Delete rows with missing values: Missing values can be handled by deleting the rows or columns having null values. If columns have more than half of the rows as null then the entire column can be dropped, even though this method could lead to a loss of a lot of information and it doesn't work very well if the percentage of missing values comprises a large part of the complete dataset.

2. Using Algorithms that support missing values: Most ML learning models today do not support missing values but some ML models like the The k-NN algorithm are versatile enough to ignore missing values in the dataset. Others like Naive Bayes can also accommodate missing values when making a forecast.

3. Feed in missing values with Mean/Median: Here, the missing values are substituted by the mean or median values. This method can prevent the loss of data that is unavoidable with most other methods. Replacing the above two approximations (mean, median) is a statistical approach to handle the missing values.

Most datasets have missing or null values that you need to handle intelligently if you want to create an accurate and versatile AI model that will be able to handle your business needs.

You might also like
this new related posts

Want to scale up your data labeling projects
and do it ethically? 

We have a wide range of solutions and tools that will help you train your algorithms. Click below to learn more!