Missing Data

Before taking any actions, we first conducted a test to make sure that these values are missing at random or not. Using mcar test in R, we have the following hypothesis:
H0: Data is missing completely at random
Ha: Data is missing not at random


The p-value is 1 in both datasets, so we fail to reject the null hypothesis and therefore, the missing values are missing completely at random.
Handling Missing Data
​
The following approach was used to deal with the missing values:
-
Responses for those participants who did not respond to the questions about their gender and/or their classification level were removed, as we do not wish to continue with those responses. 0.01 of the total responses in 2014 and only 4 rows in 2019 were removed after this.

​
-
Then, we examined the percentage of missing values in all rows, in both datasets. The rows with missing data more than 5% were removed from the datasets.
-
Remaining responses with percentage of missing values below or equal to 5% were chosen for imputation with “mice” and the missing values were replaced with median.
After performing the above steps, we have no missing data in either of the years.
