Background
People around the world express emotions through facial expressions. To understand someone‘s emotion one does not have to speak the language of the other person. The feeling‘s can be read off the person‘s face. Paul Ekman analyzed exactly this aspect of people: if facial expressions of emotions are universal.
Aim
The aim is to see if a convolutional neural network can be created to categorize the emotions in the pictures correctly to the actual emotion shown.
Dataset
The used dataset can be downloaded from Kaggle.
The dataset consists of 48x48 pixel pictures of people. The pictures can be categorized into 7 groups: 0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy, 4 = Sad, 5 = Surprise, 6 = Neutral.
The data set is divided into train, validation and 2 separate test sets: private and public. The whole train data set that was provided in the Kaggle challenge included 28709 images. These images were then split into train and validation with 35% of the images as validation and 65% as training.
Technology
For this project the notebook was created and run online on the Paperspace gradient website. This website allows the user to choose a machine with different GPU and CPU settings that are needed. To run the models at a moderate speed a GPU of 16 GiB is needed as well as 8 CPU and 30 GiB RAM,.
To prepare the data the pandas library and the numpy library were used. To create the models various Tensorflow libraries were imported into the notebook.
Benefits
Even though the model has lacks in prediction, an accuracy of over 63% was achieved by testing various implementations of the model by changing: batch size, epochs, drop-out values and filter sizes.
Drawbacks
After evaluating a couple of images, it is noticeable that some images have wrong labels. This leads when training a model to not be able to predict reliably. To eliminate this issue all images would need to be reevaluated to check if the labeling is correct and then rerun the models.
Challenges
The challenge that persisted throughout the notebook was the challenge of boosting the accuracy on validation and test data. The first model created had an accuracy of about 50%. The model that was then used to create the final model started off at around 56% accuracy and the final model ending at 63% accuracy on the test data shows an increase however not an increase that would be suitable for a model to predict reliably facial expressions.
Results
The final model created trained for 200 epochs with a batch size of 128. The model had an accuracy of 61.3% on the public test set and a 63.56% accuracy on the private test set. This model has multiple conv2d layers with max-pooling layers, dropout layers and batch normalization layers.