The results of the 2017 EmotioNet Challenge are now available
Please send any questions to martinez.158@osu.edu with subject line “EmotioNet Challenge.”
The EmotioNet Challenge website is: http://cbcsl.ece.ohio-state.edu/EmotionNetChallenge/index.html
Research groups that have designed or are developing algorithms for the analysis of facial expressions are encouraged to participate in this challenge. The competition has two tracks. You may decide to participate in a single track or in both tracks.
This track requires the identification of 12 action units (AUs). The AUs included in the challenge are: 1, 2, 4, 5, 6, 9, 12, 17, 20, 25, 26, 43.
Training data: The EmotioNet database includes 950,000 images with annotated AUs. These were annotated with the algorithm described in [1]. You can train your system using this set. You can also use any other annotated dataset you think appropriate. This dataset has been used to successfully train a variety of classifiers, including several deep networks.
Optimization data: We also include 25,000 manually annotated AUs. You may want to use this dataset to see how well your algorithm works or to optimize the parameters of your algorithm.
Verification phase: Participants will have access to a server where they can test their algorithm. Participants will receive a unique access code. Each participant will be able to test their algorithm twice. Comparative results against those of the 2017 EmotioNet Challenge [4] will be provided.
Challenge phase: Participants will be able to connect to the server one final time to complete the final test. The test dataset used in this phase will be different than the one in the verification phase. The results of this phase will be used to compute the final scores of the challenge.
Evaluation: Identification accuracy of AUs will be measured using two criteria – accuracy and F-scores. Algorithms will then be classified from first (best) to last based on an ordinal ranking of their performance on the mean of the recognition of all AUs. Formally, these criteria are defined as follows.
Accuracy is a measurement of closeness to the true value. We will compute recognition accuracy of each AU, i.e., $accuracy_i$ corresponds to the accuracy of your algorithm in identifying that an image has AU i active/present, with i= 1, 2, 4, 5, 6, 9, 12, 17, 20, 25, 26, 43. Formally,
\[accuracy_i=\frac{\sum \mbox{true positives}_i+\sum \mbox{true negatives}_i}{\sum \mbox{total population}}\]
where $\mbox{true positives}_i$ are AU $i$ instances correctly identified in the test images, $\mbox{true negatives}_i$ are images correctly labeled as not having AU $i$ active/present, and total population is the total number of test images.
The mean accuracy is $accuracy= m^{-1}\sum accuracy_i$ , and the standard deviation is $\sigma^2=m^{-1} \sum(accuracy_i-accuracy)^2$ , where $m$ is the number of AUs.
F-scores will also be provided by each AU before computing the mean and standard deviation. The F-score of AU i is given by
\[F_{\beta_i}=(1+\beta^2)\frac{precision_i \cdot recall_i}{\beta^2presion_i+recall_i}\]
where $precision_i$ is the fraction of AU $i$ is correctly identified, $recall_i$ is the number of correct recognitions of AU i over the actual number of images with AU $i$ active, and $\beta$ defines the relative importance of precision over recall. We will use $\beta=.5,1,2$. $\beta=.5$ gives more importance to precision (this is useful in applications where false positives are not as important as precision), $\beta=2$ emphasizes recall (which is important in applications where false negatives are unwanted), and $\beta=1$ provides a measure where recall and precision are equally relevant. We will compute the mean $F_{\beta}=m^{-1}\sum F_{\beta_i}$.
The final ranking of all participants will be given by the average of accuracy and $F_1$, i.e., $\mathbf{\mbox{final ranking}}=.5(accuracy+F_1 )$. We may also provide rankings for each of the individual measurements (i.e., $accuracy$ and $F_\beta$, with $\beta=.5,1,2$) as well as the number of times each algorithm wins in the classification of each AU using these evaluations. But only the final ranking will be used to order submissions from first to last.
A variety of additional experiments might be performed on the data to better understand the limitations of current algorithms, but these will have no effect on the final ranking.
Recognition of basic and compound emotions: This track is for algorithms that can recognize emotion categories in face images. You can identify the emotion category based on the detection of AUs, but you can also use any other system (e.g., one that uses shape or appearance, e.g., [2]). Your do not need to participate in track 1 to be eligible to participate in this track. Of course, you may want to participate in track 1 and not in this one. Or you may wish to participate in both.
Training data: A subset of the images in the EmotioNet database correspond to basic and compound emotions. The EmotioNet database includes a file with annotations. These annotations are given by the algorithms described in references [1,2]. You can use this dataset and any other manually or automatically annotated dataset to train your system. The database of [2] provides a large number of manually annotated images in lab conditions and can be downloaded by following this online form: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. Note that only post-doctoral researchers and faculty members can apply for this dataset.
Optimization data: You may want to use this dataset to determine how well your algorithm works on images in the wild or to optimize the parameters of your algorithm.
Verification phase: Participants will be able to use their unique code to test their algorithms in a server. Participants will be able to test their algorithms once.
Challenge phase: Participants will use their unique access code to access the dataset in the server and provide their emotion labels. These results will be the final scores of the challenge.
Evaluation: Mean $F_1$ and Unweighted Average Recall (UAR) will be used as criteria in this track. $F_1$ is defined same as in track 1 with the difference that $i$ will now correspond to emotion category $i$ rather than AU $i$.
Unweighted Average Recall (UAR) is defined as follows: for an emotion category $i$, the Recall of a classifier is defined as,
\[Recall_i=\frac{\mbox{true positives}_i}{\mbox{total positive}_i}\]
where $\mbox{true positives}_i$ is the number of correctly classified samples for emotion category i and $\mbox{total positive}_i$ is the actual number of images in the testing set with this emotion as ground truth.
Then Unweighted Average Recall (UAR) is calculated as $\mbox{UAR} = m^{-1}\sum \mbox{Recall}_i$, where m is the number of emotion categories.
The final ranking of all participants will be given by the average of $\mbox{UAR}$ and $F_1$, i.e., $\mathbf{\mbox{final ranking}}=.5(\mbox{UAR}+F_1 )$. Similar as in track 1, we may also provide rankings for each of the individual measurements. But only the $\mathbf{\mbox{final ranking}}$ will be used to order submissions from first to last.
The EmotioNet database can be accessed here: http://cbcsl.ece.ohio-state.edu/dbform_emotionet.html
EmotioNet database: The training and verification sets are available now. This includes 975,000 images of facial expressions in the wild. This dataset is defined in reference [1]. Manual annotations of AUs on 25,000 images are included (i.e., the optimization set).
Registration phase: Registration for the challenge starts July 1st and closes October 24th, 2018. To register please visit: registration form
Verification phase: Participants will have access to the testing server from September 3rd to October 29th, 2018.
Challenge phase: Participants will have access to the server from November 5th to November 9th, 2018.
Results will be announced in early November.
The results of the 2017 EmotioNet Challenge are detailed in [4].
IMPORTANT NOTE: All results will be posted on the EmotioNet website and might be published in papers and included in press releases. By participating in this challenge, you and your institution/company assume all responsibilities, liability and costs. You can only participate using algorithms developed by you and your co-authors. Neither proprietary nor classified concepts/information should be included in your submission. Analyses of the results given by all algorithms might be extended without prior notice and published on websites, papers and press releases. These will NOT change the outcome of the challenge (i.e., the rankings will be determined using the methodology described above) but are useful statistics that will help the community better understand the strengths and limitations of each algorithm and the area as a whole. By participating you agree to all terms and conditions stated in the website of the challenge.
The results of the EmotioNet challenge are summarized here (reference [4]).
Last November we announced the 2017 EmotioNet Challenge. Training and verification data were made available in mid November and early December for tracks 1 and 2, respectively.
Research groups and companies interested in participating in the challenged were required to register. 37 groups signed up and received the training and verification sets.
The testing data was sent to the 37 participants in early February. Teams had a few days to process the data and submit their automatic annotations – AUs in track 1 and emotion categories in track 2.
Of the initial 37 groups, only 4 groups successfully completed track 1 on time. Only 2 groups completed track 2 before the deadline. The results are summarized below. Additional results and a detailed analysis of the results will be published in a working paper within a few weeks.
Group |
Final score |
I2R-CCNU-NTU-2 |
.728985 |
JHU |
.710087 |
I2R-CCNU-NTU-1 |
.702322 |
I2R-CCNU-NTU-3 |
.69608 |
Note: Final score takes a value between 0 and 1, with 1 the best possible score and 0 the worst one. The final score is the average of accuracy and F1 score.
Group |
Accuracy |
I2R-CCNU-NTU-2 |
.8215 |
I2R-CCNU-NTU-1 |
.783667 |
I2R-CCNU-NTU-3 |
.776583 |
JHU |
.771417 |
Group |
F1 |
F2 |
F.5 |
JHU |
.6405 |
.635416667 |
.638083333 |
I2R-CCNU-NTU-2 |
.639833333 |
.624916667 |
.64325 |
I2R-CCNU-NTU-1 |
.629583333 |
.625416667 |
.635083333 |
I2R-CCNU-NTU-3 |
.622833333 |
.620333333 |
.626583333 |
Group |
Final score (2017 measurement) |
Final score (2018 measurement) |
NTechLab |
0.596767708 |
0.249 |
JHU |
0.479914583 |
0.1395 |
Note: Final score takes a value between 0 and 1, with 1 the best and 0 the worst possible scores, respectively. The final score (2017 measurement) is the average of accuracy and F1 score. The final score (2018 measurement) is the average of Unweighted Average Recall (UAR) and F1 score. In parentheses, we show the final score for those images the group was able to analyze.
Group |
Accuracy |
UAR |
NTechLab |
0.9415 |
0.243 |
JHU |
0.8358125 |
0.137 |
Group |
F1 |
F2 |
F.5 |
NTechLab |
0.254969 |
0.25981875 |
0.257669 |
JHU |
0.142375 |
0.12695625 |
0.181725 |
Results will be announced in early November.
The results of the 2017 EmotioNet Challenge are detailed in [4].
IMPORTANT NOTE: All results will be posted on the EmotioNet website and might be published in papers and included in press releases. By participating in this challenge, you and your institution/company assume all responsibilities, liability and costs. You can only participate using algorithms developed by you and your co-authors. Neither proprietary nor classified concepts/information should be included in your submission. Analyses of the results given by all algorithms might be extended without prior notice and published on websites, papers and press releases. These will NOT change the outcome of the challenge (i.e., the rankings will be determined using the methodology described above) but are useful statistics that will help the community better understand the strengths and limitations of each algorithm and the area as a whole. By participating you agree to all terms and conditions stated in the website of the challenge.
32 participants registered for the The 2018 Emotionet Challenge. Of these, 7 participants completed the verification phase for track 1, and 5 for track 2. 6 participants completed the challenge phase for track 1, and 3 participants for track 2. The results of the challenge phase are summarized below. Additional results and a detailed analysis of the results will be published in a working paper within a few weeks.
Group |
mean Accuracy |
F1 |
Final Score |
PingAn-GammaLab |
.9446 |
.5659 |
.7553 |
VisionLabs |
.9207 |
.4229 |
.6718 |
MIT |
.9298 |
.4125 |
.6711 |
University of Washington |
.8869 |
.3730 |
.6300 |
PingAn-Tech |
.8694 |
.3747 |
.6221 |
University of Denver |
.8576 |
.2296 |
.5436 |
Note: Final score takes a value between 0 and 1, with 1 the best possible score and 0 the worst one. The final score is the average of accuracy and F1 score.
Group |
UAR |
F1 |
Final Score |
VisionLabs |
.3774 |
.3391 |
.3582 |
Peking University |
.2758 |
.209 |
.2424 |
University of Arkansas |
.2257 |
.1222 |
.174 |
Situ |
.0296 |
.0741 |
.0518 |
Note: Final score takes a value between 0 and 1, with 1 the best and 0 the worst possible scores, respectively. The final score (2018 measurement) is the average of Unweighted Average Recall (UAR) and F1 score.
Do I need to publish my results/algorithm?
No. Participating in this challenge does NOT mean you need to publish any paper describing your algorithm or results. Of course, you are welcome to publish papers describing any algorithm you have developed to participate in this challenge or the results obtained on an existing algorithm and submit it to the journal, conference, workshop or symposium of your choice.
Where can I publish my algorithm and results?
You can publish your algorithm and results wherever you feel is best suited. Or you can decide not to publish them. Your results will however be posted on websites and might appear in articles and press releases.
Can I participate using an already published algorithm?
Yes. There are no restrictions on who or which algorithm can participate but the algorithm must be yours. You cannot participate using an algorithm derived/implemented by someone else. Your implementation can of course use open access code available on websites (e.g., openCV,GitHub)
Can I participate anonymously?
Participation to this challenge requires registration. This includes your name and University, Institution or Company where you work. Only the name of your institution will be made available online and in papers and only after the release of the results. If there are multiple entries from the same institution, the name of the institution is going to be followed by a dash and a number, e.g., OSU-1, OSU-2, etc.
Can companies participate?
Yes, but the results will be made publicly available in websites, papers and press releases.
If I register but do not participate in the challenge phase, will I be listed in the website?
No.
I have detected an error/typo in a file, what should I do?
Contact us asap at Martinez.158@osu.edu.
[1] Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR'16), Las Vegas, NV, USA.
[2] Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454-E1462.
[3] Benitez-Quiroz, C. F., Liu, Y., & Martinez, A. M. (2016). Recognition of Action Units in the Wild with Deep Nets. ICCV 2017.
[4] Benitez-Quiroz, C. F., Srinivasan, R., Feng, Q., Wang, Y., & Martinez, A. M. (2017). EmotioNet Challenge: Recognition of facial expressions of emotion in the wild. arXiv preprint arXiv:1703.01210