Capstone · ENGI-981B · Memorial University

Can My Phone Tell If I'm Happy?

A fully functional iOS app that detects your emotional state in real time using facial recognition — no cloud, no data upload, all inference runs on your phone.

SwiftUICore MLVisionFirebaseMobileViT v2Python

What is this project?

I teamed up with Soe to build an iOS app that uses AI to detect your emotions from a selfie and track your mood over time. This app tells you whether you look happy, sad, angry, surprised, or somewhere in between.

Is it 100% accurate? No. Is it fun and occasionally very wrong? Absolutely. That's kind of the point.


Why does the app think it knows your feelings?

The theory goes back to a psychologist named Paul Ekman, who argued in the 1970s that six basic emotions — happiness, sadness, anger, fear, disgust, and surprise — are universally recognised across cultures. A child in Tokyo and a grandparent in Brazil would make roughly the same face when surprised. This gave researchers a shared framework for labelling expression data, and that labelled data is what makes training a classifier possible.

App screenshotEkman's six basic emotions

The app recognises seven categories: the six above, plus neutral, for those of us who have perfected the resting face.


Where did the training data come from, and what's wrong with it?

The model was trained on three public datasets: FER2013, RAF-DB, and AffectNet. Each brings something different.

FER2013

35,000+ grayscale images · 48×48 px · from Google Image Search · labelled by hand. Large and standardised, but low resolution and noisy.

Large benchmark

RAF-DB

Smaller but more realistic · high-res images · varied lighting, head angles, glasses.

High quality

AffectNet

400,000+ images · essential volume for a Transformer to avoid overfitting.

Heavyweight
Honest downside: all three datasets skew toward certain demographics and are heavily imbalanced. "Happy" accounts for a disproportionate chunk of the samples, while "Disgust" and "Fear" are quite rare. The model reflects that — it's very good at happy, it basically cannot find disgust.
Dataset class distributionDataset imbalance chart

Why train our own model instead of using an API?

Two reasons. First, it's more interesting. Second, running inference through a cloud API means every photo you take gets sent to a server. For an app that's literally tracking your emotional state, that felt like the wrong call.

Training our own model and bundling it directly into the iOS app means inference runs entirely on-device. Your face stays on your phone. No data leaves. The tradeoff is that the model has to be small enough to live on a phone without slowing everything down, which is why we chose MobileViT v2.

Shipped ✓

MobileViT v2

Size70 MB
Latency25–60 ms
TypeVision Transformer
ConsistencyHigh

Custom CNN

Size9.4 MB
Latency5–15 ms
TypeConvolutional NN
ConsistencyMedium
MobileViT v2 architecture

How did it actually perform?

On our own real-world test set of 290 images, the app successfully detected a face and ran a prediction 93% of the time. Across the seven emotion classes, overall accuracy landed around 70 to 72%.

The honest breakdown: Happy and Angry were reliably detected. Surprise was solid. Neutral was decent. Fear was mediocre. Disgust was a disaster — the model essentially gave up on it and defaulted to Angry whenever it was confused, which was often.

93%

Face detection success rate

70–72%

Overall accuracy across 7 classes

70MB

On-device model size

25ms

Min inference time per image

Happy
~88%
Angry
~79%
Surprise
~72%
Neutral
~65%
Sad
~58%
Fear
~45%
Disgust
~18%
That last part is the most interesting failure mode. When a model is uncertain, it tends to collapse toward whatever class dominated its training data. In our case, that was Angry. Fixing it properly requires better data more than better architecture.
Confusion matrixAccuracy breakdown by category

Recognition

MoodTracker was awarded Best Capstone Project of the graduating class.

Best Capstone Project Award