GoEmotions & the model behind it
The dataset, architecture decisions, known limitations, and the person who built it.
The Dataset
GoEmotions is a large-scale emotion dataset from Google Research — 58k Reddit comments annotated across 27 fine-grained emotion categories by trained human raters.
Dataset Facts
Emotion Distribution
Emotion Group Breakdown
Model Card
Base Model
DistilBERT
Parameters
0M
Classes
0
Samples
0k
Fine-tuning
Full fine-tune
Hosted on
HuggingFace Hub
Known Limitations
⚠️ Reddit-trained
▾Trained exclusively on Reddit comments. May not generalise to formal, medical, or domain-specific text where tone and vocabulary differ significantly.
⚠️ English only
▾Not evaluated on any non-English language. Results on multilingual input are undefined and likely poor.
⚠️ Max 512 tokens
▾Long texts are silently truncated to 512 WordPiece tokens before inference. Content beyond that boundary is ignored.
⚠️ Class imbalance
▾Neutral and ambiguous emotion classes have fewer training examples and may underperform on edge cases.