About This Project

GoEmotions & the model behind it

The dataset, architecture decisions, known limitations, and the person who built it.

The Dataset

GoEmotions is a large-scale emotion dataset from Google Research — 58k Reddit comments annotated across 27 fine-grained emotion categories by trained human raters.

Dataset Facts

SourceReddit comments (English)
Size58k comments
Labels27 emotions
TypeMulti-label
Published byGoogle Research

Emotion Distribution

Positive12 / 27
Negative11 / 27
Ambiguous4 / 27

Emotion Group Breakdown

Positive12/27 · 44%
Negative11/27 · 41%
Ambiguous4/27 · 15%

Model Card

🧠

Base Model

DistilBERT

⚙️

Parameters

0M

🏷️

Classes

0

📊

Samples

0k

🔧

Fine-tuning

Full fine-tune

☁️

Hosted on

HuggingFace Hub

Known Limitations

⚠️ Reddit-trained

Trained exclusively on Reddit comments. May not generalise to formal, medical, or domain-specific text where tone and vocabulary differ significantly.

⚠️ English only

Not evaluated on any non-English language. Results on multilingual input are undefined and likely poor.

⚠️ Max 512 tokens

Long texts are silently truncated to 512 WordPiece tokens before inference. Content beyond that boundary is ignored.

⚠️ Class imbalance

Neutral and ambiguous emotion classes have fewer training examples and may underperform on edge cases.

Nadipalli Jaswanth

Final year CSE @ NIT Andhra Pradesh · Interning at PepsiCo on agentic AI pipelines

FAANG PrepLangGraphRAG SystemsFastAPITransformers