About This Project

GoEmotions & the model behind it

The dataset, architecture decisions, known limitations, and the person who built it.

The Dataset

GoEmotions is a large-scale emotion dataset from Google Research — 58k Reddit comments annotated across 27 fine-grained emotion categories by trained human raters.

Dataset Facts

SourceReddit comments (English)

Size58k comments

Labels27 emotions

TypeMulti-label

Published byGoogle Research

Emotion Distribution

Positive12 / 27

Negative11 / 27

Ambiguous4 / 27

Emotion Group Breakdown

Positive12/27 · 44%

Negative11/27 · 41%

Ambiguous4/27 · 15%

Model Card

🧠

Base Model

DistilBERT

⚙️

Parameters

🏷️

Classes

📊

Samples

🔧

Fine-tuning

Full fine-tune

☁️

Hosted on

HuggingFace Hub

Known Limitations

⚠️ Reddit-trained

▾

Trained exclusively on Reddit comments. May not generalise to formal, medical, or domain-specific text where tone and vocabulary differ significantly.

⚠️ English only

▾

Not evaluated on any non-English language. Results on multilingual input are undefined and likely poor.

⚠️ Max 512 tokens

▾

Long texts are silently truncated to 512 WordPiece tokens before inference. Content beyond that boundary is ignored.

⚠️ Class imbalance

▾

Neutral and ambiguous emotion classes have fewer training examples and may underperform on edge cases.

Nadipalli Jaswanth

Final year CSE @ NIT Andhra Pradesh · Interning at PepsiCo on agentic AI pipelines

FAANG PrepLangGraphRAG SystemsFastAPITransformers

GH LI 🤗