System Architecture

How the model works

A deep-dive into the transformer pipeline — from raw text to 27-class emotion output. Click any layer to explore it interactively.

📥

Input Layer

Raw text string accepted by the API (max 512 tokens).

string

Type below — the Tokenizer layer updates live ↓

🔤

Tokenizer

AutoTokenizer: WordPiece tokenization, [CLS] / [SEP] tokens, attention masks.

tokens

WordPiece tokens for: “I am so happy today!

[CLS]iamsohappytoday![SEP]
🧠

BERT Encoder

12 transformer layers · 768 hidden dim · 12 attention heads.

embeddings

12 attention heads · opacity = attention weight (animated)

1
2
3
4
5
6
7
8
9
10
11
12
🎯

Classification Head

Linear(768 → 27) + Sigmoid activation — one node per emotion.

logits

27 sigmoid output neurons — hover to see emotion

Positive (12)Negative (11)Neutral (4)
📊

Output

Top-K predictions ranked by confidence score.

predictions

Top predictions from sigmoid output

excitement
Confidence 0%

94% · confidence

joy
Confidence 0%

81% · confidence

pride
Confidence 0%

72% · confidence

Training Details

0%
F1 Score
0M
Parameters
0+
Train Steps
0
Epochs

Training Setup

Loss FunctionBCEWithLogitsLoss
OptimizerAdamW
SchedulerLinear warmup
MetricF1 (macro)

Dataset Stats

Total Examples58k
Label TypeMulti-label
SourceReddit comments
Classes27 emotions
Split80 / 10 / 10

Inference Pipeline

# 1. Load model + tokenizer from HuggingFace Hub

snapshot_download(repo_id=HF_MODEL_REPO, local_dir="model_cache")

# 2. Tokenize input text

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# 3. Forward pass (no_grad for inference)

with torch.no_grad():

logits = model(input_ids=inputs["input_ids"],

attention_mask=inputs["attention_mask"])

# 4. Sigmoid → per-class probabilities

probs = torch.sigmoid(logits).squeeze() # shape: [27]

# 5. Top prediction

top_idx = probs.argmax().item()

label, confidence = LABELS[top_idx], float(probs[top_idx])

Why Multi-label?

Human emotions rarely occur in isolation. Multi-label classification with sigmoid activation allows each of the 27 emotion nodes to fire independently — capturing the full emotional complexity of language.

“I just found out I got the job! I'm so nervous but incredibly excited.”
excitement 0.91joy 0.83nervousness 0.74