How the model works
A deep-dive into the transformer pipeline — from raw text to 27-class emotion output. Click any layer to explore it interactively.
Input Layer
Raw text string accepted by the API (max 512 tokens).
Type below — the Tokenizer layer updates live ↓
Tokenizer
AutoTokenizer: WordPiece tokenization, [CLS] / [SEP] tokens, attention masks.
WordPiece tokens for: “I am so happy today!”
BERT Encoder
12 transformer layers · 768 hidden dim · 12 attention heads.
12 attention heads · opacity = attention weight (animated)
Classification Head
Linear(768 → 27) + Sigmoid activation — one node per emotion.
27 sigmoid output neurons — hover to see emotion
Output
Top-K predictions ranked by confidence score.
Top predictions from sigmoid output
94% · confidence
81% · confidence
72% · confidence
Training Details
Training Setup
Dataset Stats
Inference Pipeline
# 1. Load model + tokenizer from HuggingFace Hub
snapshot_download(repo_id=HF_MODEL_REPO, local_dir="model_cache")
# 2. Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# 3. Forward pass (no_grad for inference)
with torch.no_grad():
logits = model(input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"])
# 4. Sigmoid → per-class probabilities
probs = torch.sigmoid(logits).squeeze() # shape: [27]
# 5. Top prediction
top_idx = probs.argmax().item()
label, confidence = LABELS[top_idx], float(probs[top_idx])
Why Multi-label?
Human emotions rarely occur in isolation. Multi-label classification with sigmoid activation allows each of the 27 emotion nodes to fire independently — capturing the full emotional complexity of language.