A rigorous, controlled empirical scan to stress-test multi-epoch training boundaries on consumer hardware. We ran a 5,114,304 parameter architecture through a sweeping 1-to-8 epoch matrix using an isolated, stationary pool of 20,000,000 high-quality tokens from FineWeb-Edu.
Modern LLM design dictates that models should ideally train on unique tokens for exactly one epoch to guarantee generalized logic. Our sweep completely dismantles this assumption for extreme Small Language Models (SLMs) in the megabyte range:
All benchmarks processed zero-shot using uniform learning schedules. Lower Perplexity (PPL) and Lower Training Loss denote superior data mapping.
| Benchmark/Metric | Epoch 1 | Epoch 2 | Epoch 3 | Epoch 4 | Epoch 5 | Epoch 6 (🏆 Win) | Epoch 7 | Epoch 8 |
|---|---|---|---|---|---|---|---|---|
| Final Pretrain Loss (↓) | 6.574 | 5.871 | 5.459 | 5.173 | 4.974 | 4.858 | 4.734 | 4.632 |
| BLiMP (Grammar ↑) | 54.01% | 56.31% | 57.85% | 61.07% | 61.53% | 59.11% | 62.17% | 60.67% |
| ARC-Easy (Facts ↑) | 26.56% | 26.77% | 27.78% | 29.17% | 29.80% | 30.30% | 30.26% | 29.63% |
| BoolQ (Reading Logic ↑) | 37.83% | 37.83% | 37.83% | 37.83% | 37.83% | 37.86% | 37.95% | 37.92% |
| Wikitext (Byte PPL ↓) | 5.1477 | 4.2237 | 3.8080 | 3.5645 | 3.4472 | 3.3916 | 3.1886 | 3.1550 |
| EVALUATION STATUS | Underfit | Adapting | Glitch Crisis | Stable | High Quality | OPTIMAL PEAK | Oscillating | Overfit Parrot |
Crucial observation: While the training loss curves downward indefinitely, the true linguistic capability spikes and drops, confirming structural fragmentation past Epoch 7.
Tracking raw generation weights given the prompt "Artificial intelligence is ":
Epoch 1: "...by the year-ponbmi, which they may be well of a single health;, so was that has much to help with a tergs..."
Analysis: Highly fragmented syntax. Inventing structural gibberish due to under-tokenization.
Epoch 3 (The Glitch Phase): "...Artificial intelligence is בोΩϤЧЉᵸ²хайϻجΠ״..."
Analysis: Transition volatility. Weights start intense formatting compression, creating dense corrupted tokens.
Epoch 5/6 (Structural Mastery): "...The earliest examples are known for a large number... | Despite its knowledge, the same person's most complexity was only as being..."
Analysis: Complete repair of character flaws. The model spontaneously masters Markdown tags, layout partitions, and tabular boundaries.