Pretraining on the Test Set Is All You Need. (arXiv:2309.08632v1 [cs.CL])

Click here to flash read.

Inspired by recent work demonstrating the promise of smaller
Transformer-based language models pretrained on carefully curated data, we
supercharge such approaches by investing heavily in curating a novel, high
quality, non-synthetic data mixture based solely on evaluation benchmarks.
Using our novel dataset mixture consisting of less than 100 thousand tokens, we
pretrain a 1 million parameter transformer-based LLM \textbf{phi-CTNL}
(pronounced ``fictional") that achieves perfect results across diverse academic
benchmarks, strictly outperforming all known foundation models.
\textbf{phi-CTNL} also beats power-law scaling and exhibits a never-before-seen
grokking-like ability to accurately predict downstream evaluation benchmarks'
canaries.

Click here to read this post out

ID: 409799; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: Sept. 19, 2023, 7:31 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: