--> SKILL-MIX

SKILL-MIX: A Flexible and Expandable Family of Evaluations for AI Models

Dingli Yu1, Simran Kaur1, Arushi Gupta1, Jonah Brown-Cohen2, Anirudh Goyal2, Sanjeev Arora1

1Princeton Language and Intelligence (PLI), Princeton University

2Google DeepMind
Grader: GPT-4
Metrics are reported for each student model at k = 2, 3, 4. Evaluations on k = 5, 6 are skipped if the metric drops below 0.05 with smaller k. We consider combinations with uncommon skills whose occurrence rate in RedPajama is less than 5%, and deduct points for skills whose name is mentioned in the text. For each k, the highest score is highlighted in orange.
Student (generator) k=2 k=3 k=4 k=5 k=6

Contributing a model

Please see instructions for contributing a model and then fill out this form!

BibTeX


        @inproceedings{DBLP:journals/corr/abs-2310-17567,
          author       = {Dingli Yu and
                          Simran Kaur and
                          Arushi Gupta and
                          Jonah Brown{-}Cohen and
                          Anirudh Goyal and
                          Sanjeev Arora},
          title        = {Skill-Mix: a Flexible and Expandable Family of Evaluations for {AI} models},
          booktitle    = {The Twelfth International Conference on Learning Representations,
                          {ICLR} 2024, Vienna, Austria, May 7-11, 2023},
          year         = {2024}       
        }