Introducing Codeset

We're excited to announce the launch of Codeset, a startup focused on accelerating the development of agentic code models through novel training and evaluation datasets.

Our first step towards this is a platform designed to provide easy access to large-scale datasets of reproducible, sandboxed environments of real-world software engineering tasks.

We believe that such a platform can add tremendous value in the ongoing expansion of RL environments beyond toy problems, making it possible to train code models on complex, real-world engineering tasks with ease. First, we aim to make interacting with real-world coding environments as easy as calling an API. Second, we aim to provide better and stonger verification while dramatically reducing the time it takes to verify solutions, eliminating a major bottleneck.

We're soon making our platform publicly available, together with a dataset composed of novel software engineering tasks not available in any other public dataset. This dataset will be available to use via the Codeset platform from day one.

Stay tuned for more updates, and don't hesitate to reach out if you have questions or suggestions.