🎮ByteCraft: Generating video games and animations through bytes

Contents: ByteCraft, Examples, The future
ByteCraft
Imagine a world where you can write a prompt describing a video game or animation that you want, and a fully fledged executable file comes out. We take the first attempt at this crazy goal by training a model to generate the bytes of video games and animations!

Our model, 🎮ByteCraft, was made by fine-tuning a 7B parameter LLM (Qwen2.5) at 32K generation context length on 4 GPUs for 4 months to generate the bytes of video games and animations conditional on a text description of the desired file. The file can then be saved and read on your computer!
Working in the byte world is extremely challenging because a single wrong byte can break the whole functioning of the file. Still, ByteCraft can generate some semi-functional and fully working files. The model is imperfect, but the fact that it can generate diverse readable files shows that the model has some understanding of bytes.
A file of 32Kb represents 32K tokens at the byte level. To alleviate this problem, we use Byte-Pair-Encoding (BPE) to encode bytes into tokens containing, on average, 2.29 bytes and, at most, 4-5 bytes, allowing us to generate files as big as 140Kb with 32K tokens.
Examples of files generated by ByteCraft
There are 2 examples per section, click on them to start the file.
Note: If your browser doesn’t show the SWF properly, I included direct links. To view the SWF from direct links, install the Firefox/Chrome browser extension of Ruffle to see them directly in your browser, or download them on your computer and open them with the Ruffle app.
Moving checkered patterns (Direct links: 1, 2)
Working memorizations (Direct links: 1, 2)
Weird broken animations (Direct links: 1, 2)
Infinite loading (Direct links: 1, 2)
Characters (Direct links: 1, 2)
Sounds
Others (Direct links: 1, 2)
The future
A parallel exists between ByteCraft and autoregressive molecule generation. Molecules can be represented as SMILES strings and their context length is generally small (10-250 tokens without BPE). We show below some of the progress of molecule generation over time on the Zinc-250K dataset:
- (2016) GVAE: 0.7% valid molecules (<- ByteCraft is here)
- (2017) CVAE: 7.2% valid molecules
- (2018) RVAE: 34.9% valid molecules
- (2021) GFVAE, STGG, and many others: 100% valid molecules, but not always synthesizable
- (2025) STGG+AL: 100% valid molecules with high synthesizability and strong out-of-distribution properties (<- the future ByteCraftv3 is here)
ByteCraft is at the equivalent of GVAE for molecule generation in 2016 but on the much harder problem of generating games and animations at 32K context length. Considering the recent exponential progress in AI, we expect to rapidly move toward the goal of 100% valid generated novel files at high context length.
Keep in mind that this was trained on extremely limited hardware (4 GPUs for 4 months). Our method scales with compute. The ceiling is far from being reached; we are at the very first stage of a new paradigm.
We hope this crazy project inspires researchers and hobbyists toward the lofty goal of generating games through bytes.