Running Llama on Windows 98

Written by Phil Emmerson | Jan 2, 2025 3:03:34 PM

Hello and welcome to the Netglass.io blog! We're excited to embark on this journey of sharing insights and innovations in the IT realm with you.

Recently, we came across an intriguing experiment by EXO Labs, where they successfully ran a modern AI language model, Llama, on a vintage Windows 98 Pentium II machine.

This endeavor involved several challenges, including sourcing compatible hardware, transferring necessary files via FTP, and adapting modern code to compile on outdated systems.

The team utilized Borland C++ 5.02, a 26-year-old IDE, to modify Andrej Karpathy's llama2.c, enabling it to function on the legacy hardware.

The results were impressive:

stories260K model: 39.31 tokens/second
stories15M model: 1.03 tokens/second
Llama 3.2 1B model: 0.0093 tokens/second
This achievement underscores the potential for running advanced AI models on hardware once considered obsolete, aligning with EXO's mission to make frontier AI accessible beyond traditional data centers.

Their work also highlights the importance of efficient model architectures, such as BitNet, which utilizes ternary weights to significantly reduce storage requirements and enhance energy efficiency. This approach could enable large models to operate on minimal hardware, further democratizing AI technology.

At Netglass.io, we're inspired by such innovative approaches that challenge conventional boundaries in technology.

Stay tuned for more insights and discussions on the ever-evolving landscape of technology.

Sources
https://blog.exolabs.net/day-4

View full post