PANews reported on March 21 that Tether announced the release of a cross-platform BitNet LoRA fine-tuning framework in QVAC Fabric, enabling optimizations for training and inference of Microsoft BitNet (1-bit LLM). This framework significantly reduces computing power and memory requirements, allowing billion-parameter models to be trained and fine-tuned on laptops, consumer-grade GPUs, and smartphones.
This solution is the first to enable fine-tuning of the BitNet model on mobile GPUs (including Adreno, Mali, and Apple Bionic). Tests show that a 125M parameter model can be fine-tuned in about 10 minutes, a 1B model in about 1 hour, and it can even be extended to a 13B parameter model on mobile devices.

Furthermore, the framework supports heterogeneous hardware such as Intel, AMD, and Apple Silicon, and for the first time achieves 1-bit LLM LoRA fine-tuning on non-NVIDIA devices. In terms of performance, the BitNet model achieves inference speeds of 2 to 11 times faster on mobile GPUs than on CPUs, while reducing memory usage by up to approximately 77.8% compared to traditional 16-bit models.
Tether stated that this technology has the potential to break the dependence on high-end computing power and cloud infrastructure, promote the development of AI training towards decentralization and localization, and provide a foundation for new application scenarios such as federated learning.


