https://github.com/tloen/llama-int8
https://github.com/facebookresearch/llama/issues/79#issuecomment-1454687232
This is a fork of the LLaMA code that runs LLaMA-13B comfortably within 24 GiB of RAM
https://github.com/go-noah/llama
LLaMA 7B example