Turkish from İstanbul
Open source enthusiast
Same thing. Inference just uses a lot less memory.
7900XTX
is it with Automatic1111?
Things have changed.
I can now run mistral on my intel iGPU using Vulkan.
At least the Arch people are not shilling for some corp.
Flatpak is crazy inefficient, but at least I can get software that is not yet on distro repos. It will get better.
I use ubuntu because my provider has that by default. It’s not my favourite distro these days, but gets the job done.
I had a similar question: https://lemmy.ml/post/12919434
Docker makes sense if you are deploying thousands of machines in the cloud. I don’t think it makes as much sense if you have your own hardware.
Some services do have 1-line installers with docker, so those might be useful. But they usually have 1-line non-docker installers too.
You can’t shrink a model to 1/8 the size and expect it to run at the same quality. Quantization allows me to move from a cloud gpu to my laptops crappy cpu/igpu, so I’m ok with that tradeoff.
The most exciting part is that they plan to release multiple models, so I will probably be able to run it on my laptop
llama.cpp quantizes the heck out of language models, which allows consumer cpus to run them. my laptop can run most 7b or 13b LLMs with 4bit quantization (and they are trying to push the level of quantization even further to 2 or 1.5 bits!)
The same will happen with stable diffusion. Most SD models are still around fp16 levels of quantization, and will soon be going lower. I expect we’ll all be running SDXL or larger models on our laptop CPUs without breaking a sweat at 4bit level.
orange pi looks very promising