@turkishdelight

turkishdelight@lemmy.ml · 3 months ago

orange pi looks very promising

turkishdelight@lemmy.ml · 4 months ago

Ubuntu 6.06 LTS

turkishdelight@lemmy.ml · 4 months ago

Same thing. Inference just uses a lot less memory.

turkishdelight@lemmy.ml · 4 months ago

7900XTX

is it with Automatic1111?

turkishdelight@lemmy.ml · 4 months ago

Things have changed.

I can now run mistral on my intel iGPU using Vulkan.

turkishdelight@lemmy.ml · 4 months ago

sad!

turkishdelight@lemmy.ml · 4 months ago

At least the Arch people are not shilling for some corp.

turkishdelight@lemmy.ml · 4 months ago

https://ollama.com/blog/amd-preview

turkishdelight@lemmy.ml · 4 months ago

Flatpak is crazy inefficient, but at least I can get software that is not yet on distro repos. It will get better.

turkishdelight@lemmy.ml · 4 months ago

I use ubuntu because my provider has that by default. It’s not my favourite distro these days, but gets the job done.

turkishdelight@lemmy.ml · 4 months ago

I had a similar question: https://lemmy.ml/post/12919434

turkishdelight@lemmy.ml · edit-2 4 months ago

Docker makes sense if you are deploying thousands of machines in the cloud. I don’t think it makes as much sense if you have your own hardware.

Some services do have 1-line installers with docker, so those might be useful. But they usually have 1-line non-docker installers too.

turkishdelight@lemmy.ml · 4 months ago

You can’t shrink a model to 1/8 the size and expect it to run at the same quality. Quantization allows me to move from a cloud gpu to my laptops crappy cpu/igpu, so I’m ok with that tradeoff.

turkishdelight@lemmy.ml · 4 months ago

The most exciting part is that they plan to release multiple models, so I will probably be able to run it on my laptop

turkishdelight@lemmy.ml · 4 months ago

llama.cpp quantizes the heck out of language models, which allows consumer cpus to run them. my laptop can run most 7b or 13b LLMs with 4bit quantization (and they are trying to push the level of quantization even further to 2 or 1.5 bits!)

The same will happen with stable diffusion. Most SD models are still around fp16 levels of quantization, and will soon be going lower. I expect we’ll all be running SDXL or larger models on our laptop CPUs without breaking a sweat at 4bit level.