• 0 Posts
  • 77 Comments
Joined 2 years ago
cake
Cake day: July 14th, 2023

help-circle
  • Look up “LLM quantization.” The idea is that each parameter is a number; by default they use 16 bits of precision, but if you scale them into smaller sizes, you use less space and have less precision, but you still have the same parameters. There’s not much quality loss going from 16 bits to 8, but it gets more noticeable as you get lower and lower. (That said, there’s are ternary bit models being trained from scratch that use 1.58 bits per parameter and are allegedly just as good as fp16 models of the same parameter count.)

    If you’re using a 4-bit quantization, then you need about half that number in VRAM. Q4_K_M is better than Q4, but also a bit larger. Ollama generally defaults to Q4_K_M. If you can handle a higher quantization, Q6_K is generally best. If you can’t quite fit it, Q5_K_M is generally better than any other option, followed by Q5_K_S.

    For example, Llama3.3 70B, which has 70.6 billion parameters, has the following sizes for some of its quantizations:

    • q4_K_M (the default): 43 GB
    • fp16: 141 GB
    • q8: 75 GB
    • q6_K: 58 GB
    • q5_k_m: 50 GB
    • q4: 40 GB
    • q3_K_M: 34 GB
    • q2_K: 26 GB

    This is why I run a lot of Q4_K_M 70B models on two 3090s.

    Generally speaking, there’s not a perceptible quality drop going to Q6_K from 8 bit quantization (though I have heard this is less true with MoE models). Below Q6, there’s a bit of a drop between it and 5 and then 4, but the model’s still decent. Below 4-bit quantizations you can generally get better results from a smaller parameter model at a higher quantization.

    TheBloke on Huggingface has a lot of GGUF quantization repos, and most, if not all of them, have a blurb about the different quantization types and which are recommended. When Ollama.com doesn’t have a model I want, I’m generally able to find one there.


  • I recommend a used 3090, as that has 24 GB of VRAM and generally can be found for $800ish or less (at least when I last checked, in February). It’s much cheaper than a 4090 and while admittedly more expensive than the inexpensive 24GB Nvidia Tesla card (the P40?) it also has much better performance and CUDA support.

    I have dual 3090s so my performance won’t translate directly to what a single GPU would get, but it’s pretty easy to find stats on 3090 performance.









  • Ah, I assumed there were some areas where Firefox had been found lacking relative to Chromium browsers.

    For me, the current version of any major browser or fork with consistently applied security updates and capability of running the full version of Ublock Origin is sufficiently secure for my threat model. Given that, and that they all offer the feature set I want, wouldn’t it be reasonable to avoid Chromium browsers because I don’t want to encourage the Chromium monopoly?

    That’s only a small fraction of why I use Firefox, to be fair, but suppose for argument’s sake that I don’t care about MV3 extensions, Firefox Containers, etc… Would be it be so wrong for not wanting there to be a Chromium monopoly to be why I chose Firefox or one of its forks?




  • Sure, but the license is limited to uses that “help you navigate, experience, and interact with online content as you indicate with your use of Firefox.”

    Not sure how ads would help with that.

    AI? Sure, if an AI solution did those things. But it wouldn’t be them training on your data. This would be them using your data in AI-powered services, whether that be search (especially relevant if Google is mandated to stop paying them to default to Google); automatic categorization of your web browsing to make Containers more streamlined and effective; or even just having a completely opt-in AI assistant chatbot that can access data entered elsewhere in Firefox once you activate it.

    Worst case I suspect whatever they add will be things you can simply turn off in settings. Ideally it would be opt-in, of course, or at least prompted-opt-out and disabled until first use.

    And there are plenty of things that aren’t ad or AI-related that this could apply to. Heck, this could be part of a step to consolidate licenses for other products - VPN, Pocket, email anonymizers, etc. - and to enable deeper integration of those into Firefox.




  • They put their repo first on the list.

    Right. And are we talking about the list for OBS or of repos in general? I doubt Fedora sets the priority on a package level. And if they don’t, and if there are some other packages in Flathub that are problematic, then it makes sense to prioritize their own repo over them.

    That said, if those problematic packages come from other repositories, or if not but there’s another alternative to putting their repo first that would have prevented unofficial builds from showing up first, but wouldn’t have deprioritized official, verified ones like OBS, then it’s a different story. I haven’t maintained a package on Flathub like the original commenter you replied to but I don’t get the impression that that’s the case.