Koboldcpp should allow you to run much larger models with a little bit of ram offloading. There’s a fork that supports rocm for AMD cards: https://github.com/YellowRoseCx/koboldcpp-rocm
Make sure to use quantized models for the best performace, q4k_M being the standard.
What’s the deal with Alpine not using GNU? Is it a technical or ideological thing? Or is it another “because we can” type distro?