I’m not really convinced that a GPU backend is needed. Was there ever a comparison of the different CLIP model variants? Or a graph optimized / quantized ONNX version?
I think the proposed solution makes a lot of sense for the task at hand if it were integrated on the pic-rs end, but it would be worth investigating further improvements if it were on the lemmy server end.
I’m not really convinced that a GPU backend is needed. Was there ever a comparison of the different CLIP model variants? Or a graph optimized / quantized ONNX version?
I think the proposed solution makes a lot of sense for the task at hand if it were integrated on the pic-rs end, but it would be worth investigating further improvements if it were on the lemmy server end.