Add warning about caching with cuBLAS

This commit is contained in:
Raven Scott 2023-05-20 23:47:16 +02:00
parent 668b343cbb
commit 927b5c834d

View File

@ -81,6 +81,8 @@ This will automatically configure the API for you as well as the bot in two sepe
# Docker Compose with GPU # Docker Compose with GPU
This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack. This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack.
NOTE: Caching is currently broken for cuBLAS: https://github.com/abetlen/llama-cpp-python/issues/253
1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo 1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo
2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support 2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support