From 927b5c834d75c3aa69dbb1f624d90b2a72b2a88f Mon Sep 17 00:00:00 2001 From: Raven Scott Date: Sat, 20 May 2023 23:47:16 +0200 Subject: [PATCH] Add warning about caching with cuBLAS --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 00b3441..a84c937 100644 --- a/README.md +++ b/README.md @@ -81,6 +81,8 @@ This will automatically configure the API for you as well as the bot in two sepe # Docker Compose with GPU This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack. +NOTE: Caching is currently broken for cuBLAS: https://github.com/abetlen/llama-cpp-python/issues/253 + 1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo 2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support