This code is for a Discord bot that uses OpenAI's GPT-3 language model (self hosted at home) to generate responses to user messages. It listens for messages in two specified Discord channels, and when a user sends a message, it appends it to the conversation history and sends it to the GPT-3 API to generate a response. The response is then sent back to the user in the same channel. The bot uses the Node.js discord.js library to interact with the Discord API and the node-fetch library to make HTTP requests to the GPT-3 API.
If the response is not empty, send it back to the user in the same channel. If it is empty, send a reset message and delete the conversation history for that user.
Define a generateResponse function that sends a request to the GPT-3 API to generate a response. If the request times out or an error occurs, handle it accordingly.
The HTTP Server from https://abetlen.github.io/llama-cpp-python/ is required to use this bot.
llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack.
1.`git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo
2.`mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support
3.`mv Dockerfile Dockerfile.nongpu; mv Dockerfile.gpu Dockerfile;` - Move nongpu Dockerfile out of the way, enable GPU Support
3.`cp default.gpu.env .env` - Copy the default GPU .env to its proper location
4. Set DATA_DIR in .env to the exact location of your model files.
5. Edit docker-compose.yaml MODEL to ensure the correct model bin is set
6. set N_GPU_LAYERS to the amount of layers you would like to export to GPU