llama-cpp-python-djs-bot/README.md

104 lines
4.2 KiB
Markdown
Raw Normal View History

2023-04-09 23:32:13 -04:00
# llama-cpp-python-djs-bot
THIS CODE IS MEANT TO BE SELF HOSTED USING THE LIBRARY: https://abetlen.github.io/llama-cpp-python/
2023-04-09 23:34:15 -04:00
# Description
2023-04-09 23:32:13 -04:00
2023-04-09 23:33:34 -04:00
This code is for a Discord bot that uses OpenAI's GPT-3 language model (self hosted at home) to generate responses to user messages. It listens for messages in two specified Discord channels, and when a user sends a message, it appends it to the conversation history and sends it to the GPT-3 API to generate a response. The response is then sent back to the user in the same channel. The bot uses the Node.js discord.js library to interact with the Discord API and the node-fetch library to make HTTP requests to the GPT-3 API.
2023-04-09 23:32:13 -04:00
Here is a summary of the main parts of the code:
Import required modules and set environment variables using dotenv.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Create a new Client instance and set the intents and partials.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Define two channel IDs that the bot will listen to.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Create a Map to store ongoing conversations with users.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Define functions to update the bot's presence status, check if any conversation is busy, and set a conversation as busy or not busy.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Listen for the ready event and update the bot's presence status.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Listen for the messageCreate event and respond to messages that are sent in the specified channels.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
When a message is received, check if any conversation is busy. If so, delete the message and send a busy response to the user.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
If no conversation is busy, append the user message to the conversation history and send it to the GPT-3 API to generate a response.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
If the response is not empty, send it back to the user in the same channel. If it is empty, send a reset message and delete the conversation history for that user.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Define a generateResponse function that sends a request to the GPT-3 API to generate a response. If the request times out or an error occurs, handle it accordingly.
2023-04-09 23:33:34 -04:00
2023-04-09 23:32:13 -04:00
Call the generateResponse function within the messageCreate event listener function.
2023-04-12 12:29:51 -04:00
![demo](https://media.discordapp.net/attachments/562897071326101515/1095738407826767922/image.png?width=1038&height=660 "demo")
2023-04-11 17:46:58 -04:00
# Backend REQUIIRED
The HTTP Server from https://abetlen.github.io/llama-cpp-python/ is required to use this bot.
llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
To install the server package and get started:
pip install llama-cpp-python[server]
2023-04-11 17:47:21 -04:00
2023-04-11 17:46:58 -04:00
export MODEL=./models/your_model.py
2023-04-11 17:47:21 -04:00
2023-04-11 17:46:58 -04:00
python3 -m llama_cpp.server
2023-04-11 17:47:21 -04:00
2023-04-11 17:46:58 -04:00
Navigate to http://localhost:8000/docs to see the OpenAPI documentation.
2023-04-12 10:30:59 -04:00
# Static Usage
2023-04-09 23:32:13 -04:00
1) Use ```npm i ```
2) Create a .env file ```cp default.env .env```
3) Edit .env for your needs
4) Go into https://discord.com/developers/applications and enable Privileged Intents.
6) Run the bot ```node llamabot.js ```
2023-04-12 12:29:51 -04:00
# Docker Compose
2023-04-12 12:37:02 -04:00
This will automatically configure the API for you as well as the bot in two seperate containers within a stack.
2023-04-12 10:30:59 -04:00
1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git`
2. `cp default.env .env`
3. Set DATA_DIR in .env to the exact location of your model files.
4. Edit docker-compose.yaml MODEL to ensure the correct model bin is set
5. `docker compose up -d`
2023-04-09 23:32:13 -04:00
2023-05-19 15:32:21 -04:00
# Docker Compose with GPU
This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack.
2023-05-26 20:02:38 -04:00
NOTE: Caching for GPU has been fixed.
2023-05-20 17:47:16 -04:00
2023-05-19 15:32:21 -04:00
1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo
2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support
3. `mv Dockerfile Dockerfile.nongpu; mv Dockerfile.gpu Dockerfile;` - Move nongpu Dockerfile out of the way, enable GPU Support
3. `cp default.gpu.env .env` - Copy the default GPU .env to its proper location
4. Set DATA_DIR in .env to the exact location of your model files.
5. Edit docker-compose.yaml MODEL to ensure the correct model bin is set
6. set N_GPU_LAYERS to the amount of layers you would like to export to GPU
7. `docker compose up -d`
2023-04-09 23:32:13 -04:00
Want to make this better? Issue a pull request!