llama-cpp-python-djs-bot/README.md

# llama-cpp-python-djs-bot

THIS CODE IS MEANT TO BE SELF HOSTED USING THE LIBRARY: https://abetlen.github.io/llama-cpp-python/

# Description

This code is for a Discord bot that uses OpenAI's GPT-3 language model (self hosted at home) to generate responses to user messages. It listens for messages in two specified Discord channels, and when a user sends a message, it appends it to the conversation history and sends it to the GPT-3 API to generate a response. The response is then sent back to the user in the same channel. The bot uses the Node.js discord.js library to interact with the Discord API and the node-fetch library to make HTTP requests to the GPT-3 API.

Here is a summary of the main parts of the code:

Import required modules and set environment variables using dotenv.

Create a new Client instance and set the intents and partials.

Define two channel IDs that the bot will listen to.

Create a Map to store ongoing conversations with users.

Define functions to update the bot's presence status, check if any conversation is busy, and set a conversation as busy or not busy.

Listen for the ready event and update the bot's presence status.

Listen for the messageCreate event and respond to messages that are sent in the specified channels.

When a message is received, check if any conversation is busy. If so, delete the message and send a busy response to the user.

If no conversation is busy, append the user message to the conversation history and send it to the GPT-3 API to generate a response.

If the response is not empty, send it back to the user in the same channel. If it is empty, send a reset message and delete the conversation history for that user.

Define a generateResponse function that sends a request to the GPT-3 API to generate a response. If the request times out or an error occurs, handle it accordingly.

Call the generateResponse function within the messageCreate event listener function.

![demo](https://media.discordapp.net/attachments/562897071326101515/1095738407826767922/image.png?width=1038&height=660 "demo")


# Backend REQUIIRED

The HTTP Server from https://abetlen.github.io/llama-cpp-python/ is required to use this bot.

llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).

To install the server package and get started:

pip install llama-cpp-python[server]

export MODEL=./models/your_model.py

python3 -m llama_cpp.server

Navigate to http://localhost:8000/docs to see the OpenAPI documentation.

# Static Usage

1) Use ```npm i ```

2) Create a .env file ```cp default.env .env```

3) Edit .env for your needs

4) Go into https://discord.com/developers/applications and enable Privileged Intents.

6) Run the bot ```node llamabot.js	```

# Docker Compose 
This will automatically configure the API for you as well as the bot in two seperate containers within a stack.

1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git`


2. `cp default.env .env`

3. Set DATA_DIR in .env to the exact location of your model files.

4. Edit docker-compose.yaml MODEL to ensure the correct model bin is set

5. `docker compose up -d`


# Docker Compose with GPU
This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack.

NOTE: Caching for GPU has been fixed.

1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo

2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support

3. `mv Dockerfile Dockerfile.nongpu; mv Dockerfile.gpu Dockerfile;` - Move nongpu Dockerfile out of the way, enable GPU Support

3. `cp default.gpu.env .env` - Copy the default GPU .env to its proper location

4. Set DATA_DIR in .env to the exact location of your model files.

5. Edit docker-compose.yaml MODEL to ensure the correct model bin is set

6. set N_GPU_LAYERS to the amount of layers you would like to export to GPU

7. `docker compose up -d`


Want to make this better? Issue a pull request!
add readme 2023-04-09 23:32:13 -04:00			`# llama-cpp-python-djs-bot`

			`THIS CODE IS MEANT TO BE SELF HOSTED USING THE LIBRARY: https://abetlen.github.io/llama-cpp-python/`

update readme 2023-04-09 23:34:15 -04:00			`# Description`
add readme 2023-04-09 23:32:13 -04:00
update readme 2023-04-09 23:33:34 -04:00			This code is for a Discord bot that uses OpenAI's GPT-3 language model (self hosted at home) to generate responses to user messages. It listens for messages in two specified Discord channels, and when a user sends a message, it appends it to the conversation history and sends it to the GPT-3 API to generate a response. The response is then sent back to the user in the same channel. The bot uses the Node.js discord.js library to interact with the Discord API and the node-fetch library to make HTTP requests to the GPT-3 API.
add readme 2023-04-09 23:32:13 -04:00
			`Here is a summary of the main parts of the code:`

			`Import required modules and set environment variables using dotenv.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Create a new Client instance and set the intents and partials.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Define two channel IDs that the bot will listen to.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Create a Map to store ongoing conversations with users.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Define functions to update the bot's presence status, check if any conversation is busy, and set a conversation as busy or not busy.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Listen for the ready event and update the bot's presence status.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Listen for the messageCreate event and respond to messages that are sent in the specified channels.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`When a message is received, check if any conversation is busy. If so, delete the message and send a busy response to the user.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`If no conversation is busy, append the user message to the conversation history and send it to the GPT-3 API to generate a response.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`If the response is not empty, send it back to the user in the same channel. If it is empty, send a reset message and delete the conversation history for that user.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Define a generateResponse function that sends a request to the GPT-3 API to generate a response. If the request times out or an error occurs, handle it accordingly.`
update readme 2023-04-09 23:33:34 -04:00
add readme 2023-04-09 23:32:13 -04:00			`Call the generateResponse function within the messageCreate event listener function.`

update readme to add screenshot 2023-04-12 12:29:51 -04:00			`![demo](https://media.discordapp.net/attachments/562897071326101515/1095738407826767922/image.png?width=1038&height=660 "demo")`


update readme 2023-04-11 17:46:58 -04:00			`# Backend REQUIIRED`

			`The HTTP Server from https://abetlen.github.io/llama-cpp-python/ is required to use this bot.`

			`llama-cpp-python offers a web server which aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).`

			`To install the server package and get started:`

			`pip install llama-cpp-python[server]`
update readme 2023-04-11 17:47:21 -04:00
update readme 2023-04-11 17:46:58 -04:00			`export MODEL=./models/your_model.py`
update readme 2023-04-11 17:47:21 -04:00
update readme 2023-04-11 17:46:58 -04:00			`python3 -m llama_cpp.server`
update readme 2023-04-11 17:47:21 -04:00
update readme 2023-04-11 17:46:58 -04:00			`Navigate to http://localhost:8000/docs to see the OpenAPI documentation.`

update readme 2023-04-12 10:30:59 -04:00			`# Static Usage`
add readme 2023-04-09 23:32:13 -04:00
			1) Use ```npm i ```

			2) Create a .env file ```cp default.env .env```

			`3) Edit .env for your needs`

			`4) Go into https://discord.com/developers/applications and enable Privileged Intents.`

			6) Run the bot ```node llamabot.js ```

update readme to add screenshot 2023-04-12 12:29:51 -04:00			`# Docker Compose`
update readme 2023-04-12 12:37:02 -04:00			`This will automatically configure the API for you as well as the bot in two seperate containers within a stack.`
update readme 2023-04-12 10:30:59 -04:00
			1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git`


			2. `cp default.env .env`

			`3. Set DATA_DIR in .env to the exact location of your model files.`

			`4. Edit docker-compose.yaml MODEL to ensure the correct model bin is set`

			5. `docker compose up -d`

add readme 2023-04-09 23:32:13 -04:00
adding NVIDIA GPU Support with Stats 2023-05-19 15:32:21 -04:00			`# Docker Compose with GPU`
			`This will automatically configure the API that supports cuBLAS and GPU inference for you as well as the bot in two seperate containers within a stack.`

update readme 2023-05-26 20:02:38 -04:00			`NOTE: Caching for GPU has been fixed.`
Add warning about caching with cuBLAS 2023-05-20 17:47:16 -04:00
adding NVIDIA GPU Support with Stats 2023-05-19 15:32:21 -04:00			1. `git clone https://git.ssh.surf/snxraven/llama-cpp-python-djs-bot.git` - Clone the repo

			2. `mv docker-compose.yml docker-compose.nogpu.yml; mv docker-compose.gpu.yml docker-compose.yml;` - Move nongpu compose out of the way, Enable GPU Support

			3. `mv Dockerfile Dockerfile.nongpu; mv Dockerfile.gpu Dockerfile;` - Move nongpu Dockerfile out of the way, enable GPU Support

			3. `cp default.gpu.env .env` - Copy the default GPU .env to its proper location

			`4. Set DATA_DIR in .env to the exact location of your model files.`

			`5. Edit docker-compose.yaml MODEL to ensure the correct model bin is set`

			`6. set N_GPU_LAYERS to the amount of layers you would like to export to GPU`

			7. `docker compose up -d`


add readme 2023-04-09 23:32:13 -04:00			`Want to make this better? Issue a pull request!`