This repository contains two interdependent Node.js applications, `ai_log.js` and `ai_log_backend.js`. These applications collaborate to monitor NGINX logs, detect potential security threats, and manage conversation history for AI-based log analysis. This README provides a comprehensive guide for setting up, configuring, and operating these applications, along with detailed explanations of the underlying mechanisms and customizable features.
The AI Log Monitoring System is a powerful and extensible solution designed to enhance security monitoring by integrating AI-based analysis into traditional NGINX log management. By continuously tailing log files and utilizing an AI model to analyze log entries in real-time, this system automates the detection of security threats, sends alerts to designated channels (like Discord), and manages conversation history for ongoing log analysis.
The solution is composed of two primary scripts:
-`ai_log.js`: Handles log monitoring, buffering, and sending logs to a backend for AI processing.
-`ai_log_backend.js`: Manages AI-based analysis, conversation history, and exposes an API for interacting with the AI service.
## Prerequisites
Before setting up and running the AI Log Monitoring System, ensure you have the following components installed and configured on your machine:
- **Node.js**: Version 14.x or higher is required to run the JavaScript code.
- **npm**: Version 6.x or higher is needed to manage project dependencies.
- **Docker**: Required for running the AI model, particularly if using a containerized GPT model for processing.
- **NGINX**: The web server generating logs that the system will monitor.
- **Discord**: A Discord webhook URL is necessary for sending security alerts.
Ensure you have administrative privileges on the machine to install and configure these dependencies.
Environment variables play a critical role in configuring the behavior of the AI Log Monitoring System. They allow you to customize log directories, adjust token limits, and set up necessary credentials. Create a `.env` file in the root of your project and populate it with the following variables:
```bash
# General Configuration
DEBUG=true # Enable detailed logging output for debugging
WEBHOOKURL=https://discord.com/api/webhooks/your-webhook-id # Discord webhook URL for sending alerts
MAX_CONTENT_LENGTH=2000 # Maximum length for content scraped from web pages
# AI Service Configuration
TIMEOUT_DURATION=100000 # Maximum duration (in ms) for API requests to complete
MAX_TOKENS=8000 # Maximum tokens allowed in a conversation before trimming
TOLERANCE=100 # Extra tokens allowed before forcing a trim of conversation history
```
**Explanation of Environment Variables**:
- **DEBUG**: Enables verbose logging, including debug-level messages. Useful during development and troubleshooting.
- **WEBHOOKURL**: The Discord webhook URL where alerts will be sent. This should be a valid and secure URL.
- **MAX_CONTENT_LENGTH**: Limits the length of content extracted from web pages to avoid overloading the system or sending excessively long messages.
- **TIMEOUT_DURATION**: Sets a timeout for requests to the AI service, preventing the system from hanging indefinitely.
- **MAX_TOKENS**: Controls the total number of tokens (words + punctuation) allowed in a conversation with the AI. This prevents the AI from consuming too much memory or processing power.
- **TOLERANCE**: A buffer to avoid hitting the MAX_TOKENS limit exactly, ensuring smoother operation by providing a cushion before trimming conversation history.
The system allows you to specify IP addresses and subnets that should be ignored during log monitoring. This is particularly useful for filtering out trusted sources or known harmless traffic (e.g., public DNS servers).
- **Ignored IPs**: Directly listed IP addresses that should be skipped during processing.
- **Ignored Subnets**: Subnets specified in CIDR notation that represent ranges of IP addresses to be ignored.
To customize the ignored IPs and subnets, modify the `ignoredIPs` and `ignoredSubnets` arrays in `ai_log.js`:
The `ai_log.js` script uses the `Tail` module to monitor NGINX log files in real-time.
As new lines are added to the logs, the script reads and buffers them. The buffer size is configurable, allowing the system to batch-process logs before sending them to the backend.
- **Webhook-Based Alerts**: The system uses a Discord webhook to send alerts as embedded messages.
- **Formatted Messages**: Alerts are formatted with titles, descriptions, and timestamps to ensure they are easy to read and understand.
- **IP Banning Alerts**: When an IP is banned, the system includes the IP address in the Discord alert.
## API Endpoints
The backend server (`ai_log_backend.js`) exposes several API endpoints for interacting with the AI service, managing conversation history, and controlling the system.
### POST /api/v1/chat
This endpoint processes incoming NGINX logs by sending them to the AI model for analysis.
- **Request Body**:
-`message`: A string containing one or more NGINX log lines.
- **Response**:
- A JSON object with the AI's analysis and any detected alerts.
Logging is a critical component of the AI Log Monitoring System, providing insights into system operations, debugging information, and records of security actions.
### Log Levels
The system categorizes logs into different levels to help you quickly identify the nature and severity of messages:
- **INFO**: General information about the system's operations.
- **WARN**: Indications of potential issues that may require attention.
- **ERROR**: Logs generated when an error occurs during processing.
- **SUCCESS**: Messages that indicate successful operations or actions.
- **DEBUG**: Detailed messages intended for debugging purposes, enabled when `DEBUG=true`.
### Log Structure
Each log message includes a timestamp, log level, and message content. For example:
```plaintext
2024-03-12 10:12:33 [INFO] Starting to read log from: /dockerData/logs/access.log
To protect your infrastructure from repeated attacks, the system can automatically ban IP addresses identified as malicious.
The banning process is executed via shell commands, and the system includes a delay mechanism to prevent overloading the network with too many ban requests in a short period.
The AI Log Monitoring System is designed to be efficient and scalable, handling large volumes of log data with minimal overhead. However, some optimizations can further enhance performance, especially in high-traffic environments.
### Managing Token Limits
By carefully managing the number of tokens in the AI's conversation history, the system prevents memory overuse and ensures faster response times. The `MAX_TOKENS` and `TOLERANCE` variables allow you to fine-tune this behavior.
### Efficient Log Parsing
The system uses regular expressions and filtering techniques to parse and analyze log files efficiently. By focusing only on relevant log entries and ignoring unnecessary ones, the system reduces processing time and improves accuracy.
## Troubleshooting
If you encounter issues while using the AI Log Monitoring System, this section provides guidance on common problems and how to resolve them.
### Common Issues
- **No Logs Detected**: Ensure that the log directory is correctly specified and that log files are present.
- **AI Service Unresponsive**: Restart the AI service using the `/api/v1/restart-core` endpoint.
- **Excessive False Positives**: Review the ignored IPs and subnets to ensure that known safe traffic is excluded.
### Restarting Services
If the AI service becomes unresponsive or you need to apply changes, use the `/api/v1/restart-core` endpoint to restart the core AI Docker container. This refreshes the model and clears any stale states.
### Log Analysis
Review the logs generated by the system to identify potential issues. Focus on `ERROR` and `WARN` level messages to spot critical problems quickly. Use the `DEBUG` logs for deeper investigation during development or when troubleshooting specific issues.
## Customization
The AI Log Monitoring System is highly customizable, allowing you to tailor its behavior to your specific needs.
You can extend the log parsing functionality by adding new regular expressions or parsing logic to handle different log formats or detect new types of threats.