forked from snxraven/ravenscott-blog
504 lines
19 KiB
Markdown
504 lines
19 KiB
Markdown
|
<!-- lead -->
|
||
|
Monitoring containerized applications is essential for ensuring optimal performance, diagnosing issues promptly, and maintaining overall system health.
|
||
|
|
||
|
In a dynamic environment where containers can be spun up or down based on demand, having a flexible and responsive monitoring solution becomes even more critical. This article delves into how we utilize the Netdata REST API to generate real-time, visually appealing graphs and an interactive dashboard for each container dynamically. By integrating technologies like Node.js, Express.js, Chart.js, Docker, and web sockets, we create a seamless monitoring experience that provides deep insights into container performance metrics.
|
||
|
|
||
|
## Example Dynamic Page
|
||
|
|
||
|
https://ssh42113405732790.syscall.lol/
|
||
|
|
||
|
|
||
|
## Introduction
|
||
|
|
||
|
As containerization becomes the backbone of modern application deployment, monitoring solutions need to adapt to the ephemeral nature of containers. Traditional monitoring tools may not provide the granularity or real-time feedback necessary for containerized environments. Netdata, with its powerful real-time monitoring capabilities and RESTful API, offers a robust solution for collecting and accessing performance metrics. By leveraging the Netdata REST API, we can fetch detailed metrics about CPU usage, memory consumption, network traffic, disk I/O, and running processes within each container.
|
||
|
|
||
|
Our goal is to create an interactive dashboard that not only displays these metrics in real-time but also provides users with the ability to interact with the data, such as filtering processes or adjusting timeframes. To achieve this, we build a backend server that interfaces with the Netdata API, processes the data, and serves it to the frontend where it's rendered using Chart.js and other web technologies.
|
||
|
|
||
|
## System Architecture
|
||
|
|
||
|
Understanding the system architecture is crucial to grasp how each component interacts to provide a cohesive monitoring solution. The architecture comprises several key components:
|
||
|
|
||
|
1. **Netdata Agent**: Installed on the host machine, it collects real-time performance metrics and exposes them via a RESTful API.
|
||
|
2. **Backend Server**: A Node.js application built with Express.js that serves as an intermediary between the Netdata API and the frontend clients.
|
||
|
3. **Interactive Dashboard**: A web interface that displays real-time graphs and system information, built using HTML, CSS, JavaScript, and libraries like Chart.js.
|
||
|
4. **Docker Integration**: Utilizing Dockerode, a Node.js Docker client, to interact with Docker containers, fetch process lists, and verify container existence.
|
||
|
5. **Proxy Server**: Routes incoming requests to the appropriate container's dashboard based on subdomain mapping.
|
||
|
6. **Discord Bot**: Allows users to request performance graphs directly from Discord, enhancing accessibility and user engagement.
|
||
|
|
||
|
### Data Flow
|
||
|
|
||
|
- The Netdata Agent continuously collects performance metrics and makes them available via its RESTful API.
|
||
|
- The Backend Server fetches data from the Netdata API based on requests from clients or scheduled intervals.
|
||
|
- The Interactive Dashboard requests data from the Backend Server, which processes and serves it in a format suitable for visualization.
|
||
|
- Docker Integration ensures that the system is aware of the running containers and can fetch container-specific data.
|
||
|
- The Proxy Server handles subdomain-based routing, directing users to the correct dashboard for their container.
|
||
|
- The Discord Bot interacts with the Backend Server to fetch graphs and sends them to users upon request.
|
||
|
|
||
|
## Backend Server Implementation
|
||
|
|
||
|
The backend server is the linchpin of our monitoring solution. It handles data fetching, processing, and serves as an API endpoint for the frontend dashboard and the Discord bot.
|
||
|
|
||
|
### Setting Up Express.js Server
|
||
|
|
||
|
We start by setting up an Express.js server that listens for incoming HTTP requests. The server is configured to handle Cross-Origin Resource Sharing (CORS) to allow requests from different origins, which is essential for serving the dashboard to users accessing it from various domains.
|
||
|
|
||
|
```javascript
|
||
|
const express = require('express');
|
||
|
const app = express();
|
||
|
const port = 6666;
|
||
|
|
||
|
app.use(cors()); // Enable CORS
|
||
|
app.listen(port, "0.0.0.0", () => {
|
||
|
console.log(`Server running on http://localhost:${port}`);
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Interacting with Netdata API
|
||
|
|
||
|
To fetch metrics from Netdata, we define a function that constructs the appropriate API endpoints based on the container ID and the desired timeframe.
|
||
|
|
||
|
```javascript
|
||
|
const axios = require('axios');
|
||
|
|
||
|
const getEndpoints = (containerId, timeframe) => {
|
||
|
const after = -(timeframe * 60); // Timeframe in seconds
|
||
|
return {
|
||
|
cpu: `http://netdata.local/api/v1/data?chart=cgroup_${containerId}.cpu&format=json&after=${after}`,
|
||
|
memory: `http://netdata.local/api/v1/data?chart=cgroup_${containerId}.mem_usage&format=json&after=${after}`,
|
||
|
// Additional endpoints for io, pids, network...
|
||
|
};
|
||
|
};
|
||
|
```
|
||
|
|
||
|
We then define a function to fetch data for a specific metric:
|
||
|
|
||
|
```javascript
|
||
|
const fetchMetricData = async (metric, containerId, timeframe = 5) => {
|
||
|
const endpoints = getEndpoints(containerId, timeframe);
|
||
|
try {
|
||
|
const response = await axios.get(endpoints[metric]);
|
||
|
return response.data;
|
||
|
} catch (error) {
|
||
|
console.error(`Error fetching ${metric} data for container ${containerId}:`, error);
|
||
|
throw new Error(`Failed to fetch ${metric} data.`);
|
||
|
}
|
||
|
};
|
||
|
```
|
||
|
|
||
|
### Data Processing
|
||
|
|
||
|
Once we have the raw data from Netdata, we need to process it to extract timestamps and values suitable for graphing. The data returned by Netdata is typically in a time-series format, with each entry containing a timestamp and one or more metric values.
|
||
|
|
||
|
```javascript
|
||
|
const extractMetrics = (data, metric) => {
|
||
|
const labels = data.data.map((entry) => new Date(entry[0] * 1000).toLocaleTimeString());
|
||
|
let values;
|
||
|
|
||
|
switch (metric) {
|
||
|
case 'cpu':
|
||
|
case 'memory':
|
||
|
case 'pids':
|
||
|
values = data.data.map(entry => entry[1]); // Adjust index based on metric specifics
|
||
|
break;
|
||
|
case 'io':
|
||
|
values = {
|
||
|
read: data.data.map(entry => entry[1]),
|
||
|
write: data.data.map(entry => entry[2]),
|
||
|
};
|
||
|
break;
|
||
|
case 'network':
|
||
|
values = {
|
||
|
received: data.data.map(entry => entry[1]),
|
||
|
sent: data.data.map(entry => entry[2]),
|
||
|
};
|
||
|
break;
|
||
|
default:
|
||
|
values = [];
|
||
|
}
|
||
|
|
||
|
return { labels, values };
|
||
|
};
|
||
|
```
|
||
|
|
||
|
### Graph Generation with Chart.js
|
||
|
|
||
|
To generate graphs, we use the `chartjs-node-canvas` library, which allows us to render Chart.js graphs server-side and output them as images.
|
||
|
|
||
|
```javascript
|
||
|
const { ChartJSNodeCanvas } = require('chartjs-node-canvas');
|
||
|
const chartJSMetricCanvas = new ChartJSNodeCanvas({ width: 1900, height: 400, backgroundColour: 'black' });
|
||
|
|
||
|
const generateMetricGraph = async (metricData, labels, label, borderColor) => {
|
||
|
const configuration = {
|
||
|
type: 'line',
|
||
|
data: {
|
||
|
labels: labels,
|
||
|
datasets: [{
|
||
|
label: label,
|
||
|
data: metricData,
|
||
|
borderColor: borderColor,
|
||
|
fill: false,
|
||
|
tension: 0.1,
|
||
|
}],
|
||
|
},
|
||
|
options: {
|
||
|
scales: {
|
||
|
x: {
|
||
|
title: {
|
||
|
display: true,
|
||
|
text: 'Time',
|
||
|
color: 'white',
|
||
|
},
|
||
|
},
|
||
|
y: {
|
||
|
title: {
|
||
|
display: true,
|
||
|
text: `${label} Usage`,
|
||
|
color: 'white',
|
||
|
},
|
||
|
},
|
||
|
},
|
||
|
plugins: {
|
||
|
legend: {
|
||
|
labels: {
|
||
|
color: 'white',
|
||
|
},
|
||
|
},
|
||
|
},
|
||
|
},
|
||
|
};
|
||
|
|
||
|
return chartJSMetricCanvas.renderToBuffer(configuration);
|
||
|
};
|
||
|
```
|
||
|
|
||
|
This function takes the metric data, labels, and graph styling options to produce a PNG image buffer of the graph, which can then be sent to clients or used in the dashboard.
|
||
|
|
||
|
### API Endpoints for Metrics
|
||
|
|
||
|
We define API endpoints for each metric that clients can request. For example, the CPU usage endpoint:
|
||
|
|
||
|
```javascript
|
||
|
app.get('/api/graph/cpu/:containerId', async (req, res) => {
|
||
|
const { containerId } = req.params;
|
||
|
const timeframe = parseInt(req.query.timeframe) || 5;
|
||
|
const format = req.query.format || 'graph';
|
||
|
|
||
|
try {
|
||
|
const data = await fetchMetricData('cpu', containerId, timeframe);
|
||
|
if (format === 'json') {
|
||
|
return res.json(data);
|
||
|
}
|
||
|
|
||
|
const { labels, values } = extractMetrics(data, 'cpu');
|
||
|
const imageBuffer = await generateMetricGraph(values, labels, 'CPU Usage (%)', 'rgba(255, 99, 132, 1)');
|
||
|
res.set('Content-Type', 'image/png');
|
||
|
res.send(imageBuffer);
|
||
|
} catch (error) {
|
||
|
res.status(500).send(`Error generating CPU graph: ${error.message}`);
|
||
|
}
|
||
|
});
|
||
|
```
|
||
|
|
||
|
Similar endpoints are created for memory, network, disk I/O, and PIDs.
|
||
|
|
||
|
### Full Report Generation
|
||
|
|
||
|
For users who want a comprehensive view of their container's performance, we offer a full report that combines all the individual graphs into one image.
|
||
|
|
||
|
```javascript
|
||
|
app.get('/api/graph/full-report/:containerId', async (req, res) => {
|
||
|
// Fetch data for all metrics
|
||
|
// Generate graphs for each metric
|
||
|
// Combine graphs into a single image using Canvas
|
||
|
// Send the final image to the client
|
||
|
});
|
||
|
```
|
||
|
|
||
|
By using the `canvas` and `loadImage` modules, we can composite multiple graphs into a single image, adding titles and styling as needed.
|
||
|
|
||
|
## Interactive Dashboard
|
||
|
|
||
|
The interactive dashboard provides users with real-time insights into their container's performance. It is designed to be responsive, visually appealing, and informative.
|
||
|
|
||
|
### Live Data Updates
|
||
|
|
||
|
To achieve real-time updates, we use client-side JavaScript to periodically fetch the latest data from the backend server. We use `setInterval` to schedule data fetches every second or at a suitable interval based on performance considerations.
|
||
|
|
||
|
```html
|
||
|
<script>
|
||
|
async function updateGraphs() {
|
||
|
const response = await fetch(`/api/graph/full-report/${containerId}?format=json&timeframe=1`);
|
||
|
const data = await response.json();
|
||
|
// Update charts with new data
|
||
|
}
|
||
|
|
||
|
setInterval(updateGraphs, 1000);
|
||
|
</script>
|
||
|
```
|
||
|
|
||
|
### Chart.js Integration
|
||
|
|
||
|
We use Chart.js on the client side to render graphs directly in the browser. This allows for smooth animations and interactivity.
|
||
|
|
||
|
```javascript
|
||
|
const cpuChart = new Chart(cpuCtx, {
|
||
|
type: 'line',
|
||
|
data: {
|
||
|
labels: [],
|
||
|
datasets: [{
|
||
|
label: 'CPU Usage (%)',
|
||
|
data: [],
|
||
|
borderColor: 'rgba(255, 99, 132, 1)',
|
||
|
borderWidth: 2,
|
||
|
pointRadius: 3,
|
||
|
fill: false,
|
||
|
}]
|
||
|
},
|
||
|
options: {
|
||
|
animation: { duration: 500 },
|
||
|
responsive: true,
|
||
|
maintainAspectRatio: false,
|
||
|
scales: {
|
||
|
x: { grid: { color: 'rgba(255, 255, 255, 0.1)' } },
|
||
|
y: { grid: { color: 'rgba(255, 255, 255, 0.1)' } }
|
||
|
},
|
||
|
plugins: { legend: { display: false } }
|
||
|
}
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Process List Display
|
||
|
|
||
|
An essential aspect of container monitoring is understanding what processes are running inside the container. We fetch the process list using Docker's API and display it in a searchable table.
|
||
|
|
||
|
```javascript
|
||
|
// Backend endpoint
|
||
|
app.get('/api/processes/:containerId', async (req, res) => {
|
||
|
const { containerId } = req.params;
|
||
|
try {
|
||
|
const container = docker.getContainer(containerId);
|
||
|
const processes = await container.top();
|
||
|
res.json(processes.Processes || []);
|
||
|
} catch (err) {
|
||
|
console.error(`Error fetching processes for container ${containerId}:`, err);
|
||
|
res.status(500).json({ error: 'Failed to fetch processes' });
|
||
|
}
|
||
|
});
|
||
|
|
||
|
// Client-side function to update the process list
|
||
|
async function updateProcessList() {
|
||
|
const processResponse = await fetch(`/api/processes/${containerId}`);
|
||
|
const processList = await processResponse.json();
|
||
|
// Render the process list in the table
|
||
|
}
|
||
|
```
|
||
|
|
||
|
We enhance the user experience by adding a search box that allows users to filter the processes by PID, user, or command.
|
||
|
|
||
|
### Visual Enhancements
|
||
|
|
||
|
To make the dashboard more engaging, we incorporate visual elements like particle effects using libraries like `particles.js`. We also apply a dark theme with styling that emphasizes the data visualizations.
|
||
|
|
||
|
```css
|
||
|
body {
|
||
|
background-color: #1c1c1c;
|
||
|
color: white;
|
||
|
font-family: Arial, sans-serif;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Responsive Design
|
||
|
|
||
|
Using Bootstrap and custom CSS, we ensure that the dashboard is responsive and accessible on various devices and screen sizes.
|
||
|
|
||
|
```html
|
||
|
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet">
|
||
|
<div class="container mt-4">
|
||
|
<!-- Dashboard content -->
|
||
|
</div>
|
||
|
```
|
||
|
|
||
|
## Docker Integration
|
||
|
|
||
|
Docker plays a pivotal role in our system, not just for running the containers but also for providing data about them.
|
||
|
|
||
|
### Fetching Container Information
|
||
|
|
||
|
We use the `dockerode` library to interact with Docker:
|
||
|
|
||
|
```javascript
|
||
|
const Docker = require('dockerode');
|
||
|
const docker = new Docker();
|
||
|
|
||
|
async function containerExists(subdomain) {
|
||
|
try {
|
||
|
const containers = await docker.listContainers();
|
||
|
return containers.some(container => container.Names.some(name => name.includes(subdomain)));
|
||
|
} catch (error) {
|
||
|
console.error(`Error checking Docker for subdomain ${subdomain}:`, error.message);
|
||
|
return false;
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
This function checks whether a container corresponding to a subdomain exists, which is essential for routing and security purposes.
|
||
|
|
||
|
### Fetching Process Lists
|
||
|
|
||
|
As mentioned earlier, we can retrieve the list of processes running inside a container:
|
||
|
|
||
|
```javascript
|
||
|
const container = docker.getContainer(containerId);
|
||
|
const processes = await container.top();
|
||
|
```
|
||
|
|
||
|
This allows us to display detailed information about what's happening inside the container, which can be invaluable for debugging and monitoring.
|
||
|
|
||
|
## Proxy Server for Web UI
|
||
|
|
||
|
To provide users with a seamless experience, we set up a proxy server that routes requests to the appropriate container dashboards based on subdomains.
|
||
|
|
||
|
### Subdomain-Based Routing
|
||
|
|
||
|
We parse the incoming request's hostname to extract the subdomain, which corresponds to a container ID.
|
||
|
|
||
|
```javascript
|
||
|
app.use(async (req, res, next) => {
|
||
|
const host = req.hostname;
|
||
|
let subdomain = host.split('.')[0].toUpperCase();
|
||
|
|
||
|
if (!subdomain || ['LOCALHOST', 'WWW', 'SYSCALL'].includes(subdomain)) {
|
||
|
return res.redirect('https://discord-linux.com');
|
||
|
}
|
||
|
|
||
|
const exists = await containerExists(subdomain);
|
||
|
if (!exists) {
|
||
|
return res.redirect('https://discord-linux.com');
|
||
|
}
|
||
|
|
||
|
// Proceed to proxy the request
|
||
|
});
|
||
|
```
|
||
|
|
||
|
### Proxying Requests
|
||
|
|
||
|
Using `http-proxy-middleware`, we forward the requests to the backend server's live dashboard endpoint:
|
||
|
|
||
|
```javascript
|
||
|
const { createProxyMiddleware } = require('http-proxy-middleware');
|
||
|
|
||
|
createProxyMiddleware({
|
||
|
target: `https://g.syscall.lol/full-report/${subdomain}`,
|
||
|
changeOrigin: true,
|
||
|
pathRewrite: {
|
||
|
'^/': '/live', // Rewrite the root path to /live
|
||
|
}
|
||
|
})(req, res, next);
|
||
|
```
|
||
|
|
||
|
This setup allows users to access their container's dashboard by visiting a URL like `https://SSH42113405732790.syscall.lol`, where `SSH42113405732790` is the container ID.
|
||
|
|
||
|
## Discord Bot Integration
|
||
|
|
||
|
To make the monitoring solution more accessible, we integrate a Discord bot that allows users to request graphs and reports directly within Discord.
|
||
|
|
||
|
### Command Handling
|
||
|
|
||
|
We define a `graph` command that users can invoke to get performance graphs:
|
||
|
|
||
|
```javascript
|
||
|
module.exports = {
|
||
|
name: "graph",
|
||
|
description: "Retrieves a graph report for your container.",
|
||
|
options: [
|
||
|
// Command options for report type, timeframe, etc.
|
||
|
],
|
||
|
run: async (client, interaction) => {
|
||
|
// Command implementation
|
||
|
},
|
||
|
};
|
||
|
```
|
||
|
|
||
|
### User Authentication
|
||
|
|
||
|
We authenticate users by matching their Discord ID with the container IDs stored in our database:
|
||
|
|
||
|
```javascript
|
||
|
let sshSurfID;
|
||
|
connection.query(
|
||
|
"SELECT uid FROM users WHERE discord_id = ?",
|
||
|
[interaction.user.id],
|
||
|
(err, results) => {
|
||
|
if (results.length === 0) {
|
||
|
interaction.editReply("Sorry, you do not have a container associated with your account.");
|
||
|
} else {
|
||
|
sshSurfID = results[0].uid;
|
||
|
}
|
||
|
}
|
||
|
);
|
||
|
```
|
||
|
|
||
|
### Fetching and Sending Graphs
|
||
|
|
||
|
Once we have the user's container ID, we fetch the graph image from the backend server and send it as a reply in Discord:
|
||
|
|
||
|
```javascript
|
||
|
const apiUrl = `https://g.syscall.lol/${reportType}/${sshSurfID}?timeframe=${timeframe}`;
|
||
|
const response = await axios.get(apiUrl, { responseType: 'stream' });
|
||
|
// Send the image in the reply
|
||
|
await interaction.editReply({
|
||
|
files: [{
|
||
|
attachment: response.data,
|
||
|
name: `${reportType}_graph.png`
|
||
|
}]
|
||
|
});
|
||
|
```
|
||
|
|
||
|
This integration provides users with an easy way to monitor their containers without leaving Discord.
|
||
|
|
||
|
## Security Considerations
|
||
|
|
||
|
When building a monitoring system, especially one that exposes container data over the network, security is paramount.
|
||
|
|
||
|
### Access Control
|
||
|
|
||
|
We ensure that only authenticated users can access the data for their containers. This involves:
|
||
|
|
||
|
- Verifying container existence and ownership before serving data.
|
||
|
- Using secure communication protocols (HTTPS) to encrypt data in transit.
|
||
|
- Implementing proper authentication mechanisms in the backend server and Discord bot.
|
||
|
|
||
|
### Input Validation
|
||
|
|
||
|
We sanitize and validate all inputs, such as container IDs, to prevent injection attacks and unauthorized access.
|
||
|
|
||
|
### Rate Limiting
|
||
|
|
||
|
To protect against Denial of Service (DoS) attacks, we can implement rate limiting on API endpoints.
|
||
|
|
||
|
## Performance Optimizations
|
||
|
|
||
|
To ensure the system performs well under load, we implement several optimizations:
|
||
|
|
||
|
- **Caching**: Cache frequently requested data to reduce load on the Netdata Agent and backend server.
|
||
|
- **Efficient Data Structures**: Use efficient data structures and algorithms for data processing.
|
||
|
- **Asynchronous Operations**: Utilize asynchronous programming to prevent blocking operations.
|
||
|
- **Load Balancing**: Distribute incoming requests across multiple instances of the backend server if needed.
|
||
|
|
||
|
## Future Enhancements
|
||
|
|
||
|
There are several areas where we can expand and improve the monitoring solution:
|
||
|
|
||
|
- **Alerting Mechanisms**: Integrate alerting to notify users of critical events or thresholds being exceeded.
|
||
|
- **Historical Data Analysis**: Store metrics over longer periods for trend analysis and capacity planning.
|
||
|
- **Custom Metrics**: Allow users to define custom metrics or integrate with application-level monitoring.
|
||
|
- **Mobile Accessibility**: Optimize the dashboard for mobile devices or create a dedicated mobile app.
|
||
|
|
||
|
## My Thoughts
|
||
|
|
||
|
By leveraging the Netdata REST API and integrating it with modern web technologies, we have built a dynamic and interactive monitoring solution tailored for containerized environments. The combination of real-time data visualization, user-friendly interfaces, and accessibility through platforms like Discord empowers users to maintain and optimize their applications effectively.
|
||
|
|
||
|
This approach showcases the power of combining open-source tools and technologies to solve complex monitoring challenges in a scalable and efficient manner. As containerization continues to evolve, such solutions will become increasingly vital in managing and understanding the performance of distributed applications.
|
||
|
|
||
|
*Note: The code snippets provided are simplified for illustrative purposes. In a production environment, additional error handling, security measures, and optimizations should be implemented.*
|