ravenscott-blog/Harnessing Netdata REST API for Dynamic Container Monitoring and Visualization.md at f20231ddae4512a520b843269c3746f36db2ce11

Raven Scott f20231ddae update article

2024-10-13 04:05:50 -04:00

19 KiB

Raw Blame History

Monitoring containerized applications is essential for ensuring optimal performance, diagnosing issues promptly, and maintaining overall system health.

In a dynamic environment where containers can be spun up or down based on demand, having a flexible and responsive monitoring solution becomes even more critical. This article delves into how I utilize the Netdata REST API to generate real-time, visually appealing graphs and an interactive dashboard for each container dynamically. By integrating technologies like Node.js, Express.js, Chart.js, Docker, and web sockets, I create a seamless monitoring experience that provides deep insights into container performance metrics.

Example Dynamic Page

https://ssh42113405732790.syscall.lol/

Introduction

As containerization becomes the backbone of modern application deployment, monitoring solutions need to adapt to the ephemeral nature of containers. Traditional monitoring tools may not provide the granularity or real-time feedback necessary for containerized environments. Netdata, with its poIrful real-time monitoring capabilities and RESTful API, offers a robust solution for collecting and accessing performance metrics. By leveraging the Netdata REST API, I can fetch detailed metrics about CPU usage, memory consumption, network traffic, disk I/O, and running processes within each container.

Our goal is to create an interactive dashboard that not only displays these metrics in real-time but also provides users with the ability to interact with the data, such as filtering processes or adjusting timeframes. To achieve this, I build a backend server that interfaces with the Netdata API, processes the data, and serves it to the frontend where it's rendered using Chart.js and other web technologies.

System Architecture

Understanding the system architecture is crucial to grasp how each component interacts to provide a cohesive monitoring solution. The architecture comprises several key components:

Netdata Agent: Installed on the host machine, it collects real-time performance metrics and exposes them via a RESTful API.
Backend Server: A Node.js application built with Express.js that serves as an intermediary betIen the Netdata API and the frontend clients.
Interactive Dashboard: A web interface that displays real-time graphs and system information, built using HTML, CSS, JavaScript, and libraries like Chart.js.
Docker Integration: Utilizing Dockerode, a Node.js Docker client, to interact with Docker containers, fetch process lists, and verify container existence.
Proxy Server: Routes incoming requests to the appropriate container's dashboard based on subdomain mapping.
Discord Bot: Allows users to request performance graphs directly from Discord, enhancing accessibility and user engagement.

Data Flow

The Netdata Agent continuously collects performance metrics and makes them available via its RESTful API.
The Backend Server fetches data from the Netdata API based on requests from clients or scheduled intervals.
The Interactive Dashboard requests data from the Backend Server, which processes and serves it in a format suitable for visualization.
Docker Integration ensures that the system is aware of the running containers and can fetch container-specific data.
The Proxy Server handles subdomain-based routing, directing users to the correct dashboard for their container.
The Discord Bot interacts with the Backend Server to fetch graphs and sends them to users upon request.

Backend Server Implementation

The backend server is the linchpin of our monitoring solution. It handles data fetching, processing, and serves as an API endpoint for the frontend dashboard and the Discord bot.

Setting Up Express.js Server

I start by setting up an Express.js server that listens for incoming HTTP requests. The server is configured to handle Cross-Origin Resource Sharing (CORS) to allow requests from different origins, which is essential for serving the dashboard to users accessing it from various domains.

const express = require('express');
const app = express();
const port = 6666;

app.use(cors()); // Enable CORS
app.listen(port, "0.0.0.0", () => {
  console.log(`Server running on http://localhost:${port}`);
});

Interacting with Netdata API

To fetch metrics from Netdata, I define a function that constructs the appropriate API endpoints based on the container ID and the desired timeframe.

const axios = require('axios');

const getEndpoints = (containerId, timeframe) => {
  const after = -(timeframe * 60); // Timeframe in seconds
  return {
    cpu: `http://netdata.local/api/v1/data?chart=cgroup_${containerId}.cpu&format=json&after=${after}`,
    memory: `http://netdata.local/api/v1/data?chart=cgroup_${containerId}.mem_usage&format=json&after=${after}`,
    // Additional endpoints for io, pids, network...
  };
};

I then define a function to fetch data for a specific metric:

const fetchMetricData = async (metric, containerId, timeframe = 5) => {
  const endpoints = getEndpoints(containerId, timeframe);
  try {
    const response = await axios.get(endpoints[metric]);
    return response.data;
  } catch (error) {
    console.error(`Error fetching ${metric} data for container ${containerId}:`, error);
    throw new Error(`Failed to fetch ${metric} data.`);
  }
};

Data Processing

Once I have the raw data from Netdata, I need to process it to extract timestamps and values suitable for graphing. The data returned by Netdata is typically in a time-series format, with each entry containing a timestamp and one or more metric values.

const extractMetrics = (data, metric) => {
  const labels = data.data.map((entry) => new Date(entry[0] * 1000).toLocaleTimeString());
  let values;

  switch (metric) {
    case 'cpu':
    case 'memory':
    case 'pids':
      values = data.data.map(entry => entry[1]); // Adjust index based on metric specifics
      break;
    case 'io':
      values = {
        read: data.data.map(entry => entry[1]),
        write: data.data.map(entry => entry[2]),
      };
      break;
    case 'network':
      values = {
        received: data.data.map(entry => entry[1]),
        sent: data.data.map(entry => entry[2]),
      };
      break;
    default:
      values = [];
  }

  return { labels, values };
};

Graph Generation with Chart.js

To generate graphs, I use the chartjs-node-canvas library, which allows us to render Chart.js graphs server-side and output them as images.

const { ChartJSNodeCanvas } = require('chartjs-node-canvas');
const chartJSMetricCanvas = new ChartJSNodeCanvas({ width: 1900, height: 400, backgroundColour: 'black' });

const generateMetricGraph = async (metricData, labels, label, borderColor) => {
  const configuration = {
    type: 'line',
    data: {
      labels: labels,
      datasets: [{
        label: label,
        data: metricData,
        borderColor: borderColor,
        fill: false,
        tension: 0.1,
      }],
    },
    options: {
      scales: {
        x: {
          title: {
            display: true,
            text: 'Time',
            color: 'white',
          },
        },
        y: {
          title: {
            display: true,
            text: `${label} Usage`,
            color: 'white',
          },
        },
      },
      plugins: {
        legend: {
          labels: {
            color: 'white',
          },
        },
      },
    },
  };

  return chartJSMetricCanvas.renderToBuffer(configuration);
};

This function takes the metric data, labels, and graph styling options to produce a PNG image buffer of the graph, which can then be sent to clients or used in the dashboard.

API Endpoints for Metrics

I define API endpoints for each metric that clients can request. For example, the CPU usage endpoint:

app.get('/api/graph/cpu/:containerId', async (req, res) => {
  const { containerId } = req.params;
  const timeframe = parseInt(req.query.timeframe) || 5;
  const format = req.query.format || 'graph';

  try {
    const data = await fetchMetricData('cpu', containerId, timeframe);
    if (format === 'json') {
      return res.json(data);
    }

    const { labels, values } = extractMetrics(data, 'cpu');
    const imageBuffer = await generateMetricGraph(values, labels, 'CPU Usage (%)', 'rgba(255, 99, 132, 1)');
    res.set('Content-Type', 'image/png');
    res.send(imageBuffer);
  } catch (error) {
    res.status(500).send(`Error generating CPU graph: ${error.message}`);
  }
});

Similar endpoints are created for memory, network, disk I/O, and PIDs.

Full Report Generation

For users who want a comprehensive view of their container's performance, I offer a full report that combines all the individual graphs into one image.

app.get('/api/graph/full-report/:containerId', async (req, res) => {
  // Fetch data for all metrics
  // Generate graphs for each metric
  // Combine graphs into a single image using Canvas
  // Send the final image to the client
});

By using the canvas and loadImage modules, I can composite multiple graphs into a single image, adding titles and styling as needed.

Interactive Dashboard

The interactive dashboard provides users with real-time insights into their container's performance. It is designed to be responsive, visually appealing, and informative.

Live Data Updates

To achieve real-time updates, I use client-side JavaScript to periodically fetch the latest data from the backend server. I use setInterval to schedule data fetches every second or at a suitable interval based on performance considerations.

<script>
  async function updateGraphs() {
    const response = await fetch(`/api/graph/full-report/${containerId}?format=json&timeframe=1`);
    const data = await response.json();
    // Update charts with new data
  }

  setInterval(updateGraphs, 1000);
</script>

Chart.js Integration

I use Chart.js on the client side to render graphs directly in the browser. This allows for smooth animations and interactivity.

const cpuChart = new Chart(cpuCtx, {
  type: 'line',
  data: {
    labels: [],
    datasets: [{
      label: 'CPU Usage (%)',
      data: [],
      borderColor: 'rgba(255, 99, 132, 1)',
      borderWidth: 2,
      pointRadius: 3,
      fill: false,
    }]
  },
  options: {
    animation: { duration: 500 },
    responsive: true,
    maintainAspectRatio: false,
    scales: {
      x: { grid: { color: 'rgba(255, 255, 255, 0.1)' } },
      y: { grid: { color: 'rgba(255, 255, 255, 0.1)' } }
    },
    plugins: { legend: { display: false } }
  }
});

Process List Display

An essential aspect of container monitoring is understanding what processes are running inside the container. I fetch the process list using Docker's API and display it in a searchable table.

// Backend endpoint
app.get('/api/processes/:containerId', async (req, res) => {
  const { containerId } = req.params;
  try {
    const container = docker.getContainer(containerId);
    const processes = await container.top();
    res.json(processes.Processes || []);
  } catch (err) {
    console.error(`Error fetching processes for container ${containerId}:`, err);
    res.status(500).json({ error: 'Failed to fetch processes' });
  }
});

// Client-side function to update the process list
async function updateProcessList() {
  const processResponse = await fetch(`/api/processes/${containerId}`);
  const processList = await processResponse.json();
  // Render the process list in the table
}

I enhance the user experience by adding a search box that allows users to filter the processes by PID, user, or command.

Visual Enhancements

To make the dashboard more engaging, I incorporate visual elements like particle effects using libraries like particles.js. I also apply a dark theme with styling that emphasizes the data visualizations.

body {
  background-color: #1c1c1c;
  color: white;
  font-family: Arial, sans-serif;
}

Responsive Design

Using Bootstrap and custom CSS, I ensure that the dashboard is responsive and accessible on various devices and screen sizes.

<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet">
<div class="container mt-4">
  <!-- Dashboard content -->
</div>

Docker Integration

Docker plays a pivotal role in our system, not just for running the containers but also for providing data about them.

Fetching Container Information

I use the dockerode library to interact with Docker:

const Docker = require('dockerode');
const docker = new Docker();

async function containerExists(subdomain) {
  try {
    const containers = await docker.listContainers();
    return containers.some(container => container.Names.some(name => name.includes(subdomain)));
  } catch (error) {
    console.error(`Error checking Docker for subdomain ${subdomain}:`, error.message);
    return false;
  }
}

This function checks whether a container corresponding to a subdomain exists, which is essential for routing and security purposes.

Fetching Process Lists

As mentioned earlier, I can retrieve the list of processes running inside a container:

const container = docker.getContainer(containerId);
const processes = await container.top();

This allows us to display detailed information about what's happening inside the container, which can be invaluable for debugging and monitoring.

Proxy Server for web UI

To provide users with a seamless experience, I set up a proxy server that routes requests to the appropriate container dashboards based on subdomains.

Subdomain-Based Routing

I parse the incoming request's hostname to extract the subdomain, which corresponds to a container ID.

app.use(async (req, res, next) => {
  const host = req.hostname;
  let subdomain = host.split('.')[0].toUpperCase();

  if (!subdomain || ['LOCALHOST', 'WWW', 'SYSCALL'].includes(subdomain)) {
    return res.redirect('https://discord-linux.com');
  }

  const exists = await containerExists(subdomain);
  if (!exists) {
    return res.redirect('https://discord-linux.com');
  }

  // Proceed to proxy the request
});

Proxying Requests

Using http-proxy-middleware, I forward the requests to the backend server's live dashboard endpoint:

const { createProxyMiddleware } = require('http-proxy-middleware');

createProxyMiddleware({
  target: `https://g.syscall.lol/full-report/${subdomain}`,
  changeOrigin: true,
  pathRewrite: {
    '^/': '/live',  // Rewrite the root path to /live
  }
})(req, res, next);

This setup allows users to access their container's dashboard by visiting a URL like https://SSH42113405732790.syscall.lol, where SSH42113405732790 is the container ID.

Discord Bot Integration

To make the monitoring solution more accessible, I integrate a Discord bot that allows users to request graphs and reports directly within Discord.

Command Handling

I define a graph command that users can invoke to get performance graphs:

module.exports = {
  name: "graph",
  description: "Retrieves a graph report for your container.",
  options: [
    // Command options for report type, timeframe, etc.
  ],
  run: async (client, interaction) => {
    // Command implementation
  },
};

User Authentication

I authenticate users by matching their Discord ID with the container IDs stored in our database:

let sshSurfID;
connection.query(
  "SELECT uid FROM users WHERE discord_id = ?",
  [interaction.user.id],
  (err, results) => {
    if (results.length === 0) {
      interaction.editReply("Sorry, you do not have a container associated with your account.");
    } else {
      sshSurfID = results[0].uid;
    }
  }
);

Fetching and Sending Graphs

Once I have the user's container ID, I fetch the graph image from the backend server and send it as a reply in Discord:

const apiUrl = `https://g.syscall.lol/${reportType}/${sshSurfID}?timeframe=${timeframe}`;
const response = await axios.get(apiUrl, { responseType: 'stream' });
// Send the image in the reply
await interaction.editReply({
  files: [{
    attachment: response.data,
    name: `${reportType}_graph.png`
  }]
});

This integration provides users with an easy way to monitor their containers without leaving Discord.

Security Considerations

When building a monitoring system, especially one that exposes container data over the network, security is paramount.

Access Control

I ensure that only authenticated users can access the data for their containers. This involves:

Verifying container existence and ownership before serving data.
Using secure communication protocols (HTTPS) to encrypt data in transit.
Implementing proper authentication mechanisms in the backend server and Discord bot.

Input Validation

I sanitize and validate all inputs, such as container IDs, to prevent injection attacks and unauthorized access.

Rate Limiting

To protect against Denial of Service (DoS) attacks, I can implement rate limiting on API endpoints.

Performance Optimizations

To ensure the system performs Ill under load, I implement several optimizations:

Caching: Cache frequently requested data to reduce load on the Netdata Agent and backend server.
Efficient Data Structures: Use efficient data structures and algorithms for data processing.
Asynchronous Operations: Utilize asynchronous programming to prevent blocking operations.
Load Balancing: Distribute incoming requests across multiple instances of the backend server if needed.

Future Enhancements

There are several areas where I can expand and improve the monitoring solution:

Alerting Mechanisms: Integrate alerting to notify users of critical events or thresholds being exceeded.
Historical Data Analysis: Store metrics over longer periods for trend analysis and capacity planning.
Custom Metrics: Allow users to define custom metrics or integrate with application-level monitoring.
Mobile Accessibility: Optimize the dashboard for mobile devices or create a dedicated mobile app.

My Thoughts

By leveraging the Netdata REST API and integrating it with modern web technologies, I have built a dynamic and interactive monitoring solution tailored for containerized environments. The combination of real-time data visualization, user-friendly interfaces, and accessibility through platforms like Discord empoIrs users to maintain and optimize their applications effectively.

This approach showcases the poIr of combining open-source tools and technologies to solve complex monitoring challenges in a scalable and efficient manner. As containerization continues to evolve, such solutions will become increasingly vital in managing and understanding the performance of distributed applications.

Note: The code snippets provided are simplified for illustrative purposes. In a production environment, additional error handling, security measures, and optimizations should be implemented.

19 KiB Raw Blame History