Add 2 new articles
This commit is contained in:
parent
e566cf7875
commit
940a79c971
230
markdown/Building a Fast and Efficient URL Shortener.md
Normal file
230
markdown/Building a Fast and Efficient URL Shortener.md
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
<!-- lead -->
|
||||||
|
A deep dive into building a URL shortener from scratch
|
||||||
|
|
||||||
|
In this post, we’ll explore a complete deep dive into building a URL shortener from scratch using Node.js, Express, and MongoDB. URL shorteners are powerful tools that reduce lengthy URLs into much shorter and manageable ones while maintaining their redirection to the original destination. You’ve probably used services like bit.ly or TinyURL—today, you’ll learn how to create your own.
|
||||||
|
|
||||||
|
## The Tech Stack
|
||||||
|
|
||||||
|
This URL shortener is powered by:
|
||||||
|
|
||||||
|
1. **Node.js** - A JavaScript runtime built on Chrome’s V8 engine, perfect for building scalable network applications.
|
||||||
|
2. **Express.js** - A minimalistic web framework for Node.js that simplifies server and routing logic.
|
||||||
|
3. **MongoDB** - A NoSQL database used to store the original and shortened URLs.
|
||||||
|
4. **ShortId** - A package that generates URL-friendly, non-sequential unique IDs.
|
||||||
|
5. **Mongoose** - A MongoDB object data modeling (ODM) library that provides schema-based solutions for model validation and querying.
|
||||||
|
|
||||||
|
### Key Features:
|
||||||
|
|
||||||
|
- A RESTful API to create and manage shortened URLs.
|
||||||
|
- Secure API with validation using API keys.
|
||||||
|
- Efficient redirection from short URLs to the original URL.
|
||||||
|
- Scalable and easy-to-integrate MongoDB database to store URL data.
|
||||||
|
|
||||||
|
Let’s break down the code.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Setup and Initialization
|
||||||
|
|
||||||
|
We first import the necessary modules:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const express = require('express');
|
||||||
|
const mongoose = require('mongoose');
|
||||||
|
const shortid = require('shortid');
|
||||||
|
const Url = require('./models/Url');
|
||||||
|
require('dotenv').config();
|
||||||
|
```
|
||||||
|
|
||||||
|
### Explanation:
|
||||||
|
|
||||||
|
- **express**: To handle HTTP requests and define our routes.
|
||||||
|
- **mongoose**: To interact with our MongoDB database.
|
||||||
|
- **shortid**: Generates unique short IDs for the shortened URLs.
|
||||||
|
- **dotenv**: Manages environment variables, especially useful for storing sensitive data like API keys.
|
||||||
|
- **Url**: This is the model we’ll define later to store our URL data.
|
||||||
|
|
||||||
|
The application runs on port `9043`:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const app = express();
|
||||||
|
const port = 9043;
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Middleware Setup
|
||||||
|
|
||||||
|
To handle incoming JSON requests and validate the API key, we use middleware functions:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
app.use(express.json());
|
||||||
|
```
|
||||||
|
|
||||||
|
This middleware parses the incoming request bodies as JSON, making it easier to interact with the API data.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## MongoDB Connection
|
||||||
|
|
||||||
|
Connecting to MongoDB is essential for our application’s functionality. We use `mongoose.connect()` to establish a connection to a local MongoDB instance:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
mongoose.connect('mongodb://127.0.0.1:27017/shorturl');
|
||||||
|
```
|
||||||
|
|
||||||
|
### Database Connection Handling:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const db = mongoose.connection;
|
||||||
|
db.on('error', console.error.bind(console, 'connection error:'));
|
||||||
|
db.once('open', () => {
|
||||||
|
console.log('Connected to MongoDB');
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
This connection ensures that if there are any issues connecting to MongoDB, we’ll log them immediately, and once successful, we print a "Connected to MongoDB" message.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## API Key Validation
|
||||||
|
|
||||||
|
Before creating a short URL, we must validate that the request has a valid API key. This middleware checks the `x-api-key` header against a pre-defined key in the `.env` file:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const validateApiKey = (req, res, next) => {
|
||||||
|
const apiKey = req.headers['x-api-key'];
|
||||||
|
if (apiKey && apiKey === process.env.API_KEY) {
|
||||||
|
next();
|
||||||
|
} else {
|
||||||
|
res.status(403).json({ error: 'Forbidden' });
|
||||||
|
}
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The request only proceeds if the key is valid; otherwise, it returns a 403 Forbidden status, ensuring only authorized users can create short URLs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Creating Short URLs
|
||||||
|
|
||||||
|
The core functionality is handled by the POST `/api/shorturl` route. It validates the request body, checks for an existing shortened URL, and generates a new one if necessary:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
app.post('/api/shorturl', validateApiKey, async (req, res) => {
|
||||||
|
const { longUrl } = req.body;
|
||||||
|
|
||||||
|
if (!longUrl) {
|
||||||
|
return res.status(400).json({ error: 'Invalid URL' });
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
let url = await Url.findOne({ longUrl });
|
||||||
|
|
||||||
|
if (url) {
|
||||||
|
return res.json({ shortUrl: `https://s.shells.lol/${url.shortId}` });
|
||||||
|
}
|
||||||
|
|
||||||
|
const shortId = shortid.generate();
|
||||||
|
url = new Url({
|
||||||
|
longUrl,
|
||||||
|
shortId,
|
||||||
|
});
|
||||||
|
|
||||||
|
await url.save();
|
||||||
|
res.json({ shortUrl: `https://s.shells.lol/${shortId}` });
|
||||||
|
} catch (err) {
|
||||||
|
console.error(err);
|
||||||
|
res.status(500).json({ error: 'Server error' });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### What’s happening here?
|
||||||
|
|
||||||
|
- **Validation**: If the `longUrl` field is missing from the request body, it returns a 400 Bad Request error.
|
||||||
|
- **URL Existence Check**: If the long URL already exists in the database, it simply returns the existing shortened URL.
|
||||||
|
- **New Short ID Generation**: If no record exists for the given URL, we generate a new short ID using the `shortid` library, store it in the database, and return the shortened URL.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Redirecting Short URLs
|
||||||
|
|
||||||
|
When a user visits a short URL, the application looks for the corresponding long URL in the database and redirects them:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
app.get('/:shortId', async (req, res) => {
|
||||||
|
const { shortId } = req.params;
|
||||||
|
|
||||||
|
try {
|
||||||
|
const url = await Url.findOne({ shortId });
|
||||||
|
|
||||||
|
if (url) {
|
||||||
|
return res.redirect(301, url.longUrl);
|
||||||
|
}
|
||||||
|
|
||||||
|
res.status(404).json({ error: 'URL not found' });
|
||||||
|
} catch (err) {
|
||||||
|
console.error(err);
|
||||||
|
res.status(500).json({ error: 'Server error' });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
### What’s happening here?
|
||||||
|
|
||||||
|
- The **shortId** is extracted from the URL parameters.
|
||||||
|
- The app searches for a matching record in the database. If it finds one, it sends a 301 redirect to the original long URL.
|
||||||
|
- If no record is found, it responds with a 404 error, letting the user know that the short URL doesn’t exist.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Data Model: URL Schema
|
||||||
|
|
||||||
|
The data for each shortened URL is stored in MongoDB using the `Url` model, defined as follows:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const mongoose = require('mongoose');
|
||||||
|
|
||||||
|
const urlSchema = new mongoose.Schema({
|
||||||
|
longUrl: {
|
||||||
|
type: String,
|
||||||
|
required: true,
|
||||||
|
},
|
||||||
|
shortId: {
|
||||||
|
type: String,
|
||||||
|
required: true,
|
||||||
|
unique: true,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
module.exports = mongoose.model('Url', urlSchema);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Points:
|
||||||
|
|
||||||
|
- **longUrl**: This is the original URL that we want to shorten. It’s a required field.
|
||||||
|
- **shortId**: This is the unique, generated identifier for the short URL.
|
||||||
|
|
||||||
|
By defining these fields and their constraints, Mongoose ensures that each URL we save is valid and meets the necessary conditions, such as being unique.
|
||||||
|
|
||||||
|
## Key Takeaways
|
||||||
|
Modular and Scalable Design: The way we structured this URL shortener allows it to easily scale and handle high traffic loads. With the use of Node.js and Express, which are non-blocking and event-driven, the service can manage concurrent requests efficiently. MongoDB, a NoSQL database, is highly scalable as well, making it an ideal candidate for storing URL mappings without sacrificing speed.
|
||||||
|
|
||||||
|
Security Considerations: The introduction of API key validation showcases how simple security measures can protect your API from unauthorized access. This is a small but critical step in ensuring that only trusted sources can interact with your service. Although basic, this approach is a strong foundation that can be expanded to include more robust authentication mechanisms, such as OAuth or JWT, as the service grows.
|
||||||
|
|
||||||
|
Database Operations and Efficiency: MongoDB’s flexibility allows for the quick and easy storage of URLs. By using indexes on the shortId, lookups are extremely fast, making redirection nearly instantaneous. The efficient querying and validation mechanisms provided by Mongoose also ensure that operations such as checking for existing URLs or saving new ones are both optimized and secure.
|
||||||
|
|
||||||
|
ShortId for Uniqueness: The shortid library helps generate compact, unique, URL-friendly strings, which is essential in minimizing the length of URLs while maintaining their uniqueness. This is an efficient way to avoid collisions and ensure that every short URL points to a different long URL, even in high-volume environments.
|
||||||
|
|
||||||
|
Handling Edge Cases: By including comprehensive error handling, we've created a more robust system. Whether it's validating input URLs, handling database errors, or returning meaningful error responses (such as 404s for missing URLs or 400s for bad requests), the application is designed to fail gracefully. This improves the user experience and ensures the system is easier to debug and maintain.
|
||||||
|
|
||||||
|
## My Thoughts
|
||||||
|
|
||||||
|
This project demonstrates how you can build a simple yet effective URL shortener using modern web technologies. We focused on essential features like API security, URL shortening, and redirection while keeping it scalable and easy to extend.
|
||||||
|
|
||||||
|
By leveraging tools like Express.js and MongoDB, this solution is not only efficient but also capable of handling significant traffic loads. With additional features like user tracking, analytics, or custom short URLs, you can further enhance the functionality of this URL shortener.
|
||||||
|
|
||||||
|
Building a URL shortener might seem like a simple project at first glance, but as we've explored in this deep dive, it's packed with various technical components that can be applied to numerous real-world applications. By breaking down each part of the process—from server setup to database interactions and security implementations—this project serves as an excellent example of how modern web technologies can come together to build a fully functional service that is not only scalable but also reliable and secure.
|
||||||
|
|
||||||
|
|
@ -0,0 +1,171 @@
|
|||||||
|
<!-- lead -->
|
||||||
|
A deep dive into building an external bash monitoring micro service.
|
||||||
|
|
||||||
|
Docker has become a key tool for deploying and managing applications. However, with the rise of containers comes a significant challenge: inspecting and auditing what occurs inside these containers. One often overlooked aspect is the command history—specifically, the `.bash_history` files. These files can reveal important information about user actions, debugging sessions, or potential security issues, but manually inspecting them across dozens or even hundreds of containers can be daunting.
|
||||||
|
|
||||||
|
This post presents a programmatic solution using Node.js, Dockerode, and Discord to inspect the `.bash_history` files across an entire Docker infrastructure. The solution automates the process of tailing these history files, batching their content, and sending notifications to Discord in near real-time, with rate limiting and error handling built in.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Concept Overview
|
||||||
|
|
||||||
|
The idea is to automate the process of inspecting `.bash_history` files across Docker containers by continuously tailing these files and pushing the extracted command history to a central logging service, in this case, a Discord channel.
|
||||||
|
|
||||||
|
This approach allows you to track what commands were executed inside the containers at any point in time, whether they were legitimate debugging sessions or potentially harmful actions. By utilizing Docker's overlay filesystem and a programmatic approach, we can automate the discovery and monitoring of these files. We then send these logs to a remote system (Discord in this example) for further inspection, ensuring that no action goes unnoticed.
|
||||||
|
|
||||||
|
This setup covers multiple layers:
|
||||||
|
|
||||||
|
1. **Container Inspection**: We gather container information and map their overlay2 filesystem to identify `.bash_history` files.
|
||||||
|
2. **File Tailing**: We use the `tail` package to continuously monitor these `.bash_history` files in real time.
|
||||||
|
3. **Batching and Rate Limiting**: The logs are collected and sent in batches to avoid spamming the monitoring system and to respect API rate limits.
|
||||||
|
4. **Container Name Mapping**: We translate the filesystem paths to meaningful container names, making it easier to identify where each command was executed.
|
||||||
|
5. **Error Handling and Resilience**: The system is designed to handle file access issues, rate limiting, and other potential pitfalls, ensuring that it remains robust even under challenging conditions.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## The Code Breakdown
|
||||||
|
|
||||||
|
### Dockerode Integration
|
||||||
|
|
||||||
|
We start by setting up the connection to the Docker API using the Dockerode library. Dockerode allows us to interact with Docker containers, retrieve their metadata, and inspect their filesystems:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const Docker = require('dockerode');
|
||||||
|
const docker = new Docker({ socketPath: '/var/run/docker.sock' });
|
||||||
|
```
|
||||||
|
|
||||||
|
The Docker socket (`/var/run/docker.sock`) provides the necessary interface to communicate with the Docker daemon. Using Dockerode, we can list all containers, inspect their filesystems, and map their overlay2 directories to locate `.bash_history` files.
|
||||||
|
|
||||||
|
### Monitoring `.bash_history` Files
|
||||||
|
|
||||||
|
The core task of this script is to discover and monitor `.bash_history` files for each container. In Docker, each container’s filesystem is managed using the overlay2 storage driver, which layers file system changes. Every container gets its own unique directory in `/var/lib/docker/overlay2`. By scanning these directories, we can find `.bash_history` files inside each container.
|
||||||
|
|
||||||
|
The `scanForBashHistoryFiles` function recursively scans the overlay2 directory:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
function scanForBashHistoryFiles(directory) {
|
||||||
|
fs.readdir(directory, { withFileTypes: true }, (err, files) => {
|
||||||
|
if (err) {
|
||||||
|
console.error(`Error reading directory ${directory}:`, err);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
files.forEach((file) => {
|
||||||
|
if (file.isDirectory()) {
|
||||||
|
const subdirectory = path.join(directory, file.name);
|
||||||
|
scanForBashHistoryFiles(subdirectory);
|
||||||
|
} else if (file.name === '.bash_history') {
|
||||||
|
const filePath = path.join(directory, file.name);
|
||||||
|
let overlayId = directory.split('/').slice(-3, -1)[0];
|
||||||
|
tailFile(filePath, overlayId); // Tail the file for changes
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This function crawls through the directories in the overlay2 folder, checking for `.bash_history` files. Once a file is found, it triggers the `tailFile` function, which starts monitoring the file for changes.
|
||||||
|
|
||||||
|
### File Tailing
|
||||||
|
|
||||||
|
Tailing a file means continuously monitoring it for new lines. This is critical in real-time logging because we need to capture each new command entered into a container.
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
function tailFile(filePath, overlayId) {
|
||||||
|
const tail = new Tail(filePath);
|
||||||
|
tail.on('line', (data) => {
|
||||||
|
let messages = messageGroups.get(overlayId) || new Set();
|
||||||
|
messages.add(data); // Add new command to the set
|
||||||
|
messageGroups.set(overlayId, messages);
|
||||||
|
});
|
||||||
|
|
||||||
|
tail.on('error', (error) => {
|
||||||
|
console.error(`Error tailing file ${filePath}:`, error);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `tailFile` function uses the `tail` package to listen for new lines in the `.bash_history` file. Each new line (representing a command entered in the container) is added to a `messageGroups` map, which organizes messages by container ID (`overlayId`). This is critical for keeping logs organized and ensuring each container’s commands are batched and sent separately.
|
||||||
|
|
||||||
|
### Mapping Container Names to Overlay2 IDs
|
||||||
|
|
||||||
|
Since the `.bash_history` files exist within obscure overlay2 directory names, it's crucial to map these IDs back to human-readable container names. This is where the function `getContainerNameFromOverlayId` comes into play:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
async function getContainerNameFromOverlayId(overlayId) {
|
||||||
|
try {
|
||||||
|
const containers = await docker.listContainers({ all: true });
|
||||||
|
for (const containerInfo of containers) {
|
||||||
|
const container = docker.getContainer(containerInfo.Id);
|
||||||
|
const inspectData = await container.inspect();
|
||||||
|
if (inspectData.GraphDriver.Data.LowerDir.includes(overlayId)) {
|
||||||
|
return inspectData.Name.replace(/^\//, '');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Error mapping overlay2 ID to container name for ID ${overlayId}:`, error);
|
||||||
|
}
|
||||||
|
return overlayId;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This function inspects each container, looking for an overlay2 directory that matches the provided `overlayId`. Once found, it extracts and returns the container’s human-readable name. If no match is found, the function simply returns the `overlayId` as a fallback.
|
||||||
|
|
||||||
|
### Sending Logs to Discord
|
||||||
|
|
||||||
|
To centralize the logs, the system sends command history to a Discord channel via a webhook. Messages are batched and sent periodically to avoid overwhelming the Discord API.
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
async function sendToDiscordBatch(overlayId, messages) {
|
||||||
|
const containerName = await getContainerNameFromOverlayId(overlayId);
|
||||||
|
const message = messages.join('\n');
|
||||||
|
await axios.post(DISCORD_WEBHOOK_URL, {
|
||||||
|
embeds: [
|
||||||
|
{
|
||||||
|
title: `Container: ${containerName}`,
|
||||||
|
description: message,
|
||||||
|
color: 0x0099ff,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `sendToDiscordBatch` function batches commands for each container and sends them to a Discord channel using a webhook. Each log message is accompanied by the container’s name, making it easy to identify which container each command belongs to.
|
||||||
|
|
||||||
|
### Handling Rate Limiting
|
||||||
|
|
||||||
|
To ensure that the Discord API is not overloaded, the code includes logic to handle rate limiting. If too many requests are sent too quickly, Discord will reject them with a 429 status code. This system respects rate limits by queuing messages and retrying them after a specified delay:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
function handleRateLimitError(response) {
|
||||||
|
if (response.status === 429) { // Rate limit exceeded
|
||||||
|
const retryAfter = response.headers['retry-after'] * 1000;
|
||||||
|
console.warn(`Rate limit exceeded. Retrying after ${retryAfter} ms.`);
|
||||||
|
setTimeout(processMessageQueue, retryAfter);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This ensures that the system remains functional even under heavy usage, without dropping any log data.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Robustness and Resilience
|
||||||
|
|
||||||
|
This solution is designed with resilience in mind. Containers can come and go, logs can be deleted or rotated, and `.bash_history` files may not always be available. The system handles all these issues gracefully by:
|
||||||
|
|
||||||
|
1. **Continuously rescanning** the overlay2 directories for new or deleted `.bash_history` files.
|
||||||
|
2. **Handling file access errors**, ensuring that permission issues or missing files do not cause the program to crash.
|
||||||
|
3. **Managing rate limits** by batching requests and retrying failed ones.
|
||||||
|
|
||||||
|
This resilience makes the system suitable for use in production environments where the container landscape is dynamic and constantly changing.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## What I found
|
||||||
|
|
||||||
|
By automating the inspection of `.bash_history` files across an entire Docker infrastructure, this solution provides a powerful tool for auditing, debugging, and ensuring security compliance. Through the integration of Dockerode, file system monitoring, and Discord for centralized log management, it becomes possible to monitor actions inside containers in real-time.
|
||||||
|
|
||||||
|
This approach can be extended further with additional logging, command filtering, or even alerting on specific patterns in the bash history. As containers continue to play a key role in modern infrastructure, the ability to inspect and audit their internal state will become increasingly important, and this solution offers a scalable, real-time mechanism to do so.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user