mirror of
https://github.com/neon-mmd/websurfx.git
synced 2024-12-21 20:08:21 -05:00
initial commit
This commit is contained in:
commit
15fc415301
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
/target
|
24
CONTRIBUTING.org
Normal file
24
CONTRIBUTING.org
Normal file
@ -0,0 +1,24 @@
|
||||
* Things to consider before contributing.
|
||||
|
||||
** Knowledge Required
|
||||
- Rust basics.
|
||||
- Actix-web crate basics.
|
||||
- Tokio crate and async/await.
|
||||
- Reqwest crate basics.
|
||||
- Serde and serde_json crate basics.
|
||||
- fake_useragent crate basics.
|
||||
- pyo3/hlua/rlua crates basics.
|
||||
|
||||
** Guidelines
|
||||
|
||||
- Please be patient.
|
||||
|
||||
- Treat everyone with respect --- ("give respect and take respect").
|
||||
|
||||
- Document your code properly with rust coding conventions in mind.
|
||||
|
||||
- Provide a brief description of what changes you made in the pull request.
|
||||
|
||||
- Provide an apropriate header to pull request.
|
||||
|
||||
*NOTE:* The rolling branch is where all the contributions should go. In simple terms, it is the working branch for this project.
|
3454
Cargo.lock
generated
Normal file
3454
Cargo.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
17
Cargo.toml
Normal file
17
Cargo.toml
Normal file
@ -0,0 +1,17 @@
|
||||
[package]
|
||||
name = "websurfx"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
|
||||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
|
||||
|
||||
[dependencies]
|
||||
reqwest = {version="*",features=["json"]}
|
||||
tokio = {version="*",features=["full"]}
|
||||
serde = {version="*",features=["derive"]}
|
||||
handlebars = { version = "4.2.1", features = ["dir_source"] }
|
||||
scraper = {version="*"}
|
||||
actix-web = {version="4"}
|
||||
actix-files = {version="0.6.2"}
|
||||
serde_json = {version="*"}
|
||||
fake-useragent = {version="*"}
|
34
README.org
Normal file
34
README.org
Normal file
@ -0,0 +1,34 @@
|
||||
* Websurfx
|
||||
|
||||
a lightening fast, privacy respecting, secure [[https://en.wikipedia.org/wiki/Metasearch_engine][meta search engine]]. (pronounced as websurface or web-surface //wɛbˈsɜːrfəs//.)
|
||||
|
||||
|
||||
* Installation and Testing
|
||||
|
||||
To start installing, testing and playing around with search engine. Clone the repo and then *cargo run* as shown below:
|
||||
|
||||
#+begin_src shell
|
||||
git clone https://gitlab.com/NEON-MMD/websurfx.git
|
||||
cd websurfx
|
||||
cargo run
|
||||
#+end_src
|
||||
|
||||
and then open your browser of choice and visit [[http://127.0.0.1:8080]] and then you can start playing with it right away.
|
||||
|
||||
*Important Note:* Do not over refresh your page or put too many requests or start hosting on production server as there are no behaviours right now in the code to prevent ip blocking (it is still work under process) and you can potential can get your ip blocked/banned and also the project is still under testing phase and far from being complete and ready to be hosted on production machines.
|
||||
|
||||
* Contributing
|
||||
|
||||
Contributions are welcome. It does not matter who you are you can still contribute to the project in your way :).
|
||||
|
||||
** Not a developer but still want to contribute
|
||||
|
||||
Here is [[https://youtu.be/FccdqCucVSI][video]] by Mr. Nick on how to contribute and credit to him as well
|
||||
|
||||
** Developer
|
||||
|
||||
If you are developer, have a look at the [[file:CONTRIBUTING.org][CONTRIBUTING.org]] document for more information.
|
||||
|
||||
* License
|
||||
|
||||
The project is available under the [[file:LICENSE][GPLv3]] license.
|
47
goals.org
Normal file
47
goals.org
Normal file
@ -0,0 +1,47 @@
|
||||
* TODO Goals for v0.1.0
|
||||
|
||||
- [ ] Add code to remove nsfw content from search results using a blocklist.
|
||||
|
||||
- [ ] Add code to disallow user to search sensitive content (similar functionality to swisscows search engine) if strict safe search is turned on.
|
||||
|
||||
- [ ] Add better error handling code to handle scraping error, reqwest error, etc.
|
||||
|
||||
- [ ] Add ability to change colorschemes with the theme style of the page.
|
||||
|
||||
=For example:=
|
||||
|
||||
If there is simple theme is there then there is option of 9 different colorschemes to choose from such as catppuccin-mocha, solarized dark, nord, etc.
|
||||
|
||||
- [ ] Add random delays and behaviours to emulate human behaviour if needed to evade ip blocking.
|
||||
|
||||
- [ ] Add python/lua config to the search engine and also ensuring and giving more control to the user (server maintainer/administrator).
|
||||
|
||||
- [ ] Add settings page to configure search engine on the fly by the user and save his preferences using cookies.
|
||||
|
||||
- [ ] Add search engine logo to index page and to the navbar (on the right hand side).
|
||||
|
||||
- [X] Add code to generate random user agent to protect user's privacy.
|
||||
|
||||
- [X] Add duckduckgo engine as an upstream.
|
||||
|
||||
- [X] Add atleast one searx engine instance as an upstream engine.
|
||||
|
||||
- [X] Add pagination support.
|
||||
|
||||
- [X] Add basic handlebars pages and theme with catppuccin colorscheme.
|
||||
|
||||
* Goals for the future
|
||||
|
||||
- Move from handlebars to faster templating engine /Tera/.
|
||||
|
||||
- Adding more upstream search engines.
|
||||
|
||||
- Add dorking support (like google).
|
||||
|
||||
- Add advanced search functionality and dropdown menu for it.
|
||||
|
||||
- Add more categories to search engine's search page like images, files, all, news, maps, etc.
|
||||
|
||||
- Add advanced image functionality to images category (which will be a great aid for content creators, video editors, etc.).
|
||||
|
||||
- Add GPT integration (taking insipiration from langchain module of python or incorporating it using pyo3) and give user the choice to add api key (giving user the choice is important as there are many people who are against ai and so this will ensure that those people also get satisfied like if they don't gpt so they will not their api key and so it will be disabled).
|
1
public/images/robot-404.svg
Normal file
1
public/images/robot-404.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 98 KiB |
5
public/robots.txt
Normal file
5
public/robots.txt
Normal file
@ -0,0 +1,5 @@
|
||||
User-agent: *
|
||||
Allow: /search
|
||||
Disallow: /*?*q=*
|
||||
Disallow: /static
|
||||
Disallow: /images
|
12
public/static/catppuccin-mocha.css
Normal file
12
public/static/catppuccin-mocha.css
Normal file
@ -0,0 +1,12 @@
|
||||
:root {
|
||||
/* catppuccin-mocha colorscheme */
|
||||
--bg: #1e1e2e;
|
||||
--fg: #cdd6f4;
|
||||
--1: #45475a;
|
||||
--2: #f38ba8;
|
||||
--3: #a6e3a1;
|
||||
--4: #f9e2af;
|
||||
--5: #89b4fa;
|
||||
--6: #f5c2e7;
|
||||
--7: #ffffff;
|
||||
}
|
10
public/static/index.js
Normal file
10
public/static/index.js
Normal file
@ -0,0 +1,10 @@
|
||||
let search_box = document.querySelector('input')
|
||||
function search_web() {
|
||||
window.location = `search?q=${search_box.value}`
|
||||
}
|
||||
|
||||
search_box.addEventListener('keyup', (e) => {
|
||||
if (e.keyCode === 13) {
|
||||
search_web()
|
||||
}
|
||||
})
|
242
public/static/style.css
Normal file
242
public/static/style.css
Normal file
@ -0,0 +1,242 @@
|
||||
@import url('./catppuccin-mocha.css');
|
||||
|
||||
* {
|
||||
padding: 0;
|
||||
margin: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
html {
|
||||
font-size: 62.5%;
|
||||
}
|
||||
|
||||
body {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
height: 100vh;
|
||||
background: var(--1);
|
||||
}
|
||||
|
||||
/* styles for the index page */
|
||||
|
||||
.search-container {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 5rem;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.search-container div {
|
||||
display: flex;
|
||||
}
|
||||
|
||||
/* styles for the search box and search button */
|
||||
|
||||
.search_bar {
|
||||
display: flex;
|
||||
}
|
||||
|
||||
.search_bar input {
|
||||
padding: 1rem;
|
||||
width: 50rem;
|
||||
height: 3rem;
|
||||
outline: none;
|
||||
border: none;
|
||||
box-shadow: rgba(0, 0, 0, 1);
|
||||
background: var(--fg);
|
||||
}
|
||||
|
||||
.search_bar button {
|
||||
padding: 1rem;
|
||||
border-radius: 0;
|
||||
height: 3rem;
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
outline: none;
|
||||
border: none;
|
||||
gap: 0;
|
||||
background: var(--bg);
|
||||
color: var(--3);
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.1rem;
|
||||
}
|
||||
|
||||
.search_bar button:active,
|
||||
.search_bar button:hover {
|
||||
filter: brightness(1.2);
|
||||
}
|
||||
|
||||
/* styles for the footer and header */
|
||||
|
||||
header {
|
||||
background: var(--bg);
|
||||
width: 100%;
|
||||
display: flex;
|
||||
justify-content: right;
|
||||
align-items: center;
|
||||
padding: 1rem;
|
||||
}
|
||||
|
||||
header ul,
|
||||
footer ul {
|
||||
list-style: none;
|
||||
display: flex;
|
||||
justify-content: space-around;
|
||||
align-items: center;
|
||||
font-size: 1.5rem;
|
||||
gap: 2rem;
|
||||
}
|
||||
|
||||
header ul li a,
|
||||
footer ul li a,
|
||||
header ul li a:visited,
|
||||
footer ul li a:visited {
|
||||
text-decoration: none;
|
||||
color: var(--2);
|
||||
text-transform: capitalize;
|
||||
letter-spacing: 0.1rem;
|
||||
}
|
||||
|
||||
header ul li a {
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
header ul li a:hover,
|
||||
footer ul li a:hover {
|
||||
color: var(--5);
|
||||
}
|
||||
|
||||
footer div span {
|
||||
font-size: 1.5rem;
|
||||
color: var(--4);
|
||||
}
|
||||
|
||||
footer div {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
footer {
|
||||
background: var(--bg);
|
||||
width: 100%;
|
||||
padding: 1rem;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
/* Styles for the search page */
|
||||
|
||||
.results {
|
||||
width: 90%;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: space-around;
|
||||
}
|
||||
|
||||
.results .search_bar {
|
||||
margin: 1rem 0;
|
||||
}
|
||||
|
||||
.results_aggregated {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: space-between;
|
||||
margin: 2rem 0;
|
||||
}
|
||||
|
||||
.results_aggregated .result {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
margin-top: 1rem;
|
||||
}
|
||||
|
||||
.results_aggregated .result h1 a {
|
||||
font-size: 1.5rem;
|
||||
color: var(--2);
|
||||
text-decoration: none;
|
||||
letter-spacing: 0.1rem;
|
||||
}
|
||||
|
||||
.results_aggregated .result h1 a:hover {
|
||||
color: var(--5);
|
||||
}
|
||||
|
||||
.results_aggregated .result h1 a:visited {
|
||||
color: var(--bg);
|
||||
}
|
||||
|
||||
.results_aggregated .result small {
|
||||
color: var(--3);
|
||||
font-size: 1.1rem;
|
||||
word-wrap: break-word;
|
||||
line-break: anywhere;
|
||||
}
|
||||
|
||||
.results_aggregated .result p {
|
||||
color: var(--fg);
|
||||
font-size: 1.2rem;
|
||||
margin-top: 0.3rem;
|
||||
word-wrap: break-word;
|
||||
line-break: anywhere;
|
||||
}
|
||||
|
||||
.results_aggregated .result .upstream_engines {
|
||||
text-align: right;
|
||||
font-size: 1.2rem;
|
||||
padding: 1rem;
|
||||
color: var(--5);
|
||||
}
|
||||
|
||||
/* Styles for the 404 page */
|
||||
|
||||
.error_container {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
align-items: center;
|
||||
width: 100%;
|
||||
gap: 5rem;
|
||||
}
|
||||
|
||||
.error_container img {
|
||||
width: 30%;
|
||||
}
|
||||
|
||||
.error_content {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
justify-content: center;
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
.error_content h1,
|
||||
.error_content h2 {
|
||||
letter-spacing: 0.1rem;
|
||||
}
|
||||
|
||||
.error_content h1 {
|
||||
font-size: 3rem;
|
||||
}
|
||||
|
||||
.error_content h2 {
|
||||
font-size: 2rem;
|
||||
}
|
||||
|
||||
.error_content p {
|
||||
font-size: 1.2rem;
|
||||
}
|
||||
|
||||
.error_content p a,
|
||||
.error_content p a:visited {
|
||||
color: var(--2);
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.error_content p a:hover {
|
||||
color: var(--5);
|
||||
}
|
10
public/templates/404.html
Normal file
10
public/templates/404.html
Normal file
@ -0,0 +1,10 @@
|
||||
{{>header}}
|
||||
<main class="error_container">
|
||||
<img src="images/robot-404.svg" alt="Image of broken robot." />
|
||||
<div class="error_content">
|
||||
<h1>Aw! snap</h1>
|
||||
<h2>404 Page Not Found!</h2>
|
||||
<p>Go to <a href="/">search page</a></p>
|
||||
</div>
|
||||
</main>
|
||||
{{>footer}}
|
20
public/templates/about.html
Normal file
20
public/templates/about.html
Normal file
@ -0,0 +1,20 @@
|
||||
{{>header}}
|
||||
<main class="about-container">
|
||||
<h1>Websurfx</h1>
|
||||
<small
|
||||
>a lightening fast, privacy respecting, secure meta search engine</small
|
||||
>
|
||||
<article>
|
||||
Lorem ipsum dolor sit amet, officia excepteur ex fugiat reprehenderit enim
|
||||
labore culpa sint ad nisi Lorem pariatur mollit ex esse exercitation amet.
|
||||
Nisi anim cupidatat excepteur officia. Reprehenderit nostrud nostrud ipsum
|
||||
Lorem est aliquip amet voluptate voluptate dolor minim nulla est proident.
|
||||
Nostrud officia pariatur ut officia. Sit irure elit esse ea nulla sunt ex
|
||||
occaecat reprehenderit commodo officia dolor Lorem duis laboris cupidatat
|
||||
officia voluptate. Culpa proident adipisicing id nulla nisi laboris ex in
|
||||
Lorem sunt duis officia eiusmod. Aliqua reprehenderit commodo ex non
|
||||
excepteur duis sunt velit enim. Voluptate laboris sint cupidatat ullamco ut
|
||||
ea consectetur et est culpa et culpa duis.
|
||||
</article>
|
||||
</main>
|
||||
{{>footer}}
|
15
public/templates/footer.html
Normal file
15
public/templates/footer.html
Normal file
@ -0,0 +1,15 @@
|
||||
<footer>
|
||||
<div>
|
||||
<span>Powered By <b>Websurfx</b></span><span>-</span><span>a lightening fast, privacy respecting, secure meta
|
||||
search engine</span>
|
||||
</div>
|
||||
<div>
|
||||
<ul>
|
||||
<li><a href="#">Source Code</a></li>
|
||||
<li><a href="#">Issues/Bugs</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</footer>
|
||||
</body>
|
||||
|
||||
</html>
|
11
public/templates/header.html
Normal file
11
public/templates/header.html
Normal file
@ -0,0 +1,11 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<title></title>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link href="static/style.css" rel="stylesheet" type="text/css" />
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<header>{{>navbar}}</header>
|
7
public/templates/index.html
Normal file
7
public/templates/index.html
Normal file
@ -0,0 +1,7 @@
|
||||
{{>header}}
|
||||
<main class="search-container">
|
||||
<img src="images/fps_logo.png" alt="Websurfx meta-search engine logo" />
|
||||
{{>search_bar}}
|
||||
</main>
|
||||
<script src="static/index.js"></script>
|
||||
{{>footer}}
|
6
public/templates/navbar.html
Normal file
6
public/templates/navbar.html
Normal file
@ -0,0 +1,6 @@
|
||||
<nav>
|
||||
<ul>
|
||||
<li><a href="about">about</a></li>
|
||||
<li><a href="settings">settings</a></li>
|
||||
</ul>
|
||||
</nav>
|
20
public/templates/search.html
Normal file
20
public/templates/search.html
Normal file
@ -0,0 +1,20 @@
|
||||
{{>header}}
|
||||
<main class="results">
|
||||
{{>search_bar}}
|
||||
<div class="results_aggregated">
|
||||
{{#each results}}
|
||||
<div class="result">
|
||||
<h1><a href="{{this.visitingUrl}}">{{{this.title}}}</a></h1>
|
||||
<small>{{this.url}}</small>
|
||||
<p>{{{this.description}}}</p>
|
||||
<div class="upstream_engines">
|
||||
{{#each engine}}
|
||||
<span>{{this}}</span>
|
||||
{{/each}}
|
||||
</div>
|
||||
</div>
|
||||
{{/each}}
|
||||
</div>
|
||||
</main>
|
||||
<script src="static/index.js"></script>
|
||||
{{>footer}}
|
9
public/templates/search_bar.html
Normal file
9
public/templates/search_bar.html
Normal file
@ -0,0 +1,9 @@
|
||||
<div class="search_bar">
|
||||
<input
|
||||
type="search"
|
||||
name="search-box"
|
||||
value="{{this.pageQuery}}"
|
||||
placeholder="Type to search"
|
||||
/>
|
||||
<button type="submit" onclick="search_web()">search</button>
|
||||
</div>
|
5
public/templates/settings.html
Normal file
5
public/templates/settings.html
Normal file
@ -0,0 +1,5 @@
|
||||
{{>header}}
|
||||
<main class="settings">
|
||||
<h1>Page is under construction</h1>
|
||||
</main>
|
||||
{{>footer}}
|
35
src/bin/websurfx.rs
Normal file
35
src/bin/websurfx.rs
Normal file
@ -0,0 +1,35 @@
|
||||
use websurfx::server::routes;
|
||||
|
||||
use actix_files as fs;
|
||||
use actix_web::{web, App, HttpServer};
|
||||
use handlebars::Handlebars;
|
||||
|
||||
// The function that launches the main server and handle routing functionality
|
||||
#[actix_web::main]
|
||||
async fn main() -> std::io::Result<()> {
|
||||
let mut handlebars: Handlebars = Handlebars::new();
|
||||
|
||||
handlebars
|
||||
.register_templates_directory(".html", "./public/templates")
|
||||
.unwrap();
|
||||
|
||||
let handlebars_ref: web::Data<Handlebars> = web::Data::new(handlebars);
|
||||
|
||||
HttpServer::new(move || {
|
||||
App::new()
|
||||
.app_data(handlebars_ref.clone())
|
||||
// Serve images and static files (css and js files).
|
||||
.service(fs::Files::new("/static", "./public/static").show_files_listing())
|
||||
.service(fs::Files::new("/images", "./public/images").show_files_listing())
|
||||
.service(routes::robots_data) // robots.txt
|
||||
.service(routes::index) // index page
|
||||
.service(routes::search) // search page
|
||||
.service(routes::about) // about page
|
||||
.service(routes::settings) // settings page
|
||||
.default_service(web::route().to(routes::not_found)) // error page
|
||||
})
|
||||
// Start server on 127.0.0.1:8080
|
||||
.bind(("127.0.0.1", 8080))?
|
||||
.run()
|
||||
.await
|
||||
}
|
96
src/engines/duckduckgo.rs
Normal file
96
src/engines/duckduckgo.rs
Normal file
@ -0,0 +1,96 @@
|
||||
use std::collections::HashMap;
|
||||
|
||||
use reqwest::header::USER_AGENT;
|
||||
use scraper::{Html, Selector};
|
||||
|
||||
use crate::search_results_handler::aggregation_models::RawSearchResult;
|
||||
|
||||
// This function scrapes results from the upstream engine duckduckgo and puts all the scraped
|
||||
// results like title, visiting_url (href in html),engine (from which engine it was fetched from)
|
||||
// and description in a RawSearchResult and then adds that to HashMap whose keys are url and
|
||||
// values are RawSearchResult struct and then returns it within a Result enum.
|
||||
pub async fn results(
|
||||
query: &str,
|
||||
page: Option<u32>,
|
||||
user_agent: &str,
|
||||
) -> Result<HashMap<String, RawSearchResult>, Box<dyn std::error::Error>> {
|
||||
// Page number can be missing or empty string and so appropriate handling is required
|
||||
// so that upstream server recieves valid page number.
|
||||
let url: String = match page {
|
||||
Some(page_number) => {
|
||||
if page_number <= 1 {
|
||||
format!("https://html.duckduckgo.com/html/?q={query}&s=&dc=&v=1&o=json&api=/d.js")
|
||||
} else {
|
||||
format!(
|
||||
"https://duckduckgo.com/html/?q={}&s={}&dc={}&v=1&o=json&api=/d.js",
|
||||
query,
|
||||
page_number / 2 * 30,
|
||||
page_number / 2 * 30 + 1
|
||||
)
|
||||
}
|
||||
}
|
||||
None => format!("https://html.duckduckgo.com/html/?q={query}&s=&dc=&v=1&o=json&api=/d.js"),
|
||||
};
|
||||
|
||||
// fetch the html from upstream duckduckgo engine
|
||||
// TODO: Write better error handling code to handle no results case.
|
||||
let results: String = reqwest::Client::new()
|
||||
.get(url)
|
||||
.header(USER_AGENT, user_agent)
|
||||
.send()
|
||||
.await?
|
||||
.text()
|
||||
.await?;
|
||||
|
||||
let document: Html = Html::parse_document(&results);
|
||||
let results: Selector = Selector::parse(".result")?;
|
||||
let result_title: Selector = Selector::parse(".result__a")?;
|
||||
let result_url: Selector = Selector::parse(".result__url")?;
|
||||
let result_desc: Selector = Selector::parse(".result__snippet")?;
|
||||
|
||||
let mut search_results: HashMap<String, RawSearchResult> = HashMap::new();
|
||||
|
||||
// scrape all the results from the html
|
||||
for result in document.select(&results) {
|
||||
let search_result: RawSearchResult = RawSearchResult {
|
||||
title: result
|
||||
.select(&result_title)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
.to_string(),
|
||||
visiting_url: format!(
|
||||
"https://{}",
|
||||
result
|
||||
.select(&result_url)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
),
|
||||
description: result
|
||||
.select(&result_desc)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
.to_string(),
|
||||
engine: vec!["duckduckgo".to_string()],
|
||||
};
|
||||
search_results.insert(
|
||||
format!(
|
||||
"https://{}",
|
||||
result
|
||||
.select(&result_url)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
),
|
||||
search_result,
|
||||
);
|
||||
}
|
||||
|
||||
Ok(search_results)
|
||||
}
|
2
src/engines/mod.rs
Normal file
2
src/engines/mod.rs
Normal file
@ -0,0 +1,2 @@
|
||||
pub mod duckduckgo;
|
||||
pub mod searx;
|
89
src/engines/searx.rs
Normal file
89
src/engines/searx.rs
Normal file
@ -0,0 +1,89 @@
|
||||
use std::collections::HashMap;
|
||||
|
||||
use reqwest::header::USER_AGENT;
|
||||
use scraper::{Html, Selector};
|
||||
|
||||
use crate::search_results_handler::aggregation_models::RawSearchResult;
|
||||
|
||||
// This function scrapes results from the upstream engine searx instance and puts all the scraped
|
||||
// results like title, visiting_url (href in html),engine (from which engine it was fetched from)
|
||||
// and description in a RawSearchResult and then adds that to HashMap whose keys are url and
|
||||
// values are RawSearchResult struct and then returns it within a Result enum.
|
||||
pub async fn results(
|
||||
query: &str,
|
||||
page: Option<u32>,
|
||||
user_agent: &str,
|
||||
) -> Result<HashMap<String, RawSearchResult>, Box<dyn std::error::Error>> {
|
||||
// Page number can be missing or empty string and so appropriate handling is required
|
||||
// so that upstream server recieves valid page number.
|
||||
let url: String = match page {
|
||||
Some(page_number) => {
|
||||
if page_number <= 1 {
|
||||
format!("https://searx.work/search?q={query}")
|
||||
} else {
|
||||
format!("https://searx.work/search?q={query}&pageno={page_number}",)
|
||||
}
|
||||
}
|
||||
None => format!("https://searx.work/search?q={query}"),
|
||||
};
|
||||
|
||||
// fetch the html from upstream searx instance engine
|
||||
// TODO: Write better error handling code to handle no results case.
|
||||
let results: String = reqwest::Client::new()
|
||||
.get(url)
|
||||
.header(USER_AGENT, user_agent)
|
||||
.send()
|
||||
.await?
|
||||
.text()
|
||||
.await?;
|
||||
|
||||
let document: Html = Html::parse_document(&results);
|
||||
let results: Selector = Selector::parse(".result")?;
|
||||
let result_title: Selector = Selector::parse("h3>a")?;
|
||||
let result_url: Selector = Selector::parse("h3>a")?;
|
||||
let result_desc: Selector = Selector::parse(".content")?;
|
||||
|
||||
let mut search_results: HashMap<String, RawSearchResult> = HashMap::new();
|
||||
|
||||
// scrape all the results from the html
|
||||
for result in document.select(&results) {
|
||||
let search_result: RawSearchResult = RawSearchResult {
|
||||
title: result
|
||||
.select(&result_title)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
.to_string(),
|
||||
visiting_url: result
|
||||
.select(&result_url)
|
||||
.next()
|
||||
.unwrap()
|
||||
.value()
|
||||
.attr("href")
|
||||
.unwrap()
|
||||
.to_string(),
|
||||
description: result
|
||||
.select(&result_desc)
|
||||
.next()
|
||||
.unwrap()
|
||||
.inner_html()
|
||||
.trim()
|
||||
.to_string(),
|
||||
engine: vec!["searx".to_string()],
|
||||
};
|
||||
search_results.insert(
|
||||
result
|
||||
.select(&result_url)
|
||||
.next()
|
||||
.unwrap()
|
||||
.value()
|
||||
.attr("href")
|
||||
.unwrap()
|
||||
.to_string(),
|
||||
search_result,
|
||||
);
|
||||
}
|
||||
|
||||
Ok(search_results)
|
||||
}
|
3
src/lib.rs
Normal file
3
src/lib.rs
Normal file
@ -0,0 +1,3 @@
|
||||
pub mod engines;
|
||||
pub mod server;
|
||||
pub mod search_results_handler;
|
25
src/search_results_handler/aggregation_models.rs
Normal file
25
src/search_results_handler/aggregation_models.rs
Normal file
@ -0,0 +1,25 @@
|
||||
use serde::Serialize;
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct SearchResult {
|
||||
pub title: String,
|
||||
pub visiting_url: String,
|
||||
pub url: String,
|
||||
pub description: String,
|
||||
pub engine: Vec<String>,
|
||||
}
|
||||
|
||||
pub struct RawSearchResult {
|
||||
pub title: String,
|
||||
pub visiting_url: String,
|
||||
pub description: String,
|
||||
pub engine: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct SearchResults {
|
||||
pub results: Vec<SearchResult>,
|
||||
pub page_query: String,
|
||||
}
|
77
src/search_results_handler/aggregator.rs
Normal file
77
src/search_results_handler/aggregator.rs
Normal file
@ -0,0 +1,77 @@
|
||||
use std::collections::HashMap;
|
||||
|
||||
use fake_useragent::{Browsers, UserAgentsBuilder};
|
||||
|
||||
use super::aggregation_models::{RawSearchResult, SearchResult, SearchResults};
|
||||
use crate::engines::{duckduckgo, searx};
|
||||
|
||||
// A function that aggregates all the scraped results from the above upstream engines and
|
||||
// then removes duplicate results and if two results are found to be from two or more engines
|
||||
// then puts their names together to show the results are fetched from these upstream engines
|
||||
// and then removes all data from the HashMap and puts into a struct of all results aggregated
|
||||
// into a vector and also adds the query used into the struct this is neccessory because otherwise
|
||||
// the search bar in search remains empty if searched from the query url
|
||||
//
|
||||
// For Example:
|
||||
//
|
||||
// If you search from the url like *https://127.0.0.1/search?q=huston* then the search bar should
|
||||
// contain the word huston and not remain empty.
|
||||
pub async fn aggregate(
|
||||
query: &str,
|
||||
page: Option<u32>,
|
||||
) -> Result<SearchResults, Box<dyn std::error::Error>> {
|
||||
// Generate random user agent to improve privacy of the user.
|
||||
let user_agent: String = UserAgentsBuilder::new()
|
||||
.cache(false)
|
||||
.dir("/tmp")
|
||||
.thread(1)
|
||||
.set_browsers(
|
||||
Browsers::new()
|
||||
.set_chrome()
|
||||
.set_safari()
|
||||
.set_edge()
|
||||
.set_firefox()
|
||||
.set_mozilla(),
|
||||
)
|
||||
.build()
|
||||
.random()
|
||||
.to_string();
|
||||
|
||||
let mut result_map: HashMap<String, RawSearchResult> = HashMap::new();
|
||||
|
||||
let ddg_map_results: HashMap<String, RawSearchResult> =
|
||||
duckduckgo::results(query, page, &user_agent).await?;
|
||||
let searx_map_results: HashMap<String, RawSearchResult> =
|
||||
searx::results(query, page, &user_agent).await?;
|
||||
|
||||
result_map.extend(ddg_map_results);
|
||||
|
||||
for (key, value) in searx_map_results.into_iter() {
|
||||
if result_map.contains_key(&key) {
|
||||
result_map
|
||||
.get_mut(&key)
|
||||
.unwrap()
|
||||
.engine
|
||||
.push(value.engine.get(0).unwrap().to_string())
|
||||
} else {
|
||||
result_map.insert(key, value);
|
||||
}
|
||||
}
|
||||
|
||||
let mut search_results: Vec<SearchResult> = Vec::new();
|
||||
|
||||
for (key, value) in result_map.into_iter() {
|
||||
search_results.push(SearchResult {
|
||||
title: value.title,
|
||||
visiting_url: value.visiting_url,
|
||||
url: key,
|
||||
description: value.description,
|
||||
engine: value.engine,
|
||||
})
|
||||
}
|
||||
|
||||
Ok(SearchResults {
|
||||
results: search_results,
|
||||
page_query: query.to_string(),
|
||||
})
|
||||
}
|
2
src/search_results_handler/mod.rs
Normal file
2
src/search_results_handler/mod.rs
Normal file
@ -0,0 +1,2 @@
|
||||
pub mod aggregation_models;
|
||||
pub mod aggregator;
|
1
src/server/mod.rs
Normal file
1
src/server/mod.rs
Normal file
@ -0,0 +1 @@
|
||||
pub mod routes;
|
79
src/server/routes.rs
Normal file
79
src/server/routes.rs
Normal file
@ -0,0 +1,79 @@
|
||||
use std::fs::read_to_string;
|
||||
|
||||
use crate::search_results_handler::aggregator::aggregate;
|
||||
use actix_web::{get, web, HttpRequest, HttpResponse};
|
||||
use handlebars::Handlebars;
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct SearchParams {
|
||||
q: Option<String>,
|
||||
page: Option<u32>,
|
||||
}
|
||||
|
||||
#[get("/")]
|
||||
pub async fn index(
|
||||
hbs: web::Data<Handlebars<'_>>,
|
||||
) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let page_content: String = hbs.render("index", &"").unwrap();
|
||||
Ok(HttpResponse::Ok().body(page_content))
|
||||
}
|
||||
|
||||
pub async fn not_found(
|
||||
hbs: web::Data<Handlebars<'_>>,
|
||||
) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let page_content: String = hbs.render("404", &"")?;
|
||||
|
||||
Ok(HttpResponse::Ok()
|
||||
.content_type("text/html; charset=utf-8")
|
||||
.body(page_content))
|
||||
}
|
||||
|
||||
#[get("/search")]
|
||||
pub async fn search(
|
||||
hbs: web::Data<Handlebars<'_>>,
|
||||
req: HttpRequest,
|
||||
) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let params = web::Query::<SearchParams>::from_query(req.query_string())?;
|
||||
match ¶ms.q {
|
||||
Some(query) => {
|
||||
if query.trim().is_empty() {
|
||||
Ok(HttpResponse::Found()
|
||||
.insert_header(("location", "/"))
|
||||
.finish())
|
||||
} else {
|
||||
let results_json: crate::search_results_handler::aggregation_models::SearchResults =
|
||||
aggregate(query, params.page).await?;
|
||||
let page_content: String = hbs.render("search", &results_json)?;
|
||||
Ok(HttpResponse::Ok().body(page_content))
|
||||
}
|
||||
}
|
||||
None => Ok(HttpResponse::Found()
|
||||
.insert_header(("location", "/"))
|
||||
.finish()),
|
||||
}
|
||||
}
|
||||
|
||||
#[get("/robots.txt")]
|
||||
pub async fn robots_data(_req: HttpRequest) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let page_content: String = read_to_string("./public/robots.txt")?;
|
||||
Ok(HttpResponse::Ok()
|
||||
.content_type("text/plain; charset=ascii")
|
||||
.body(page_content))
|
||||
}
|
||||
|
||||
#[get("/about")]
|
||||
pub async fn about(
|
||||
hbs: web::Data<Handlebars<'_>>,
|
||||
) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let page_content: String = hbs.render("about", &"")?;
|
||||
Ok(HttpResponse::Ok().body(page_content))
|
||||
}
|
||||
|
||||
#[get("/settings")]
|
||||
pub async fn settings(
|
||||
hbs: web::Data<Handlebars<'_>>,
|
||||
) -> Result<HttpResponse, Box<dyn std::error::Error>> {
|
||||
let page_content: String = hbs.render("settings", &"")?;
|
||||
Ok(HttpResponse::Ok().body(page_content))
|
||||
}
|
Loading…
Reference in New Issue
Block a user