This commit is contained in:
Raven Scott 2025-06-07 20:18:16 -04:00
parent 58b6cf5dfc
commit 07a2832c8f

View File

@ -14,7 +14,7 @@ The AI landscape in 2025 is a dystopian fever dream, a chaotic gold rush where t
The transformer architecture, a statistical trick for predicting text, has been inflated into a godlike entity, worshipped with fanatical zeal while ignoring the wreckage it leaves behind. This obsession with scale is collective madness. Models are trained on datasets so colossal - trillions of tokens scraped from the internets cesspool, books, and corporate sludge - that even their creators cant untangle the mess.
Every company, every startup, every wannabe AI guru is unleashing armies of scrapers to plunder the web, hammering servers and destabilizing the digital ecosystem. A March 2025 McKinsey survey reveals that over 80% of organizations deploying generative AI see no tangible enterprise-level impact, suggesting this rush is driven by hype, not results [McKinsey, 2025](https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai). High-profile failures - like Samsung banning ChatGPT after code leaks, Googles Bard hallucinating, Zillows AI pricing flop costing millions, and IBM Watson Healths erroneous cancer recommendations - underscore the chaos [Lakera, 2024](https://www.lakera.ai/blog/risks-of-ai). Were not building progress; were orchestrating a digital apocalypse.
Every company, every startup, every wannabe AI guru is unleashing armies of scrapers to plunder the web, hammering servers and destabilizing the digital ecosystem. High-profile failures - like Samsung banning ChatGPT after code leaks, Googles Bard hallucinating, Zillows AI pricing flop costing millions, and IBM Watson Healths erroneous cancer recommendations - underscore the chaos [Lakera, 2024](https://www.lakera.ai/blog/risks-of-ai). Were not building progress; were orchestrating a digital apocalypse.
## Apples *Illusion of Thinking*: A Flamethrower to AIs Lies
@ -26,7 +26,7 @@ These models are erratic, nailing one puzzle only to choke on a near-identical o
The “thinking processes” LRMs boast are a marketing stunt, revealed by Apple as a chaotic mess of incoherent leaps, dead ends, and half-baked ideas - not thought, but algorithmic vomit. LRMs fail to use explicit algorithms, even when essential, faking it with statistical sleight-of-hand that collapses under scrutiny. This brittleness isnt theoretical: IBM Watson Healths cancer AI made erroneous treatment recommendations, risking malpractice, and Googles Bard hallucinated inaccurate information [Lakera, 2024](https://www.lakera.ai/blog/risks-of-ai).
A January 2025 McKinsey report notes that 50% of employees worry about AI inaccuracy, 51% fear cybersecurity risks, and many cite data leaks, aligning with Apples findings of unreliable outputs [McKinsey, 2025](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work). A 2023 study in *Frontiers* highlights AIs inability to ethically solve problems or explain results, further questioning its readiness for critical applications [Frontiers, 2023](https://www.frontiersin.org/articles/10.3389/frai.2023.1148154/full). This isnt a warning - its a guillotine.
A January 2025 McKinsey report notes that 50% of employees worry about AI inaccuracy, 51% fear cybersecurity risks, and many cite data leaks, aligning with Apples findings of unreliable outputs [McKinsey, 2025](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work). This isnt a warning - its a guillotine.
## Amodeis Confession: Were Flying Blind
@ -44,11 +44,11 @@ The AI industrys data addiction is a digital plague, and web scraping is its
But Scrapys non-sane defaults and aggressive concurrency are a death sentence for servers. Its defaults prioritize speed over ethics, overwhelming servers with relentless requests. Small websites, blogs, and forums - run by individuals or small businesses - crash or rack up crippling bandwidth costs. The X post brags about handling “tens of thousands of pages,” but each page is a sledgehammer to someones infrastructure.
The internet thrives on open access, but scraping is strangling it. Websites implement bot protections, paywalls, or IP bans, locking out legitimate users. The X post admits to “bot protections” and “IP bans” as challenges, but Scrapys workarounds escalate this arms race, turning the web into a walled garden. A June 2025 *Nature* article reports that AI-driven scraping is overwhelming academic websites, with sites like DiscoverLife receiving millions of daily hits, slowing them to unusability [Nature, 2025](https://www.nature.com/articles/d41586-025-01743-9).
The internet thrives on open access, but scraping is strangling it. Websites implement bot protections, paywalls, or IP bans, locking out legitimate users. The X post admits to “bot protections” and “IP bans” as challenges, but Scrapys workarounds escalate this arms race, turning the web into a walled garden.
Scrapers plunder content without consent, stealing intellectual property, leaving creators - writers, artists, publishers - with no compensation. The X posts “clean datasets” fantasy ignores the dirty truth: this data is pilfered. A February 2025 EPIC report calls this the “great scrape,” noting that companies like OpenAI allegedly scraped New York Times articles, copyrighted books, and YouTube videos without permission, violating privacy and IP rights [EPIC, 2025](https://epic.org/scraping-for-me-not-for-thee-large-language-models-web-data-and-privacy-problematic-paradigms/).
Scraping collects sensitive personal information without consent, raising privacy concerns. A 2024 OECD report highlights how scraping violates privacy laws and the OECD AI Principles, risking identity fraud and cyberattacks [OECD, 2024](https://oecd.ai/en/data-scraping-challenge). A May 2025 Simplilearn article notes that scraping exacerbates AIs privacy violations, advocating for GDPR and HIPAA compliance [Simplilearn, 2025](https://www.simplilearn.com/challenges-of-artificial-intelligence-article).
Scraping collects sensitive personal information without consent, raising privacy concerns. A 2024 OECD report highlights how scraping violates privacy laws and the OECD AI Principles, risking identity fraud and cyberattacks. A May 2025 Simplilearn article notes that scraping exacerbates AIs privacy violations, advocating for GDPR and HIPAA compliance [Simplilearn, 2025](https://www.simplilearn.com/challenges-of-artificial-intelligence-article).
Millions of scrapers clog networks, slowing access, while data centers strain, driving up energy costs and carbon emissions. Websites lose faith, shutting down or going offline, shrinking the internets diversity. Scrapys defenders claim its “essential” for LLMs, but thats a lie. This data hunger is a choice. By glorifying server-killing tools, were murdering the internets soul. Deploying LLMs built on this stolen foundation isnt reckless - its immoral.
@ -60,7 +60,7 @@ Scraped datasets are a toxic stew of biases, errors, and garbage. Models inherit
Transformers attention mechanisms “hallucinate” nonexistent connections, sparking errors that could mean lawsuits or worse in production. Models overfit to scraped data quirks, brittle in real-world scenarios - a context shift, and theyre lost. LRMs burn obscene resources for negligible gains. Apple showed their “reasoning” doesnt scale, yet we torch energy grids to keep the farce alive.
A 2024 *ScienceDirect* article lists LLM vulnerabilities: prompt injection manipulates outputs, training data poisoning introduces biases, PII breaches occur during training, insecure output handling generates harmful content, and denial-of-service attacks disrupt availability [ScienceDirect, 2024](https://www.sciencedirect.com/science/article/pii/S2666659024000130). A 2021 arXiv paper outlines six risk areas for LLMs, including discrimination, toxicity, misinformation, and environmental harms, all amplified by flawed data [arXiv, 2021](https://arxiv.org/abs/2112.04359). This isnt a system - its a house of horrors, and deploying it on stolen, server-killing data is lunacy.
A 2021 arXiv paper outlines six risk areas for LLMs, including discrimination, toxicity, misinformation, and environmental harms, all amplified by flawed data [arXiv, 2021](https://arxiv.org/abs/2112.04359). This isnt a system - its a house of horrors, and deploying it on stolen, server-killing data is lunacy.
## Why Deployment Is a Betrayal
@ -80,7 +80,7 @@ Governments are asleep, with no framework to govern AI or scrapings risks, no
Autonomous AI in critical systems, powered by flawed LRMs, is a death sentence - Apples research shows failures go unchecked without human oversight, amplifying harm exponentially. Scrapings data theft, glorified by the X post, steals from creators, undermining the webs creative ecosystem. Deploying LLMs built on this is endorsing piracy at scale. Scrapings server attacks are killing the open web, forcing websites behind paywalls or offline, shrinking the internets diversity. LLMs are complicit in this murder.
Scraped data fuels LLMs that churn soulless text, drowning human creativity and turning culture into algorithmic sludge, disconnecting us from authenticity. A 2024 MIT Press article on ChatGPT warns of its potential for malicious misuse, privacy violations, and bias propagation, exacerbated by unregulated data practices [MIT Press, 2024](https://direct.mit.edu/dint/article/6/1/150/119771/The-Limitations-and-Ethical-Considerations-of). A 2020 Harvard Gazette report notes that AIs lack of oversight risks societal harm, with regulators ill-equipped to keep pace [Harvard Gazette, 2020](https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/).
Scraped data fuels LLMs that churn soulless text, drowning human creativity and turning culture into algorithmic sludge, disconnecting us from authenticity A 2020 Harvard Gazette report notes that AIs lack of oversight risks societal harm, with regulators ill-equipped to keep pace [Harvard Gazette, 2020](https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/).
## Toxic Incentives: Profit Over Existence
@ -116,7 +116,7 @@ These steps arent optional - theyre the only way to save ourselves from th
The AI industrys peddling a fairy tale, and were the suckers buying it. LLMs and LRMs arent saviors - theyre ticking bombs wrapped in buzzwords, built on a dying internets ashes. Apples *The Illusion of Thinking* and Amodeis confession are klaxons blaring in our faces. Scrapys server-killing rampage, glorified on X, is the final straw - were not just risking failure; were murdering the digital world that sustains us.
From high-profile deployment failures - Samsung, Google, Zillow, IBM - to the ethical quagmire of web scraping, from AIs environmental toll to its persistent opacity, the evidence is overwhelming. Over 80% of organizations see no tangible AI impact, yet the rush continues [McKinsey, 2025](https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai). IBM warns of escalating risks like data leakage [IBM, 2025](https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality). Lakera documents privacy violations from scraping, amplifying harm [Lakera, 2024](https://www.lakera.ai/blog/risks-of-ai). This isnt a mistake - its a betrayal of humanitys trust.
From high-profile deployment failures - Samsung, Google, Zillow, IBM - to the ethical quagmire of web scraping, from AIs environmental toll to its persistent opacity, the evidence is overwhelming. IBM warns of escalating risks like data leakage [IBM, 2025](https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality). Lakera documents privacy violations from scraping, amplifying harm [Lakera, 2024](https://www.lakera.ai/blog/risks-of-ai). This isnt a mistake - its a betrayal of humanitys trust.
Deploying LLMs and LRMs, fueled by scrapings destruction, isnt just dumb - its a crime against our survival. Lock them in the lab, crack the code, and stop the internets slaughter, or brace for the apocalypse. The clocks ticking, and were out of excuses.
@ -125,18 +125,12 @@ Deploying LLMs and LRMs, fueled by scrapings destruction, isnt just dumb -
- Shojaee, Parshin, et al. “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity.” Apple Machine Learning Research, June 2025, https://machinelearning.apple.com/research/illusion-of-thinking.
- Amodei, Dario. “Essay on AI Interpretability.” Personal website, 2025, quoted in Futurism, https://futurism.com/anthropic-ceo-admits-ai-ignorance.
- Anonymous. “The web scraping tool Scrapy.” X post, 2025, https://x.com/birgenbilge_mk/status/1930558228590428457?s=46
- McKinsey & Company. “The state of AI: How organizations are rewiring to capture value.” March 2025, https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai.
- Lakera. “AI Risks: Exploring the Critical Challenges of Artificial Intelligence.” 2024, https://www.lakera.ai/blog/risks-of-ai.
- McKinsey & Company. “AI in the workplace: A report for 2025.” January 2025, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work.
- IBM. “AI Agents in 2025: Expectations vs. Reality.” March 2025, https://www.ibm.com/think/insights/ai-agents-2025-expectations-vs-reality.
- Simplilearn. “Top 15 Challenges of Artificial Intelligence in 2025.” May 2025, https://www.simplilearn.com/challenges-of-artificial-intelligence-article.
- Nature. “Web-scraping AI bots cause disruption for scientific databases and journals.” June 2025, https://www.nature.com/articles/d41586-025-01743-9.
- EPIC. “Scraping for Me, Not for Thee: Large Language Models, Web Data, and Privacy-Problematic Paradigms.” February 2025, https://epic.org/scraping-for-me-not-for-thee-large-language-models-web-data-and-privacy-problematic-paradigms/.
- OECD. “The AI data scraping challenge: How can we proceed responsibly?” 2024, https://oecd.ai/en/data-scraping-challenge.
- ScienceDirect. “A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly.” 2024, https://www.sciencedirect.com/science/article/pii/S2666659024000130.
- arXiv. “Ethical and social risks of harm from Language Models.” 2021, https://arxiv.org/abs/2112.04359.
- Frontiers. “Specific challenges posed by artificial intelligence in research ethics.” 2023, https://www.frontiersin.org/articles/10.3389/frai.2023.1148154/full.
- MIT Press. “The Limitations and Ethical Considerations of ChatGPT.” 2024, https://direct.mit.edu/dint/article/6/1/150/119771/The-Limitations-and-Ethical-Considerations-of.
- Harvard Gazette. “Ethical concerns mount as AI takes bigger decision-making role.” 2020, https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/.
- TechTarget. “Generative AI Ethics: 11 Biggest Concerns and Risks.” March 2025, https://www.techtarget.com/searchenterpriseai/feature/Generative-AI-Ethics-11-Biggest-Concerns-and-Risks.
- WIRED. “The Dark Risk of Large Language Models.” 2022, https://www.wired.com/story/dark-risk-large-language-models/.