free software resistance
the cost of computing freedom is eternal vigilance
### we-are-already-hastening-the-demise-of-the-web
*originally posted:* sep 2025
> even the anubis dev always said that anubis was always a stopgap solution at best and has many drawbacks
- a friend
for a few days ive been processing about 5500 links, across almost as many websites- on probably hundreds of domains.
thats nothing that even thinks about beginning to approach what an ai scraper or search engine does.
and i probably couldve done this all on day one, if the web was as functional today as it was just a few years ago.
we are destroying the web, the same way that free software has been eroded and put in mostly corporate control. its reasonable to assume that 99% of "free" software today depends on software thats developed on microsoft github, and ALL OF THAT is subject to fuckery involving microsofts so-called ai.
meanwhile, users on nekoweb are fighting against scraping.
please dont misunderstand where im coming from here- when i was against creeping corporate fascism and mass surveillance in the early 2000s, it wasnt really a "pro-terrorist" argument- just like it isnt "pro-terrorism" to want palestine to be free, although some people i STRONGLY disagree with may not see that the way i do.
rather, fighting terrorism has been used as the impetus for a LOT of really terrible ideas, and concentration of power in the executive branch. ive been complaining about that for more than 20 years, and you can see just how much it was abused in 2016 and today in 2025 (although it was also abused in 2008, 2012 and 2020, just not nearly as blatantly or extensively as it is today).
when it comes to the web, the war on ai reminds me of the war on terror, because:
### 1. it really doesnt quite understand how the problem works.
some people do understand of course, and we agree on the problem- one really obvious example is that ai scrapers make websites prohibitively expensive to run that were not prohibitive before. thats a real-life concrete problem largely created by ai scraping, and you wont find me pretending it doesnt exist.
### 2. it wont actually solve the problem, at least it stands to fail in a few ways.
what we are doing to stop scraping is starting to hurt our ability to archive the web. when websites we care about drop offline, theyre either gone forever or theyre mirrored or archived somewhere else.
i know people making personal websites might hate the idea of someone mirroring them, on the other hand one of the reasons we even have a history of the web is we were able to archive so much of geocities when yahoo let us all down. when it comes to YOUR personal website, everyones against it. when it comes to EVERYONES personal websites, we are generally for it.
but if we make it POSSIBLE to archive websites, we largely make it possible for other people to GET our STUFF. and while you might be against that in the short run, in the long run we are talking about an idea like-
lets say we actually made it impossible to bootleg and share commercial music. no no, lets say we ACTUALLY made it possible FOR REAL this time. that will probably never happen- but IF it did, the next time theres a vault fire like the one at universal studios, we could lose decades of music.
what makes it safe? the fact that so many COPIES exist that arent in a single location. the more redundancy there is, the longer the music lasts.
the point of the web was to remove single points of failure- granted, that was regarding communication and transmission, rather than storage. but as the web became accessible to billions, redundancy of storage also became an issue.
we either have the redundancy, and ai can scrape (granted, from other locations that can theoretically manage the bandwidth better than we can) or ai cant scrape it anywhere and we dont have redundancy.
we also make it so that it becomes a bigger than ever pain in the arse to mirror anything, which i really hate right now. the worse we make this for legitimate uses, the less those legitimate uses will ever happen.
obviously we can debate for years what is and isnt legitimate use, like we do with patents and copyright. to cut around that ill make up my own definition:
legitimate uses are the ones that given years of HINDSIGHT, we are GLAD someone was able to accomplish doing- whether we made it easy or nearly impossible to do.
### 3. it makes things worse for people than it does for bots.
how many innnocent people have died or been tortured for the "lofty" goals of the war on terrorism? could we have found a better way? because many alternatives have been proposed.
i CANT STAND what the web is turning into. the people who created gemini are heroes (ive used gopher before, even created software for it) because if the web becomes too stupid to bother with, we are building a MODEST alternative.
i dont even like creating websites anymore. so long ago they stopped making it fun (in my opinion). yes its incredibly cute when people make new websites- im certainly not on THIS host if i didnt think so. and its not like ive never used features that arent available on gemini, you know? i made a programming language based on javascript:
=> quasi/quasi41.html
but the whole point of the web is to SHARE INFORMATION, not HOARD it. the web is TERRIBLE at hoarding data, and here we are in 2025 trying to rebuild the web to do the PRECISE FUCKING OPPOSITE of what it was created to do!
of those more than 5500 urls, only between 200 and 300 were impossible to access using automation (NOTE i am mirroring, i am neither using an ai scraper nor do i plan to use this data with any sort of machine learning software- im looking for very concrete things like LICENSE data).
if i had to access each one WITHOUT automation, this collection of publicly-available data (IM making it publicly available, it was also already publicly available, it is also centred around / documenting freely licensed information) wouldnt even be available in the form i make it available in now. it has already taken years of work- of course i dont work on it all the time.
but one of the most difficult websites to get RELIABLY with gnu wget was the gnu website itself? thats hilarious. im pretty sure you can find a way to prove its reliable doing it another way, thats nice for you. i have my requirements and with a cross section of thousands of urls, the gnu.org urls (website front pages mind you- or the equivalent for projects like- wget) are the ones that require me to do the MOST babysitting and rehashing and repeated attempts while i try to get the data.
this is a VERY modest amount of data, too. ONLY 5000 urls? what if i wanted to run (or create) a search engine?
or put another way, will mirroring in the future REQUIRE the effort of building your own search engine?
theres a lot of ways i see nekoweb having more potential than neocities. but neocities doesnt care about ai scrapers. i happen to flit about the web A LOT, and i have so much of my time wasted by anubis every day, which bots are already bypassing:
=> https://www.theregister.com/2025/08/15/codeberg_beset_by_ai_bots/
my point isnt "give up and let bots win", my point is we are doing all of this wrong. it largely doesnt work, its hurting users, its even hurting the long term viability of the web (unless you actually like MOST websites to really be completely gone when they go offline).
being passionate about ai bots scraping your personal website is one way everybody at home can join the war, and every war likes to give people the impression that support at home means is just as important as how things are at the front. heck, theres even some truth to that- consider what ukraine is going through right now.
but although there are websites on nekoweb which even block the internet archive to prevent scraping- and if you made it this far, the project im working on this week (and for years now) isnt about nekoweb, its about free software- SOMEDAY nekoweb will probably go the way of geocities (it would be cool if it doesnt, sure) and while you may not want to be mirrored, it would be just as nice if theres a comprehensive-as-possible history of nekoweb somewhere.
the internet archive is actually getting harder to use, because more and more page archives (including of nekoweb incidentally) are actually archives of challenge pages similar to anubis.
youre actually doing it- youre making it so the web cant be accessed long term. and this isnt just personal websites, its websites about freely-licensed software.
so you click "back" a few times on the archive to view earlier versions, but it just jumps back to a more recent one until you fiddle with the url to go back a year OR TWO and find a page that saved before all these challenge pages from clownflare, anubis and several others started getting saved instead.
its certainly an exaggeration to say nothing but challenge pages are getting archived- which isnt what i said anyway. i doubt 5000 pages is a good statistical sample but it certainly gives me an impression of what the archive is gradually turning into.
tasks that used to be instantaneous (when done manually in a browser) are becoming a long drawn out affair, just to look up a single piece of data like- what license does this package use? and im well aware thank you, that i can just get that data for an actual package from the repo database. yes. only this database includes information from webpages, its not based on package repos- which arent primary sources from the authors of the software.
its going to take a lot more to find a cleverer way to solve the problems we want to solve. doing this in a way that actually makes people happy in the long run and doesnt cut off our nose just to keep someone from taking a picture of it requires balance, and at first glance challenge pages LOOK like a reasonable compromise.
but the overall effect created when everyone fights scraping the way theyre doing it now means that if you want to get the most of what the web has always offered, youll ultimately have to recreate that using another protocol, using different hosts, and building a new internet archive.
we need to be a little more sceptical about the solutions we are hacking together- the web has never been this broken, and its not improving. 2024 and 2025 are imo the worst years the web ever had. ive been on this thing since the 90s. its getting ridiculous.
i totally appreciate that you want to preserve the AESTHETIC of the 90s internet- and maybe give it your own personal modern twist.
what i really loved about the internet 25 years ago- and even 10 or 15 years ago, is the WHOLE THING worked A LOT better than THIS.
im sure there are people trying very hard to make it BETTER too- but overall, it isnt. we have so much to do if we want to keep the internet working as well as it once did.
im not optimistic. but i didnt write this to tell you theres no point- even if it might already be too late- i mean, if we consider global warming the biggest threat to the internet, by extension of the level of threat it poses to humanity- we have bigger problems there than the ones that face the web. and i hope we manage to fix that!
i hate challenge pages. i despise them. and theyre already failing to do the one thing we put up with them for.
in the long run, if this keeps up theres still gemini. but i think we can save the web too- we might find a way! but the sooner we stop breaking it, the better.
finally, i waited to make every other point before saying that- yes, this is largely the fault of the arseholes running ai scrapers.
yep. i got that part. i know some people are doing their best to get by. you want to make a web that boycotts shit created with llms? i would be supportive of that effort. not that i think boycotting alone will do it of course, but i think it would help a lot.
again, i direct your attention to the war on terror. so much of it is non-workable, so much of it is unacceptable. i dont mind waiting extra time at the terminal, but treating everyone like theyre a bot isnt fun- just like its no fun (nor is it usually necessary) when everyone is treated like a criminal.
please, even if you dont know any other way- try to support people when they try to think of a better idea. support the idea of the web having a future.
license: 0-clause bsd
```
# 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
```
=> https://freesoftwareresistance.neocities.org