The more you can filter and label at the source, the less you have to work out in VL.
I use alloy (which is kinda heavy) to extract and prepare only the data I want and it works great so far.
Not everything in black and white makes sense.
The more you can filter and label at the source, the less you have to work out in VL.
I use alloy (which is kinda heavy) to extract and prepare only the data I want and it works great so far.


I agree with you on some points here. The problem is that these crawlers are hostile to the point of DDOSing sites
So the problem is not that someone archives your public accessible data, the problem is that in doing so either breaks your site or makes you pay for the excess traffic.
I think the web is now broken beyond repair. The commercialisation killed it and the tech monopolies are all that’s left.
So I think small invite only fully encrypted enclaves are all that is left, until someone comes up with a “new Internet”, that can resist the " Techbros", but for now I don’t see that.
Also I don’t see the Fediverse as a solution, it’s just under the radar for now, but if it gets bigger it will be coöpted and sunk.
What are you using to ship the logs to VL?
If you want to exclude “normal” logs you should start excluding them before they reach VL, so the only logs you have are the interesting ones.