Hacker News new | past | comments | ask | show | jobs | submit login

Yeah I wasn't really planning for this to blow up like it did today. It's currently sitting at about 35% of the index size I usually aim for, so besides the stuff I can't index because it's behind CDNs, there's a lot of pages it just hasn't gotten to yet. playwright.dev is pretty low on the priority list because it has a metric crap-ton of javascript on its front page. The crawler has visited it, looked at it, and put it very far down the priority queue.



Even though some sites have a metric crap-top of js they sometimes render very minimally for certain screen sizes or mobile devices without any of the js crap. Does your crawler pay attention to any of that?


It doesn't look at what the javascript does, just how much there is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: