Wikipedia returns 403 when using sphinx linkcheck to check for broken links

10 hours ago 2
ARTICLE AD BOX

We use sphinx linkcheck to make sure that our docs do not contain broken links, but Wikipedia has started returning a 403, assuming that our CI is a robot. Fair enough, it is a robot, so has anyone come across this problem before, and what is the best solution?

Full error message:

403 Client Error: Too many requests. Please respect our robot policy https://w.wiki/4wJS.

I know we can use linkcheck_ignore for anything on this domain, but that defeats the point of checking for broken links.

Is it possible to use some kind of proxy/cache that would save a copy of the web page? Our docs/CI could check the cache first and save the number of hits going to an external site. Is there an automated way to do this, or would we have to save the pages of interest manually and set our own up? We have already done this for some pages that have disappeared from the internet (thanks to the Wayback Machine for keeping a copy).

Read Entire Article