Wikipedia returns 403 when using sphinx linkcheck to check for broken links

10 hours ago 2

ARTICLE AD BOX

We use sphinx linkcheck to make sure that our docs do not contain broken links, but Wikipedia has started returning a 403, assuming that our CI is a robot. Fair enough, it is a robot, so has anyone come across this problem before, and what is the best solution?

Full error message:

403 Client Error: Too many requests. Please respect our robot policy https://w.wiki/4wJS.

I know we can use linkcheck_ignore for anything on this domain, but that defeats the point of checking for broken links.

Is it possible to use some kind of proxy/cache that would save a copy of the web page? Our docs/CI could check the cache first and save the number of hits going to an external site. Is there an automated way to do this, or would we have to save the pages of interest manually and set our own up? We have already done this for some pages that have disappeared from the internet (thanks to the Wayback Machine for keeping a copy).

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Wikipedia returns 403 when using sphinx linkcheck to check for broken links

ARTICLE AD BOX

Related

Python 3.12 + pywin32 breaks when copied to network share under corporate proxy

Regex - How do I check for digits 01 to 52 in filenames?

no longer getting sms messages via pushbulltets api using python websocket

LEFT SIDEBAR AD