Bug report
Bug description:
I was using robotparser to check some websites and I realized that the there was a strange behavior. I gave it some urls that I knew could be crawled and it return me False. After some debugging I realize that the url was ok but I think there is a problem in the function that is reading the url.
Line 62 of https://github.com/python/cpython/blob/main/Lib/urllib/robotparser.py
f = urllib.request.urlopen(self.url)
returned my an error code 404, but actually the url/robots.txt file was present and I could see it online.
What I did was to add these two lines of code, the error disappear and the function told me the url could be crawled, as I expected. :
header= {'User-Agent': '*'}
req = urllib.request.Request(url=self.url, headers=header)
f = urllib.request.urlopen(req)
Could you give me a feedback on that? Do you think it is correct and could be fixed in the main version?
CPython versions tested on:
3.8, 3.9
Operating systems tested on:
Windows
Bug report
Bug description:
I was using
robotparserto check some websites and I realized that the there was a strange behavior. I gave it some urls that I knew could be crawled and it return meFalse. After some debugging I realize that the url was ok but I think there is a problem in the function that is reading the url.Line 62 of https://github.com/python/cpython/blob/main/Lib/urllib/robotparser.py
returned my an
error code 404, but actually theurl/robots.txtfile was present and I could see it online.What I did was to add these two lines of code, the error disappear and the function told me the url could be crawled, as I expected. :
Could you give me a feedback on that? Do you think it is correct and could be fixed in the main version?
CPython versions tested on:
3.8, 3.9
Operating systems tested on:
Windows