The description of the XML query API needs to be updated.
You are allowed to crawl the dblp website. We do even provide a very simple XML query API which is documented here:
- Michael Ley: DBLP XML Requests . Appendix to a paper from VLDB 2009.
However, since the number of requests from automated crawlers sometimes becomes so high that it threatens the operation of the website, please understand that we have to time-out or block excessive bulk queries temporarily. If your number of requests exceed the certain limit, you will receive an empty document with a HTTP status "429 Too Many Requests" response. In such a case, the response will also carry a "Retry-After" header field specifying the number of seconds to wait until the time-out will be lifted. It might be wise to prepare your scripts for such an answer.
To avoid any problems, please tone down your scripts to a reasonable number of requests per minute. You should always be fine when waiting for at least one or two second between two consecutive requests. Thanks for understanding.
If you know that you need to do a lot of queries to the dblp data stock, we encourage you to please download the whole dblp dataset as a single XML file instead and do your queries locally. The big XML file is in sync with the data on the websites.
Sometimes, we do have a robots.txt in place to limit crawling on some sections of the site. This is never done to hide any of our information from you or your crawlers, but rather to avoid confusing crawlers with duplicate or testing branches of the website. Hence, please respect our robots.txt.