URL Checking

The URL Checking tool performs the following functions:

  • Identifies URLs in the document (regardless of whether they are already Word hyperlinks)
  • Converts them to clickable hyperlinks, if they are not already clickable, by turning them into Word Hyperlink fields
  • Automatically corrects common problems with special characters or spaces found in URLs
  • Checks on the Web to determine whether the link is valid

Valid URLs—URLs that may be followed without redirection to the indicated website—are not flagged. URLs that return any HTTP error code other than 200 (i.e., ones with no HTTP error) will have a Word comment inserted to bring attention to the invalid URL and provide an indication of the error encountered in attempting to follow the link.

For example, a link that points to a nonexistent page will have a Word comment inserted indicating that the server returned an HTTP 404 (“Not Found”) status code:

http://www.inera.com/~kdkdkd

Comment: URL Validation failed because the page http://www.inera.com/~kdkdkd does not exist (HTTP error 404).

URL Validation will also try to fix some cases of URLs that contain invalid characters or spaces, as sometimes happens during the editing process. For example, spaces sometimes appear mid-URL, which would cause a URL to fail in a browser. When possible, URL Validation will remove the incorrect spaces and add a comment such as:

Comment: The URL “http://dtd.nlm.nih.gov/ faq.html” has been corrected. One or more spaces were removed.

Examples of other cleanup cases include replacing en dashes with hyphens, or similar characters with a simple tilde.

eXtyles will also silently correct URLs where the only difference is http vs. https. In those cases, after URL Validation has successfully completed a message such as the following will appear:

On this page

How to use

To use URL Checking:

  • Select Advanced Processing from the eXtyles menu
  • Select URL Checking

URL Checking is best run after Bibliographic References processing because URL conversion to clickable hyperlinks when they’re found in reference lists interferes with eXtyles recognition of electronic references during Bibliographic References processing.



Reviewing comments

Redirected URLs

Experience has shown that the most common warning from URL Checking is that a URL has been redirected to a different final URL. There are many reasons for this, including:

  • A website has been reorganized and the (wise) webmaster has automatically redirected old URLs to the new pages. In cases such as this, you may want to update the URL in the document to reflect the most current URL.
  • A “vanity” URL has been used. For example, some drug manufacturers will register a domain for a drug name, but when you follow the URL to the website of the drug, you actually visit the manufacturer’s landing page. In such cases, you might want to keep the vanity URL for the drug (despite the redirection warning) because the drug manufacturer could be acquired, and the ultimate landing page would change. In other words, when a URL is redirected, we recommend keeping in the document the URL that will likely have the longest life.
  • Some URLs are always redirected. For example, URLs to articles on the New York Times website have a new and unique final URL on every single visit, even though the URL you click is identical. Redirection is performed in this case for the purpose of tracking web visitors. In such cases, the redirect warning can be ignored.

Excluded URLs

URL Validation does not check URLs that link to PubMed and CrossRef in reference lists because these links have likely been added by eXtyles.

Limitations of URL Checking

URL Checking has the following limitations:

  • If a hyperlink has no comment after URL Checking, there’s no guarantee that it points to a working page. Some websites redirect links for now-dead pages to a standard page that indicates a page has moved. For critical cases, there is still no substitute for a manual check of each page.
  • A Word hyperlink with an underlying “mailto” property, indicating that the hyperlink is an email address, will not be validated.

Connection Error Codes

When URL Validation encounters a connection error, it inserts a warning with a corresponding error code:


Connection Failures

Error message returnedError code
Authoritative: Host not foundERROR_INTERNET_NAME_NOT_RESOLVED
Network subsystem failedENETDOWN
No file handles availableEMFILE
No buffer space availableENOBUFS
Handle is not a socketENOTSOCK
Not connectedENOTCONN
Address not availableEADDRNOTAVAIL
Connection abortedECONNABORTED
Connection resetERROR_INTERNET_CONNECTION_RESET
Connection timed outETIMEDOUT
Connection refusedECONNREFUSED
Failed to read from connectionEREAD
Failed to write to connectionEWRITE
Host downEHOSDOWN
Host unreachableEHOSTUNREACH
Non-authoritive: host not found or server failureTRY_AGAIN
Non-recoverable: refused or not implementedNO_RECOVERY
Internal errorEINTERNAL
Valid name, no data record for typeNO_DATA

Connection Status Errors

Error message returnedError code

URL is incorrect

HTTP Error 400
Restricted sourceHTTP error 403
Does not existHTTP error 404
Does not existHTTP error 410