URL Checking and Validation

Screenshot of an Advanced Processing menu with 'URL Check' highlighted. Bibliographic References and Duplicate Reference Check have already been run

Search this space

Available in:

URL Checking performs the following functions:

  • Identifies URLs in the document (regardless of whether they are already Word hyperlinks)

  • Converts them to clickable hyperlinks, if they are not already clickable, by turning them into Word Hyperlink fields

  • Automatically corrects common problems with special characters or spaces found in URLs

  • Checks on the Web to determine whether the link is valid

Valid URLs are not flagged.

URLs that return any HTTP error code other than 200 (i.e., ones with no HTTP error) will have a Word comment inserted and provide an indication of the error encountered in attempting to follow the link.

Example:

A link that points to a nonexistent page will have a Word comment inserted indicating that the server returned an HTTP 404 (“Not Found”) status code:

http://www.inera.com/~kdkdkd

Comment: URL Validation failed because the page http://www.inera.com/~kdkdkd does not exist (HTTP error 404).

URL Validation will also try to fix some cases of URLs that contain invalid characters or spaces, as sometimes happens during the editing process.

Example:

spaces sometimes appear mid-URL, which would cause a URL to fail in a browser. When possible, URL Validation will remove the incorrect spaces and add a comment such as:

Comment: The URL http://nlm.nih.gov/ faq.html” has been corrected. One or more spaces were removed.

eXtyles will also silently correct URLs where the only difference is http vs. https. In those cases, after URL Validation has successfully completed a message such as the following will appear:

 

 

 

 

Valid URLs: URLs that may be followed without redirection to the indicated website

 

 

 

 

 

 

How to use

To use URL Checking:

  1. Run eXtyles up through (at least) Bibliographic Reference Processing.

  2. Select eXtyles > Advanced Processing > URL Checking.

Reviewing comments

Redirected URLs

Experience has shown that the most common warning from URL Checking is that a URL has been redirected to a different final URL. There are many reasons for this, including:

  • A website has been reorganized and the (wise) webmaster has automatically redirected old URLs to the new pages.
    In cases such as this, you may want to update the URL in the document to reflect the most current URL.

  • A “vanity” URL has been used.

In other words, when a URL is redirected, we recommend keeping in the document the URL that will likely have the longest life.

  • Some URLs are always redirected.

In cases like the above example, the redirect warning can be ignored.

 

 

Limitations of URL Checking

URL Checking has the following limitations:

  • If a hyperlink has no comment after URL Checking, there’s no guarantee that it points to a working page.
    Some websites redirect links for now-dead pages to a standard page that indicates a page has moved. For critical cases, there is still no substitute for a manual check of each page.

  • A Word hyperlink with an underlying “mailto” property, indicating that the hyperlink is an email address, will not be validated.

 

Connection Error Codes

When URL Validation encounters a connection error, it inserts a warning with a corresponding error code:

Connection Failures

Error message returned

Error code

Error message returned

Error code

Authoritative: Host not found

ERROR_INTERNET_NAME_NOT_RESOLVED

Network subsystem failed

ENETDOWN

No file handles available

EMFILE

No buffer space available

ENOBUFS

Handle is not a socket

ENOTSOCK

Not connected

ENOTCONN

Address not available

EADDRNOTAVAIL

Connection aborted

ECONNABORTED

Connection reset

ERROR_INTERNET_CONNECTION_RESET

Connection timed out

ETIMEDOUT

Connection refused

ECONNREFUSED

Failed to read from connection

EREAD

Failed to write to connection

EWRITE

Host down

EHOSDOWN

Host unreachable

EHOSTUNREACH

Non-authoritive: host not found or server failure

TRY_AGAIN

Non-recoverable: refused or not implemented

NO_RECOVERY

Internal error

EINTERNAL

Valid name, no data record for type

NO_DATA

Connection Status Errors

Error message returned

Error code

Error message returned

Error code

URL is incorrect

HTTP Error 400

Restricted source

HTTP error 403

Does not exist

HTTP error 404

Does not exist

HTTP error 410

 

Copyright © 2022 Atypon Systems, LLC. All Rights Reserved.