URL Checking and Validation
URL Checking performs the following functions:
Identifies URLs in the document (regardless of whether they are already Word hyperlinks)
Converts them to clickable hyperlinks, if they are not already clickable, by turning them into Word Hyperlink fields
Automatically corrects common problems with special characters or spaces found in URLs
Checks on the Web to determine whether the link is valid
Valid URLs are not flagged.
URLs that return any HTTP error code other than 200 (i.e., ones with no HTTP error) will have a Word comment inserted and provide an indication of the error encountered in attempting to follow the link.
Example:
A link that points to a nonexistent page will have a Word comment inserted indicating that the server returned an HTTP 404 (“Not Found”) status code:
Comment: URL Validation failed because the page http://www.inera.com/~kdkdkd does not exist (HTTP error 404).
URL Validation will also try to fix some cases of URLs that contain invalid characters or spaces, as sometimes happens during the editing process.
Example:
spaces sometimes appear mid-URL, which would cause a URL to fail in a browser. When possible, URL Validation will remove the incorrect spaces and add a comment such as:
Comment: The URL “http://nlm.nih.gov/ faq.html” has been corrected. One or more spaces were removed.
eXtyles will also silently correct URLs where the only difference is http vs. https. In those cases, after URL Validation has successfully completed a message such as the following will appear:
Valid URLs: URLs that may be followed without redirection to the indicated website
How to use
To use URL Checking:
Run eXtyles up through (at least) Bibliographic Reference Processing.
Select eXtyles > Advanced Processing > URL Checking.
Reviewing comments
Redirected URLs
Experience has shown that the most common warning from URL Checking is that a URL has been redirected to a different final URL. There are many reasons for this, including:
A website has been reorganized and the (wise) webmaster has automatically redirected old URLs to the new pages.
In cases such as this, you may want to update the URL in the document to reflect the most current URL.A “vanity” URL has been used.
In other words, when a URL is redirected, we recommend keeping in the document the URL that will likely have the longest life.
Some URLs are always redirected.
In cases like the above example, the redirect warning can be ignored.
Limitations of URL Checking
URL Checking has the following limitations:
If a hyperlink has no comment after URL Checking, there’s no guarantee that it points to a working page.
Some websites redirect links for now-dead pages to a standard page that indicates a page has moved. For critical cases, there is still no substitute for a manual check of each page.A Word hyperlink with an underlying “mailto” property, indicating that the hyperlink is an email address, will not be validated.
Connection Error Codes
When URL Validation encounters a connection error, it inserts a warning with a corresponding error code:
Connection Failures
Error message returned | Error code |
---|---|
Authoritative: Host not found | ERROR_INTERNET_NAME_NOT_RESOLVED |
Network subsystem failed | ENETDOWN |
No file handles available | EMFILE |
No buffer space available | ENOBUFS |
Handle is not a socket | ENOTSOCK |
Not connected | ENOTCONN |
Address not available | EADDRNOTAVAIL |
Connection aborted | ECONNABORTED |
Connection reset | ERROR_INTERNET_CONNECTION_RESET |
Connection timed out | ETIMEDOUT |
Connection refused | ECONNREFUSED |
Failed to read from connection | EREAD |
Failed to write to connection | EWRITE |
Host down | EHOSDOWN |
Host unreachable | EHOSTUNREACH |
Non-authoritive: host not found or server failure | TRY_AGAIN |
Non-recoverable: refused or not implemented | NO_RECOVERY |
Internal error | EINTERNAL |
Valid name, no data record for type | NO_DATA |
Connection Status Errors
Error message returned | Error code |
---|---|
URL is incorrect | HTTP Error 400 |
Restricted source | HTTP error 403 |
Does not exist | HTTP error 404 |
Does not exist | HTTP error 410 |
Copyright © 2022 Atypon Systems, LLC. All Rights Reserved.