Cleanup

Use the Cleanup function to automatically remove unnecessary white space and Word formatting attributes that are not needed for eXtyles processing.

Select Cleanup from the eXtyles menu, and the Cleanup dialog (shown below) will open.

When the Cleanup dialog appears, you will see that default cleanup options have already been checked. You can always check the boxes of additional operations you want to include in the process, or uncheck those steps that you do not want to run on the document. Click OK to run Cleanup.

The eXtyles Auto-Style Regular Body Paragraphs option in the Cleanup dialog can be set to automatically apply your organization’s custom body paragraph style to all main body paragraphs in the Word file, dramatically reducing the time spent manually applying paragraph styles to the manuscript.

The following Normalize and Auto-Style Cleanup options are performed on specific elements of your content.

In this section



Normalize Document

OperationDescription
Style Whole Document with Default StyleAll paragraph styles are converted to the Default Style (the Word style “Normal,” unless a specific override is a part of your configuration).

Auto-Style

OperationDescription
Regular Body Paragraphs with StyleAll regular body paragraphs are converted to the style selected in the drop-down box. Local italic and boldface markup is retained.
TablesTables created with the Word Table Editor are automatically styled with Table Head, Table Body, and Table Footnote paragraph styles. Table titles are automatically styled even if they are outside of the table.
ReferencesNumbered references, or unnumbered references that follow a heading such as “References,” are automatically styled as Reference paragraphs.

Auto-Style Regular Body Paragraphs Criteria

The eXtyles Auto-Style Regular Body Paragraphs with Style option in the Cleanup dialog can be set to automatically apply your organization’s custom body paragraph style to all regular body paragraphs in the Word file, dramatically reducing the time spent manually applying paragraph styles to the manuscript.

The style will be applied to left-aligned paragraphs that start with an upper case letter unless one of the follow criteria is met:

  • The paragraph is styled as MTDisplayEquation (i.e., a MathType Display or Right-numbered equation), Footnote Text, or Endnote Text. To configure this criterion, please contact your eXtyles representative.
  • The paragraph is not a Body Text level paragraph in Word’s Outline view.
  • The paragraph begins with a left indent.
  • The entire paragraph is in the selected font for Preformatted text (e.g., Courier—this font can be preset in your configuration or can be selected from the Cleanup dialog) or in the font of Word’s HTML Preformatted style. To configure this criterion, please contact your eXtyles representative.
  • The entire paragraph is styled with an ALL CAPS or small caps font or character style.
  • The paragraph is shorter than the default 60 characters and ends with a period.
  • The paragraph is manually entered in all capital letters and is shorter than the default 60 characters. To configure this criterion, please contact your eXtyles representative.
  • The paragraph starts with an inline shape (e.g., an embedded graphic).
  • The paragraph starts with a word that suggests a table or figure caption (e.g., “Table,” “Figure,” “Fig.,” “Plate”), followed by punctuation (period, colon, em or en dash, or hyphen), a table or figure number followed by punctuation, or the end of the paragraph. The lists of caption words include words for “figure” and “table” in a number of European languages; to configure these lists, please contact your eXtyles representative.
  • The first character of the paragraph is bold, italic, underline, superscript, raised, or lowered.
  • The first character of the paragraph is a digit.
  • The first character of the paragraph is a special character in a special character font (e.g., Symbol). Special characters include but are not limited to: opening parentheses, square brackets, angle brackets and braces, asterisk, daggers, raised dot, dashes, and bullets.
  • The first character of the paragraph is an opening parenthesis, square bracket, angle bracket or brace, asterisk, raised dot, or dash, and is not in a special character font.

White Space (Whole Document) Settings

The following White Space Cleanup options are performed on the entire document.

OperationDescription
Remove Section BreaksWord’s section breaks are removed.
Remove Page BreaksWord’s page breaks are removed.
Remove Column BreaksWord’s column breaks are removed.
Remove Space Between a Number and %Spaces between a number and the % sign are removed.

White Space (Font-Sensitive) Settings

The following White Space Cleanup options may be set to exclude text in a specified font. Please see the description of this feature following the table for more information.

OperationDescription
Convert Tabs to Spaces*Each tab character is converted to a single space.
Remove Multiple SpacesMultiple spaces are converted to a single space. Note that this step is performed after converting tabs to spaces, so all extra white space is removed.
Remove Start and End Paragraph SpacesLeading and trailing spaces in paragraphs are removed. Note that this step is performed after converting tabs to spaces, so all extra white space is removed. Non-breaking spaces and soft returns are also removed.
Remove Blank ParagraphsBlank paragraphs are removed.

*Tabs are required for the proper XML export of some elements (e.g., between a term and definition in an abbreviations list). Because of this, the USGS default cleanup setting is to retain all tabs in the document. However, if during your visual scan of the document you notice that there is no term and definition content in the document (e.g., Abbreviations, DMU), you may choose to have eXtyles Cleanup remove tabs by selecting this option.

White Space Cleanup: Excluding Paragraphs in a Specified Font

In some documents, you might wish to run white space–related Cleanup features but exclude certain paragraphs from this type of cleanup. For instance, programming code often has intentional blank lines and intentional leading spaces that should not be removed during Cleanup, although the rest of the document should have no blank lines. The same is true for verse and poetry, in which white space is often intentional and should be retained.

So that users don’t have to forgo use of the white space Cleanup features entirely, eXtyles allows users to specify that paragraphs in a particular font should not be affected by some of the white space–related Cleanup options. That is, you can set computer code in Courier and specify that paragraphs in Courier should not be affected by white space cleanup operations; you can set verse in a font such as Garamond and specify that paragraphs in Garamond should not be affected by white space cleanup operations.

You can protect additional text by selecting a specific font to exclude from white-space cleanup.

The ability to exclude paragraphs from white space–related Cleanup features is optional. If you prefer, you may still run all Cleanup features on all paragraphs in your documents by leaving the Exclude Text in Font checkbox unchecked.

The following subset of white space Cleanup options may (optionally) be set to not run on paragraphs in a specified font:

  • Convert Tabs to Spaces
  • Remove Multiple Spaces
  • Remove Start and End Paragraph Spaces
  • Remove Blank Paragraphs

To use this feature, check your document for elements that eXtyles should skip when running these options. After Activation, eXtyles will automatically protect paragraphs that are in a constant-width font or that are correctly tagged with a paragraph style that is set in a constant-width font. You should check any verse/poetry paragraphs to make sure they’re in a font that isn’t used elsewhere in the document and apply such a font if necessary.

In addition, the paragraphs in question must be entirely in the excluded font; paragraphs in a mix of fonts will not be excluded from these Cleanup operations. For instance, if you have programming code paragraphs that contain both Courier and Courier New, select the entire code block and change the font to either Courier or Courier New. Then specify that font as the excluded font in the Cleanup dialog. Similarly, set the font of any verse or poetry sections that should not be affected by these options to a font that doesn’t appear elsewhere in the document, and then specify the chosen font as the excluded font in the Cleanup dialog.

eXtyles will then not apply the white space–related Cleanup functions to paragraphs in the specified font, which means that tabs, extra spaces, leading and trailing spaces, and blank paragraphs will be retained for paragraphs that are in the specified font.

Authors sometimes apply a code (monospace) font instead of using a monospaced paragraph style. eXtyles protects these paragraphs from white space Cleanup as well.

During Activation, eXtyles checks the document for paragraphs that are in a monospace font but not a monospace-font paragraph style. eXtyles then applies a monospace-font paragraph style (HTML Preformatted) to exclude these paragraphs from the subset of white space operations during Cleanup. The HTML Preformatted paragraphs can then be styled as usual during the Style Paragraphs stage.

By using the HTML Preformatted paragraph style, eXtyles Activation protects from white space cleanup paragraphs that are in a monospace font.

When you use this feature and exclude paragraphs in the specified font, the Cleanup features Normalize Document and Auto-Style will also automatically skip paragraphs in the specified font, so that code or verse or otherwise protected paragraphs will not be tagged with your base paragraph style.

Typographic Settings

OperationDescription
Remove Optional HyphensDiscretionary hyphens are removed.
Soft ReturnsSoft returns—that is, line breaks inserted with Shift + Enter—are converted to spaces or hard returns.

Character Style to Face Markup Conversion

OperationDescription
Built-in StylesConverts Word’s built-in character styles, such as Emphasis and Strong, to plain face markup. This conversion may exclude the Word character styles for “Hyperlink,” “Footnote Reference,” “Endnote Reference,” and “Comment Reference,” depending on your configuration.
User-Defined StylesConverts cases of user-defined character styles to plain face markup. This conversion is most important in preparing the document for advanced processing functions.

Auto Text to Plain Text

OperationDescription
Word FieldsWord fields are converted to text. In particular, fields generated by add-in products such as EndNote are automatically converted to plain text.
Auto-Numbered Headings and ListNumbers generated with Word’s auto-numbering feature in headers and in numbered lists are converted to plain text.

Comments, Bookmarks, and Hidden Text

OperationDescription
Remove Word CommentsRemove Word comments in the file. The default is to remove all comments. You can also remove comments from specific reviewers by selecting the reviewer from the drop-down list.
Remove Word BookmarksRemove Word bookmarks from the file. We recommend selecting this option to avoid conflicts with eXtyles Advanced Processes.
Hidden TextWarns when hidden text is present in the document, and reveals it in one of two ways: (1) Flag with Comments (default), (2) Make Not Hidden, or (3) Leave As-Is.

Graphics

OperationDescription
Leave Graphics/Remove Graphics/Export and Remove GraphicseXtyles Cleanup can remove all graphics from your manuscript, allowing eXtyles functions to work faster. The default setting is to leave graphics as they are (“Leave Graphics”), or you can select “Remove Graphics” and “Export and Remove Graphics.” If you choose to export graphics, they will be placed in a subfolder of the document and named according to the document. Equations and Excel worksheets are not removed when this option is selected. Advanced configuration capabilities for this option are available.
Graphics replacement text boxWhen you choose to remove graphics, they are replaced with a text string to identify where they were located and to identify to which exported file they correlate. The default text is “[INSERT FIGURE %Z]” where “%Z” becomes “001,” “002,” and so on, as sequential graphics are removed. This text may be altered to meet your specific needs. Note: This option is disabled unless you select “Remove Graphics” or “Export and Remove Graphics.”

Tables

OperationDescription
Remove BordersRemoves borders on all cells and on the entire table. Unchecked by default.
Center ColumnsCenters text in all table columns. Unchecked by default.
Remove ShadingRemoves all shading in tables. Unchecked by default.
Add Top/Bottom BorderAdds an x-pt. rule to top and bottom of all tables. Unchecked by default.
Left Justify First ColumnLeft-justifies text in first column of all tables. Unchecked by default.
Insert Space in Empty CellInserts a single non-breaking space in empty tables cells. Checked by default.
Add Header BorderRemoves any specific width settings and applies auto-fit to all table cells. Unchecked by default.
Remove 1st-Line IndentUnindents the first line of all table cells. Unchecked by default.
Auto-Fit ContentsApplies auto-fit to all table cells. Unchecked by default.

Extract Equation Numbers

Starting in May 2020, eXtyles can locate equation numbers that have been added inside Word Equation Builder objects and move them outside of the math object. This is necessary for Citation Matching to work correctly (i.e., the correct application of the cite_eq character style to citations of equations), because eXtyles will not detect the equation number if it is inside the Equation Builder math. Only equation numbers placed after the equation, and which are the last content in the paragraph, will be moved by this functionality.

Detable Non-Table Content

Occasionally content in your document may be incorrectly formatted in a Word table. For example, the author may have used table cells to achieve white space between an equation and its number:

This content should not remain in table cells and eXtyles Cleanup can automatically “detable” such content.

For all of the following functions, content will not be detabled if:

  1. The table is preceded by a paragraph that eXtyles identifies as a table title
  2. The table is followed by a table title that is not itself followed by another table
  3. The first row of the table contains a table title

Single-Row Tables

This Cleanup option finds all tables that are just one row and converts them to either tab-separated text, if the contents of the table looks like an equation, or otherwise paragraph-separated text.

Equations

Finds all tables that have content in just two columns (discounting empty cells), with an equation (i.e., a, Equation 3.0 or MathType object or a Word Equation Builder object) in one cell and an equation number in the other cell, regardless of the number of rows. These tables are then converted to tab-separated text.

Figures

Finds all tables that contain only (1) images, (2) figure captions, or (3) panel captions or panel labels. If the table contains more than one figure caption, then the text is separated as paragraph-separated text; otherwise the table is converted to tab-separated text.

References

Finds all tables that are preceded by a reference list title and contain no more than two columns of content, and converts them to tab-separated text. This option is selected by default on the Cleanup dialog.

Additional Cleanup Dialog Features

The Cleanup dialog contains three buttons, located along the right-hand side of the dialog box, to allow rapid setting selections:

  • Set All: Select all options in the dialog.
  • Clear All: Clear all options in the dialog. Click this button first if you want to run just a single operation, such as removing comments from a specific reviewer only.
  • Reset: Set the default check boxes of the dialog to the default choices.