Importing HTML into InDesign

Importing HTML into InDesign

What to Import?

We have recently been looking at finding a way of importing HTML files easily into InDesign. This certainly is not an easy task because HTML files have been created specifically with an online purpose while InDesign is really intended primarly for print output. For example images that may have a 72 dpi resolution in the <img> tags are not suitable for print. It is primarily text based elements in the HTML  file(including tables) that will be most useful for import.

Some of the Issues

One of the issues with attempting to convert from CSS is that it most often than not does not specify any values, but relies on set defaults. InDesign styles specify values for style attributes in the InDesign interface. The styles for tags can either be set directly or through classes. Any plug-in or script attempting to reproduce the styles would need to be able to reproduce the defaults and then see when they are modified in the CSS code. A much better solution might be to just create a set of styles in an InDesign template with a matching system to match InDesign styles to HTML tags and CSS classes. However it could also be really useful to have some generation functionality to create the Paragraph Styles in InDesign and specifically ensure that we actually always have a match between the tags and styles in the HTML and the InDesign document.

CSS Defaults for Text

This is an example of CSS and InDesign default matching for CSS text based attributes:

CSS PropertyDefault ValueInDesign PropertyInDesign Default Value

Typically the CSS properties will have the following values and these will vary in the ease in which they could be recreated in InDesign:

font-family Any font name (e.g. "Arial", "Helvetica", "Times New Roman") or a list of font names separated by commas (e.g. "Arial, sans-serif"). The browser will use the first available font in the list. The default value is "sans-serif".
font-size Any valid CSS length value (e.g. 10px, 12pt, 2em). The default value is 16px.
color Any valid CSS color value (e.g. red, #ff0000, rgb(255, 0, 0)). The default value is black.
line-height Any valid CSS length value or a percentage of the font size (e.g. 1.5, 150%). The default value is normal, which is equivalent to a line height of 1.2 in most browsers.
text-align left, right, center, justify. The default value is left.
text-transform none, uppercase, lowercase, capitalize. The default value is none.
font-weight normal, bold, bolder, lighter, or a number between 100 and 900 in increments of 100 (e.g. 400, 700). The default value is normal.
font-style normal, italic, oblique. The default value is normal.
text-decoration none, underline, overline, line-through. The default value is none.
letter-spacing Any valid CSS length value (e.g. 1px, 2pt, 0.1em). The default value is normal, which is equivalent to a letter spacing of 0 in most browsers.

However another issue with HTML is that the tags themselves will effectively have default styles

For example this might be what we would expect as the defaults for the <h1> tag:

<h1 style="display: block; font-size: 2em; font-weight: bold; margin-top: 0; margin-bottom: 0.67em; text-align: left;">This is an h1 element</h1>

This could then be overriden (potentially more than once) and also overriden by one or more applied classes in the HTML. For example

<h1 class="heading1">

The quality of the conversion to InDesign likely depends on how well the original HTML design was implemented..

The Best Solution

The easiest solution is probably just to provide a substitution table for tags and classes and not worry about pulling the attributes out of the html at all. It would however be useful to have some kind of report to understand what tags are present in the HTML to be imported.

An Existing Solution

Without developing a plug-in at the moment there is a solution to at least get the text in from an HTML page using the Import XML feature of InDesign. However for this to work the <head> tag section needs to be removed from the html file.


Popular posts from this blog

History of PDF

Illustrator Scripts for changing colors

Considerations for PDF Creation