Search This Blog

Friday, 23 September 2022

Why Plugins Matter?

Plugging Plug-ins – Why Third-Party Software Matters 

 Any professional racing driver will tell you that there’s no such thing as too much power.  Give them a new, 1000-horsepower engine and after 5 laps, they’ll pull into the pits and say:  “Great, but can you give me 1100bhp?”   It’s just the same with software – especially software that’s as versatile as Adobe Acrobat and CC products such as Adobe InDesign, Adobe Illustrator and Adobe Photoshop.  

No matter how powerful, flexible or easy-to-use the application, as soon as users get to grips with it, they’ll find it doesn’t quite do exactly what they want it to.  Or they’ll want it to be just that bit easier to do a certain function or perhaps be able to batch functions together.   This isn’t greed, or customers being niggly – on the contrary, it’s actually a compliment that the original application is proving useful.  It simply underlines that there’s no such thing as the perfect program.   

Users will invariably never tell you exactly what they want from the outset.  What they will tell you is the improvements they want on top of what they already have.  And this is where third-party developers and their plug-in products enter the picture.   These developers usually start by being close to the user community for a given application – typically in chat forums and the like.  Every so often, users will ask the question:  “how can I do this?” or “is there something that can help me do that?”  If the question is asked more than once, perhaps there is a gap in the market. So if a handful of people ask for the ability to be able to mask off an area of a PDF document, to suppress a logo or some kind of sensitive information that they don’t want others to see, there’s an opportunity to fulfil an emerging need. It’s this kind of situation that saw the founding of Mapsoft.  By staying close to the Adobe user community, the company has steadily expanded its range of plug-in products to fulfil functional needs and niches of user groups. For example, its solutions’ family now encompasses Impress Pro, a plug-in that enables users to add multiple text stamps to a document either as a watermark below existing text, or above existing text as headers and footers.  Other solutions include MaskIt, which lets users cover up certain content on a document that may be confidential or commercially sensitive, and DogEars – a useful publishing tool which lets users mark pages of interest in a document so they can quickly refer back to them, just like a physical bookmark. TOCBuilder allows the creation of a printable and linked Table of contents at the start of the PDF document.

So what should you look for in a third-party developer? 

Firstly, does the developer have the endorsement of the main vendor’s partner programme?  This should be considered essential. Mapsoft for example is an Adobe Business Partner and has been developing plug-ins for Adobe products for over 30 years. Second, how closely tied to the main vendor’s products are the third-party developers’ products?  Ask the question.  They should license and use the main vendor’s core technology in developing plug-ins, to ensure reliability and glitch-free use.  Mapsoft license and use Adobe’s own core technology in developing plug-ins and customised products, to ensure reliability and glitchfree use. Third, can the product be evaluated before buying to ensure it does what users want?  If this isn’t possible, it can undermine the whole reason for buying the plug-in.  Any developer that has confidence in its solutions should offer evaluation versions as a matter of course.  (Free evaluation versions of all Mapsoft’s plug-ins are available from the Mapsoft web site.) Finally, can the developer offer user references and ongoing support for the plug-in?  This is the acid test to prove that the third-party developer is in it for the long haul.  Look for experience and long-term commitment to the sector as 2 evidence of the developer’s credentials.  With over 30 years experience in the sector, working with a number of high profile companies such as Network Rail, Xerox and Hallmark Cards, Mapsoft has proven expertise and commitment to deliver high quality products. And all of Mapsoft’s software solutions come with one year’s free support as standard. By bearing these points in mind, you can be sure of getting the most effective and reliable plug-ins – plug-ins which will enhance the experience of the main application, speed up routine tasks and add valuable extra features and functionality.

https://www.mapsoft.com

 

Friday, 12 March 2021

Is PDF accessible?

Overview

The Portable Document Format (PDF) is a file format developed by Adobe Systems. PDF makes it possible to distribute documents with original formatting intact. PDF files are created by scanning an original print document or by using a variety of popular software applications. 

Accessibility

The popularity of PDF has created concerns about accessibility, particularly for users of screen readers and for those who have low vision. While Adobe has taken steps to permit access to those who use screen readers, it is essential that documents be correctly marked up (commonly referred to as “tagged”) so that screen readers have the information they need to identify items such as headings and alt text for images. Tables must also be marked up so that screen reader users can navigate them and clearly understand the association of data with appropriate column and row names.

Tagged PDF

Few authors are currently creating tagged PDF files, either because this requires additional effort or because of lack of awareness. Authors are also limited by the capabilities of their word processing or desktop publishing tools, many of which have PDF export capabilities that do not currently support tagged PDF. Microsoft Office, particularly with its most recent versions, does provide good PDF exporting, assuming that appropriate styles are used when first creating a document in Word.

Available Documentation

Adobe provides accessibility documentation at adobe.com/accessibility. Among other resources available from this site, Adobe has developed a variety of Acrobat accessibility training resources that describe in detail the process of creating accessible PDF documents using Word, InDesign, and Acrobat. 

Support In Operating Systems

PDF accessibility also requires support from operating system and assistive technology developers. In Microsoft Windows, both JAWS and NVDA support tagged PDF. However, there is currently no support for tagged PDF in other operating systems.

Is PDF the Correct Choice of Format

Despite advances in accessibility, many users and advocacy groups continue to recommend that PDF documents be accompanied, or replaced, by alternative format documents that are more universally accessible, such as HTML. PDF unfortunately is still not indexed as well as HTML and so if content is to be used for SEO then it is often converted to HTML. 

https://www.mapsoft.com

Adobe PDF Base-14 Fonts

A number of fonts are included with Adobe Acrobat and therefore don't need to be embedded in PDF files. In our products Impress, Impress Pro and TOCBuilder these fonts are marked in the font lists in Red:

  • 4 font sets in the Helvetica family: normal, bold, and bold italic, with any size. XSL-FO "sans-serif" font family is normally mapped to "Helvetica".
  • 4 font sets in the Times family: normal, bold, and bold italic, with any size XSL-FO "serif" font family is normally mapped to "Times".
  • 4 font sets in the Courier family: normal, bold, and bold italic, with any size. XSL-FO "monospace" font family is normally mapped to "Courier".
  • 1 font sets in the Symbol family: normal, with any size. "Symbol" is normally used for Greek alphabets and some symbols like: Ω, φ, ≠, ©.
  • 1 font sets in the ZapfDingbats family: normal, with any size. "ZapfDingbats" is normally used for Zapf dingbats like: ✌, ✍, ❀, ☺.

Camelot Project - the Precursor to PDF and Acrobat

 The Camelot Project

 J. Warnock

This document describes the base technology and ideas behind the project named “Camelot.” This project’s goal is to solve a fundamental problem that confronts today’s companies. The problem is concerned with our ability to communicate visual material between different computer applications and systems. The specific problem is that most programs print to a wide range of printers, but there is no universal way to communicate and view this printed information electronically. The popularity of FAX machines has given us a way to send images around to produce remote paper, but the lack of quality, the high communication bandwidth and the device specific nature of FAX has made the solution less than desirable. What industries badly need is a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks. These documents should be viewable on any display and should be printable on any modern printers. If this problem can be solved, then the fundamental way people work will change. 

The invention of the PostScript language has gone a long way to solving this problem. PostScript is a device independent page description language. Adobe’s PostScript interpreter has been implemented on over 100 commercially available printer products. These printer products include color machines, high resolution machines, high speed machines and low-cost machines. Over 4000 applications output their printed material to PostScript machines. This support for PostScript as a standard make the PostScript solution a candidate for this electronic document interchange. 

Within the PostScript and Display PostScript context the “view and print anywhere” problem has been implemented and solved. Since most applications have PostScript print drivers, documents from a wide variety of applications can be viewed from operating systems that use Display PostScript. PostScript files can be shipped around communication networks and printed remotely. “Encapsulated PostScript” is a type of PostScript file that can be used by many applications to include a PostScript image as part of a page the application builds. 

The reason the Display PostScript and PostScript solutions are not a total solution in today’s world is that this solution requires powerful desktop machines and PostScript printers. The Display PostScript and PostScript solutions are the correct long-term solution as the power of machines increases over time, but this solution offers little help for the vast majority of today’s users with today’s machines. 

The Camelot Project is an attempt to define technologies and products that will give the value that Display PostScript and PostScript delivers to the vast number of installed machines that exists today. For the purposes of this discussion these machines include 640K Intel 286/386/486 machines (PC compatibles), Apple Macintosh machines, mainframes, and workstations. The displays must include CGA, EGA, VGA and any other higher resolution or color displays supported by the above machines.

Our vision for Camelot is to provide a collection of utilities, applications, and system software so that a corporation can effectively capture documents from any application, send electronic versions of these documents anywhere, and view and print these documents on any machines. 

There are at least two technical approaches to the Camelot project. Both solutions depend on the PostScript technology. One approach is to try to make Display PostScript and PostScript implementations smaller and faster so that they can run on the vast majority of today’s machines. This approach has been tried and is extremely difficult. 

A second approach is to divide the problem into smaller problems. This approach would allow each piece to run independently on the smaller machines while achieving acceptable performance and a solution for the complete problem. This latter approach requires that the problem be divided in a way that is natural for users, and provides a solution for every user. An approach to the Camelot project will now be described that will divide the problem into smaller pieces. This solution depends on a unique property of the PostScript language. 

PostScript, as an interpretive language, has some properties that other interpretive languages do not have. In particular, the semantics of operators is not fixed. Operators can be redefined to have any desired behavior. This property of PostScript allows the execution of a PostScript file to have side effects that are very different from the normal printing of a page. An example might be instructive. Suppose a PostScript file draws 10 sided polygon with the following PostScript procedure: 

/poly 

    {1 0 moveto 

        /ang 36 def 

        10 {ang cos ang sin lineto 

         /ang ang 36 add def 

     }repeat 

 }def 

This procedure will build a path that is a ten sided polygon. In this procedure the verbs: “moveto” and “lineto” have the standard semantics of building a PostScript path within the PostScript Language. 

By redefining “moveto” and “lineto” very different things can happen. For example, if these operators are defined as follows: 

/moveto 

    {exch writenumber writenumber (moveto) writestring}def 

/lineto 

    {exch writenumber writenumber (lineto) writestring}def 

then when the “poly” procedure is executed a file is written that has the following contents: 

1.0 0.0 moveto 

0.809 0.588 

lineto 0.309 0.951 

lineto -0.309 0.951 

lineto -0.809 0.588 

lineto -1.0 0.0 

lineto -0.809 -0.588 

lineto -0.309 -0.951 

lineto 0.309 -0.951 

lineto 0.809 -0.588 

lineto 1.0 0.0 

lineto In this example the new redefined “moveto” and “lineto” definitions don’t build a path. Instead they write out the coordinates they have been given and then write out the names of their own operations. The resulting file that is written by these new definitions draws the same polygon as the original file but only uses the “moveto” and “lineto” operators. Here, the execution of the PostScript file has allowed a derivative file to be generated. In some sense this derivative file is simpler and uses fewer operators than the original PostScript file but has the same net effect. We will call this operation of processing one PostScript file into another form of PostScript file “rebinding.“ 

The above example illustrates a capability of the PostScript language that is not frequently used. This “rebinding” of the language, however, is extremely valuable. The Camelot project depends on variations on this idea. 

The approach we will take with Camelot is to define a new language of operators and conventions. For the purposes of this discussion we will call this language “Interchange PostScript” or IPS. IPS will primarily contain the graphics and imaging operators of PostScript. The language will be defined so that any IPS file is a valid PostScript file. The file will have the appropriate baggage so that it is a valid EPS file. IPS files will print on PostScript printer and will be able to be used by applications that accept EPS files. IPS will also be structured so that the complete PostScript parser is not necessary to read any file written in IPS. IPS will have an adequate set of operators so that any practical document expressed in PostScript can be represented in IPS. There will be situations in IPS where the IPS file cannot represent visual situations that can be theoretically generated in PostScript. However we believe these situations are extremely rare, and all practical application documents can be represented efficiently in IPS. The right way to think about IPS is as it relates to English. No person in the world knows every English word, but a small subset of the English words, and certain usage patterns enable people to consistently communicate. 

Once we have defined IPS, we will build a version of the PostScript interpreter (IPS binder) that will read any PostScript file and rebind that file into an IPS file. The IPS binder can be quite small in that it does not need the graphics, font or device machinery contained in full PostScript interpreter. Another function of the IPS binder will be to include reconstituted fonts into the IPS file. The idea here is to include just the characters of a font that are actually used in the document. A result of including the necessary characters from the fonts used is that an IPS file will be completely self contained. In other words, when I send a file around the country, I don’t have to worry about whether the receiving location has all the fonts required by the document. The current situation is that complex font substitution schemes are used to deal with locations not having the appropriate fonts. 

Once IPS is defined and the IPS binder implemented, then users can capture any PostScript file emitted by a PostScript driver, and convert that file to a self contained IPS file. This file can be shipped anywhere around the network and printed on any PostScript machine (management utilities will be written to ease this printing process.) 

In addition to the IPS binder, a viewer and browser will be written that will read IPS files, and render those files on displays or to dumb raster printers. It is believed that IPS interpreters can be substantially simpler, and smaller than full PostScript interpreters. It is also believed that an IPS interpreter can have acceptable performance on small machines. The real hope is to make the IPS viewer and browser small enough so that it can co-exist with other applications. It is interesting to think about what those applications can be. 

One obvious application for the IPS viewer is in its use in electronic mail systems. Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed. Large centrally maintained databases of documents could be accessed remotely and selectively printed remotely. This would save millions of dollars in document inventory costs. 

Specific large visual data bases like the value-line stock charts, encyclopedias, atlases, Military maps, Service Manuals, Time-Life Books etc. could be shipped on CD-ROM’s with a viewer. This would allow full publication (text, graphics, images and all) to be viewed and printed across a very large base of machines. 

Imagine if the IPS viewer is also equipped with text searching capabilities. In this case the user could find all documents that contain a certain word or phrase, and then view that word or phrase in context within the document. Entire libraries could be archived in electronic form, and since IPS files are self-contained, would be printable at any location. 

One of the central requirements of the Camelot Project is that the IPS file format is device independent. This is essential because it is necessary to be able to print the documents on color or black and white machines — on low or high resolution machines. This requirement is also essential in order to visualize the documents at various magnifications on the screen. For example, it is imperative that the user be able to magnify portions of complex maps, so that subportions of the image are easy to read even on low resolution displays. 

To accomplish the above requirement it is necessary that consistent font rendering machinery be available to the viewer. For this reason the viewers will need to contain the full ATM implementations as part of each system. 

In considering all the requirements of corporations regarding documents, it is important to structure Camelot components so that they can be sold in ways that are useful to the corporations. Several ideas have come to mind. 

Components of Camelot are generally not interesting to single users. The exception to this is in the distribution of large generally useful databases. If someone produced a CD-ROM with “maps of the world” on it, then one can imagine selling a retail package with one viewer and the CD-ROM. 

In most other applications, the distribution of information is to many people. In these latter cases a corporation would like a copy of the viewer for every PC. One can imagine viewers integrated into mail systems, or as general stand-alone browsing systems. In any event corporations should be interested in site-licensing arrangements. (more to come)

History of PDF


A Short History of PDF

Adobe Systems made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in desktop publishing workflows and the first PDF Export was created for PageMaker 5 by Mapsoft. PDF competed with a variety of formats such as DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript format.

Released as an ISO standard

PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008,[5][6] at which time control of the specification passed to an ISO Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for all patents owned by Adobe that are necessary to make, use, sell, and distribute PDF-compliant implementations.[7]

PDF 1.7, the sixth edition of the PDF specification and the version accompaning Acrobat version 8 became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized and their specification is published only on Adobe's website and many of them are also not supported by popular third-party implementations of PDF.

In December, 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, including clarifications, corrections and critical updates to normative references.[13] ISO 32000-2 does not include any proprietary technologies as normative references.[14]

Information taken in part from Wikipedia

https://www.mapsoft.com


Thursday, 21 May 2020

Summary of the Structure of PDF files

PDF can be looked upon as a combination of different file types presented in a single container. The reason for this is that a PDF file contains Text, vector art, images, fonts and other file format can be embedded - even the native files that were used to create the PDF in the first place.

An object orientated file format with were items can be connected directly or indirectly to each other. 

PDF is an object orientated file format with dictionaries, images, vector drawings, text and resources


The objects within a PDF file can be divided into the following types:

Dictionaries

A group containing direct or references to indirect objects. Dictionaries can be seen as the glue holding together the elements in a PDF files. The example below shows the structure of a typical page dictionary:

pdf page dictionary


The Contents stream has an attributes dictionary that contains a filter name and the length of the stream
The CropBox array contains the coordinates of the rectangle that defines the area that is visible on the page.
The MediaBox array contains the coordinates of the rectangle that defines the media size. This will typically match a standard media size such as Letter or A4 and will allow the PDF page to be reliably printed on a device that contains these standard media sizes.
The Resources dictionary contains references and information for elements that are needed to reliably output the visual elements of the page such as colors, fonts and Images.
 
Streams

The collection of operators outputting information onto the page. Normally the stream will also require elements of the page resources dictionary such as colors and fonts. Streams are either stored as a single element or in an array.

q
567.48 61.011 -540 720 re
W* n
q
/GS0 gs
0 720 -541.1399536 0 567.4799194 61.0105438 cm
/Im0 Do
Q
Q
/CS0 cs 0.302 0.302 0.302  scn
1 i 
/GS1 gs
56.7 286.911 m
56.7 295.191 56.7 303.471 56.7 311.751 c
59.1 311.751 61.5 311.751 63.9 311.751 c
63.9 306.831 63.9 301.911 63.9 296.991 c
65.88 296.991 67.8 296.991 69.72 296.991 c
69.72 301.191 69.72 305.391 69.72 309.591 c
72 309.591 74.22 309.591 76.5 309.591 c
76.5 305.391 76.5 301.191 76.5 296.991 c
81.06 296.991 85.62 296.991 90.18 296.991 c
90.18 293.631 90.18 290.271 90.18 286.911 c
79.02 286.911 67.86 286.911 56.7 286.911 c
f*

You can see that there are several references to items in the page resources dictionary:
GS0 is a reference to a graphics state and gs is the operator that sets it.
Im0 is an XObject image and the Do operator draws the image.
CS0 is a reference to a color dictionary and the scn operator assigns it to strokes.

You can also see usage of several path operators re - rectangle, m - moveto, c - curve f* - fill.

Text strings

These can either be ANSI (single byte characters) or Unicode (multi-byte). The example here is the representation of the last date modified in the catalog dictionary.

Unicode text string




Images

Images are normally held within the page resources and the stream will also have an associated Attributes dictionary that will describe the attributes of the data within the stream. BitsPerComponent size of the data that is used to define a single pixel (dot) within the image. The ColorSpace dictionary describes the colour model that is used to define the colors within the image.

XObject image and attributes


Names

Used normally to provide a name that can be used to refer to a dictionary or dictionary item. For example, the pages dictionary has a name "Type" with the value "Pages" and a single page has a name of "Type" with a value of"Page".

pdf name entry in a dictionary



Arrays

Fixed length data holding types and/or references to other elements. For an example see the Real Numbers example below.

Real numbers

Decimal numbers. In this example they are being used to define the rectangle of the page media box:

Real numbers

Integers

Whole numbers. For example to show the total number of  pages in the PDF file.

integers


For further details see the PDF Specification at https://www.adobe.com/devnet/pdf/pdf_reference.html

Contact:

Michael Peters

Wednesday, 20 May 2020

Understanding of Colour and Colour models

There are a number of color  models but I am only going to cover 2 here as they are the most often used. 

RGB

This color model is primarily used to describe light. It is used mainly in cameras and scanners. It has 3 color elements that when added together at 100% represent white or pure light. The 3 different colors are Red, Green and Blue. The color model is almost infinite in its range and this in itself is ok until printing is required and that printing is being done through the CMYK color model. The model uses 3 values with each being in a range between 0 and 255 as in the Windows and applications such as Photoshop or as a decimal number up to a maximum of 1 in PDF for example. 

RGB is an additive color model. Adding all of the colors in equal amounts will result in white.

RGB Colour merge and intersections
In the web world RGB colours are represented by hex number combinations (the numbering system is ). So for example Red would be #FF0000, Green would be #00FF00 and Blue would be #0000FF. Black is #000000 and White is #FFFFFF. 

CMYK

Cyan/Magenta/Yellow/Black used primarily in printing.

The colors are created by printing the colors on top of each other to achieve the required shades. There may may overlaps required on the edges (trapping) to ensure that spaces are not seen as different paper types can expand and shrink when the ink/toner is applied. The color model is much more limited in its range than RGB and therefore care needs to be taken when converting from RGB to CMYK. This can be achieved through color management systems, adding additional colors to the printing run (such as Hexachrome) or using Spot colors that are usually already mixed colors such as Pantone. Printing is effected bu the resolution of the input and output and the paper stock that is being used to print onto both in the surface quality and base color of the media type and also the attributes of the inks being used. Additionally output effects and colors can be modified and enhanced through varnishes such as UV and foils to provide metallic effects.

The model uses 4 values each as a percentage of the 4 colors of cyan, magenta, yellow and black.

CMYK is a subtractive color model. Adding all of the colors in equal amounts results in black. However in CMYK this will more than likely result in a dirty color and so with the addition of the K in CMYK the printers also have a real black in order to print a true black.

CMYK Colour merge and intersections

This is a simple look at color and I will expand on this in a future blog.

Contact info:

Michael Peters