Search This Blog

Thursday, 21 May 2020

Summary of the Structure of PDF files

PDF can be looked upon as a combination of different file types presented in a single container. The reason for this is that a PDF file contains Text, vector art, images, fonts and other file format can be embedded - even the native files that were used to create the PDF in the first place.

An object orientated file format with were items can be connected directly or indirectly to each other. 



The objects within a PDF file can be divided into the following types:

Dictionaries

A group containing direct or references to indirect objects. Dictionaries can be seen as the glue holding together the elements in a PDF files. The example below shows the structure of a typical page dictionary:



The Contents stream has an attributes dictionary that contains a filter name and the length of the stream
The CropBox array contains the coordinates of the rectangle that defines the area that is visible on the page.
The MediaBox array contains the coordinates of the rectangle that defines the media size. This will typically match a standard media size such as Letter or A4 and will allow the PDF page to be reliably printed on a device that contains these standard media sizes.
The Resources dictionary contains references and information for elements that are needed to reliably output the visual elements of the page such as colors, fonts and Images.
 
Streams

The collection of operators outputting information onto the page. Normally the stream will also require elements of the page resources dictionary such as colors and fonts. Streams are either stored as a single element or in an array.

q
567.48 61.011 -540 720 re
W* n
q
/GS0 gs
0 720 -541.1399536 0 567.4799194 61.0105438 cm
/Im0 Do
Q
Q
/CS0 cs 0.302 0.302 0.302  scn
1 i 
/GS1 gs
56.7 286.911 m
56.7 295.191 56.7 303.471 56.7 311.751 c
59.1 311.751 61.5 311.751 63.9 311.751 c
63.9 306.831 63.9 301.911 63.9 296.991 c
65.88 296.991 67.8 296.991 69.72 296.991 c
69.72 301.191 69.72 305.391 69.72 309.591 c
72 309.591 74.22 309.591 76.5 309.591 c
76.5 305.391 76.5 301.191 76.5 296.991 c
81.06 296.991 85.62 296.991 90.18 296.991 c
90.18 293.631 90.18 290.271 90.18 286.911 c
79.02 286.911 67.86 286.911 56.7 286.911 c
f*

You can see that there are several references to items in the page resources dictionary:
GS0 is a reference to a graphics state and gs is the operator that sets it.
Im0 is an XObject image and the Do operator draws the image.
CS0 is a reference to a color dictionary and the scn operator assigns it to strokes.

You can also see usage of several path operators re - rectangle, m - moveto, c - curve f* - fill.

Text strings

These can either be ANSI (single byte characters) or Unicode (multi-byte). The example here is the representation of the last date modified in the catalog dictionary.





Images

Images are normally held within the page resources and the stream will also have an associated Attributes dictionary that will describe the attributes of the data within the stream. BitsPerComponent size of the data that is used to define a single pixel (dot) within the image. The ColorSpace dictionary describes the colour model that is used to define the colors within the image.



Names

Used normally to provide a name that can be used to refer to a dictionary or dictionary item. For example, the pages dictionary has a name "Type" with the value "Pages" and a single page has a name of "Type" with a value of"Page".




Arrays

Fixed length data holding types and/or references to other elements. For an example see the Real Numbers example below.

Real numbers

Decimal numbers. In this example they are being used to define the rectangle of the page media box:


Integers

Whole numbers. For example to show the total number of  pages in the PDF file.



For further details see the PDF Specification at https://www.adobe.com/devnet/pdf/pdf_reference.html

Contact:

Michael Peters

Wednesday, 20 May 2020

Understanding of Colour and Colour models

There are a number of color  models but I am only going to cover 2 here as they are the most often used. 

RGB

This color model is primarily used to describe light. It is used mainly in cameras and scanners. It has 3 color elements that when added together at 100% represent white or pure light. The 3 different colors are Red, Green and Blue. The color model is almost infinite in its range and this in itself is ok until printing is required and that printing is being done through the CMYK color model. The model uses 3 values with each being in a range between 0 and 255 as in the Windows and applications such as Photoshop or as a decimal number up to a maximum of 1 in PDF for example. 

RGB is an additive color model. Adding all of the colors in equal amounts will result in white.

RGB Colour merge
In the web world RGB colours are represented by hex number combinations (the numbering system is ). So for example Red would be #FF0000, Green would be #00FF00 and Blue would be #0000FF. Black is #000000 and White is #FFFFFF. 

CMYK

Cyan/Magenta/Yellow/Knockout used primarily in printing.

The colors are created by printing the colors on top of each other to achieve the required shades. There may may overlaps required on the edges (trapping) to ensure that spaces are not seen as different paper types can expand and shrink when the ink/toner is applied. The color model is much more limited in its range than RGB and therefore care needs to be taken when converting from RGB to CMYK. This can be achieved through color management systems, adding additional colors to the printing run (such as Hexachrome) or using Spot colors that are usually already mixed colors such as Pantone. Printing is effected bu the resolution of the input and output and the paper stock that is being used to print onto both in the surface quality and base color of the media type and also the attributes of the inks being used. Additionally output effects and colors can be modified and enhanced through varnishes such as UV and foils to provide metallic effects.

The model uses 4 values each as a percentage of the 

CMYK is a subtractive color model. Adding all of the colors in equal amounts results in black. However in CMYK this will more than likely result in a dirty color and so with the addition of the K in CMYK the printers also have a real black in order to print a true black.

CMYK Colour merge

This is a simple look at color and I will expand on this in a future blog.

Contact info:

Michael Peters

Tuesday, 19 May 2020

What is an Acrobat Plug-in?


A way for software developers to add additional functionality to Acrobat or to modify current functionality.

Why are plug-ins required?

Adobe provides a product that is intended to be used across multiple industries and organisations. Supporting all multiple vertical markets bloats the application in proving features that would only be used by relatively few people when compared with the whole Acrobat market.

Can Acrobat plug-ins be used in the Adobe Reader?

Special support needs to be added to the plug-in so that it can run under Adobe Reader. However the Reader plug-in will require a special license and needs to go through an approval process with Adobe Systems Inc. - https://www.adobe.com/devnet/reader/ikla.html.

Are plug-ins specific to a particular version of Adobe Acrobat?

We have plug-ins that we developed for Acrobat 6 that still run without modification in Acrobat DC. However, if new features are used that are specific to a later version then it won't work under later versions. If earlier versions used the Adobe Dialog Manager (ADM) then they won't now work in current versions of Acrobat.

Examples of Plug-ins
  • New security handlers that might be specific to a particular organisation. For example, we have developed security handlers that do not allow PDF files to be viewed outside a particular organisations offices. 
  • New annotations. For example, we created a plug-in that supported all of the British Standard Markups.
  • Flattening annotations and form fields into the main document. This ensured that they could not be changed or modified and that they would print as part of the document even if the printing of annotations was switched off.
  • Adding text and images to PDF files.
  • Creating a table of Contents for PDF files
  • Adding fields for variable data printing
  • Hardware integration of Adobe Acrobat into whiteboards and interactive tables

Contact Info:

Michael Peters

Tuesday, 11 February 2020

PDF Software Development Beyond Acrobat

The Adobe PDF Library

The Adobe PDF library can be seen basically as a software developer version of Adobe Acrobat but without the user interface however, it is far more than that.




Platform availability

Acrobat is 32 bit application on Windows and is also available for the Mac however the PDF library is available on far more platforms and also as a 64-bit offering. The library is available for the following platforms:
  • Windows 32-bit
  • Windows 64-bit
  • Mac 32-bit
  • Mac 64-bit
  • Linux 32-bit
  • Linux 64-bit
  • Solaris Sparc 32-bit
  • Solaris Sparc 64-bit
  • Solaris Intel 32-bit
  • Solaris Intel 64-bit
  • AIX 32-bit
  • AIX 64-bit
  • HP/UX PA-RISC 32-bit
  • HP/UX PA-RISC 64-bit
  • HP/UX Itanium 32-bit
  • HP/UX Itanium 64-bit

Where is it used?

The library is built into Adobe's CC (Creative Cloud) applications such as Adobe InDesign, Adobe PhotoShop and Adobe Illustrator and is available to third-party companies as an OEM offering.

Our licensing and experience

Mapsoft has extensive experience with working with the Adobe PDF library and the Adobe Acrobat SDK (software developers kit).

Mapsoft licences the library for use in our applications and we have extensive experience in using it for other organisations offerings. The software developer kit is the same code base in both Adobe Acrobat/Adobe Reader and the PDF library. In some cases the same product that is available as a plug-in for Acrobat can have most of its code reused for an Adobe PDF library offering. However the advantages is that library offerings are not constrained by the same licensing as Adobe Acrobat in particular in being able to use it in a server environment. Datalogics, who license the PDF library have created their own interfaces now that can be used through .NET and through Java.

In general the PDF library is kept in step with both Acrobat and changes in  PDF and recently with the jump from PDF version 1.7 to PDF version 2. There is also extensive support for PDF X and PDF A and the ability to be able to convert from generic PDF to these specific versions.

The PDF library is not an end user product. It is for use by developers and is capable of interacting with PDF files to a very low level. Although PDF is an ISO standard there is the huge confidence of choosing a software developers kit that has been created by the originators of the PDF standard and used extensively in products from Adobe rather than choosing a 3rd party offering.

Datalogics who are responsible for licensing the PDF library to third-party software developers also provide support and maintenance and licensing of the binaries where development is not required because that service has been provided by another organisation such as Mapsoft.

For more information please contact Michael Peters who is the Technical Director at Mapsoft.

Michael Peters
mpeters@mapsoft.com
www.mapsoft.com

for more information on the Adobe PDF library please see the Datalogics website at:  https://www.datalogics.com/products/pdf/pdflibrary/