Search This Blog

Thursday, 21 May 2020

Summary of the Structure of PDF files

PDF can be looked upon as a combination of different file types presented in a single container. The reason for this is that a PDF file contains Text, vector art, images, fonts and other file format can be embedded - even the native files that were used to create the PDF in the first place.

An object orientated file format with were items can be connected directly or indirectly to each other. 

PDF is an object orientated file format with dictionaries, images, vector drawings, text and resources

The objects within a PDF file can be divided into the following types:


A group containing direct or references to indirect objects. Dictionaries can be seen as the glue holding together the elements in a PDF files. The example below shows the structure of a typical page dictionary:

pdf page dictionary

The Contents stream has an attributes dictionary that contains a filter name and the length of the stream
The CropBox array contains the coordinates of the rectangle that defines the area that is visible on the page.
The MediaBox array contains the coordinates of the rectangle that defines the media size. This will typically match a standard media size such as Letter or A4 and will allow the PDF page to be reliably printed on a device that contains these standard media sizes.
The Resources dictionary contains references and information for elements that are needed to reliably output the visual elements of the page such as colors, fonts and Images.

The collection of operators outputting information onto the page. Normally the stream will also require elements of the page resources dictionary such as colors and fonts. Streams are either stored as a single element or in an array.

567.48 61.011 -540 720 re
W* n
/GS0 gs
0 720 -541.1399536 0 567.4799194 61.0105438 cm
/Im0 Do
/CS0 cs 0.302 0.302 0.302  scn
1 i 
/GS1 gs
56.7 286.911 m
56.7 295.191 56.7 303.471 56.7 311.751 c
59.1 311.751 61.5 311.751 63.9 311.751 c
63.9 306.831 63.9 301.911 63.9 296.991 c
65.88 296.991 67.8 296.991 69.72 296.991 c
69.72 301.191 69.72 305.391 69.72 309.591 c
72 309.591 74.22 309.591 76.5 309.591 c
76.5 305.391 76.5 301.191 76.5 296.991 c
81.06 296.991 85.62 296.991 90.18 296.991 c
90.18 293.631 90.18 290.271 90.18 286.911 c
79.02 286.911 67.86 286.911 56.7 286.911 c

You can see that there are several references to items in the page resources dictionary:
GS0 is a reference to a graphics state and gs is the operator that sets it.
Im0 is an XObject image and the Do operator draws the image.
CS0 is a reference to a color dictionary and the scn operator assigns it to strokes.

You can also see usage of several path operators re - rectangle, m - moveto, c - curve f* - fill.

Text strings

These can either be ANSI (single byte characters) or Unicode (multi-byte). The example here is the representation of the last date modified in the catalog dictionary.

Unicode text string


Images are normally held within the page resources and the stream will also have an associated Attributes dictionary that will describe the attributes of the data within the stream. BitsPerComponent size of the data that is used to define a single pixel (dot) within the image. The ColorSpace dictionary describes the colour model that is used to define the colors within the image.

XObject image and attributes


Used normally to provide a name that can be used to refer to a dictionary or dictionary item. For example, the pages dictionary has a name "Type" with the value "Pages" and a single page has a name of "Type" with a value of"Page".

pdf name entry in a dictionary


Fixed length data holding types and/or references to other elements. For an example see the Real Numbers example below.

Real numbers

Decimal numbers. In this example they are being used to define the rectangle of the page media box:

Real numbers


Whole numbers. For example to show the total number of  pages in the PDF file.


For further details see the PDF Specification at


Michael Peters

Wednesday, 20 May 2020

Understanding of Colour and Colour models

There are a number of color  models but I am only going to cover 2 here as they are the most often used. 


This color model is primarily used to describe light. It is used mainly in cameras and scanners. It has 3 color elements that when added together at 100% represent white or pure light. The 3 different colors are Red, Green and Blue. The color model is almost infinite in its range and this in itself is ok until printing is required and that printing is being done through the CMYK color model. The model uses 3 values with each being in a range between 0 and 255 as in the Windows and applications such as Photoshop or as a decimal number up to a maximum of 1 in PDF for example. 

RGB is an additive color model. Adding all of the colors in equal amounts will result in white.

RGB Colour merge and intersections
In the web world RGB colours are represented by hex number combinations (the numbering system is ). So for example Red would be #FF0000, Green would be #00FF00 and Blue would be #0000FF. Black is #000000 and White is #FFFFFF. 


Cyan/Magenta/Yellow/Black used primarily in printing.

The colors are created by printing the colors on top of each other to achieve the required shades. There may may overlaps required on the edges (trapping) to ensure that spaces are not seen as different paper types can expand and shrink when the ink/toner is applied. The color model is much more limited in its range than RGB and therefore care needs to be taken when converting from RGB to CMYK. This can be achieved through color management systems, adding additional colors to the printing run (such as Hexachrome) or using Spot colors that are usually already mixed colors such as Pantone. Printing is effected bu the resolution of the input and output and the paper stock that is being used to print onto both in the surface quality and base color of the media type and also the attributes of the inks being used. Additionally output effects and colors can be modified and enhanced through varnishes such as UV and foils to provide metallic effects.

The model uses 4 values each as a percentage of the 4 colors of cyan, magenta, yellow and black.

CMYK is a subtractive color model. Adding all of the colors in equal amounts results in black. However in CMYK this will more than likely result in a dirty color and so with the addition of the K in CMYK the printers also have a real black in order to print a true black.

CMYK Colour merge and intersections

This is a simple look at color and I will expand on this in a future blog.

Contact info:

Michael Peters

Tuesday, 19 May 2020

What is an Acrobat Plug-in?

Adobe Acrobat plugin

A way for software developers to add additional functionality to Acrobat or to modify current functionality.

Why are plug-ins required?

Adobe provides a product that is intended to be used across multiple industries and organisations. Supporting all multiple vertical markets bloats the application in proving features that would only be used by relatively few people when compared with the whole Acrobat market.

Can Acrobat plug-ins be used in the Adobe Reader?

Special support needs to be added to the plug-in so that it can run under Adobe Reader. However the Reader plug-in will require a special license and needs to go through an approval process with Adobe Systems Inc. -

Are plug-ins specific to a particular version of Adobe Acrobat?

We have plug-ins that we developed for Acrobat 6 that still run without modification in Acrobat DC. However, if new features are used that are specific to a later version then it won't work under later versions. If earlier versions used the Adobe Dialog Manager (ADM) then they won't now work in current versions of Acrobat.

Examples of Plug-ins
  • New security handlers that might be specific to a particular organisation. For example, we have developed security handlers that do not allow PDF files to be viewed outside a particular organisations offices. 
  • New annotations. For example, we created a plug-in that supported all of the British Standard Markups.
  • Flattening annotations and form fields into the main document. This ensured that they could not be changed or modified and that they would print as part of the document even if the printing of annotations was switched off.
  • Adding text and images to PDF files.
  • Creating a table of Contents for PDF files
  • Adding fields for variable data printing
  • Hardware integration of Adobe Acrobat into whiteboards and interactive tables

Contact Info:

Michael Peters