Wednesday, June 27, 2007

PDFForms - dataFile attribute is now xmlData

The datafile attribute name has been changed to xmldata.

xmldata now allows the following kind of sources:

In case of form population(action=populate)

Source = xmlobject/xmlstring/filename/url

and in case of action="read"

Source = file or a variable name.

This change should be reflected in the next build.



Thumbnail images

The following example generates thumbnail images from pages in a PDF document and links the thumbnail images to the pages in the PDF document:
<h3>PDF Thumbnail Demo</h3>
<!--- Create a variable for the name of the PDF document. --->
<cfset mypdf="myBook">
<cfset thisPath=ExpandPath(".")>
<!--- Use the getInfo action to retrieve the total page count for the PDF document. --->
<cfpdf action="getInfo" source="#mypdf#.pdf" name="PDFInfo">
<cfset pageCount="#PDFInfo.TotalPages#">
<!--- Generate a thumbnail image for each page in the PDF source document,
create a directory (if it doesn't already exist) in the web root that is a concatenation of the
PDF source name and the word "thumbnails", and save the thumbnail images in that directory. --->
<cfpdf action="thumbnail" source="#mypdf#.pdf" overwrite="yes" destination="#mypdf#_thumbnails" scale=60>
<!--- Loop through the images in the thumbnail directory and generate a link from
each image to the corresponding page in the PDF document. --->
<cfloop index="LoopCount" from ="1" to="#pageCount#" step="1">
  <!--- Click the thumbnail image to navigate to the page in the PDF document. --->
  <a href="#mypdf#.pdf##page=#LoopCount#" target="_blank">  <img src="#mypdf#_thumbnails/#mypdf#_page_#LoopCount#.jpg"></a>

cfpdf - action = merge

   I thought I did post some usage examples for the cfpdf tag.
Suppose I have a pdf file of 100 pages, and I want only the selected pages in a separate pdf.
I want only the 7, 10, 15, 20, 25, 40 pages, here is how to acheive this.
<cfset sourcefile = ExpandPath('inputfiles\merge_test1.pdf')>
<cfset destinationfile= ExpandPath('results\merge_result1.pdf')>
<cfpdf action=merge source="#sourcefile#" Pages="7, 10, 15, 20, 25, 40" destination="#destinationfile#" overwrite="true">
Now suppose I want just the first 3 chapters of the entire pdf file, I can specify the page numbers in a block.
Assuming that the first 3 chapters are in the first 30 pages, here is how to acheive that.
<cfpdf action=merge source="#sourcefile#" Pages="1-30" destination="#destinationfile#" overwrite="true">
Supposedly, I wanted just the first chapter(10 Pages) and the 5th Chapter(50-60 Pages)...
<cfpdf action=merge source="#sourcefile#" Pages="1-10, 50-60" destination="#destinationfile#" overwrite="true">
In case you wanted some specific pages along with a block of pages to be merged... 
<cfpdf action=merge source="#sourcefile#" Pages="1-10, 21, 31, 41, 50-60" destination="#destinationfile#" overwrite="true">
Now, if your source file is password protected, you can provide the password in the cfpdf tag.
<cfpdf action=merge source="#sourcefile#" Pages="5, 9, 16, 18" password="mypwd" destination="#destinationfile#" overwrite="true">
If we have a set of pdf files in a directory and we want to merge all of the pdf files into a single one,
<cfset sourcedir = ExpandPath('inputfiles\mergetest-dir')>
<cfset destinationfile = ExpandPath('results\merge_result_dir.pdf')>
<cfpdf action=merge directory="#sourcedir#" destination="#destinationfile#" overwrite="true">
In the above action, the files might be merged in any order, suppose I wanted it to merge by the name of the file,
<cfpdf action=merge directory="#sourcedir#" destination="#destinationfile#" order="name" overwrite="true">
And another attribute to help you sort in the way you want(asc or desc)
<cfpdf action=merge directory="#sourcedir#" destination="#destinationfile#" order="name" ascending="false" overwrite="true">
<cfpdf action=merge directory="#sourcedir#" destination="#destinationfile#" order="name" ascending="true" overwrite="true">
We can also have the merged pdf as a variable
<cfpdf action=merge source="#sourcefile#" Pages="7, 10, 11" name="pagedata">
We can perform actions on the variable 'pagedata'
<cfpdf action=getinfo source="pagedata" name="myVar">
<cfoutput>The total number of pages = #myVar.TotalPages#</cfoutput><br><br>
If you are merging a set of pdf files/pages, and you want to keep your bookmark,
<cfpdf action=merge source="#sourcefile#" Pages="7, 10, 11" destination="#destinationfile#" overwrite="true" keepbookmark="true">
Merging diff pages from different pdf's
Now you need to merge a set of pages from different pdf files,
<cfset sourcefile1 = ExpandPath('inputfiles\test1.pdf')>
<cfset sourcefile2 = ExpandPath('inputfiles\test2.pdf')>
<cfset sourcefile3 = ExpandPath('inputfiles\test3.pdf')>
<cfset destinationfile = ExpandPath('results\result1.pdf')>
<cfpdf action="merge" destination="#destinationfile#" overwrite="true">
         <cfpdfparam source="#sourcefile1#" pages="1-3, 5">
         <cfpdfparam source="#sourcefile2#"><!--- entire pdf here --->
         <cfpdfparam source="#sourcefile3#" pages="2" password="cfpdfparam_test3" >

Monday, June 18, 2007

Data extraction from signed PDFForm

CFPDFForm currently throws an application exception when processing LCD PDFs that have been signed and submitted, using the default 'Sign Data and Submit Settings' in LCD.
Now this has been fixed... And now data can be extracted from a signed PDFForm.
In signed pdf extra node named <signatures> was introduced in <xfa:data> element. Earlier assumption was that <xfa:data> will contain only forms data. Now signatures will be ignored before constructing form data DOM.

Friday, June 15, 2007

isPDFFile() Currently ALWAYS returns NO

   I have been testing the isPDFFile() method, currently(beta) this always returns false.
This is a bug and has already been fixed... should be available in the final build.
If you are unaware of isPDFFile(), Here is a brief description:

Verifies whether a PDF file is valid.

Returns True, if the value returns a valid PDF file. False, otherwise.



Path - Pathname to a PDF file. The pathname can be absolute or relative to the CFM page and must be enclosed in quotation marks.


<!--- The following code shows the action page for a form where a user chooses a PDF document to print. --->

<cfif IsPDFFile("#form.printMe#")>

<cfprint type="PDF" source="#myPDF#">


<p>This is not a valid PDF file or the PDF document you have chosen is not available.</p>





Monday, June 11, 2007

Access document attributes?

Recently someone posted a query on the ColdFusion Forums asking about cfpdf and accessing document attributes.
I was wondering if there was a way you could use CFPDF to access document attributes and extract embeded images and text from a PDF? Would there be a way for use to access text blocks created by the user (along with the x:y coordinates)?
For example, A LOT of sites convert the PDF to a JPEG and create area maps to simulate a zoom effect. Just looking for a way to deconstruct the PDF using the new CFPDF tag...
Most of these things requested are possible using cfpdf tag.
You can extract document attributes like metadata using cfpdf action="getinfo"
Extracting Text using cfpdf tag action=processddx. (code for extracting text from pdf follows below)..
You can't extract images from pdf. You can create JPEG images from pdf pages using action="thumbnail" and you can also specify o/p image format in this tag.
DDX File:
<?xml version="1.0" encoding="UTF-8"?>
<DDX xmlns=""
   xsi:schemaLocation=" coldfusion_ddx.xsd">
   <DocumentText result="Out1">
      <PDF source="Doc1"/>

CFM File:
<cfset ddxfile = "<Webroot>\ddx-textExtract\doc_text.ddx">
<cfset sourcefile1 = "<Webroot>\ddx-textExtract\<Any pdf having text>">
<cfset destinationfile = "<Webroot>\ddx-textExtract\ddx_result_doc_text.xml">
<cfset inputStruct=StructNew()>
<cfset inputStruct.Doc1="#sourcefile1#">
<cfset outputStruct=StructNew()>
<cfset outputStruct.Out1="#destinationfile#">
<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="ddxVar">

Saturday, June 9, 2007

The new cfpdf tag

This new tag introduced into Scorpio ColdFusion 8 is useful in manipulating existing PDF documents.

The following list describes some of the tasks you can perform with the cfpdf tag:
■ Merge several PDF documents into one PDF document.
■ Delete pages from a PDF document.
■ Merge pages from one or more PDF documents and generate a new PDF document.
■ Linearize PDF documents for faster web display.
■ Remove interactivity from forms created in Acrobat® to generate flat PDF documents.
■ Encrypt and add password protection to PDF documents.
■ Generate thumbnail images from PDF documents or pages.
■ Add or remove watermarks from PDF documents or pages.
■ Retrieve information associated with a PDF document, such as the software used to generate the file or the author, and set information for a PDF document, such as the title, author and keywords.

Populating a PDF form with XML data

Coldfusion 8 allows you to populate a pdf form with the xml data(read from a xml file).

Example: Consider a pdf form say, "payslipTemplate.pdf" is to be filled with employee data. The data is present in a xml file, formdata.xml

cfset sourcefile = "#ExpandPath('payslipTemplate.pdf')#"
cfset destinationfile = "#ExpandPath('employeeid123.pdf')#"
cfset datafile = "#ExpandPath('formdata.xml')#"

cfpdfform source="#sourcefile#" destination="#destinationfile#" action="populate" xmldata="#datafile#"


Monday, June 4, 2007

PDF Forms Tags...

CF 8 has introduced several tags for manipulating PDF forms:

The following describes a few of the tasks you can perform with PDF forms:

cfpdfform - Reads data from a form and writes it to a file or populates a form with data from a data source.

cfpdfformparam  - A child tag of the cfpdfform tag or the cfpdfsubform tag; populates individual fields in PDF forms.

cfpdfsubform - A child tag of the cfpdfform tag; creates the hierarchy of the PDF form so that form fields are filled properly.

The cfpdfsubform tag contains one or more cfpdpformparam tags.


Sunday, June 3, 2007

About PDF forms

Scorpio lets you incorporate interactive PDF forms in your application.
You can extract data submitted from the PDF forms, populate form fields from an XML data file or a database, and embed PDF forms in PDF documents created in ColdFusion.

ColdFusion supports interactive forms created with Adobe Acrobat forms and with LiveCycle.
In Adobe Acrobat 6.0 or earlier, you can create interactive Acroforms. Using Adobe LiveCycle
Designer, which is provided with Adobe Acrobat Professional 7.0 and later, you can generate
interactive forms.

The type of form is significant because it affects how you manipulate the data in ColdFusion.
For example, you cannot use an XML data file generated from a form created in Acrobat to
populate a form created in LiveCycle, and vice versa, because the XML file formats differ
between the two types of forms.

Forms created in Acrobat use the XML Forms Data Format (XFDF) file format. Forms
created in LiveCycle use the XML Forms Architecture (XFA) format introduced in Acrobat
and Reader 6.

The file format also affects how you prefill fields in a form from a data source, because you must
map the data structure as well as the field names.

The use of JavaScript also differs based on the context. The JavaScript Object Model in a PDF
file differs from the HTML JavaScript Object Model. Consequently, scripts written in
HTML JavaScript do not apply to PDF files. Also, JavaScript differs between forms created in
Acrobat and those created in LiveCycle: scripts written in one format do not work with other.