Monday, June 11, 2007

Access document attributes?

Recently someone posted a query on the ColdFusion Forums asking about cfpdf and accessing document attributes.
I was wondering if there was a way you could use CFPDF to access document attributes and extract embeded images and text from a PDF? Would there be a way for use to access text blocks created by the user (along with the x:y coordinates)?
For example, A LOT of sites convert the PDF to a JPEG and create area maps to simulate a zoom effect. Just looking for a way to deconstruct the PDF using the new CFPDF tag...
Most of these things requested are possible using cfpdf tag.
You can extract document attributes like metadata using cfpdf action="getinfo"
Extracting Text using cfpdf tag action=processddx. (code for extracting text from pdf follows below)..
You can't extract images from pdf. You can create JPEG images from pdf pages using action="thumbnail" and you can also specify o/p image format in this tag.
DDX File:
<?xml version="1.0" encoding="UTF-8"?>
<DDX xmlns=""
   xsi:schemaLocation=" coldfusion_ddx.xsd">
   <DocumentText result="Out1">
      <PDF source="Doc1"/>

CFM File:
<cfset ddxfile = "<Webroot>\ddx-textExtract\doc_text.ddx">
<cfset sourcefile1 = "<Webroot>\ddx-textExtract\<Any pdf having text>">
<cfset destinationfile = "<Webroot>\ddx-textExtract\ddx_result_doc_text.xml">
<cfset inputStruct=StructNew()>
<cfset inputStruct.Doc1="#sourcefile1#">
<cfset outputStruct=StructNew()>
<cfset outputStruct.Out1="#destinationfile#">
<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="ddxVar">


Alexander said...

This is great stuff you have been doing here! It works like a charm. One request... is it possible that you show an example using DDX, to split a multipage pdf document into single pdf files?
Thanks so much,

Ahamad said...

Hi Alexander,

Yes, you can split a pdf file into multiple pdf docs. Here is how you can acheive this.

ooph... blogspot is not allowing me to post code in the contents.

I will do a new post and let you know.

Ahamad said...

I have posted the code here...

ahamad said...

The above example to extract text is also posted here...