CFPDF: Access document attributes?

Recently someone posted a query on the ColdFusion 8 Forums asking about cfpdf and accessing document attributes.

=========================

I was wondering if there was a way you could use CFPDF to access document attributes and extract embeded images and text from a PDF? Would there be a way for use to access text blocks created by the user (along with the x:y coordinates)?

For example, A LOT of sites convert the PDF to a JPEG and create area maps to simulate a zoom effect. Just looking for a way to deconstruct the PDF using the new CFPDF tag...
=========================

Most of these things requested are possible using cfpdf tag.

You can extract document attributes like metadata using cfpdf action="getinfo"

Extracting Text using cfpdf tag action=processddx. (code for extracting text from pdf follows below)..

You can't extract images from pdf. You can create JPEG images from pdf pages using action="thumbnail" and you can also specify o/p image format in this tag.

===========================
DDX File:

<?xml version="1.0" encoding="UTF-8"?>
<DDX xmlns="http://ns.adobe.com/DDX/1.0/"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">
   <DocumentText result="Out1">
      <PDF source="Doc1"/>
   </DocumentText>
</DDX>

CFM File:

<cfset ddxfile = "<Webroot>\ddx-textExtract\doc_text.ddx">
<cfset sourcefile1 = "<Webroot>\ddx-textExtract\<Any pdf having text>">
<cfset destinationfile = "<Webroot>\ddx-textExtract\ddx_result_doc_text.xml">

<cfoutput>#ddxVar.Out1#</cfoutput>

==========================

4 comments:

Unknown said...: This is great stuff you have been doing here! It works like a charm. One request... is it possible that you show an example using DDX, to split a multipage pdf document into single pdf files?
Thanks so much,
Alexander; Wednesday, July 04, 2007
Ahamad said...: Hi Alexander,
Thanks...

Yes, you can split a pdf file into multiple pdf docs. Here is how you can acheive this.

ooph... blogspot is not allowing me to post code in the contents.

I will do a new post and let you know.; Thursday, July 05, 2007
Ahamad said...: Hi,
I have posted the code here...

http://cfpdf.blogspot.com/2007/07/split-pdf-file-into-multiple-pdf-docs.html; Thursday, July 05, 2007
Anonymous said...: The above example to extract text is also posted here...

http://cf-examples.net/index.cfm/2008/6/18/Extract-Text-From-PDF; Tuesday, July 15, 2008

CFPDF

Monday, June 11, 2007

Access document attributes?

4 comments:

Blog Archive

Links

About Me