Creating PDFs

Trying to shove tables
into a sheet of paper since 88'

Scott Steinbeck

  • Software Developer
  • 20+ Year of experience 
  • Father
  • Hardware Tinkerer

Hobbies

 
  • Coding on my free time
  • Overcommitting myself
  • Automating everything
  • IOT Development
  • Teaching Coding/Electronics
  • Contributing to open source

Creating Excel

 

@cfsimplicity Julian Halliwell

Creating Excel

Back to the show

State of the cfdocument

if only they had more experience with pdfs...

Reality...

Modern CSS support

Limited Header/Footer Customization

JS Support

Page Break Issues

Table Border Styles

Custom Font Support

Context aware chapter titles

Page numbering

Bookmarks

Table of Contents

Internal Links

Image Paths

Repeating Table Headers

Orientation change within the PDF

Missing, limited, or undocumented support for...

alternative solutions that provide additional features

WKHTMLTOPDF

executable

alternative solutions that provide additional features

PhantomJS (headless browser)

PhantomJS development is suspended until further notice

executable

alternative solutions that provide additional features

executable -> Puppeteer CLI

alternative solutions that provide additional features

Written for node, wrapped for java 

Playwright

alternative solutions that provide additional features

https://html2canvas.hertzen.com/

Javascript library that renders html into a canvas element on the fly and convert it to an image

alternative solutions that provide additional features

Notable mentions

Apache PDF Box
(not from ortus)

Prince PDF

($500 / server licence)

Introducing FLying Saucer

Lucee 5.3+

Introducing FLying Saucer

Bryce (1999)

Searching for docs....

HTML/CSS SUpport

The Adobe CFML cfdocument  tag can render HTML that supports the following standards:

HTML 4.01
XML 1.0
DOM Level 1 and 2
CSS1 and CSS2 (For more information, see the "Supported CSS styles" section).
The Lucee CFML cfdocument  tag can render HTML that supports the following standards:

HTML 4.01
XML 1.0
DOM Level 1 and 2
CSS 2.1, css 3 support in development

The magic is found in the CSS support

Orientation change within the PDF

 /* Use A4 paper */
    @page { size: A4 }

    /* Use A4 paper in landscape orientation */
    @page { size: A4 landscape }

    /* These two custom sizes are equivalent */
    @page { size: 30cm 40cm }
    @page { size: 40cm 30cm landscape }

    /* Use square paper, this sets width and height */
    @page { size: 30cm }

Header/Footer Customization

Name Default alignment In figure
text-align vertical-align
@top center middle yellow
@bottom center middle yellow
@left center middle red
@right center middle red
@top-left left middle green
@page {
    @bottom {
        content: counter(page)
    }
}

Page Break (inside, after, before)

page-break-before: 
	auto | always | avoid 

Page Break (inside, after, before)

page-break-after: 
	auto | always | avoid 

Page Break (inside, after, before)

page-break-inside: 
	auto | always | avoid 

Table Styles

  table {
    border-collapse: collapse;
    table-layout: fixed;
  }

modifies the table layout algorithm to repeat table headers and footers on subsequent pages

Custom Font Support

<cfdocument fontDirectory = "path/to/Calibri.ttf">
  <style type="text/css">
      body {
          font-family: Calibri, sans-serif;
      }
  </style>
</cfdocument>

Note:  Modern (Flying saucer) engine works using the font-family-name from the .ttf file with the same case.

Note 2: Possible problems with forign language fonts

Context aware chapter titles


    @page {
        @top {
            content: string(doctitle)
        }
    }

    h1 { string-set: doctitle content() }

Copying content from the document
Generated content in page regions may contain text content copied from the document using the string-set property:

 

Page numbering

@page docpage{
    size:8.5in 11in;
    margin: .4in .5in .75in .5in;
    @bottom-left {
        font-family: Arial, Helvetica, sans-serif;
        font-size:11px;
        content: "<cfoutput>#prc.invoiceTitle#</cfoutput>"
    }
    @bottom-right {
        font-family: Arial, Helvetica, sans-serif;
        font-size:11px;
        content: "Page " counter(page) " of " counter(pages);
    }
}
.normalpage {
    page: docpage;
}

Bookmarks

document bookmark=true {}
<a href="#anchor-id">What is ColdFusion (CFML)</a>
<h2 id="#anchor-id" class="h1 page">
	What is ColdFusion (CFML)
</h2>

Table of Contents

ul.toc a::after {
  content: leader(".") target-counter(attr(href), page);
}

Image Paths

document localurl=true
<img  
     align="center" 
     style="max-width:100%" 
     src="/path/to/local_file.png" 
/>
<img  
     align="center" 
     style="max-width:100%" 
     src="http(s)://yourimageurl.png" 
/>

ColdFusion retrieves image files directly from
the local drive rather than by using HTTP, HTTPS, or proxy

Repeating Table Headers

  table {
    -fs-table-paginate	
  }

modifies the table layout algorithm to repeat table headers and footers on subsequent pages

More custom Flying Saucer Tags

-fs-keep-with-inline will try to avoid breaking a block in such a way that only padding and borders appear on a page. 
-fs-page-sequence Allows you to limit the scope of the page and pages counters to a portion of the document
-fs-font-metric-src use inside a font-face rule in case the font you want to embed in the PDF has a custom font metrics file; 
-fs-pdf-font-embed use with the value embed inside a font-face rule to have Flying Saucer embed a font file within a PDF document, avoiding the need to call the addFont() method of the FontResolver class
-fs-pdf-font-encoding use inside a font-face rule to specify the enconding for a custom font you are embedding inside a PDF; takes the name of the encoding as value.
-fs-table-cell-colspan whole number. Replaces use of legacy colspan attribute for table columns.
-fs-table-cell-rowspan whole number. Replaces use of legacy rowspan attribute for table columns.
-fs-table-paginate when used with the value paginate, modifies the table layout algorithm to repeat table headers and footers on subsequent pages and improve the appearance of cells that break across pages (for example by closing and reopening borders), but that's all it does. If a table's minimum width is wider than the page, it will be chopped off.
-fs-text-decoration-extent Either line (default) or block. It controls how text decorations are drawn on a block level element. With line, the spec compliant behavior is used text decoration is drawn across line box. With block, text decoration is drawn across entire content area of block.

https://flyingsaucerproject.github.io/flyingsaucer/r8/guide/users-guide-R8.html

BONUS CODE HIGHLIGHTING

Putting it all together

New Challenge 

Original Export

New Export

how do we convert it?

AST

Abstract Syntax Tree

Commandbox JQ (Jmes path)

AST Explorer

https://astexplorer.net/

what do we do with Markdown?

Search for a java library to convert it

moduleSettings = {
	cbmarkdown = {
		// Looks for www or emails and converts them to links
		autoLinkUrls             : true,
		// Creates anchor links for headings
		anchorLinks              : true,
		// Set the anchor id
		anchorSetId              : true,
		// Set the anchor id but also the name
		achorSetName             : true,
		// Do we create the anchor for the full header or just before it. True is wrap, false is just create anchor tag
		anchorWrapText           : false,
		// The class(es) to apply to the anchor
		anchorClass              : "anchor",
		// raw html prefix. Added before heading text, wrapped or unwrapped
		anchorPrefix             : "",
		// raw html suffix. Added before heading text, wrapped or unwrapped
		anchorSuffix             : "",
		// Enable youtube embedded link transformer
		enableYouTubeTransformer : false,
		// override HTML to use for wrapping style.
		codeStyleHTMLOpen		 : '<code class="code inline">',
		// override HTML to use for wrapping style.
		codeStyleHTMLClose		 : '</code>',
		// add a class prefix to the "fenced" code blocks, i.e. ```js. Useful for supporting various syntax highlighters.
		fencedCodeLanguageClassPrefix : "brush",
		// Table options
		tableOptions             : {
			// Treat consecutive pipes at the end of a column as defining spanning column.
			columnSpans                 : true,
			// Whether table body columns should be at least the number or header columns.
			appendMissingColumns        : true,
			// Whether to discard body columns that are beyond what is defined in the header
			discardExtraColumns         : true,
			// Class name to use on tables
			className                   : "table",
			// When true only tables whose header lines contain the same number of columns as the separator line will be recognized
			headerSeparationColumnMatch : true
		}
	} // end markdown settings
};

is all parsing done with ast?

vscode-cfml

{
  "name": "CFML",
  "scopeName": "embedding.cfml",
  "patterns": [
    {
      "begin": "(?i)(?=^\\s*(/\\*|//|import\\b|(component|abstract\\s*component|final\\s*component|interface)(\\s+|{)))",
      "contentName": "source.cfml.script",
      "end": "(?=not)possible",
      "patterns": [
        {
          "include": "#source-cfml-script"
        }
      ]
    },
    {
      "include": "#source-cfml-tag-comments"
    },
    {
      "begin": "(?i)(?=<cf(component|interface)\\b)",
      "contentName": "source.cfml",
      "end": "(?=not)possible",
      "patterns": [
        {
          "include": "#cfcomponent"
        },
        {
          "include": "#cfinterface"
        }
      ]
    },
    {
      "begin": "(?=\\S)",
      "contentName": "text.html.cfml",

is all parsing done with ast?

cfml-parser

var braceOpen = 0;
var semi = 0;
var quotePos = 0;
var eqPos = 0;
var lineEnd = 0;
var inString = false;
var stringOpenChar = "";
var currentStatement = "";
var currentStatementStart = 1;
var commentStatement = "";
var sb = createObject("java", "java.lang.StringBuilder");

//parsing a cfscript tag uses startPosition and endPosition
if (arguments.startPosition != 0 && arguments.endPosition != 0) {
    pos = arguments.startPosition;
    contentLength = arguments.endPosition;
}

while(pos<=contentLength) {
    c = mid(content, pos, 1);
    
    if (c == "'" || c == """") {
        if (inString && stringOpenChar == c) {
            if (mid(content, pos, 2) != c&c) {
                inString = false; //end string
            } else {
                //escaped string open char
                sb.append(c);
                sb.append(c);
                pos = pos+2;
                continue;
            }
            
        } else if (!inString) {
            inString = true;
            stringOpenChar = c;
        }
        sb.append(c);
    } else if (!inString) {
        if (c == "/" && mid(content, pos, 2) == "/*") {
            //currentState = this.STATE.COMMENT;
            commentStatement = new Comment(name="/*", startPosition=pos, parent=parent, file=arguments.file);
            if (!isSimpleValue(parent)) {
                parent.addChild(commentStatement);
            }
            endPos = find("*/", content, pos+3);
            if (endPos == 0) {
                //end of doc
                endPos = contentLength;
            }
            commentStatement.setEndPosition(endPos);~

Copy of Advanced PDF Generation + Building a gitbook Markdown conversion process

By Eric Peterson

Copy of Advanced PDF Generation + Building a gitbook Markdown conversion process

Take a dive deep into the undocumented features of Flying Saucer (the engine behind cfDocument) in pursuit of creating a pdf with page numbering, context-aware chapter headings, table page breaks, and more. Then we will spend some time digging into markdown conversion using a java library Flexmark to handle many aspects of markdown conversions including tables, auto-linking, embedded YouTube, etc.

  • 420