HTML To PDF Conversion

HTML_ToPDF 3.5

Printing webpages is great, but every browser renders a page differently. This can cause problems if you need to be able to print a page that looks the same no matter what platform or browser is being used. Additionally, PDF files can be searched and browsed as a sort of notebook, making it a useful file format for large text files.

HTML_ToPDF is a PHP class that makes it easy to convert HTML documents to PDF files on the fly. HTML_ToPDF grew out of the need to convert HTML files (which are easy to create) to PDF files (which are not so easy to create) fast and easily. Featues include:

  • The ability to encrypt and set permissions on the PDF file on the fly.
  • Can easily convert pages that are generated dynamically (i.e. from a database via PHP)
  • The ability to set the header and footer text, including the color.
  • The ability to set the page size and margins.
  • The ability to convert images in the webpage to images embedded in the PDF. The script tries to convert relative image paths in to absolute ones as well.
  • The ability to use the CSS in the HTML file in the creation of the PDF. This includes remote CSS files as well.
  • The ability to convert remote files.
  • The ability to convert links into embedded clickable links in the PDF file
  • The ability to scale the HTML page.
  • Easy setting of any of these options through the methods of the class.
  • Tries to fix quirks in html pages which break html2ps.
  • Works on both Unix/Linux and Windows.
Price:

All of my scripts are open source, and therefore free!. Of course contributions in the form of documentation, patches or wishlist items are always welcome. In addition, if you need help then you can hire me to install, modify, or improve the scripts. Head over to my consulting website to contact me.

Downloads:
  • 11-18-2006: HTML_ToPDF 3.5 (.tar.gz or .zip). More than one release per year! What's new: Raw HTML can be passed straight into the class making it easier to convert dynamic pages, PEAR is no longer required, CSS via <link... tags now work, better handling of <input... tags, and better instructions for Windows installs.
  • 02-11-2006: HTML_ToPDF 3.4 (.tar.gz or .zip). Several people have either contributed code or money and with that, some great new features follow: much better support for Windows (see README for details), relative paths for images and css by using setDefaultPath(), images can be rendered in grayscale, font-size can be set, all the temp files are cleaned up.
  • 11-24-2004: HTML_ToPDF 3.3. Several improvements, mostly contributed by others. https pages are supported, empty headers and footers are now possible, and more See the CHANGES file for full details.
  • 11-13-2003: HTML_ToPDF 3.2. The quality of converted images is much higher. Errors in the debug output were fixed.
  • 08-06-2003: HTML_ToPDF 3.1. Added debug mode, support for landscape page orientation (using the @page block), and a4 paper.
  • 07-17-2003: HTML_ToPDF 3.0. Several improvements, including a PDF encryptor. See CHANGES for more info.
  • 08-22-2002: HTML_ToPDF 2.0.1. Small bug fix in method setGetUrlPath() which is used to set an alternative path to the image grabber program (i.e. curl or links).
  • 08-10-2002: HTML_ToPDF 2.0. The first release after the rewrite.
Help: Please don't contact me directly for help with the script (unless you're looking for paid consulting). Rather, use the group below.
Google Groups Announcements, Help, and Discussion for HTML_ToPDF:
Email:
Browse Archives at groups.google.com
Requires: See the README.

Known Limitations:
  • Any limitations that html2ps has this script has. Thus, CSS attributes within a tag, such as class or style are not noticed. Best results are obtained by coding the colors and styles directly into the HTML using attributes such as bgcolor and color.
  • Along the same lines, CSS based layouts (i.e. floats, absolute positioning, etc.) will not render correctly. You will, unfortunately, need to use a table-based layout if you want a more complex layout.
  • Does not render form elements very well.
Docs: License:

The script under the PHP license, and thus is free for you to use and modify as you wish, just let me know if you make any improvements.

Examples:

The simplest example. We convert this HTML file into this PDF file. We also customize the headers and footers.

<?php
/** $Id: example1.php 2426 2006-11-18 19:59:26Z jrust $ */
/**
 * The simplest example. We convert an HTML file into a PDF file.
 * We also add a few custom headers/footers to the PDF.
 */
?>
<html>
<head>
  <title>Testing HTML_ToPDF</title>
</head>
<body>
  Creating the PDF from local HTML file....  Note that we customize the headers and footers!<br />
<?php
// Require the class
require_once dirname(__FILE__) . '/../HTML_ToPDF.php';

// Full path to the file to be converted
$htmlFile dirname(__FILE__) . '/test.html';
// The default domain for images that use a relative path
// (you'll need to change the paths in the test.html page 
// to an image on your server)
$defaultDomain 'www.rustyparts.com';
// Full path to the PDF we are creating
$pdfFile dirname(__FILE__) . '/timecard.pdf';
// Remove old one, just to make sure we are making it afresh
@unlink($pdfFile);

// Instnatiate the class with our variables
$pdf =& new HTML_ToPDF($htmlFile$defaultDomain$pdfFile);
// Set headers/footers
$pdf->setHeader('color''blue');
$pdf->setFooter('left''Generated by HTML_ToPDF');
$pdf->setFooter('right''$D');
$result $pdf->convert();

// Check if the result was an error
if (is_a($result'HTML_ToPDFException')) {
    die(
$result->getMessage());
}
else {
    echo 
"PDF file created successfully: $result";
    echo 
'<br />Click <a href="' basename($result) . '">here</a> to view the PDF file.';
}
?>
</body>
</html> 

A more complex example. It could be used to convert this remote HTML file into this PDF file. Additionally, we set several options to customize the look.

<?php
/** $Id: example2.php 2426 2006-11-18 19:59:26Z jrust $ */
/**
 * A more complex example. We convert a remote HTML file 
 * into a PDF file. Additionally, we set several options to 
 * customize the look.
 */
?>
<html>
<head>
  <title>Testing HTML_ToPDF</title>
</head>
<body>
  Creating the PDF from remote web page...<br />
<?php
// Require the class
require_once dirname(__FILE__) . '/../HTML_ToPDF.php';

// Full path to the file to be converted (this time a webpage)
// change this to your own domain
$htmlFile 'http://www.example.com/index.html';
$defaultDomain 'www.example.com';
$pdfFile dirname(__FILE__) . '/test2.pdf';
// Remove old one, just to make sure we are making it afresh
@unlink($pdfFile);

$pdf =& new HTML_ToPDF($htmlFile$defaultDomain$pdfFile);
// Set that we do not want to use the page's css
$pdf->setUseCSS(false);
// Give it our own css, in this case it will make it so
// the lines are double spaced
$pdf->setAdditionalCSS('
p {
  line-height: 1.8em;
  font-size: 12pt;
}'
);
// We want to underline links
$pdf->setUnderlineLinks(true);
// Scale the page down slightly
$pdf->setScaleFactor('.9');
// Make the page black and light
$pdf->setUseColor(false);
// Convert the file
$result $pdf->convert();

// Check if the result was an error
if (is_a($result'HTML_ToPDFException')) {
    die(
$result->getMessage());
}
else {
    echo 
"PDF file created successfully: $result";
    echo 
'<br />Click <a href="' basename($result) . '">here</a> to view the PDF file';
}
?>
</body>
</html> 

Finally, we create a PDF file based on a dynaically generated page. We buffer the content of the page and then create the PDF at the end. We set a number of CSS tags, including page size and margins. After creating the PDF we encrypt it (password is "foobar"). Example output can be seen here.

<?php
/** $Id: example3.php 2426 2006-11-18 19:59:26Z jrust $ */
/**
 * Here we create an encrypted PDF file based on a dynaically generated page. 
 * We buffer the content of the page and then create the PDF at the end.
 * Then we load up PDFEncryptor and set meta-data, password, and permissions.
 * Finally, we send a header and the file so it opens straight into the
 * browser.
 */

// Require the class
require_once dirname(__FILE__) . '/../HTML_ToPDF.php';
require_once 
dirname(__FILE__) . '/../PDFEncryptor.php';
// Create a unique filename for the resulting PDF
$linkToPDFFull $linkToPDF tempnam(dirname(__FILE__), 'PDF-');
// Remove the temporary file it creates
unlink($linkToPDFFull);
// Give it an extension
$linkToPDFFull .= '.pdf';
$linkToPDF .= '.pdf';
// Make it web accessible
$linkToPDF basename($linkToPDF);
$defaultDomain 'www.rustyparts.com';

// Buffer the current html page so we can write it to file later
ob_start();
?>
<html>
<head>
  <title>Testing HTML_ToPDF</title>
  <style type="text/css">
  div.noprint {
    display: none;
  }
  h6 {
    font-style: italic;
    font-weight: bold;
    font-size: 14pt;
    font-family: Courier;
    color: blue;
  }
  /** Change the paper size, orientation, and margins */
  @page {
    size: 8.5in 14in;
    orientation: landscape;
  }
  /** This is a bit redundant, but its works ;) */
  /** odd pages */
  @page:right {
    margin-right: 1.0cm;
    margin-left: 1.0cm;
    margin-top: 1.0cm;
    margin-bottom: 1.0cm;
  }
  /** even pages */
  @page:left {
    margin-right: 1.0cm;
    margin-left: 1.0cm;
    margin-top: 1.0cm;
    margin-bottom: 1.0cm;
  }
  </style>
</head>
<body>
  An example dynamic page that is converted to PDF on 8x14 paper, 
  in landscape mode, with 1.0cm margins!<br /> 
  And what about <sub>subscript</sub> or <sup>superscript</sup>?<br />
  Hmmm...one last test, special characters: &alpha; &copy; &#187;<br /><br />
  This document has been encrypted with the helper PDFEncryptor class so you will need to
  enter "foobar" for the password<br />
  This should open straight into your PDF reader, 
  but, if not, click <a href="<?php echo $linkToPDF?>">here</a> to view the PDF file.<br />
  <div class="noprint">This should not show up.</div>
  <h6>
  This demonstrates the use of CSS classes for an element.<br />
  What CSS properties and blocks can be used can be found at 
  <a href="http://www.tdb.uu.se/~jan/html2psug.html">http://www.tdb.uu.se/~jan/html2psug.html</a>
  </h6>
  Inserting a page break..<br /><br />
  <!--NewPage-->
  Now on to page 2!
  A linked image with a relative path:<br />
  <a href="http://rustyparts.com/pb"><img src="tuckered.jpg" /></a>
</body>
</html>
<?php
// Send the class our HTML and the defaultDomain for images, css, etc.
$pdf =& new HTML_ToPDF(ob_get_contents(), $defaultDomain);
// We won't be sending out the HTML to the user
ob_end_clean();
$pdf->setDefaultPath('/scripts/HTML_ToPDF/examples/');
// Could turn on debugging to see what exactly is happening
// (commands being run, images being grabbed, etc.)
// $pdf->setDebug(true);
// Convert the file
$result $pdf->convert();

// Check if the result was an error
if (is_a($result'HTML_ToPDFException')) {
    die(
$result->getMessage());
}
else {
    
// Move the generated PDF to the web accessible file
    
copy($result$linkToPDFFull);
    
unlink($result);

    
// Set up encryption
    
$encryptor =& new PDFEncryptor($linkToPDFFull);
    
// Set paths
    
$encryptor->setJavaPath('/usr/lib/j2se/1.4/bin/java');
    
$encryptor->setITextPath(dirname(__FILE__) . '/../lib/itext-1.3.jar');
    
// Set meta-data
    
$encryptor->setAuthor('Paul Bunyan');
    
$encryptor->setKeywords('HTML_ToPDF, php, encryption of PDF');
    
$encryptor->setSubject('Example of HTML_ToPDF with Ecnryption');
    
$encryptor->setTitle('Showing its stuff');
    
// Set permissions
    
$encryptor->setAllowPrinting(false);
    
$encryptor->setAllowModifyContents(false);
    
$encryptor->setAllowDegradedPrinting(true);
    
$encryptor->setAllowCopy(true);
    
// Set password
    
$encryptor->setUserPassword('foobar');
    
$encryptor->setOwnerPassword('barfoo');
    
$result $encryptor->encrypt();
    if (
is_a($result'PDFEncryptorException')) {
        die(
$result->getMessage());
    }
}

header('Pragma: public');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Content-Type: application/pdf');
header('Content-Disposition: attachment; filename="example.pdf"');
readfile($linkToPDFFull);
unlink($linkToPDFFull);
?>