Filling out PDF forms in PHP with UTF-8 encoding

by Edi Modrić -

You might think that working with PDF documents in your PHP application would be a breeze. And you would be right, as long as you are generating the documents from scratch. There are a couple of PHP libraries that are really solid and give you lots of features when creating PDF documents. One of them is TCPDF, a free library (a single class, to be exact) written in pure PHP. However, when it comes to filling out PDF forms, you're out of luck. Not only is there no PHP solution (or I'm not aware of one) for filling out PDF forms seamlessly, but solutions that do exist often do not work 100%.

On a recent project, we had to generate PDF documents from data entered into the application. The application itself is really simple, it has a pure CRUD workflow, and data to enter is just a couple of text fields and dropdowns. However, PDF documents that needed to be generated are couple of pages long with lots of text. Since 95% percent of text was static, there was really no sense in generating it from scratch each and every time. Thus, we decided to use a PDF form, which is nothing more but a PDF template with placeholders for form fields to be filled.

As I already said, there's no quality PHP solution for filling out PDF forms, but there is a command line application called PDFtk. We were happily using PDFtk to fill out our PDF forms for 10 days or so, and everything was working beautifully, up until the moment some of us remembered that we really should test filling the PDF with data containing diacritics (čćžšđ / ČĆŽŠĐ). All hell broke loose. No matter how you create your PDF template (with embedded fonts or without them) you will not get a PDF file that will display UTF-8 correctly. Our agony lasted for a whole week, consisting of lots of googling, upgrading, downgrading, compiling and recompiling PDFtk, searching for other solutions, considering switching to PHP/Java bridge so we can use the current version of iText (PDFtk uses iText, but a much older version). Stack Overflow is, as you might guess, filled with questions about the same problem, but without solution. We can't count how many times have we gone over and over the same Stack Overflow answers trying to make it work.

At last, after litres and litres of coffee and hours of lost sleep, we stumbled upon a simple project, a Java class bearing a symbolic name: PDFFormFillerUTF-8. It is nothing more than a wrapper class around the already mentioned iText just as PDFtk is, but as opposed to PDFtk, it just works. I don't know if it's something specific to how the author used iText, or just the fact that it was working with the latest iText available, the fact is that it worked.

Our application is written in Symfony2 framework, so after compiling the class, using it was really simple with Symfony2 ProcessBuilder component. The class receives a couple of parameters: path to the PDF template, path to the text file with names and values of PDF form fields in the following format:

FieldName1 Field value 1
FieldName2 Field value 2

An optional argument is -flatten which tells the class to make the output PDF read only, and a final parameter which is the file name where the output PDF will be generated. You can also specify that the input with form fields is taken from STDIN and that output PDF is written to STDOUT, so the code to fill the PDF form can go something like this:

$processBuilder = new ProcessBuilder(
    array(
        "java",
        "-jar",
        "/path/to/pdfformfiller/pdfformfiller.jar",
       "/path/to/pdf_template.pdf",
        "-flatten"
    )
);

$process = $processBuilder->getProcess();

// May not be needed, but in our case it was required for PDFFormFiller
// to properly recognize STDIN as UTF-8
$process->setEnv( array( "LANG" => "en_US.UTF-8" ) );

$process->setStdin( $this->createFieldsInput( $fieldValues ) );
$process->run();

if ( !$process->isSuccessful() )
{
    throw new \Exception( $process->getErrorOutput() );
}

$pdfOutput = $process->getOutput();

The createFieldsInput method does nothing but iterate over the hash array ($fieldValues), creating the input for PDFFormFiller:

function createFieldsInput( array $fieldValues )
{
    $data = "";
    foreach ( $fieldValues as $name => $value )
    {
        $data .= $name . " " . $value . PHP_EOL;
    }

    return $data;
}

After executing the code, the $pdfOutput variable will hold the generated PDF document which you can save to disk or send to user for downloading.

Finally, we would like to say thanks to Mr. Nikolay Kitsul who wrote PDFFormFillerUTF-8. If you ever find yourself in Croatia, we owe you a couple of beers.

Comments

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and can only be disabled by changing your browser preferences.