From Web to Print: Master HTML to PDF Conversion in Python

Have you ever had the experience of screenshotting webpages manually in order to save them in document formats? It becomes quite tiresome. Developers usually have the problem of making dynamic reports, invoices, or receipts for users. Luckily, this problem can be solved with a simple conversion from HTML to PDF using Python. In this article, I will walk you through the most effective ways to convert your web designs into printable documents.

Why Automate PDF Generation?

Consider that you are running an online shop. When you sell something to your client, they will automatically want you to send them an invoice instantly. You can’t simply type out an invoice for every sale you make manually. What you actually do is create an HTML template for it.

With the help of a Python program, the customer’s information is embedded into the HTML, which immediately enables the conversion of the information into a downloadable PDF document. This method is ideal for the creation of monthly bank statements, event tickets, or report cards.

Setting Up Your Environment

We will focus on a popular method using pdfkit. This library acts as a bridge between Python and a command-line tool called wkhtmltopdf. Think of wkhtmltopdf as a web browser that renders a page but saves it as a PDF file instead of showing it on your screen.

First, you must install the Python wrapper. Open your terminal and run the following command:

pip install pdfkit

You also need to install the wkhtmltopdf tool on your operating system (Windows, Mac, or Linux) separately. Python needs this engine to perform the rendering work.

Simple Python wkhtmltopdf HTML to PDF Example

Let’s start with a basic example. We will take a live website URL and save it as a PDF file on your computer. This is useful for archiving web pages or creating offline copies of documentation.

Here is the code to get the job done:

import pdfkit

pdfkit.from_url('https://codewolfy.com/', 'codewolfy_homepage.pdf')

This script connects to codewolfy website, renders the page, and saves it as codewolfy_homepage.pdf in your current directory. It is that simple.

Generate PDF from HTML Python PDFKit Method

Sometimes you do not have a live URL. You might have raw HTML code stored in a variable or a database. You can generate a PDF directly from this string.

With this method, you to create dynamic documents where you change specific text, like names or dates, before conversion.

import pdfkit

html_content = """
    <h1>Invoice #101</h1>
    <p>Customer: John Doe</p>
    <p>Total: $50.00</p>
"""

pdfkit.from_string(html_content, 'invoice.pdf')

The code above takes the HTML string and writes it to invoice.pdf. You can now build complex strings using Python’s formatting features and print them out instantly.

Handling Configuration Paths

Sometimes, Python cannot find the wkhtmltopdf installation on your computer. If you see an error stating the executable is missing, you need to tell pdfkit exactly where it lives.

You create a configuration object and pass it to the function. This ensures your code runs smoothly regardless of where the software is installed.

import pdfkit

path_to_wkhtmltopdf = r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=path_to_wkhtmltopdf)

pdfkit.from_url('https://www.example.com', 'example.pdf', configuration=config)

Same goes for other operating system like Ubuntu or MacOS.

Create PDF from HTML in Python Using Local Files

Keeping your design separate from your code creates a cleaner project structure. Web designers often prefer working on a standalone HTML file rather than editing a messy string variable inside a Python script. This method shines when you have complex templates like a quarterly business report or a markupdetailed product catalog.

You simply save your data as a standard HTML file. The library then reads this file from your hard drive and processes the conversion. Let’s create new html file called monthly_report.html and store below code into it.

<!DOCTYPE html>
<html>

<head>
    <meta charset="UTF-8">
    <style>
        body { font-family: sans-serif; padding: 20px; }
        table { width: 100%; border-collapse: collapse; margin-top: 20px; }
        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
        th { background-color: #f2f2f2; }
        h1 { color: #333; }
    </style>
</head>

<body>
    <h1>Monthly Sales Report</h1>
    <p>This document summarizes the sales performance for October. The table below outlines the products sold, quantities, and revenue generated during this period.</p>
    <table>
        <tr>
            <th>Product Name</th>
            <th>Quantity</th>
            <th>Unit Price</th>
            <th>Total Revenue</th>
        </tr>
        <tr>
            <th>Wireless Mouse</th>
            <td>150</td>
            <td>$25.00</td>
            <td>$3,750.00</td>
        </tr>
        <tr>
            <th>Mechanical Keyboard</th>
            <td>80</td>
            <td>$85.00</td>
            <td>$6,800.00</td>
        </tr>
        <tr>
            <th>USB-C Hub</th>
            <td>200</td>
            <td>$40.00</td>
            <td>$8,000.00</td>
        </tr>
    </table>
</body>

</html>

Once, HTML code is generated our python script will be minimal.

import pdfkit

pdfkit.from_file('monthly_report.html', 'report_output.pdf')

The script locates monthly_report.html in your project folder and produces report_output.pdf immediately. It maintains all the CSS styles and formatting defined in your original HTML file.

Conclusion

Creating automated documents is a skill that all developers should know. We have demonstrated the use of HTML to PDF conversion in Python programming. This is an efficient means of completing a whole range of tasks, such as website backup and automated invoices. Knowing the python HTML to PDF library ecosystem will give you the capability to develop robust applications to better serve your customers. Do not spend your time doing things by hand and instead opt to automate document processing.

Codewolfy