How to convert a webpage to a PDF file or images in Linux

Want to know how to capture a web page and save it as a PDF document or image using the terminal? Fortunately, Linux has a plethora of utilities that you can use to automate the task of converting HTML documents to PDF files and images.

This article will introduce you to wkhtmltopdf and wkhtmltoimage, utilities you need to make your job easier.

How to convert HTML to PDF

If you are looking to capture web pages and convert them to PDF file, wkhtmltopdf utility will help you. Wkhtmltopdf is an open source command line tool used to convert web pages to PDF documents.

Since the tool works without a head in the Linux terminal, you won’t need any web drivers or browser automation framework like Selenium.

Install wkhtmltopdf on Linux

Wkhtmltopdf is not part of the standard packages preinstalled on Linux. You will need to install it manually using your system‘s package manager.

To install wkhtmltopdf on Ubuntu and Debian based distributions:

sudo apt install wkhtmltopdf

On Arch based distributions like Manjaro Linux:

sudo pacman -S wkhtmltopdf

Installing wkhtmltopdf on RHEL-based distributions like Fedora and CentOS is also easy.

sudo dnf install wkhtmltopdf

Basic syntax

The basic command syntax is:

wkhtmltopdf webpage filename

…or Web page is the URL of the web page you want to convert and file name is the name of the output PDF file.

To convert the Google home page to a PDF document:

wkhtmltopdf google.pdf

Go out:

convert google homepage to pdf

Upon opening the PDF file, you will notice that wkhtmltopdf has precisely rendered the web page into a document.

google converted pdf file

The –copies flag is a lifeline if you want your output file to have multiple copies of the web page. Note that when printing multiple copies, wkhtmltopdf will not generate multiple PDF files, but instead will add additional pages to a single document.

To create three copies of the Google home page:

wkhtmltopdf --copies 3 google.pdf

The output PDF file will contain three pages as specified in the aforementioned command.

print multiple copies with wkhtmltopdf

Add grayscale filter to the output

To add a grayscale filter to the PDF file, use the -g Where –shades of grey flag with the command:

wkhtmltopdf -g google.pdf
wkhtmltopdf --grayscale google.pdf

Output file:

change output to grayscale

Change PDF orientation

By default, wkhtmltopdf generates the PDF file in vertical layout, that is to say in portrait. To change this default behavior and capture web pages in landscape mode, use the –orientation flag with the command:

wkhtmltopdf --orientation landscape google.pdf

Go out:

use of landscape orientation in wkhtmltopdf

Note that the landscape version of the document has a larger area of ​​white space compared to the portrait version.

Do not include images when converting

When generating the output, if you don’t want wkhtmltopdf to render the images present in a web page, use the –no-images flag:

wkhtmltopdf --no-images google.pdf

Go out:

does not render images in wkhtmltopdf

Related: Best Tools to Edit PDF File Anywhere

How to convert a web page to images

The wkhtmltoimage utility is part of the wkhtmltopdf package. If you are working on a report and want to include images from a website, this tool will work in your favor. The Linux terminal not only makes it easier for you to capture images, but also gives you a range of options for you to customize your output.

Basic syntax

Wkhtmltoimage has a syntax similar to wkhtmltopdf:

wkhtmltoimage webpage filename

…or Web page is the URL of a website and file name is the name of the output image.

Convert a web page to an image

Continuing with the example above, let’s convert the Google homepage to images.

wkhtmltoimage google.png

Go out:

capture web pages in linux images

You can also specify a custom file format that you want the output image to have. Wkhtmltoimage supports the following file extensions:

For example, if you want to generate a JPG image, just change the file extension to JPG in the command:

wkhtmltoimage google.jpg

Related: JPG Vs JPEG: What’s The Difference Between These Image File Formats?

Capturing Web Pages Using the Linux Terminal

You must have a PDF viewer installed on your Linux system if you want to view PDF files generated by wkhtmltopdf. While most Linux distributions come with a preinstalled PDF editor, you can manually choose and install a PDF editor that suits your needs.

The 5 best Linux PDF editors you should try

Need to edit a PDF file in Linux? These Linux PDF editors are free to install and easy to use.

Read more

About the Author

Source link

Comments are closed.