How to convert a webpage to a PDF file or images in Linux
Want to know how to capture a web page and save it as a PDF document or image using the terminal? Fortunately, Linux has a plethora of utilities that you can use to automate the task of converting HTML documents to PDF files and images.
This article will introduce you to wkhtmltopdf and wkhtmltoimage, utilities you need to make your job easier.
How to convert HTML to PDF
If you are looking to capture web pages and convert them to PDF file, wkhtmltopdf utility will help you. Wkhtmltopdf is an open source command line tool used to convert web pages to PDF documents.
Since the tool works without a head in the Linux terminal, you won’t need any web drivers or browser automation framework like Selenium.
Install wkhtmltopdf on Linux
Wkhtmltopdf is not part of the standard packages preinstalled on Linux. You will need to install it manually using your system‘s package manager.
To install wkhtmltopdf on Ubuntu and Debian based distributions:
sudo apt install wkhtmltopdf
On Arch based distributions like Manjaro Linux:
sudo pacman -S wkhtmltopdf
Installing wkhtmltopdf on RHEL-based distributions like Fedora and CentOS is also easy.
sudo dnf install wkhtmltopdf
Basic syntax
The basic command syntax is:
wkhtmltopdf webpage filename
…or Web page is the URL of the web page you want to convert and file name is the name of the output PDF file.
To convert the Google home page to a PDF document:
wkhtmltopdf https://google.com google.pdf
Go out:
Upon opening the PDF file, you will notice that wkhtmltopdf has precisely rendered the web page into a document.
Print multiple copies of the web page
The –copies flag is a lifeline if you want your output file to have multiple copies of the web page. Note that when printing multiple copies, wkhtmltopdf will not generate multiple PDF files, but instead will add additional pages to a single document.
To create three copies of the Google home page:
wkhtmltopdf --copies 3 https://google.com google.pdf
The output PDF file will contain three pages as specified in the aforementioned command.
Add grayscale filter to the output
To add a grayscale filter to the PDF file, use the -g Where –shades of grey flag with the command:
wkhtmltopdf -g https://google.com google.pdf
wkhtmltopdf --grayscale https://google.com google.pdf
Output file:
Change PDF orientation
By default, wkhtmltopdf generates the PDF file in vertical layout, that is to say in portrait. To change this default behavior and capture web pages in landscape mode, use the –orientation flag with the command:
wkhtmltopdf --orientation landscape https://google.com google.pdf
Go out:
Note that the landscape version of the document has a larger area of white space compared to the portrait version.
Do not include images when converting
When generating the output, if you don’t want wkhtmltopdf to render the images present in a web page, use the –no-images flag:
wkhtmltopdf --no-images https://google.com google.pdf
Go out:
How to convert a web page to images
The wkhtmltoimage utility is part of the wkhtmltopdf package. If you are working on a report and want to include images from a website, this tool will work in your favor. The Linux terminal not only makes it easier for you to capture images, but also gives you a range of options for you to customize your output.
Basic syntax
Wkhtmltoimage has a syntax similar to wkhtmltopdf:
wkhtmltoimage webpage filename
…or Web page is the URL of a website and file name is the name of the output image.
Convert a web page to an image
Continuing with the example above, let’s convert the Google homepage to images.
wkhtmltoimage https://google.com google.png
Go out:
You can also specify a custom file format that you want the output image to have. Wkhtmltoimage supports the following file extensions:
For example, if you want to generate a JPG image, just change the file extension to JPG in the command:
wkhtmltoimage https://google.com google.jpg
Capturing Web Pages Using the Linux Terminal
You must have a PDF viewer installed on your Linux system if you want to view PDF files generated by wkhtmltopdf. While most Linux distributions come with a preinstalled PDF editor, you can manually choose and install a PDF editor that suits your needs.
Read more
About the Author
Comments are closed.