thanks to below posts, and I am able to add on the webpage link address to be printed and present time on the PDF generated, no matter how many pages it has . Convert HTML/webpage to PDF. There are many websites that do not allow to download the content in form of pdf, they either ask to download their premium version . How to convert HTML to PDF using Python - Python provides Conversion will be completed in following 3 Steps from Webpage/HTML to PDF.
|Language:||English, Spanish, French|
|Distribution:||Free* [*Register to download]|
The Pdfcrowd HTML to PDF API lets you convert web pages and HTML files to PDF in your Python applications. Convert Web/HTML to PDF in your Python apps with our fast and reliable HTML to PDF API. Django examples. Free integration, helpful support. There are many approaches for generating PDF in python. pdfkit is one of Generate PDF from URL: The following script gives us the pdf file.
Wait until an element with the tag name table is found. Wait until an element with the tag name table or with the id main-content is found. Parameter Description Default width Set the viewport width in pixels. The viewport is the user's visible area of the page. The value must be in the range Set the viewport height in pixels. Must be a positive integer number.
Allowed values: This mode is based on the standard browser print functionality. The viewport width affects the media min-width and max-width CSS properties.
This mode can be used to choose a particular version mobile, desktop,.. Specifies the scaling mode used for fitting the HTML contents to the print area.
No smart scaling is performed. The viewport width fits the print area width. The HTML contents width fits the print area width. The whole HTML contents fits the print area of a single page.
Set the quality of embedded JPEG images. A lower quality results in a smaller PDF file but can lead to compression artifacts. Specify which image types will be converted to JPEG.
No image conversion is done. Only opaque images are converted to JPEG images. All images are converted to JPEG images. Use 0 to leave the images unaltered.
No change of the source image is done. Protect the PDF with a user password.
When a PDF has a user password, it must be supplied in order to view the document and to perform operations allowed by the access permissions.
Protect the PDF with an owner password. Supplying an owner password grants unlimited access to the PDF including changing the passwords and access permissions. Parameter Description Default title The title. Parameter Description Default subject The subject. Parameter Description Default author The author. Parameter Description Default keywords The string with the keywords.
Display the pages in one column. Display the pages in two columns, with odd-numbered pages on the left. Display the pages in two columns, with odd-numbered pages on the right.
Thumbnail images are visible. Document outline is visible. The page content is magnified just enough to fit the entire width of the page within the window. The page content is magnified just enough to fit the entire height of the page within the window.
The page content is magnified just enough to fit the entire page within the window both horizontally and vertically.
If the required horizontal and vertical magnification factors are different, use the smaller of the two, centering the page within the window in the other dimension. Specify whether to hide the viewer application's tool bars when the document is active. Specify whether to hide the viewer application's menu bar when the document is active.
Specify whether to hide user interface elements in the document's window such as scroll bars and navigation controls , leaving only the document's contents displayed. Specify whether to resize the document's window to fit the size of the first displayed page.
Specify whether to position the document's window in the center of the screen. Specify whether the window's title bar should display the document title. If false , the title bar should instead display the name of the PDF file containing the document. Set the predominant reading order for text to right-to-left. This option has no direct effect on the document's contents or page numbering but can be used to determine the relative positioning of pages when displayed side by side or printed n-up.
Turn on the debug logging. Details about the conversion are stored in the debug log. Returns string - The link to the debug log.
Get the number of conversion credits available in your account. This method can only be called after a call to one of the convertXYZ methods. The returned value can differ from the actual count if you run parallel conversions.
The special value is returned if the information is not available. Returns int - The number of credits. Returns string - The unique job identifier. Returns int - The page count. Returns int - The count of bytes. Tag the conversion with a custom value. The tag is used in conversion statistics. A value longer than 32 characters is cut off. Parameter Description Default tag A string with the custom tag.
It can help to circumvent regional restrictions or provide limited access to your intranet. A client certificate to authenticate Pdfcrowd converter on your web server.
Set a custom user agent HTTP header. It can be usefull if you are behind some proxy or firewall. Parameter Description Default host The proxy hostname. Specifies the number of retries when the HTTP status code is received.
The status code indicates a temporary network issue. This feature can be disabled by setting to 0. Your username at Pdfcrowd.
Your API key.
The address of the web page to convert. The output file path. The path to a local file to convert. The file name must have a valid extension.
The string content to convert. Set to True to disable margins. Set the output page top margin. Set the output page right margin. Set the output page bottom margin. Set the output page left margin. A comma seperated list of page numbers or ranges. The file path to a local watermark PDF file. The file path to a local background PDF file.
Set to True disable loading remote fonts. Set to True to block ads in web pages. The text encoding of the HTML content.
Set the HTTP authentication user name. Set the HTTP authentication password. Set to True to use the print version of the page. The cookie string. Set to True to enable SSL certificate verification. Set to True to abort the conversion.
One or more CSS selectors separated by commas. Set the viewport width in pixels. The rendering mode. The smart scaling mode. The percentage value.
Set to True to disable the intelligent shrinking strategy. The image category. The DPI value. Set to True to create linearized PDF.
Set to True to enable PDF encryption. The user password. The owner password. Set to True to set the no-print flag in the output PDF.
Set to True to set the read-only only flag in the output PDF. Set to True to set the no-copy flag in the output PDF. The title. The subject. The author. Then, we specify the chunk size that we want to download at a time. We have set to bytes, iterate through each chunk, and write the chunks in the file until the chunks finished. The Python shell will look like the following when the chunks are downloading: Not pretty?
Don't worry, we will show a progress bar for the downloading process later. Finally, open the file path specified in the URL and write the content of the page. Now, we can call this function for each URL separately, and we can also call this function for all the URLs at the same time. To install the client module, type the following command: pip install clint Consider the following code: import requests from clint. The only difference is in the for loop. We used the bar method of progress module while writing the content into the file.
The output will look like the following: Download a Webpage Using urllib In this section, we will be downloading a webpage using the urllib. The urllib library is a standard library of Python, so you do not need to install it. The following line of code can easily download a webpage: urllib. The file extension will be. Check the following code: import urllib. Then, we made the request to retrieve the page. Then, you can retrieve the file. Using urllib3 The urllib3 is an improved version of the urllib module.
You can download and install it using pip: pip install urllib3 We will fetch a web page and store it in a text file by using urllib3. Import the following modules: The shutil module is used when working with files. Then, we have the unzip parameter.
If it is True, the downloaded file will be unzipped in the same destination folder. In this example, we download the zip folder, and then, the folder is unzipped.