Sunday, 15 January 2012
Converting PDF Files To Text Or HTML From Linux Terminal
Earlier, we saw how we can merge or combine PDF files from terminal. Now, I am sharing two command line tools to convert PDF files to text or html files.
Poppler Utils is a great package of PDF rendering and conversion tools and should be installed before we convert PDF files to text or html files. You can install the poppler-utils issuing the following command in debian based distro. You can install them in your favorite distros using their corresponding package installers.
Now that poppler-utils is installed, we will be able to convert PDF files to text and HTML using pdftotext and pdftohtml command-line tools.
To convert a PDF files to text, we should use pdftotext command. Following is the simplest form of the command for converting a PDF file to text file.
This command also allows you to preserve the original layout in the pdf file using the -layout switch as below:
Similarly, if you wish to convert pages of specific range, you can use -f and -l switches to specify the first and last page to convert to text file. An example below would clarify things where I've choosen to convert pages from 4 to 8 into text.
Check the man page of pdftotext and also see the help for the tool to explore other options as well.
To convert a PDF file to HTML file, you can use the pdftohtml tool available in the poppler package. Before that, I will show how to use pdftotext command to convert the PDF file to HTML file.
Now, using the pdftohtml tool is not that different than pdftotext. A simplest form would be as below:
You can use the same arguments as in the pdftotext for this tool as well for specifying the range. However, -htmlmeta and -layout are only available in pdftotext. I would let you explore more on the pdftohtml tool.
I hope this information is useful for you. :)
Poppler Utils is a great package of PDF rendering and conversion tools and should be installed before we convert PDF files to text or html files. You can install the poppler-utils issuing the following command in debian based distro. You can install them in your favorite distros using their corresponding package installers.
sudo apt-get install poppler-utils
Now that poppler-utils is installed, we will be able to convert PDF files to text and HTML using pdftotext and pdftohtml command-line tools.
PDF to Text
To convert a PDF files to text, we should use pdftotext command. Following is the simplest form of the command for converting a PDF file to text file.
pdftotext file.pdf file.txt
This command also allows you to preserve the original layout in the pdf file using the -layout switch as below:
pdftotext -layout file.pdf file.txt
Similarly, if you wish to convert pages of specific range, you can use -f and -l switches to specify the first and last page to convert to text file. An example below would clarify things where I've choosen to convert pages from 4 to 8 into text.
pdftotext -f 4 -l 8 file.pdf file.txt
Check the man page of pdftotext and also see the help for the tool to explore other options as well.
PDF to HTML
To convert a PDF file to HTML file, you can use the pdftohtml tool available in the poppler package. Before that, I will show how to use pdftotext command to convert the PDF file to HTML file.
pdftotext -f 4 -l 8 -htmlmeta file.pdf file.html
Now, using the pdftohtml tool is not that different than pdftotext. A simplest form would be as below:
pdftohtml file.pdf file.html
You can use the same arguments as in the pdftotext for this tool as well for specifying the range. However, -htmlmeta and -layout are only available in pdftotext. I would let you explore more on the pdftohtml tool.
I hope this information is useful for you. :)
Labels:
linux,
pdf to html,
pdf to text,
pdf tool,
tricks and tips
Bookmark this post:blogger tutorials
Social Bookmarking Blogger Widget |
Converting PDF Files To Text Or HTML From Linux Terminal
2012-01-15T21:23:00+05:45
Cool Samar
linux|pdf to html|pdf to text|pdf tool|tricks and tips|
Subscribe to:
Post Comments (Atom)