Showing posts with label pdf to text. Show all posts
Showing posts with label pdf to text. Show all posts
Sunday, 15 January 2012
Converting PDF Files To Text Or HTML From Linux Terminal
Earlier, we saw how we can merge or combine PDF files from terminal. Now, I am sharing two command line tools to convert PDF files to text or html files.
Poppler Utils is a great package of PDF rendering and conversion tools and should be installed before we convert PDF files to text or html files. You can install the poppler-utils issuing the following command in debian based distro. You can install them in your favorite distros using their corresponding package installers.
Now that poppler-utils is installed, we will be able to convert PDF files to text and HTML using pdftotext and pdftohtml command-line tools.
To convert a PDF files to text, we should use pdftotext command. Following is the simplest form of the command for converting a PDF file to text file.
This command also allows you to preserve the original layout in the pdf file using the -layout switch as below:
Similarly, if you wish to convert pages of specific range, you can use -f and -l switches to specify the first and last page to convert to text file. An example below would clarify things where I've choosen to convert pages from 4 to 8 into text.
Check the man page of pdftotext and also see the help for the tool to explore other options as well.
To convert a PDF file to HTML file, you can use the pdftohtml tool available in the poppler package. Before that, I will show how to use pdftotext command to convert the PDF file to HTML file.
Now, using the pdftohtml tool is not that different than pdftotext. A simplest form would be as below:
You can use the same arguments as in the pdftotext for this tool as well for specifying the range. However, -htmlmeta and -layout are only available in pdftotext. I would let you explore more on the pdftohtml tool.
I hope this information is useful for you. :)
Read more...
Poppler Utils is a great package of PDF rendering and conversion tools and should be installed before we convert PDF files to text or html files. You can install the poppler-utils issuing the following command in debian based distro. You can install them in your favorite distros using their corresponding package installers.
sudo apt-get install poppler-utils
Now that poppler-utils is installed, we will be able to convert PDF files to text and HTML using pdftotext and pdftohtml command-line tools.
PDF to Text
To convert a PDF files to text, we should use pdftotext command. Following is the simplest form of the command for converting a PDF file to text file.
pdftotext file.pdf file.txt
This command also allows you to preserve the original layout in the pdf file using the -layout switch as below:
pdftotext -layout file.pdf file.txt
Similarly, if you wish to convert pages of specific range, you can use -f and -l switches to specify the first and last page to convert to text file. An example below would clarify things where I've choosen to convert pages from 4 to 8 into text.
pdftotext -f 4 -l 8 file.pdf file.txt
Check the man page of pdftotext and also see the help for the tool to explore other options as well.
PDF to HTML
To convert a PDF file to HTML file, you can use the pdftohtml tool available in the poppler package. Before that, I will show how to use pdftotext command to convert the PDF file to HTML file.
pdftotext -f 4 -l 8 -htmlmeta file.pdf file.html
Now, using the pdftohtml tool is not that different than pdftotext. A simplest form would be as below:
pdftohtml file.pdf file.html
You can use the same arguments as in the pdftotext for this tool as well for specifying the range. However, -htmlmeta and -layout are only available in pdftotext. I would let you explore more on the pdftohtml tool.
I hope this information is useful for you. :)
Read more...
Converting PDF Files To Text Or HTML From Linux Terminal
2012-01-15T21:23:00+05:45
Cool Samar
linux|pdf to html|pdf to text|pdf tool|tricks and tips|
Comments
Labels:
linux,
pdf to html,
pdf to text,
pdf tool,
tricks and tips
Bookmark this post:blogger tutorials
Social Bookmarking Blogger Widget |
Subscribe to:
Posts (Atom)