What is the best PHP pdf to text class? #pdf to text
Edit
by Menaka Kariyawasam - 9 years ago (2015-11-03)
pdf to text format in php
| How to convert .pdf file to .txt format in PHP? |
- 1 Clarification request
1. by Asheenah - 7 years ago (2017-09-01) Reply
PDF to text parsing is fine. But Spaces between the words and some characters like '-' are missing? for eg '-' is replaced by 'T'. Is there any other method to resolve?
Ask clarification
3 Recommendations
PHP PDF to HTML: Convert PDF to HTML using Poppler
This class can convert PDF to HTML using Poppler program.
It can take the path of the Poppler program tools and execute several operations to extract information from PDF documents.
Currently the class can convert whole PDF documents or individual pages to HTML, get the document information, return the page count, etc..
Several parameters can be configured like the the preferred format of the pictures inside the document, zoom scale, whether to use images or CSS inline within the HTML or as external files, etc..
| by Anton N Nikolaev package author 215 - 8 years ago (2016-12-02) Comment I like it! |
PHP PDF to Text: Extract text contents from PDF files
This package can extract the text contents from a PDF file using pure PHP code (no external tools are needed).
It provides the following features:
- Text is extracted from PDF files as a single text property. Individual page contents are also available separately
- Text strings can be searched over the whole file contents, or through individual pages
- Support for multiple character sets: parsed text is returned in UTF8
- Embedded images can be extracted if desired
- Several option flags are available to adjust PDF contents processing
- RTL language processing
- Basic page layout rendering
- PDF Form data extraction
- Ability to extract areas of text as well as line and column contents, using an XML-based capture definitions
| by Christian Vigh package author 435 - 8 years ago (2016-11-11) Comment This class should answer your needs ; it is constantly evolving and provides advanced features. Just have a try, it is easy to use... |
- 1 Comment
1. by Rodrigo Garcia - 7 years ago (2017-07-19) Reply
Hi Christian.
Your class is EXCELLENT.
Thanks very much by your help.
In any casses,
this class generate error 500 in Apache + CentOs; I test in more of two servers LINUX and result is the same: "Error 500".
By example:
khnoc.com/www/ERROR_500.pdf
You can see is document PDF 100% valid.
I try resolve but really I can't.
I writte to {christian.vigh@wuthering-bytes.com} but never answer I recived.
Thanks by any help and congratulations by your development.
:)
This class can convert DOCX, DOC, PDF files to plain text.
It can read files in either Microsoft Word DOCX and DOC formats or PDF and parse the files to extract text they contain.
The text extracted from the documents is returned as a plain text string.
| by Dave Smith 7620 - 9 years ago (2015-11-03) Comment This is one of the newer innovation award nominees and its primary use is to extract plain text from DOC or PDF files.
Dave |