PHP Classes

What is the best PHP search string in pdf class?: Search string in PDF and return page number

Recommend this page to a friend!
  All requests RSS feed  >  What is the best PHP search string in...  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

What is the best PHP search string in pdf class?

A request is featured when there is no good recommended package on the site when it is posted. Edit

Picture of srizoophari by srizoophari - 8 years ago (2016-05-03)

Search string in PDF and return page number

This request is clear and relevant.
This request is not clear or is not relevant.

+5

I need a library or class to search some string in PDF and return the matched string page number.

  • 1 Clarification request
  • 1. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-05-06) Reply

    There are classes to extract PDF to text but also return the original page of the text I am not sure if the existing ones can do it.

    Ask clarification

    2 Recommendations

    PHP PDF to HTML: Convert PDF to HTML using Poppler

    This class can convert PDF to HTML using Poppler program.

    It can take the path of the Poppler program tools and execute several operations to extract information from PDF documents.

    Currently the class can convert whole PDF documents or individual pages to HTML, get the document information, return the page count, etc..

    Several parameters can be configured like the the preferred format of the pictures inside the document, zoom scale, whether to use images or CSS inline within the HTML or as external files, etc..
    This recommendation solves the problem.
    This recommendation does not solve the problem.

    +1

    Picture of Anton N Nikolaev by Anton N Nikolaev package author package author Reputation 215 - 8 years ago (2016-12-02) Comment

    I like it.


    PHP PDF to Text: Extract text contents from PDF files

    This package can extract the text contents from a PDF file using pure PHP code (no external tools are needed).

    It provides the following features:

    - Text is extracted from PDF files as a single text property. Individual page contents are also available separately
    - Text strings can be searched over the whole file contents, or through individual pages
    - Support for multiple character sets: parsed text is returned in UTF8
    - Embedded images can be extracted if desired
    - Several option flags are available to adjust PDF contents processing
    - RTL language processing
    - Basic page layout rendering
    - PDF Form data extraction
    - Ability to extract areas of text as well as line and column contents, using an XML-based capture definitions
    This recommendation solves the problem.
    This recommendation does not solve the problem.

    +3

    Picture of Christian Vigh by Christian Vigh package author package author Reputation 435 - 8 years ago (2016-05-06) Comment

    I have made a class to extract text contents from pdf files ; however it does not take care of the page number. Maybe it could be a first step ?

    • 7 Comments
    • 1. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-05-09) Reply

      It would be better if you could count pages to also give the page number of each text block. Is that difficult?

    • 2. Picture of Christian Vigh by Christian Vigh package author package author - 8 years ago (2016-05-16) in reply to comment 1 by Manuel Lemos Reply

      well, it could range from somewhere between tedious and a nightmare... :-) I'm kidding ; in fact, I already put that on my to-do list when posting my initial answer because, although my original concern was only extracting text, I thought it was a good idea to be able to locate text in the whole document.

      I will add a "Pages" array property that will contain the text of individual pages. I will also add a GetPageOf ( $offset ) that will return the page number given a byte offset in the Text property. And maybe, some methods to simply find the page number(s) of some text.

      I think everything should be ready by the end of this week.

    • 3. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-05-17) in reply to comment 2 by Christian Vigh Reply

      Great. That would make your package innovative. There are already classes to extract text from PDF but none would get the pages of the text objects.

    • 3. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-05-17) in reply to comment 2 by Christian Vigh Reply

      Great. That would make your package innovative. There are already classes to extract text from PDF but none would get the pages of the text objects.

    • 4. Picture of Christian Vigh by Christian Vigh package author package author - 8 years ago (2016-05-20) in reply to comment 3 by Manuel Lemos Reply

      Hi everybody,

      I'm glad to announce that the PdfToText class is now able to retrieve the page number of any text located in a pdf document.

      7 new methods are available to retrieve this information : GetPageFromOffset, text_strpos/text_stripos, document_strpos/document_stripos, and text_match/document_match (see README.md).

      There is also a Pages array property that holds the text contents of individual pages in the document

    • 5. Picture of Manuel Lemos by Manuel Lemos - 8 years ago (2016-05-20) in reply to comment 4 by Christian Vigh Reply

      That is great. I have not seen a package, PHP or other language that could do that.

    • 6. Picture of Marcelo by Marcelo - 6 years ago (2018-07-24) in reply to comment 5 by Manuel Lemos Reply

      Hello Christian, I have very large PDFs (200MB) and I can not extract all the text from them. Would you have any solution for this? Within these PDFs there are images too, so the size is excessive. I just need the text. Your function can read the file but can not process. I await your suggestion.


    Recommend package
    : 
    :