Pdfminer incorrectly raises PDFNotImplementedError: Unsupported filtererror #174

marcelhekking · 2017-03-08T10:36:17Z

I am trying to get the text of this PDF:
factuur___0098559514___20170303.pdf.
It results in an error (pdfminer.pdftypes.PDFNotImplementedError: Unsupported filter: PDFObjRef:18). However, on inspecting object PDFObjRef:18 it was found that when running it's resolve method it would generate a one-item list containing a correct filter. When using this obtained filter rather than the object, pdminer runs OK with no error.

How to reproduce:
pdf2txt.py -o text.txt factuur___0098559514___20170303.pdf results in an error.

Change made to module pdftypes in order to run it without error:

    def decode(self):
        assert self.data is None and self.rawdata is not None
        data = self.rawdata
        if self.decipher:
            # Handle encryption
            data = self.decipher(self.objid, self.genno, data)
        filters = self.get_filters()
        if not filters:
            self.data = data
            self.rawdata = None
            return
        for f in filters:
            params = self.get_any(('DP', 'DecodeParms', 'FDecodeParms'), {})

            # ----difference with original `decode` method starts here--------
            try:
                f = f.resolve()[0]
            except AttributeError:
                f = f
            # ----and ends here-----------------------------------------------

            if f in LITERALS_FLATE_DECODE:
                # will get errors if the document is encrypted.
                try:
                    data = zlib.decompress(data)
                except zlib.error, e:
                    if STRICT:
                        raise PDFException('Invalid zlib bytes: %r, %r' % (e, data))
                    data = ''
            elif f in LITERALS_LZW_DECODE:
                data = lzwdecode(data)
            elif f in LITERALS_ASCII85_DECODE:
                data = ascii85decode(data)
            elif f in LITERALS_ASCIIHEX_DECODE:
                data = asciihexdecode(data)
            elif f in LITERALS_RUNLENGTH_DECODE:
                data = rldecode(data)
            elif f in LITERALS_CCITTFAX_DECODE:
                data = ccittfaxdecode(data, params)
            elif f == LITERAL_CRYPT:
                # not yet..
                raise PDFNotImplementedError('/Crypt filter is unsupported')
            else:
                raise PDFNotImplementedError('Unsupported filter: %r' % f)
            # apply predictors
            if 'Predictor' in params:
                pred = int_value(params['Predictor'])
                if pred == 1:
                    # no predictor
                    pass
                elif 10 <= pred:
                    # PNG predictor
                    colors = int_value(params.get('Colors', 1))
                    columns = int_value(params.get('Columns', 1))
                    bitspercomponent = int_value(params.get('BitsPerComponent', 8))
                    data = apply_png_predictor(pred, colors, columns, bitspercomponent, data)
                else:
                    raise PDFNotImplementedError('Unsupported predictor: %r' % pred)
        self.data = data
        self.rawdata = None
        return

The text was updated successfully, but these errors were encountered:

Pique7 mentioned this issue Nov 19, 2024

PDFNotImplementedError: Unsupported filter: [/'FlateDecode'] pdfminer/pdfminer.six#1062

Open

dhdaines mentioned this issue Nov 27, 2024

Resolve filters before checking if it isn't a list dhdaines/playa#22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pdfminer incorrectly raises PDFNotImplementedError: Unsupported filtererror #174

Pdfminer incorrectly raises PDFNotImplementedError: Unsupported filtererror #174

marcelhekking commented Mar 8, 2017

Pdfminer incorrectly raises PDFNotImplementedError: Unsupported filtererror #174

Pdfminer incorrectly raises PDFNotImplementedError: Unsupported filtererror #174

Comments

marcelhekking commented Mar 8, 2017