You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
I am trying to get the text of this PDF: factuur___0098559514___20170303.pdf.
It results in an error (pdfminer.pdftypes.PDFNotImplementedError: Unsupported filter: PDFObjRef:18). However, on inspecting object PDFObjRef:18 it was found that when running it's resolve method it would generate a one-item list containing a correct filter. When using this obtained filter rather than the object, pdminer runs OK with no error.
How to reproduce: pdf2txt.py -o text.txt factuur___0098559514___20170303.pdf results in an error.
Change made to module pdftypes in order to run it without error:
defdecode(self):
assertself.dataisNoneandself.rawdataisnotNonedata=self.rawdataifself.decipher:
# Handle encryptiondata=self.decipher(self.objid, self.genno, data)
filters=self.get_filters()
ifnotfilters:
self.data=dataself.rawdata=Nonereturnforfinfilters:
params=self.get_any(('DP', 'DecodeParms', 'FDecodeParms'), {})
# ----difference with original `decode` method starts here--------try:
f=f.resolve()[0]
exceptAttributeError:
f=f# ----and ends here-----------------------------------------------iffinLITERALS_FLATE_DECODE:
# will get errors if the document is encrypted.try:
data=zlib.decompress(data)
exceptzlib.error, e:
ifSTRICT:
raisePDFException('Invalid zlib bytes: %r, %r'% (e, data))
data=''eliffinLITERALS_LZW_DECODE:
data=lzwdecode(data)
eliffinLITERALS_ASCII85_DECODE:
data=ascii85decode(data)
eliffinLITERALS_ASCIIHEX_DECODE:
data=asciihexdecode(data)
eliffinLITERALS_RUNLENGTH_DECODE:
data=rldecode(data)
eliffinLITERALS_CCITTFAX_DECODE:
data=ccittfaxdecode(data, params)
eliff==LITERAL_CRYPT:
# not yet..raisePDFNotImplementedError('/Crypt filter is unsupported')
else:
raisePDFNotImplementedError('Unsupported filter: %r'%f)
# apply predictorsif'Predictor'inparams:
pred=int_value(params['Predictor'])
ifpred==1:
# no predictorpasselif10<=pred:
# PNG predictorcolors=int_value(params.get('Colors', 1))
columns=int_value(params.get('Columns', 1))
bitspercomponent=int_value(params.get('BitsPerComponent', 8))
data=apply_png_predictor(pred, colors, columns, bitspercomponent, data)
else:
raisePDFNotImplementedError('Unsupported predictor: %r'%pred)
self.data=dataself.rawdata=Nonereturn
The text was updated successfully, but these errors were encountered:
I am trying to get the text of this PDF:
factuur___0098559514___20170303.pdf.
It results in an error (pdfminer.pdftypes.PDFNotImplementedError: Unsupported filter: PDFObjRef:18). However, on inspecting object PDFObjRef:18 it was found that when running it's
resolve
method it would generate a one-item list containing a correct filter. When using this obtained filter rather than the object, pdminer runs OK with no error.How to reproduce:
pdf2txt.py -o text.txt factuur___0098559514___20170303.pdf
results in an error.Change made to module
pdftypes
in order to run it without error:The text was updated successfully, but these errors were encountered: