-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix getTextContent evaluation to only apply TJ horizontal offsets using numeric items/args #7714
Conversation
Can you provide some form of test (e.g. ref text)? |
You mean a sample PDF that contains an out-of-spec TJ array? Not at the moment, the example I have is from a customer. FWIW, the change is just mirroring what getOperatorList does already. |
Hmm, we might have 'eq' test for it -- it's probably a matter of adding 'text' one |
ab58937
to
54ebf2e
Compare
Updated with a minimized test case. Hopefully I got the manifest entry right. FWIW, here's an easy expression to evaluate once the testcase PDF is loaded in the viewer: PDFViewerApplication.pdfDocument.getPage(1).then((p) => p.getTextContent()).then((tc) => console.log(tc.items[0])); You'll see the sole textItem; its width will be |
@@ -258,3 +258,5 @@ | |||
!annotation-text-widget.pdf | |||
!annotation-choice-widget.pdf | |||
!zero_descent.pdf | |||
!operator-in-TJ-array.pdf | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please remove this newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…ng numeric items/args While the array argument to TJ should only contain strings and numbers, other unfortunate items are found in PDFs in the wild, e.g.: [(Grandes) 0.0 Tc -250.0 (Client\350les,) 0.0 Tc -250.0 (Financements) 0.0 Tc -250.0 (et) 0.0 Tc -250.0 (March\351s) ] TJ getOperatorList already properly ignores any non-string, non-numeric values in TJ arrays; without this patch to getTextContent, returned text items can have NaN widths due to calculations being applied to those non-numeric values.
54ebf2e
to
85c52f1
Compare
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/4aca01e041e53d7/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/4aca01e041e53d7/output.txt Total script time: 1.08 mins Published |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/c795d0674b05ad0/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/a39b4967195cc48/output.txt |
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/a39b4967195cc48/output.txt Total script time: 25.27 mins
Image differences available at: http://107.22.172.223:8877/a39b4967195cc48/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux)FailedFull output at http://107.21.233.14:8877/c795d0674b05ad0/output.txt Total script time: 29.45 mins
Image differences available at: http://107.21.233.14:8877/c795d0674b05ad0/reftest-analyzer.html#web=eq.log |
/botio makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.22.172.223:8877/cd84092ce8d9cfa/output.txt |
From: Bot.io (Linux)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/5810f38e5ab44de/output.txt |
From: Bot.io (Windows)SuccessFull output at http://107.22.172.223:8877/cd84092ce8d9cfa/output.txt Total script time: 25.20 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/5810f38e5ab44de/output.txt Total script time: 28.34 mins
|
Thank you for your contribution! |
Fix getTextContent evaluation to only apply TJ horizontal offsets using numeric items/args
While the array argument to TJ should only contain strings and numbers, other
unfortunate items are found in PDFs in the wild, e.g.:
getOperatorList already properly ignores any non-string, non-numeric values in
TJ arrays; without this patch to getTextContent, returned text items can have
NaN widths due to calculations being applied to those non-numeric values.