Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux Segfaults #4504

Closed
BEFH opened this issue Mar 27, 2018 · 17 comments
Closed

Linux Segfaults #4504

BEFH opened this issue Mar 27, 2018 · 17 comments

Comments

@BEFH
Copy link

BEFH commented Mar 27, 2018

We are getting segfaults on some nodes of our cluster, but not others when running several pandoc versions:

Our failing nodes all have one of the following processors:

Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz,
Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz

However, it consistently segfaults on some of these nodes but not on others.

It does not segfault on our AMD Opteron(TM) Processor 6276s.

Kernel versions are as follows:
2.6.32-358.23.2.el6.x86_64,
2.6.32-696.18.7.el6.x86_64
2.6.32-358.23.2.el6.x86_64
2.6.32-358.6.2.el6.x86_64

All nodes are on CentOS 6.2

This occurs with both the system pandoc (1.19.2.1), a modulecmd version (2.0), and the statically linked, pre-compiled version 2.1.3.

↪ pandoc --version
fish: 'pandoc --version' terminated by signal SIGSEGV (Address boundary error)

It also segfaults in bash:

[fultob01@interactive5 ~]$ pandoc --version
Segmentation fault

With my statically linked pandoc, I get a really nice strace. It looks like pandoc is trying to write to 0x4200000000, which is out of bounds, and bode allows the write but shouldn't, so pandoc segfaults when it attempts to read. I have no idea what the solution is for this, but for now, I'll use mothra or manda. Do you have any idea why bode is allowing pandoc to write to that address?

Here's the strace trace:

execve("/hpc/users/fultob01/local/bin/pandoc", ["pandoc"], [/* 54 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x49f0420)      = 0
set_tid_address(0x49f0458)              = 132178
brk(0)                                  = 0x6501000
brk(0x6502000)                          = 0x6502000
brk(0x6505000)                          = 0x6505000
brk(0x6508000)                          = 0x6508000
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
sysinfo({uptime=1493582, loads=[17088, 8768, 6176] totalram=67442647040, freeram=27952783360, sharedram=0, bufferram=101777408} t
brk(0x6519000)                          = 0x6519000
mmap(0x4200000000, 1099512676352, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x4200000000
mmap(0x4200000000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---

On one of our working nodes, it looks like this:

execve("/hpc/users/fultob01/local/bin/pandoc", ["pandoc"], [/* 79 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x49f0420)      = 0
set_tid_address(0x49f0458)              = 113282
brk(0)                                  = 0x5e74000
brk(0x5e75000)                          = 0x5e75000
brk(0x5e78000)                          = 0x5e78000
brk(0x5e7b000)                          = 0x5e7b000
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 1999}, ...}) = 0
sysinfo({uptime=5121149, loads=[10304, 5024, 768] totalram=67440967680, freeram=43350913024, sharedram=0, bufferram=224522240} to
brk(0x5e8c000)                          = 0x5e8c000
mmap(0x4200000000, 1099512676352, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x4200001000, 549756862464, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x4200002000, 274878955520, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x4200003000, 137440002048, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x4200004000, 68720525312, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x4200005000, 34360786944, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x4200005000
munmap(0x4200005000, 1028096)           = 0
@mb21
Copy link
Collaborator

mb21 commented Mar 27, 2018

What OS are you on? Can you try to compile from source on the exact OS you're trying to run it on?

If you're on Fedora, this is probably a duplicate of #4461

@BEFH
Copy link
Author

BEFH commented Mar 27, 2018 via email

@jgm
Copy link
Owner

jgm commented Mar 27, 2018

I agree, it would be helpful to know if the problem persists with pandoc compiled on the target system.
On the systems where you get the segfault, what causes it? Does every pandoc command cause it? You mention pandoc --version. Do you also get the segfault with, say, a simple conversion? Does it matter whether you use -s? People have had similar problems on Windows 7; see #4283.

@BEFH
Copy link
Author

BEFH commented Mar 27, 2018

Could you please suggest some simple commands with the files to use? It segfaults even if I run pandoc with no arguments, and it segfaults from rmarkdown.

On a related note, I'm getting an "error 139" on the working nodes for some files. The command run is as follows:

/sc/orga/projects/LOAD/Brian/anaconda3/bin/pandoc +RTS -K512m -RTS Post_imputation.utf8.md --to html --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash --output /sc/orga/projects/LOAD/Brian/projects/ADSP/data/pre_impute_merge/impute_stats/stats/CHARGE_CHS_impStats.html --smart --email-obfuscation none --self-contained --standalone --section-divs --template /hpc/packages/minerva-common/rpackages/3.4.3/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:bootstrap' --include-in-header /tmp/101010540.tmpdir/RtmpomNOqg/rmarkdown-str799f60520417.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'

I couldn't find error 139 anywhere in the source, but I have read that it might be related to segfaults.

@jgm
Copy link
Owner

jgm commented Mar 27, 2018 via email

@BEFH
Copy link
Author

BEFH commented Mar 27, 2018

So I (with the help of a dedicated sysadmin) have confirmed that when pandoc segfaults immediately, it does so regardless of the simplicity of the command.

I have also confirmed that error 139 is added by GHC and means there is a segfault. Error 139 occurs with some files on the Intel processors that don't immediately segfault. It does not occur on our ancient, slow AMD processors.

Our AMD servers have 256 GB of memory, and the Intel servers have 64 GB.

We installing Haskel Platform 8.2.2 and compiling the latest pandoc from source, as you requested. We will let you know the results when we manage to compile.

@jgm
Copy link
Owner

jgm commented Mar 28, 2018 via email

@BEFH
Copy link
Author

BEFH commented Mar 28, 2018

After native compilation, it no longer segfaults. Instead, this:

pandoc: internal error: Unable to commit 1048576 bytes of memory
    (GHC version 8.2.2 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

@jgm
Copy link
Owner

jgm commented Mar 28, 2018 via email

@nh2
Copy link

nh2 commented Jun 14, 2018

Try run it with gdb --args and see if there's a backtrace from some C code (if you're unlucky, there is none, and it happens straight from Haskell, but if you're lucky, there's some C code in between).

@jgm
Copy link
Owner

jgm commented Oct 12, 2018

Relevant ghc ticket:
https://ghc.haskell.org/trac/ghc/ticket/15054
Looks like a bug still not fixed in ghc 8.6.1.

@billglick
Copy link

Are there any known work arounds to prevent the issue?

I'm running into it on several VMs running RHEL 6 with various memory footprints (24GB, 48GB, 52GB, 64GB, etc.) with 50% or more free memory. I can't figure out why it Cannot allocate memory on some, but works fine on others.

@mahermassoud
Copy link

Having the same issue. Even when i just run pandoc cli command

$ pandoc
pandoc: internal error: Unable to commit 1048576 bytes of memory
    (GHC version 8.10.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
Aborted (core dumped)

@jgm
Copy link
Owner

jgm commented Feb 18, 2021

@mahermassoud please give more information:
How exactly you installed pandoc, what version, what architecture and OS you're using.

I note that the ghc issue linked above is still open.

@mahermassoud
Copy link

@jgm I believe I'm on redhat because yum is installed

$ uname -a
Linux polaris.pbtech 2.6.32-642.11.1.el6.x86_64 #1 SMP Fri Nov 18 19:25:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I'm in a totally new conda environment where i installed r and r studio using
conda install rstudio

No idea which version it is.

Let me know if you need more info

I got a similar issue installing with
pip install pandoc

@jgm
Copy link
Owner

jgm commented Feb 19, 2021

pip install pandoc installs a python library, not the pandoc executable.
Sorry, I can't help more without knowing more details. I have no idea what rstudio is doing to install pandoc, how their pandoc is compiled, etc.

@jgm
Copy link
Owner

jgm commented Oct 3, 2022

Closing ; the upstream ghc issues has been fixed for a long time.

@jgm jgm closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants