Skip to content

fix pgbinary window bug#954

Open
Debraheem wants to merge 2 commits intomainfrom
EbF/fix_pgbinary_window_bug
Open

fix pgbinary window bug#954
Debraheem wants to merge 2 commits intomainfrom
EbF/fix_pgbinary_window_bug

Conversation

@Debraheem
Copy link
Copy Markdown
Member

This branch attempts to address some pgbinary bugs raised by @mathren. @mathren can you document here how to reproduce your initial issue.

Debraheem and others added 2 commits March 31, 2026 18:32
If copying the template and setting `pgbinary_flag=.true.` in binary_job
will segfault if there isn't a pgbinary namelist in inlist_project
@VincentVanlaer
Copy link
Copy Markdown
Member

Either this is a placebo and not actually fixing anything, or I am seeing a very different issue (although I have the exact same stack trace as was shown on slack). The problem I am seeing is

  • Initial period is set to 1e99
  • History therefore contains 1e99
  • pgbinary reads the history but uses single precision floats: 1e99 gets casted to INF
  • Plot ylimit calculations break resulting in NaN y limits
  • Deep down in the pgplot code (the NaN just propagates), it casts the limits to integers
  • On my system (or any other Intel based system), the result of the cast is a large negative number.
  • This is used to index an array of pixels, resulting in the segfault.

This patch does not fix that for me (it seems unlikely that it could). The following does however (this just guards the limit calculations against bad numbers):

diff --git a/star/private/pgstar_support.f90 b/star/private/pgstar_support.f90
index 232dada2b..48544ec06 100644
--- a/star/private/pgstar_support.f90
+++ b/star/private/pgstar_support.f90
@@ -854,17 +854,23 @@ contains
       if (use_given_ymin) then
          ymin = given_ymin
       else
-         ymin = minval(yvec(1:npts))
+         ymin = minval(yvec(1:npts), mask=.not. is_bad(yvec(1:npts)))
       end if

       use_given_ymax = abs(given_ymax + 101.0) > 1e-6
       if (use_given_ymax) then
          ymax = given_ymax
       else
-         ymax = maxval(yvec(1:npts))
+         ymax = maxval(yvec(1:npts), mask=.not. is_bad(yvec(1:npts)))
       end if
       dy = ymax - ymin

+      if (is_bad(dy)) then
+         ymax = given_ymax
+         ymin = given_ymin
+         dy = ymax - ymin
+      end if
+
       if (.not. use_given_ymin) ymin = ymin - ymargin * dy
       if (.not. use_given_ymax) ymax = ymax + ymargin * dy

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Oh good catch @VincentVanlaer! I tried it with the default $MESA_DIR/binary/work without changing the period, as I didn't think the value was important! Maybe in my original report I was hitting two problems at once? (Although I was also running on AMD processors, not intel, if that matters)

@VincentVanlaer
Copy link
Copy Markdown
Member

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Original issue (found in MESA 24.08.1):

I'm encountering a weird pgbinary behavior: with Grid1_win_flag = .true. in inlist_pgbinary everything is fine, and I make a binary of period P=1d99 (this was to get single stars in a quick and dirty way with a specific setup). If I turn that flag to .false. I get a segfault from pgbinary. It appears that having pgbinary_flag = .true. in binary job without and pgbinary window and a large period causes the segfault.

Setting pgbinary_flag to .false. causes no problems

I originally thought it was just some interplay between the flags, unrelated to the choice of period -- this seems now incorrect.

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 2, 2026

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

I only tested Eb's fix with the default and found no issues (once a pgbinary is provided in the inlist_project, with no pgbinary namelist I also hit a segfault once I turned on pgbinary_flag in binary_job).

@Debraheem
Copy link
Copy Markdown
Member Author

My fix is just a patch/guard for bad numbers, not the real solution. I believe Vincent has identified the actual source of bad numbers. I can't test though, i can't reproduce on arm.

@VincentVanlaer
Copy link
Copy Markdown
Member

@Debraheem Did your patch fix this issue for you? Cause that's what I am still somewhat confused about, since it doesn't seem that it can fix this issue.

@VincentVanlaer
Copy link
Copy Markdown
Member

Do you mean you only tested Eb's fix with the default inlists, or did you hit the same problem with the default inlists?

I only tested Eb's fix with the default and found no issues (once a pgbinary is provided in the inlist_project, with no pgbinary namelist I also hit a segfault once I turned on pgbinary_flag in binary_job).

Do you have a workdir that reproduces this? I'm trying to get this to happen, but I don't know what exactly your setup is.

@Debraheem
Copy link
Copy Markdown
Member Author

Debraheem commented Apr 3, 2026

Mathieu's original backtrace and files attached
home.zip

Backtrace for this error:

#0  0x7f279f0237a2 in ???
#1  0x7f279f022935 in ???
#2  0x7f279ee5a04f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x7f279ef714bb in __memset_avx2_unaligned_erms
	at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:367
#4  0x7f279f94d6f2 in ???
#5  0x7f279f94de3b in ???
#6  0x7f279f904629 in ???
#7  0x7f279f9074c3 in ???
#8  0x7f279f9076cc in ???
#9  0x7f279f9071af in ???
#10  0x7f279f907742 in ???
#11  0x7f279f93973a in ???
#12  0x442a37 in __pgbinary_support_MOD_show_box_pgbinary
	at ../private/pgbinary_support.f90:406
#13  0x488f49 in __pgbinary_history_panels_MOD_do_history_panels_plot
	at ../private/pgbinary_history_panels.f90:839
[home.zip](https://github.com/user-attachments/files/26450666/home.zip)

#14  0x48cafe in __pgbinary_history_panels_MOD_do_history_panels1_plot
	at ../private/pgbinary_history_panels.f90:94
#15  0x4918e9 in __pgbinary_grid_MOD_grid_plot
	at ../private/pgbinary_grid.f90:421
#16  0x494415 in __pgbinary_grid_MOD_grid1_plot
	at ../private/pgbinary_grid.f90:61
#17  0x42c177 in __pgbinary_MOD_onscreen_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:878
#18  0x42c37d in __pgbinary_MOD_do_pgbinary_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:764
#19  0x42efb6 in __pgbinary_MOD_update_pgbinary_plots
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/make/pgbinary.f90:85
#20  0x42a90a in __run_binary_support_MOD_do_run1_binary
	at ../private/run_binary_support.f90:712
#21  0x40ac4c in __binary_lib_MOD_run1_binary
	at ../public/binary_lib.f90:72
#22  0x40a59c in __run_binary_MOD_do_run_binary
	at /home/mrenzo/Documents/Research/codes/mesa/mesa-24.08.1/binary/job/run_binary.f90:7
#23  0x40a5b8 in binary_run
	at ../src/binary_run.f90:4
#24  0x40a5ef in main
	at ../src/binary_run.f90:2
./rn: line 8: 1522597 Segmentation fault      ./binary
DATE: 2026-03-25
TIME: 18:53:24

@VincentVanlaer
Copy link
Copy Markdown
Member

Hashed it out with @mathren over slack. The reproducer for the pgbinary namelist missing issue is just to add pgbinary_flag = .true. to inlist_project in the default workdir (I was still looking at the modified one, hence my confusion). The reproducer from Eb above is for a second issue which I described in #954 (comment)

@mathren
Copy link
Copy Markdown
Contributor

mathren commented Apr 3, 2026

Just for reference, to get the segfault issue because of the lack of a pgbinary namelist, take $MESA_DIR/binary/work and add in binary_job the line pgbinary_flag=.true. and erase the empty pgbinary namelist that is now present. No other change needed. After Eb's fix, any content of pgbinary namelist would work provided the namelist is there (empty, with a *_win_flag=.true. and .false., all cases with the default finite values of the binary properties worked).

So it appears there may be two problems at once:

  • lack of pgbinary namelist when pgbinary_flag = .true. in binary_job
  • 1d99 turning into NaN and propagating when doing crazy things with the period.

@VincentVanlaer
Copy link
Copy Markdown
Member

I think it is not a segfault for the lack of pgbinary, but it shows a backtrace nonetheless as that is the default when mesa_error is called. I get


Failed while trying to read pgbinary namelist file: inlist_project
Perhaps the following runtime error message will help you find the problem.

At line 1416 of file private/pgbinary_ctrls_io.f90
Fortran runtime error: End of file

Error termination. Backtrace:
#0  0x7f801f02b655 in ???
#1  0x7f801f02c219 in ???
#2  0x7f801f02cd8f in ???
#3  0x7f801f2e8a80 in ???
#4  0x7f801f2e9d5b in ???
#5  0x7f801f2ec9d4 in ???
#6  0x7f801f2ecc73 in ???
#7  0x557cbfae5aad in __pgbinary_ctrls_io_MOD_read_pgbinary_file
	at private/pgbinary_ctrls_io.f90:1416
#8  0x557cbfad680a in __pgbinary_ctrls_io_MOD_read_pgbinary_file
	at private/pgbinary_ctrls_io.f90:1432
#9  0x557cbfa9590e in __pgbinary_MOD_do_read_pgbinary_controls
	at private/pgbinary_full.f90:159
#10  0x557cbfa8fcdc in __run_binary_support_MOD_do_run1_binary
	at private/run_binary_support.f90:714
#11  0x557cbfa6f3e4 in __run_binary_MOD_do_run_binary
	at /home/vincentva/software/mesa/dev/binary/job/run_binary.f90:25
#12  0x557cbfa6ed35 in binary_run
	at src/binary_run.f90:7
#13  0x557cbfa6ed35 in main
	at src/binary_run.f90:3
make: *** [Makefile:18: run] Error 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants