Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MWE with multiple window arrays in one-sided comm #807

Closed
elbert5770 opened this issue Dec 17, 2023 · 7 comments
Closed

MWE with multiple window arrays in one-sided comm #807

elbert5770 opened this issue Dec 17, 2023 · 7 comments

Comments

@elbert5770
Copy link

elbert5770 commented Dec 17, 2023

I suppose this isn't unexpected, but here is a MWE of a case where two types of window arrays are created and the program hangs if line 40 (MPI.free(win[1]) is commented out. It also doesn't hang if an MPI.free(win[1]) is placed before the MPI.free(win_owner_match[i]) loop. The docs are clear that there is no guarantee that MPI.Finalize() will close windows, so that's fine. But it would be interesting if someone had better debugging tools like TotalView to make sure this isn't more serious than it looks.

using MPI

function read_window(target_name,window,window_number,rank,comm)
    if rank == 0
        window[window_number] = MPI.Win_create(target_name, comm)
    else
        null_array = zeros(Int32,1)
        window[window_number] = MPI.Win_create(null_array, comm)
    end

   MPI.Barrier(comm)  

    if rank != 0
        MPI.Win_lock(window[window_number];rank=0,type=:shared)
        MPI.Get!(target_name, window[window_number];rank=0)
        MPI.Win_unlock(window[window_number];rank=0)
    end
end

function main()
    MPI.Init()
    comm = MPI.COMM_WORLD
    rank = MPI.Comm_rank(comm)
    world_size = MPI.Comm_size(comm)
    root = 0
    num_cells = 5

    owner_neighbor_ptr = Array{Int32}(undef,num_cells)

    if rank == root
        owner_neighbor_ptr = Int32[2 2 0 1 1]
    end

    win = Vector{MPI.Win}(undef,1)
    window_number = 1
    read_window(owner_neighbor_ptr,win,window_number,rank,comm)
    @show rank,owner_neighbor_ptr

   # Uncomment next line for program to run successfully
    # MPI.free(win[1])

    win_owner_match = Vector{MPI.Win}(undef,num_cells)
    null_array = zeros(Int32,1)
    
    if rank == root
        match_collection = Vector{Array{Float32}}(undef,num_cells)
        for i in 1:num_cells
            match_collection[i] = rand(i*3)
        end
        @show match_collection
    end
    
    for i in 1:num_cells
        if rank == 0
            if owner_neighbor_ptr[i] > 0
                @show rank,i,match_collection[i]
                win_owner_match[i] = MPI.Win_create(match_collection[i], comm)
            else
                win_owner_match[i] = MPI.Win_create(null_array, comm)
            end
        else
            win_owner_match[i] = MPI.Win_create(null_array, comm)
        end
        MPI.Barrier(comm)
    end
    @show rank,size(win_owner_match)
    count = 0
    if rank == 1
        for owner_case in 1:num_cells
            if owner_neighbor_ptr[owner_case] > 0
                neighbor_matches = Vector{Float32}(undef,owner_case*3)
                MPI.Win_lock(win_owner_match[owner_case];rank=0,type=:shared)
                MPI.Get!(neighbor_matches, win_owner_match[owner_case];rank=0)
                MPI.Win_unlock(win_owner_match[owner_case];rank=0)
                @show owner_case,neighbor_matches
            end
        end
    else
        @show rank
    end

    MPI.Barrier(comm)
# Or, uncomment next line for program to run successfully
     # MPI.free(win[1])
    for i in 1:num_cells
        MPI.free(win_owner_match[i])
    end

    MPI.Finalize()
end

main()
@elbert5770
Copy link
Author

Furthermore, asking for
size(win_owner_match[i])
immediately after
win_owner_match[i] = MPI.Win_create(match_collection[i], comm)
causes the program to hang.

@simonbyrne
Copy link
Member

I suppose this isn't unexpected, but here is a MWE of a case where two types of window arrays are created and the program hangs if line 40 (MPI.free(win[1]) is commented out

I think the issue is that the spec states that MPI_Win_free should be called collectively: if it is left to a GC, then multiple ranks may try to call GC at different times (in this case, probably before or after the MPI.Barrier), which can lead to a deadlock.

This should probably be documented.

@simonbyrne
Copy link
Member

What does

size(win_owner_match[i])

do? I don't think we define a size method for MPI.Win objects.

@elbert5770
Copy link
Author

Yeah, in the documentation for WinCreate, it says, "MPI.free should be called on the Win object once operations have been completed.". I think MUST is a better word than should. The challenge is that the garbage collector will eventually pick it up so it looks like MPI.Finalize is doing cleanup when it is not. These are minor points that cost me time almost entirely because the program hangs instead of crashes.

@simonbyrne
Copy link
Member

Would you mind making a change to the documentation?

We could make it so that if the GC is invoked and the object hasn't been cleaned, then we print a warning?

@elbert5770
Copy link
Author

Change pushed. Perhaps worth checking in the GC, but I'd hate to give up any speed.

@simonbyrne
Copy link
Member

Change pushed. Perhaps worth checking in the GC, but I'd hate to give up any speed.

Where did you push it? Can you open a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants