-
Notifications
You must be signed in to change notification settings - Fork 56
examples: Add a ping-pong example supporting multiple connections #783
Conversation
1c2d6ce
to
b6d4021
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r1.
Reviewable status: 3 of 6 files reviewed, 12 unresolved discussions (waiting on @ultra-nexus)
examples/10-messages-ping-pong-with-multiple-connections/client.c, line 80 at r1 (raw file):
if (send_ptr == NULL || recv_ptr == NULL) { ret = -1; goto err_peer_delete;
If either send_ptr or recv_ptr is not NULL you have to free it. I know we are closing the app but otherwise, it is hard to track memory leaks e.g. using Valgrind. A good practice.
You can see example-08 for what I would recommend.
examples/10-messages-ping-pong-with-multiple-connections/client.c, line 109 at r1 (raw file):
if (rpma_conn_completion_get(conn, &cmpl)) goto err_conn_disconnect; if (cmpl.op != RPMA_OP_SEND && cmpl.op_status != IBV_WC_SUCCESS) {
Sadly you can not assume the completion for RPMA_OP_SEND will come before RPMA_OP_RECV.
You can either send the message with RPMA_F_COMPLETION_ON_ERROR or you have to make here a more versatile loop for both completions in whatever order they might have come.
examples/10-messages-ping-pong-with-multiple-connections/client.c, line 147 at r1 (raw file):
printf("Fail to send name to server: op=%d status=%d\n", cmpl.op, cmpl.op_status); goto err_conn_disconnect;
Not needed. The label is just after the if-statement.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 75 at r1 (raw file):
/* allocate a memory */ size_t dst_size = MAX_MSG_SIZE * CLIENT_MAX; svr->src_ptr = "What's your name?";
You can not do this. Such a memory is potentially not aligned to the page.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 84 at r1 (raw file):
/* register the memory */ ret = rpma_mr_reg(peer, svr->src_ptr, MAX_MSG_SIZE, RPMA_MR_USAGE_SEND, &svr->src_mr);
I would like to have at least a common naming convention between the client and the server of the same example.
If this memory is used for sending, the client names such a memory send_ptr
and send_mr
.
Please unify the names.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 111 at r1 (raw file):
/* deregister the memory region */ int ret = rpma_mr_dereg(&svr->src_mr); ret |= rpma_mr_dereg(&svr->dst_mr);
When you do such a thing the return value makes no sense so there is no point in writing an exact errno value later on.
Either do the same: ret |= close(svr->epoll);
or make sure the return value makes any sense from the begining.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 211 at r1 (raw file):
/* no completion is ready - continue */ if (ret == RPMA_E_NO_COMPLETION) goto cont;
Why not just return;
?
FYI we accept goto x
only for plain error handling not for actual execution logic.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 241 at r1 (raw file):
goto err; /* we need to wait for client's name */ goto cont;
return?
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 248 at r1 (raw file):
(char *)svr->dst_ptr + clnt->offset); /* this client is done */ goto done;
I'm not yet sure but probably you can mark somehow the client when it is done or has to be disconnected because of an error and handle the cleanup outside of this handler.
Because having this err/done label makes this function messy and it serves two functions so... it probably should be splitted.
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 252 at r1 (raw file):
} else if (cmpl.op == RPMA_OP_SEND && cmpl.op_status == IBV_WC_SUCCESS) { goto cont;
return?
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 264 at r1 (raw file):
done: err: (void) rpma_conn_disconnect(clnt->conn);
I think you probably have to do more than that. Anything from client_delete()
?
examples/10-messages-ping-pong-with-multiple-connections/server.c, line 292 at r1 (raw file):
case RPMA_CONN_ESTABLISHED: printf("Client-[%lu] connected\n", clnt->client_id); client_connected_cnt++;
I liked more the handler-based architecture the base implementation has. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 14 unresolved discussions (waiting on @ultra-nexus)
a discussion (no related file):
Since this example is aimed to be a SRQ example in the future, I think the name should reflect what it is all about e.g. 10-shared-receive-queue.
a discussion (no related file):
Do you expect this example to be merged before the actual SRQ functionality? Or you will be adjusting this example along with developing SRQ and you want both of them to be merged at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 14 unresolved discussions (waiting on @janekmi and @ultra-nexus)
a discussion (no related file):
Previously, janekmi (Jan Michalski) wrote…
Since this example is aimed to be a SRQ example in the future, I think the name should reflect what it is all about e.g. 10-shared-receive-queue.
I once thought I need to create two examples, one for non-SRQ and the other for SRQ. It should be OK to use one example for these two purpose as the code in these two example is almost identical.
a discussion (no related file):
Previously, janekmi (Jan Michalski) wrote…
Do you expect this example to be merged before the actual SRQ functionality? Or you will be adjusting this example along with developing SRQ and you want both of them to be merged at the same time?
I think it would be better merge them at the same time. I'm not sure whether I will update this example during SRQ development. Thanks.
examples/10-messages-ping-pong-with-multiple-connections/client.c, line 109 at r1 (raw file):
Previously, janekmi (Jan Michalski) wrote…
Sadly you can not assume the completion for RPMA_OP_SEND will come before RPMA_OP_RECV.
You can either send the message with RPMA_F_COMPLETION_ON_ERROR or you have to make here a more versatile loop for both completions in whatever order they might have come.
Thanks! This is extremely helpful! I'm currently struggling to fix a strange problem in this example with SRQ enabled when running on real InfiniBand hardware, but fail to figure out what's wrong. After change my code according to this comment, the problem is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 6 files reviewed, 13 unresolved discussions (waiting on @ultra-nexus)
a discussion (no related file):
Previously, ultra-nexus wrote…
I once thought I need to create two examples, one for non-SRQ and the other for SRQ. It should be OK to use one example for these two purpose as the code in these two example is almost identical.
I think examples are to show customers how to use the available features. So, an example showing how to use SRQ is fine. :-) Please rename.
a discussion (no related file):
Previously, ultra-nexus wrote…
I think it would be better merge them at the same time. I'm not sure whether I will update this example during SRQ development. Thanks.
Ok. So I will take a look here from time to time and mark this PR as WIP.
examples/10-messages-ping-pong-with-multiple-connections/client.c, line 109 at r1 (raw file):
Previously, ultra-nexus wrote…
Thanks! This is extremely helpful! I'm currently struggling to fix a strange problem in this example with SRQ enabled when running on real InfiniBand hardware, but fail to figure out what's wrong. After change my code according to this comment, the problem is fixed.
np. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r1.
Reviewable status: 5 of 6 files reviewed, 13 unresolved discussions (waiting on @ultra-nexus)
Codecov Report
@@ Coverage Diff @@
## master #783 +/- ##
==========================================
- Coverage 99.83% 99.82% -0.01%
==========================================
Files 15 15
Lines 1248 1176 -72
==========================================
- Hits 1246 1174 -72
Misses 2 2 |
- This example is based on the example-06 Signed-off-by: Hao Li <[email protected]>
@ultra-nexus, I hope you are doing well. We miss you. I will be pleased to reopen this ancient PR for you when you will be ready. Cheers! |
Signed-off-by: Hao Li [email protected]
This change is