Implement GET protocol for dependencies#420
Conversation
|
Have you looked at performance at all? I think I've discussed with @bosilca ad nauseam regarding the implications regarding communication prioritization. I might pull a version of this into my branch at some point so I can test.
There could very well be a bug in the GET implementation for the MPI comm engine that has gone undiagnosed since it's not been well-tested. |
|
I implemented this to have a better baseline in the comparison with TTG (which does GET instead of PUT). As far as I remember, there was little to no benefit in terms of performance (didn't get worse though). |
That's relevant! How much did you scale and were you using George's hypotheses regarding this are, if I recall correctly, along the lines that a sender can more easily regulate how much data it pushes onto the network than the receiver—with a GET protocol the sender doesn't have as much ability to prioritize communications, so multiple receivers can end up competing for a sender's bandwidth. On the other hand, a PUT protocol can overwhelm a receiver with many incoming messages, but that shouldn't be the case for most PaRSEC applications since the receiver also regulates which data it requests to be sent. |
83623c1 to
1b856ca
Compare
|
Sweet, it seems to have been a problem with the termination detection @therault :) All checks pass now |
Adds runtime_comm_get MCA parameter to enable use of the GET protocol. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
1b856ca to
edbd8fe
Compare
|
Argh of course not, CI doesn't test the GET protocol. Test fail if run with |
I think I might have found the bug. When a process gets too many internal GET AMs (or is asked to do too many PUTs), then it defers starting the Like I said, no one has tested GET really. |
|
Sigh, thanks for checking @omor1. I'm fairly sure I had it working at some point before the big merge. The MPI backend is still student research quality, at best. I hate the fact that we're putting data into random fields, makes the code unmaintainable. I guess It's good to have pointer though, will have to take a closer look at it again... |
"research quality" The LCI backend is better, in my humble opinion, and is certainly better-documented. It still has a decent amount of jankiness from various things I tried and haven't fully ripped out, but is more maintainable. |
| parsec_type_size(dtt, &dtt_size); | ||
| parsec_ce.mem_register(dataptr, PARSEC_MEM_TYPE_CONTIGUOUS, | ||
| -1, NULL, | ||
| dtt_size, |
There was a problem hiding this comment.
aren't we missing the count here ? parsec_type_size returns the size of the dtt type but it does not account for the nbdtt, so we need to scale it up for the mem_register in the contiguous case.
There was a problem hiding this comment.
Yes—this has long been fixed on my branch, see 948aa58.
There was a problem hiding this comment.
@omor1 if you have a fix, would you mind upstreaming them to this branch?
There was a problem hiding this comment.
@omor1 any fixes that you have are more than welcomed.
| /* Retreive deps from callback_data */ | ||
| remote_dep_cb_data_t *cb_data = (remote_dep_cb_data_t *)msg; | ||
| parsec_remote_deps_t* deps = cb_data->deps; | ||
| parsec_execution_stream_t* es = &parsec_comm_es; |
| ce->mem_unregister(cb_data->memory_handle); | ||
| parsec_thread_mempool_free(parsec_remote_dep_cb_data_mempool->thread_mempools, cb_data); | ||
|
|
||
| parsec_comm_puts--; |
There was a problem hiding this comment.
I don't think that any of parsec_comm_gets_max, parsec_comm_gets, parsec_comm_puts_max, and parsec_comm_puts are actually being used in any way, so the point is moot—the number of concurrent communications is being managed by each communication engine, not at the upper layer.
| parsec_type_size(dtt, &dtt_size); | ||
| parsec_ce.mem_register(PARSEC_DATA_COPY_GET_PTR(deps->output[k].data.data), PARSEC_MEM_TYPE_CONTIGUOUS, | ||
| -1, NULL, | ||
| dtt_size, |
There was a problem hiding this comment.
same comment as above, we need to account for nbddt in the contiguous case.
| receiver_memory_handle, | ||
| receiver_memory_handle_size ); | ||
|
|
||
| // TODO: fix the profiling! |
There was a problem hiding this comment.
still TODO's left in the code.
Avoids a round-trip by directly fetching data when a dependency release arrives.
Adds
runtime_comm_getMCA parameter to enable use of the GET protocol.Currently enabled to have it worked by CI. I'm not sure I am using the right datatypes since the reshape and redistribute tests are failing...
Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu