Before, it waited until it is really connected. That caused problems,
because the send side will start immediately sending if it is connected.
Especially at high rates (>100k) this was a problem.
As discussed in issue #186 and on IM. This function check every 2048th
cycle if the thread should be canceled.
This also removed the need for 'kill -9' in the integration test.
Before this commit, the Infiniband node recreated the address handle for
the remote node during every cycle. Now, it only creates it directly
after it got ah_attr.
* Meta data was not included in the calculation which determines if
sample should be send inline. This caused errors
* Meta data was not substracted from sample->length on receive side
ib_write() and ib_read() now point to the sequence, ts_origin, and format
members of struct sample in a separate scatter/gather element each.
ib_read() measures the time with time_now() (from villas/timing.h) and
sets all flags at receive side.
Closes#152. As described in #182, we will not rearrange the Queue Pairs
for connected mode. As soon as we test many-to-one connections for the
unrealiable connection, we will look again at this issue.
Prior to this commit, we called rdma_disconnect() and waited for a fixed
amount of time. This check was kind of arbitrary. Now, we keep polling
the receive Completion Queue until ib->conn.available_recv_wrs is zero
and all receive samples are thus given back to the framework.
The QP type is dependend on the port space of the RDMA CM ID. If the
RDMA CM ID is set to TCP, the QP has to be set to a RC. If it is set to
UDP, it has to be set to UD.
Node is now able to send data in RDMA_PS_UDP mode. Right now it creates
a new rdma_cm_id for every connection request. We could/should do this
differently
The node blocks a certain amount of samples to use in its queues.
Before this commit, the only moment to release them to the framwork was
during ib_read()/ib_write().
But, there were a couple of problems. In the following I will take
ib_read() as example, but ib_write() will be analogous.
The first problem was:
1. If a QP disconnect, all Work Requests get invalidated and will be
"flushed" to a Completion Queue.
A possible solution would be, to save them in an intermediate buffer.
We could then "exchange" these samples with the framework as soon as the node
connects again and ib_read() is called again. So, we would get valid
samples from the framwork, post them, and give the "invalidated" samples back.
But, there is a second problem:
2. We cannot assume that ib_read() is ever called again after
ib_disconnect(). This is for example the case if the disconnect is
triggered by ib_stop() and not by an external node that disconnects.
This would result in a memory leak, since the samples would never be
returned to the framework, although the node is stopped.
Because of this second problem, I decided to return all samples with
sample_put() in the disconnect function. An additional benefit is that
this is more convenient than another buffer to temporarily safe the
invalidated samples.