1
0
Fork 0
mirror of https://git.rwth-aachen.de/acs/public/villas/node/ synced 2025-03-16 00:00:02 +01:00
Commit graph

74 commits

Author SHA1 Message Date
Sonja Kolen
eabd3dbb32 node infiniband: fixed a typo 2018-08-08 09:50:05 +02:00
Dennis Potter
0cd4e07173 Fixed another small bug in Infiniband node.
Both RC and UC are connected. So the check I changed should check for
NOT UDP (UD), instead of TCP (RC)
2018-08-07 17:52:45 +02:00
Dennis Potter
33d59938eb Added one more comment line on custom librdmacm 2018-08-07 17:36:29 +02:00
Dennis Potter
ec60f1d2c2 Added support for unreliable connections 2018-08-07 16:45:05 +02:00
Dennis Potter
71134a4c81 Node now already posts Work Receives if it accepts the connections.
Before, it waited until it is really connected. That caused problems,
because the send side will start immediately sending if it is connected.
Especially at high rates (>100k) this was a problem.
2018-08-04 17:34:52 +02:00
Dennis Potter
027555c34d Added flag in config to enable/disable connection fallback. Closes #188 2018-08-02 10:41:37 +02:00
Dennis Potter
aefe40dc35 Removed IMM mode. Replaced hard coded integers by constants 2018-08-01 18:26:42 +02:00
Dennis Potter
b96a55237b Added pthread_testcancel() in ib_read()
As discussed in issue #186 and on IM. This function check every 2048th
cycle if the thread should be canceled.

This also removed the need for 'kill -9' in the integration test.
2018-07-26 15:46:41 +02:00
Dennis Potter
45ddebf5d1 Performance improvement for UDP
Before this commit, the Infiniband node recreated the address handle for
the remote node during every cycle. Now, it only creates it directly
after it got ah_attr.
2018-07-25 18:51:28 +02:00
Dennis Potter
dfd694bdc2 Some small changes in default settings and warning 2018-07-25 16:22:34 +02:00
Dennis Potter
427d715279 Fixed two minor bugs in Infiniband node 2018-07-23 16:52:38 +02:00
Dennis Potter
6296d4217e Added a few corrections for sample size
* Meta data was not included in the calculation which determines if
  sample should be send inline. This caused errors
* Meta data was not substracted from sample->length on receive side
2018-07-21 13:11:46 +02:00
Dennis Potter
591f9f73bd Added meta data in transfer
ib_write() and ib_read() now point to the sequence, ts_origin, and format
members of struct sample in a separate scatter/gather element each.

ib_read() measures the time with time_now() (from villas/timing.h) and
sets all flags at receive side.
2018-07-21 12:52:25 +02:00
Dennis Potter
2c3ddfd0c2 Merge branch 'ib-rearrange-qp' into develop
Closes #152. As described in #182, we will not rearrange the Queue Pairs
for connected mode. As soon as we test many-to-one connections for the
unrealiable connection, we will look again at this issue.
2018-07-21 12:14:25 +02:00
Dennis Potter
8704683bf2 Replaced send WC stack by queue 2018-07-21 12:07:43 +02:00
Dennis Potter
0e6d962c1a Merge branch 'ib-rearrange-qp' of git.rwth-aachen.de:acs/public/villas/VILLASnode into ib-rearrange-qp 2018-07-20 23:42:34 +02:00
Dennis Potter
a5068e28ea Replace sleep by a better check
Prior to this commit, we called rdma_disconnect() and waited for a fixed
amount of time. This check was kind of arbitrary. Now, we keep polling
the receive Completion Queue until ib->conn.available_recv_wrs is zero
and all receive samples are thus given back to the framework.
2018-07-20 23:40:43 +02:00
Dennis Potter
8029c47113 Fixed bug which arrised in afb8b57156 2018-07-20 23:34:52 +02:00
Dennis Potter
be87846a5a Fixed way of iterating scatter/gather list 2018-07-20 22:55:33 +02:00
Dennis Potter
afb8b57156 Removed option to manually add QP type
The QP type is dependend on the port space of the RDMA CM ID. If the
RDMA CM ID is set to TCP, the QP has to be set to a RC. If it is set to
UDP, it has to be set to UD.
2018-07-19 20:42:20 +02:00
Dennis Potter
3acc3df7c4 ib_read() now works for UDP
Node is now able to send data in RDMA_PS_UDP mode. Right now it creates
a new rdma_cm_id for every connection request. We could/should do this
differently
2018-07-19 20:33:41 +02:00
Dennis Potter
2b323c3781 Fixed a bug at the send side of UDP 2018-07-19 18:47:27 +02:00
Dennis Potter
cfa93292b0 Added support for RDMA_PS_UDP at send side 2018-07-19 18:32:06 +02:00
Dennis Potter
2bee7d24dd Added rdma_event_str()
This replaces the manual translation of enumerations in the switch
statements.
2018-07-17 11:10:05 +02:00
Dennis Potter
3df5d37b15 Added warning if not all samples are returned 2018-07-16 17:10:52 +02:00
Dennis Potter
0230389593 Node gives back samples to framework at disconnect
The node blocks a certain amount of samples to use in its queues.
Before this commit, the only moment to release them to the framwork was
during ib_read()/ib_write().

But, there were a couple of problems. In the following I will take
ib_read() as example, but ib_write() will be analogous.

The first problem was:
1. If a QP disconnect, all Work Requests get invalidated and will be
   "flushed" to a Completion Queue.

A possible solution would be, to save them in an intermediate buffer.
We could then "exchange" these samples with the framework as soon as the node
connects again and ib_read() is called again. So, we would get valid
samples from the framwork, post them, and give the "invalidated" samples back.

But, there is a second problem:
2. We cannot assume that ib_read() is ever called again after
   ib_disconnect(). This is for example the case if the disconnect is
   triggered by ib_stop() and not by an external node that disconnects.

   This would result in a memory leak, since the samples would never be
   returned to the framework, although the node is stopped.

Because of this second problem, I decided to return all samples with
sample_put() in the disconnect function. An additional benefit is that
this is more convenient than another buffer to temporarily safe the
invalidated samples.
2018-07-16 13:41:53 +02:00
0406c46bb4 fix indention of infiniband node 2018-07-16 11:00:15 +02:00
60f55ec178 improve naming of struct node_type function pointers (closes #150) 2018-07-16 11:00:15 +02:00
Dennis Potter
d9080fa1db Cleaned up some obsolete code. Closes #176 2018-07-16 10:54:15 +02:00
Dennis Potter
d64d1e6f37 Implemented ib_reverse()
The only value that gets reversed is src- and dst address. Fixes #175.
2018-07-16 10:35:48 +02:00
Dennis Potter
eb55dee920 Added fallback which sets mode to listening mode
Before, the node would throw an error as soon as it cannot connect to
the remote host. Now, it will throw a warning and switch to listening
mode (in which it will wait for another node to connect).
2018-07-15 16:37:52 +02:00
Dennis Potter
6444a9e337 Split configuration in an in and out part 2018-07-15 13:51:18 +02:00
Dennis Potter
6975854376 Solved problem with blocking thread
In the case that a node was already disconnected but not stopped,
rdma_cm_get_event always blocked and we coulnd't join the threads. This
is solved in this commit by registering SIGUSR1 to the CM event thread.

This bug originated in issue #152
2018-07-14 16:46:23 +02:00
Dennis Potter
95e2a7c20b Added extra check for inline mode. Closes #164
The actual maximum size for inline mode is now returned to the user and
there is a check that inline_mode is either 0 or 1. Furthermore, this
commit includes a minor improvement in ib_write()
2018-07-14 15:20:24 +02:00
Dennis Potter
1d6ee5aec8 Node can determine if data should be send inline
The user can set the maximum size of the inline data and the node checks
if a sample can be send inline. This commit doesn't contain a info
message to the user about what the final max inline size will be. (The
HCA will probably change the value set by the user.)
2018-07-13 13:50:30 +02:00
Dennis Potter
b1b778f542 Added buffer to ib_write()
Now, ib_write() reads every cycle cnt values from the Completion Queue.
If it is not able to return them to the framework immediately, it
temporarily saves them on a stack.

ib_write() checks every cycle if the stack is non-empty and if it is
possible to return values from the stack to the framework.
2018-07-13 12:21:59 +02:00
Dennis Potter
4cd8fc7150 ib_write and ib_read handle memory in a way that the pool doesn't underrun now. Sending data inline is broken in this commit 2018-07-12 17:49:17 +02:00
Dennis Potter
72e627b327 Fixes #166, all node interfaces are modified
The functions now look like this

int node_read(struct node *n, struct sample *smps[], unsigned cnt, unsigned *release);
int node_write(struct node *n, struct sample *smps[], unsigned cnt, unsigned *release);

This commit enables nodes to control how many samples will
be released by the framework through *release
2018-07-11 18:14:29 +02:00
Dennis Potter
29ff75fad3 Removed some obsolete code for completion queues 2018-07-08 15:32:28 +02:00
Dennis Potter
746fd2f694 Refactored ib_write in the same way as ib_read (described in #153). Merged separate completion queue polls to ib_write. Closes #167 2018-07-08 15:00:47 +02:00
Dennis Potter
ebb5446305 Refactored the way ib_read() handles the refence counts for the samples it uses. This is based on the algorithm described in issue #153 2018-07-08 14:05:48 +02:00
Dennis Potter
6150a36411 Changed all node_write() functions 2018-07-07 17:48:07 +02:00
Dennis Potter
4663f55e4b Changed all node_read() functions to support a *cnt instead of cnt 2018-07-07 17:07:45 +02:00
Dennis Potter
06e7434d6c Solved some state problems. This commit also solves #154, which was caused by a non-terminated thread. (This thread will be removed in a later commit anyway 2018-07-07 15:34:07 +02:00
Dennis Potter
836adee4d6 Node is able to clean everything up and reconnect. Node can abort if it is in STARTED and in CONNECTED state 2018-07-07 14:36:23 +02:00
Dennis Potter
80da4801e1 Source and target successfully connect and node changes status from STATE_STARTED to STATE_CONNECTED in this commit. Next step will be to fix ib_stop and ib_disconnect to make the target able to accept new connections. 2018-07-07 13:08:08 +02:00
Dennis Potter
e2061e58fc Events are now monitored in a separate thread. The segmentation faults we saw earlier were caused because we exited ib_start before we created a protection domain, which is used by memory_allocation 2018-07-07 12:49:22 +02:00
Dennis Potter
2bf122991c Started to convert the RDMA_CM_EVENT loop to a separate thread and added a new state to the node. This commit is still broken 2018-07-05 18:26:32 +02:00
Dennis Potter
43dc305fde Placed sanity checks to a separate function ib_check. Closes #151 2018-07-05 15:30:33 +02:00
Dennis Potter
f976ce5418 Added debug messages with different verbosity levels 2018-07-05 13:57:25 +02:00