The only part of commit d378220c (src/nf-queue.c: cleanup and improve
performance of test program for NF_QUEUE) that actually makes sense is
the increase in receive buffer size. Issuing verdicts for IDs not delivered
to userspace is a wasted effort since the kernel drops packets itself when
netlink message delivery fails. This would actually have been noticed
by a return value of -ENOENT if the result of nfnl_queue_msg_send_verdict()
would have been checked.
Signed-off-by: Patrick McHardy <kaber@trash.net>
- Fixes a bunch of bugs related to ematches
- Adds support for the nbyte ematch
- Adds a bison/flex parser for ematch expressions, expressions
may look like this:
ip.length > 256 && pattern(ip6.src = 3ffe::/16)
documenation on syntax follows
- adds ematch support to the basic classifier (--ematch EXPR)
The netlink message buffer is preallocated to a page and later
expanded as needed. Everything was properly paded and zeroed
out except for the unused part at the end. Use calloc() to
allocate the buffer.
This patch includes various bugfixes in the packet location parser.
Namely it removes two memory leaks if parsing fails. The parser is
correctly quit if an allocation error occurs and it is no longer
possible to add duplicates.
It removes the possibility to differ between net and host byteorder.
This is better done in the actual classifiers as it makes more sense
to specify this together with the value to compare against.
The patch also extends the API to add new packet locations via
rtnl_pktloc_add().
It introduces reference counting, therefore you now have to give
back packet locations with rtnl_pktloc_put() after looking them up
with rtnl_pktloc_lookup(). But you are allowed to keep using them
if the packet location file has been reread.
The packet location file now also understands "eth", "ip", and
"tcp" for "link", "net", and "transport".
A --list option has been added to nl-pktloc-lookup to list all
packet location definitions
A --u32=VALUE option has been added to let nl-pktloc-lookup print
the definition in iproute2's u32 selector style.
A manual page has been written for nl-pktloc-lookup.
Finally, nl-pktloc-lookup has been made installable.
So far all common tc atttributes were accessed via specific functions, i.e.
rtnl_class_set_parent(), rtnl_qdisc_set_parent(), rtnl_cls_set_parent()
which implied a lot of code duplication. Since all tc objects are derived
from struct rtnl_tc and these common attributes are already stored in there
this patch removes all type specific functions and makes rtnl_tc_* attribute
functions public.
rtnl_qdisc_set_parent(qdisc, 10);
becomes:
rtnl_tc_set_parent((struct rtnl_tc *) qdisc, 10);
This patch also adds the following new attributes to tc objects therefore
removing them as tc specific attributes:
- mtu
- mpu
- overhead
This allows for the rate table calculations to be unified as well taking into
account the new kernel behavior to take care of overhead automatically.
Dumping objects as environment variables has never been implemented
completely and only increases the size of the library for no real
purpose. Integration into scripts is better achieved by implementing
a python module anyway.
Adds a cli based tool to add/update traffic classes. This tool requires
each class to be supported via the respetive qdisc module in
pkglibdir/cli/qdisc/$name.so.
Syntax:
nl-class-add --dev eth2 --parent 1: --id 1:1 htb --rate 100mbit
nl-class-add --update --dev eth2 --id 1:1 htb --rate 200mbit
A database to resolve qdisc/class names to classid values and vice versa.
The function rtnl_tc_handle2str() and rtnl_tc_str2handle() will resolve
names automatically.
A CLI based tool nl-classid-lookup is provided to integrate the database
into existing iproute2 scripts.
Adds a cli based tool to add/update/replace qdiscs. This tool requires
each qdisc to be supported via a dynamic loadable module in
pkglibdir/cli/qdisc/$name.so.
So far HTB and blackhole have been implemented.
Syntax:
nl-qdisc-add --dev eth2 --parent root --id 1: htb --r2q=5
nl-qdisc-add --update-only --dev eth2 --id 1: htb --r2q=10
I have a patch against commit d378220c96
extending libnl with a facility to receive generic netlink messages sent
to multicast groups.
Essentially it add one new function genl_ctrl_resolve_grp which
prototype looks like this
int genl_ctrl_resolve_grp(struct nl_sock *sk, const char *family_name,
const char *grp_name)
It resolves the family name and the group name to group id. Then
the returned id can be used in nl_socket_add_membership to subscribe
to multicast messages.
Besides that it adds two more functions
uint32_t nl_socket_get_peer_groups(struct nl_sock *sk)
void nl_socket_set_peer_groups(struct nl_sock *sk, uint32_t groups)
allowing to modify the socket peer groups field. So it's possible to
multicast messages from the user space using the legacy interface.
Looks like there is no way (or I was not able to find one?) to modify
the netlink socket destination group from the user space, when the
group id is greater then 32.
don't try to give the kernel an empty RTA_DST attribute. this would
previously happening on trying to delete the default route as returned
from the kernel. the kernel doesn't add a RTA_DST atttribute, so libnl
does nl_addr_alloc(0) and inserts a zero-length RTA_DST attribute into
the deletion request, which the kernel then refuses with ERANGE.
Signed-off-by: David Lamparter <equinox@diac24.net>
This patch enables out-of-source builds like this
$ cd builddir && src_dir/configure && make
Before this patch there was an error about missing netlink/version.h which
is built by automake in top_builddir rather than top_srcdir which is already
in include search path.
Signed-off-by: Andreas Bießmann <biessmann@corscience.de>
When an alternate kernel header include directory is added in
CPPFLAGS, the libnl build fails. This is because the local copy of
kernel headers is added in AM_CFLAGS, which gets included after
CPPFLAGS in the automake-generated makefile. Switching to AM_CPPFLAGS
fixes the problems.
the patch below adds the possibility to
pass user data to callbacks of type
change_func_t when using the nl_cache_mngr_*
family of functions.
If there is any better way to do this,
without duplicating the code in
cache_mngr.c please let me know.
Without this patch, running alloc / free cache loop will lead to huge memory
leaks on machine with 3000 interfaces with tbf qdiscs.
Here was valgrind output:
==5580== 18,070,728 bytes in 347,514 blocks are definitely lost in loss record
32 of 32
==5580== at 0x4025485: calloc (in /lib/valgrind/vgpreload_memcheck-x86-
linux.so)
==5580== by 0x405F410: tbf_msg_parser (tbf.c:46)
==5580== by 0x405302B: qdisc_msg_parser (qdisc.c:119)
==5580== by 0x4033DC9: nl_cache_parse (cache.c:643)
==5580== by 0x4033E7C: update_msg_parser (cache.c:460)
==5580== by 0x4038A11: nl_recvmsgs (netlink-local.h:112)
==5580== by 0x4034175: __cache_pickup (cache.c:483)
==5580== by 0x40343FF: nl_cache_pickup (cache.c:516)
==5580== by 0x403447D: nl_cache_refill (cache.c:698)
==5580== by 0x4034AB7: nl_cache_alloc_and_fill (cache.c:198)
==5580== by 0x4053216: rtnl_qdisc_alloc_cache (qdisc.c:388)
==5580== by 0x80489DB: main (in /home/root/nltest)
Patch complied and tested for same test case, no more leaks anymore.
compile and link time can reduced, most non-developers don't need these cli tools.
./configure --disable-cli
time make
real 0m40.485s
user 0m33.784s
sys 0m2.793s
./configure
time make
real 0m53.097s
user 0m42.077s
sys 0m4.396s
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
* Fix filename in file header
* If the kernel or netlink socket becomes over loaded,
the kernel starts printing error messages like:
nf_queue: full at 1024 entries, dropping packets(s). Dropped: 1
nf_queue: full at 1024 entries, dropping packets(s). Dropped: 2
nf_queue: full at 1024 entries, dropping packets(s). Dropped: 3
So detect out of order packet ID's and set the NF_ACCEPT verdictÂ,
so they will be removed from the kernel queue.
* increase socket buffer to improve performance
without these changes sending more than 100 KB/s over tcp HTTP lo(localhost)
was difficult on my core2 duo machine, due to so many dropped packets.
After these changes over 150 MB/s was easy.
* improve help text
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Currently two attributes are regarded as different if they are absent in
both objects to be compared. This is obviously incorrect, change to
regard objects as different if an attribute is only present on one of
them or if the attribute data differs.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Rules don't have unique identifiers, so all attributes are compared
by initializing the ID mask to ~0. This doesn't work however since
nl_object_identical verifies whether the ID attributes are actually
present before comparing the objects, which is never the case.
Work around by using the intersection of present attributes when
comparing two rule objects.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Neighbour entries for the same destination may exist on multiple
interfaces. Include the interface in the ID attributes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
When resyncing a cache, there are no delete messages, so they need to
be synthesized for deleted objects.
Signed-off-by: Patrick McHardy <kaber@trash.net>
I've noticed a wrong behavior when setting up some delays in a netem
qdisc. I will try to make the things easier for the reader describing
the calls path.
To set up a delay (or jitter...) I use 'rtnl_netem_set_delay' which
requires an int parameter that tells the delay in micro seconds. Inside
this func, the delay is set up with the help of 'nl_us2ticks', which is
just an arithmetic operation (us * ticks_per_usec), where us is the
input parameter and ticks_per_usec is a global variable initialized in
'get_psched_settings'. And here is the problem:
If this variable is going to be calculated using '/proc/net/psched', I
think the file scan is not done properly.
I don't understand what the meaning of the asterisk is here:
int r = fscanf(fd, "%08x%08x%08x%*08x", &tick, &us, &nom);
if (4 == r && nom == 1000000 && !got_tick)
ticks_per_usec = (double)tick/(double)us;
The execution path never gets in the if statement, because r is always
3, and if the fourth parameter is read (avoiding the asterisk), there is
no variable to store it in, so it comes a segv. In my opinion we can get
rid of the if statement, because I think the proc psched file has always
a fixed format of 4 parameters, and 'nom' is always 1000000
(http://lxr.linux.no/#linux+v2.6.32/net/sched/sch_api.c#L1678).
Find attached a patch I did, if I am correct.
nfnl_queue_msg_send_verdict_payload() will to send the verdict, mark,
and possibly changed payload through the netlink socket.
Add a few docbook comments in other funcs.
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Create new function nl_send_iovec() to be used to send multiple 'struct iovec'
through the netlink socket. This will be used for NF_QUEUE, to send
packet payload of a modified packet.
Refactor nl_send() to use nl_send_iovec() sending a single struct iovec.
Create new function nl_auto_complete() by refactoring nl_send_auto_complete(),
so other functions that call nl_send may also use nl_auto_complete()
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>