table of contents
- NAME
 - SYNOPSIS
 - DESCRIPTION
 - ARCHITECTURE
 - ENTERING AND EXITING NETMAP MODE
 - DATA STRUCTURES
 - RINGS, BUFFERS AND DATA I/O
 - SLOTS AND PACKET BUFFERS
 - SCATTER GATHER I/O
 - IOCTLS
 - SELECT, POLL, EPOLL, KQUEUE.
 - LIBRARIES
 - SUPPORTED DEVICES
 - SYSCTL VARIABLES AND MODULE PARAMETERS
 - SYSTEM CALLS
 - EXAMPLES
 - SEE ALSO
 - AUTHORS
 - CAVEATS
 
| NETMAP(4) | Device Drivers Manual | NETMAP(4) | 
NAME¶
netmap —
VALE
netmap pipes
SYNOPSIS¶
device netmap
DESCRIPTION¶
netmap is a framework for extremely fast and efficient
  packet I/O for both userspace and kernel clients. It runs on
  FreeBSD and Linux, and includes
  VALE, a very fast and modular in-kernel software
  switch/dataplane, and netmap pipes, a shared memory
  packet transport channel. All these are accessed interchangeably with the same
  API.
netmap, VALE and
    netmap pipes are at least one order of magnitude
    faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native
    switches, pipes), reaching 14.88 million packets per second (Mpps) with much
    less than one core on a 10 Gbit NIC, about 20 Mpps per core for VALE ports,
    and over 100 Mpps for netmap pipes.
Userspace clients can dynamically switch NICs into
    netmap mode and send and receive raw packets through
    memory mapped buffers. Similarly, VALE switch
    instances and ports, and netmap pipes can be created
    dynamically, providing high speed packet I/O between processes, virtual
    machines, NICs and the host stack.
netmap supports both non-blocking I/O
    through ioctl(2), synchronization and blocking I/O through
    a file descriptor and standard OS mechanisms such as
    select(2), poll(2),
    epoll(2), and kqueue(2).
    VALE and netmap pipes are
    implemented by a single kernel module, which also emulates the
    netmap API over standard drivers for devices without
    native netmap support. For best performance,
    netmap requires explicit support in device
  drivers.
In the rest of this (long) manual page we document various aspects
    of the netmap and VALE
    architecture, features and usage.
ARCHITECTURE¶
netmap supports raw packet I/O through a
  port, which can be connected to a physical interface
  (NIC), to the host stack, or to a
  VALE switch). Ports use preallocated circular queues
  of buffers (rings) residing in an mmapped region. There is
  one ring for each transmit/receive queue of a NIC or virtual port. An
  additional ring pair connects to the host stack.
After binding a file descriptor to a port, a
    netmap client can send or receive packets in batches
    through the rings, and possibly implement zero-copy forwarding between
    ports.
All NICs operating in netmap mode use the
    same memory region, accessible to all processes who own
    /dev/netmap file descriptors bound to NICs.
    Independent VALE and netmap
    pipe ports by default use separate memory regions, but can be
    independently configured to share memory.
ENTERING AND EXITING NETMAP MODE¶
The following section describes the system calls to create and controlnetmap ports (including VALE
  and netmap pipe ports). Simpler, higher level
  functions are described in section LIBRARIES.
Ports and rings are created and controlled through a file descriptor, created by opening a special device
fd =
  open("/dev/netmap");ioctl(fd, NIOCREGIF, (struct nmreq
  *)arg);netmap has multiple modes of operation
    controlled by the struct nmreq argument.
    arg.nr_name specifies the port name, as follows:
OS network interface name (e.g. 'em0', 'eth1', ...)- the data path of the NIC is disconnected from the host stack, and the file descriptor is bound to the NIC (one or all queues), or to the host stack;
 valeXXX:YYY (arbitrary XXX and YYY)- the file descriptor is bound to port YYY of a VALE switch called XXX, both dynamically created if necessary. The string cannot exceed IFNAMSIZ characters, and YYY cannot be the name of any existing OS network interface.
 
On return, arg indicates the size of the
    shared memory region, and the number, size and location of all the
    netmap data structures, which can be accessed by
    mmapping the memory
char *mem = mmap(0, arg.nr_memsize,
  fd);Non-blocking I/O is done with special ioctl(2)
    select(2) and poll(2) on the file
    descriptor permit blocking I/O. epoll(2) and
    kqueue(2) are not supported on
    netmap file descriptors.
While a NIC is in netmap mode, the OS will
    still believe the interface is up and running. OS-generated packets for that
    NIC end up into a netmap ring, and another ring is
    used to send packets into the OS network stack. A close(2)
    on the file descriptor removes the binding, and returns the NIC to normal
    mode (reconnecting the data path to the host stack), or destroys the virtual
    port.
DATA STRUCTURES¶
The data structures in the mmapped memory region are detailed in<sys/net/netmap.h>, which is
  the ultimate reference for the netmap API. The main
  structures and fields are indicated below:
struct netmap_if (one per interface)- 
    
struct netmap_if { ... const uint32_t ni_flags; /* properties */ ... const uint32_t ni_tx_rings; /* NIC tx rings */ const uint32_t ni_rx_rings; /* NIC rx rings */ uint32_t ni_bufs_head; /* head of extra bufs list */ ... };Indicates the number of available rings (struct netmap_rings) and their position in the mmapped region. The number of tx and rx rings (ni_tx_rings, ni_rx_rings) normally depends on the hardware. NICs also have an extra tx/rx ring pair connected to the host stack. NIOCREGIF can also request additional unbound buffers in the same memory space, to be used as temporary storage for packets. ni_bufs_head contains the index of the first of these free rings, which are connected in a list (the first uint32_t of each buffer being the index of the next buffer in the list). A
0indicates the end of the list. struct netmap_ring (one per ring)- 
    
struct netmap_ring { ... const uint32_t num_slots; /* slots in each ring */ const uint32_t nr_buf_size; /* size of each buffer */ ... uint32_t head; /* (u) first buf owned by user */ uint32_t cur; /* (u) wakeup position */ const uint32_t tail; /* (k) first buf owned by kernel */ ... uint32_t flags; struct timeval ts; /* (k) time of last rxsync() */ ... struct netmap_slot slot[0]; /* array of slots */ }Implements transmit and receive rings, with read/write pointers, metadata and an array of slots describing the buffers.
 struct netmap_slot (one per buffer)- 
    
struct netmap_slot { uint32_t buf_idx; /* buffer index */ uint16_t len; /* packet length */ uint16_t flags; /* buf changed, etc. */ uint64_t ptr; /* address for indirect buffers */ };Describes a packet buffer, which normally is identified by an index and resides in the mmapped region.
 packet buffers- Fixed size (normally 2 KB) packet buffers allocated by the kernel.
 
The offset of the struct netmap_if in the
    mmapped region is indicated by the nr_offset field
    in the structure returned by NIOCREGIF. From there,
    all other objects are reachable through relative references (offsets or
    indexes). Macros and functions in
    <net/netmap_user.h> help
    converting them into actual pointers:
struct netmap_if *nifp =
  NETMAP_IF(mem, arg.nr_offset);struct netmap_ring *txr =
  NETMAP_TXRING(nifp, ring_index);struct netmap_ring *rxr =
  NETMAP_RXRING(nifp, ring_index);char *buf = NETMAP_BUF(ring,
  buffer_index);RINGS, BUFFERS AND DATA I/O¶
Rings are circular queues of packets with three indexes/pointers (head, cur, tail); one slot is always kept empty. The ring size (num_slots) should not be assumed to be a power of two.(NOTE: older versions of netmap used head/count format to indicate the content of a ring).
head is the first slot available to
    userspace;
  
  cur is the wakeup point: select/poll will unblock when
    tail passes cur;
  
  tail is the first slot reserved to the kernel.
Slot indexes must only move forward; for convenience, the function
nm_ring_next(ring,
  index)head and cur are only modified by the user program; tail is only modified by the kernel. The kernel only reads/writes the struct netmap_ring slots and buffers during the execution of a netmap-related system call. The only exception are slots (and buffers) in the range tail ... head-1, that are explicitly assigned to the kernel.
TRANSMIT RINGS¶
On transmit rings, after anetmap system call, slots in
  the range head ... tail-1
  are available for transmission. User code should fill the slots sequentially
  and advance head and cur past
  slots ready to transmit. cur may be moved further ahead
  if the user code needs more slots before further transmissions (see
  SCATTER GATHER I/O).
At the next NIOCTXSYNC/select()/poll(), slots up to head-1 are pushed to the port, and tail may advance if further slots have become available. Below is an example of the evolution of a TX ring:
    after the syscall, slots between cur and tail are (a)vailable
              head=cur   tail
               |          |
               v          v
     TX  [.....aaaaaaaaaaa.............]
    user creates new packets to (T)ransmit
                head=cur tail
                    |     |
                    v     v
     TX  [.....TTTTTaaaaaa.............]
    NIOCTXSYNC/poll()/select() sends packets and reports new slots
                head=cur      tail
                    |          |
                    v          v
     TX  [..........aaaaaaaaaaa........]
select() and
    poll() will block if there is no space in the ring,
    i.e.
ring->cur ==
  ring->tailHigh speed applications may want to amortize the cost of system calls by preparing as many packets as possible before issuing them.
A transmit ring with pending transmissions has
ring->head != ring->tail + 1
  (modulo the ring size).RECEIVE RINGS¶
On receive rings, after anetmap system call, the slots
  in the range head... tail-1
  contain received packets. User code should process them and advance
  head and cur past slots it wants
  to return to the kernel. cur may be moved further ahead
  if the user code wants to wait for more packets without returning all the
  previous slots to the kernel.
At the next NIOCRXSYNC/select()/poll(), slots up to
    head-1 are returned to the kernel for further
    receives, and tail may advance to report new incoming
    packets.
  
  Below is an example of the evolution of an RX ring:
    after the syscall, there are some (h)eld and some (R)eceived slots
           head  cur     tail
            |     |       |
            v     v       v
     RX  [..hhhhhhRRRRRRRR..........]
    user advances head and cur, releasing some slots and holding others
               head cur  tail
                 |  |     |
                 v  v     v
     RX  [..*****hhhRRRRRR...........]
    NICRXSYNC/poll()/select() recovers slots and reports new packets
               head cur        tail
                 |  |           |
                 v  v           v
     RX  [.......hhhRRRRRRRRRRRR....]
SLOTS AND PACKET BUFFERS¶
Normally, packets should be stored in the netmap-allocated buffers assigned to slots when ports are bound to a file descriptor. One packet is fully contained in a single buffer.The following flags affect slot and buffer processing:
- NS_BUF_CHANGED
 - must be used when the buf_idx in the slot is changed. This can be used to implement zero-copy forwarding, see ZERO-COPY FORWARDING.
 - NS_REPORT
 - reports when this buffer has been transmitted. Normally,
      
netmapnotifies transmit completions in batches, hence signals can be delayed indefinitely. This flag helps detect when packets have been sent and a file descriptor can be closed. - NS_FORWARD
 - When a ring is in 'transparent' mode (see TRANSPARENT MODE), packets marked with this flag are forwarded to the other endpoint at the next system call, thus restoring (in a selective way) the connection between a NIC and the host stack.
 - NS_NO_LEARN
 - tells the forwarding code that the source MAC address for this packet must not be used in the learning bridge code.
 - NS_INDIRECT
 - indicates that the packet's payload is in a user-supplied buffer whose
      user virtual address is in the 'ptr' field of the slot. The size can reach
      65535 bytes.
    
This is only supported on the transmit ring ofVALEports, and it helps reducing data copies in the interconnection of virtual machines. - NS_MOREFRAG
 - indicates that the packet continues with subsequent buffers; the last buffer in a packet must have the flag clear.
 
SCATTER GATHER I/O¶
Packets can span multiple slots if the NS_MOREFRAG flag is set in all but the last slot. The maximum length of a chain is 64 buffers. This is normally used withVALE ports when connecting
  virtual machines, as they generate large TSO segments that are not split
  unless they reach a physical device.
NOTE: The length field always refers to the individual fragment; there is no place with the total length of a packet.
On receive rings the macro NS_RFRAGS(slot) indicates the remaining number of slots for this packet, including the current one. Slots with a value greater than 1 also have NS_MOREFRAG set.
IOCTLS¶
netmap uses two ioctls (NIOCTXSYNC, NIOCRXSYNC) for
  non-blocking I/O. They take no argument. Two more ioctls (NIOCGINFO,
  NIOCREGIF) are used to query and configure ports, with the following argument:
struct nmreq {
    char      nr_name[IFNAMSIZ]; /* (i) port name                  */
    uint32_t  nr_version;        /* (i) API version                */
    uint32_t  nr_offset;         /* (o) nifp offset in mmap region */
    uint32_t  nr_memsize;        /* (o) size of the mmap region    */
    uint32_t  nr_tx_slots;       /* (i/o) slots in tx rings        */
    uint32_t  nr_rx_slots;       /* (i/o) slots in rx rings        */
    uint16_t  nr_tx_rings;       /* (i/o) number of tx rings       */
    uint16_t  nr_rx_rings;       /* (i/o) number of rx rings       */
    uint16_t  nr_ringid;         /* (i/o) ring(s) we care about    */
    uint16_t  nr_cmd;            /* (i) special command            */
    uint16_t  nr_arg1;           /* (i/o) extra arguments          */
    uint16_t  nr_arg2;           /* (i/o) extra arguments          */
    uint32_t  nr_arg3;           /* (i/o) extra arguments          */
    uint32_t  nr_flags           /* (i/o) open mode                */
    ...
};
A file descriptor obtained through /dev/netmap also supports the ioctl supported by network devices, see netintro(4).
NIOCGINFO- returns EINVAL if the named port does not support netmap. Otherwise, it
      returns 0 and (advisory) information about the port. Note that all the
      information below can change before the interface is actually put in
      netmap mode.
    
- nr_memsize
 - indicates the size of the 
netmapmemory region. NICs innetmapmode all share the same memory region, whereasVALEports have independent regions for each port. - nr_tx_slots, nr_rx_slots
 - indicate the size of transmit and receive rings.
 - nr_tx_rings, nr_rx_rings
 - indicate the number of transmit and receive rings. Both ring number and sizes may be configured at runtime using interface-specific functions (e.g. ethtool ).
 
 NIOCREGIF- binds the port named in nr_name to the file
      descriptor. For a physical device this also switches it into
      
netmapmode, disconnecting it from the host stack. Multiple file descriptors can be bound to the same port, with proper synchronization left to the user.NIOCREGIF can also bind a file descriptor to one endpoint of anetmap pipe, consisting of two netmap ports with a crossover connection. A netmap pipe share the same memory space of the parent port, and is meant to enable configuration where a master process acts as a dispatcher towards slave processes.To enable this function, the nr_arg1 field of the structure can be used as a hint to the kernel to indicate how many pipes we expect to use, and reserve extra space in the memory region.
On return, it gives the same info as NIOCGINFO, with nr_ringid and nr_flags indicating the identity of the rings controlled through the file descriptor.
nr_flags nr_ringid selects which rings are controlled through this file descriptor. Possible values of nr_flags are indicated below, together with the naming schemes that application libraries (such as the
nm_openindicated below) can use to indicate the specific set of rings. In the example below, "netmap:foo" is any valid netmap port name.- NR_REG_ALL_NIC netmap:foo
 - (default) all hardware ring pairs
 - NR_REG_SW netmap:foo^
 - the ``host rings'', connecting to the host stack.
 - NR_REG_NIC_SW netmap:foo+
 - all hardware rings and the host rings
 - NR_REG_ONE_NIC netmap:foo-i
 - only the i-th hardware ring pair, where the number is in nr_ringid;
 - NR_REG_PIPE_MASTER netmap:foo{i
 - the master side of the netmap pipe whose identifier (i) is in nr_ringid;
 - NR_REG_PIPE_SLAVE netmap:foo}i
 - the slave side of the netmap pipe whose identifier (i) is in
          nr_ringid.
        
The identifier of a pipe must be thought as part of the pipe name, and does not need to be sequential. On return the pipe will only have a single ring pair with index 0, irrespective of the value of i.
 
By default, a poll(2) or select(2) call pushes out any pending packets on the transmit ring, even if no write events are specified. The feature can be disabled by or-ing NETMAP_NO_TX_POLL to the value written to nr_ringid. When this feature is used, packets are transmitted only on ioctl(NIOCTXSYNC) or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring.
When registering a virtual interface that is dynamically created to a vale(4) switch, we can specify the desired number of rings (1 by default, and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields.
 NIOCTXSYNC- tells the hardware of new packets to transmit, and updates the number of slots available for transmission.
 NIOCRXSYNC- tells the hardware of consumed packets, and asks for newly available packets.
 
SELECT, POLL, EPOLL, KQUEUE.¶
select(2) and poll(2) on anetmap file descriptor process rings as indicated in
  TRANSMIT RINGS and
  RECEIVE RINGS, respectively when write
  (POLLOUT) and read (POLLIN) events are requested. Both block if no slots are
  available in the ring (ring->cur == ring->tail).
  Depending on the platform, epoll(2) and
  kqueue(2) are supported too.
Packets in transmit rings are normally pushed out (and buffers
    reclaimed) even without requesting write events. Passing the
    NETMAP_NO_TX_POLL flag to
    NIOCREGIF disables this feature. By default, receive rings
    are processed only if read events are requested. Passing the
    NETMAP_DO_RX_POLL flag to NIOCREGIF
    updates receive rings even without read events. Note that on epoll and
    kqueue, NETMAP_NO_TX_POLL and
    NETMAP_DO_RX_POLL only have an effect when some
    event is posted for the file descriptor.
LIBRARIES¶
Thenetmap API is supposed to be used directly, both
  because of its simplicity and for efficient integration with applications.
For convenience, the
    <net/netmap_user.h> header
    provides a few macros and functions to ease creating a file descriptor and
    doing I/O with a netmap port. These are loosely
    modeled after the pcap(3) API, to ease porting of
    libpcap-based applications to netmap. To use these
    extra functions, programs should
#define NETMAP_WITH_LIBS#include
  <net/netmap_user.h>The following functions are available:
- struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg)
 - similar to pcap_open, binds a file descriptor to a port.
    
- ifname
 - is a port name, in the form "netmap:XXX" for a NIC and
          "valeXXX:YYY" for a 
VALEport. - req
 - provides the initial values for the argument to the NIOCREGIF ioctl. The nm_flags and nm_ringid values are overwritten by parsing ifname and flags, and other fields can be overridden through the other two arguments.
 - arg
 - points to a struct nm_desc containing arguments (e.g. from a previously open file descriptor) that should override the defaults. The fields are used as described below
 - flags
 - can be set to a combination of the following flags: NETMAP_NO_TX_POLL, NETMAP_DO_RX_POLL (copied into nr_ringid); NM_OPEN_NO_MMAP (if arg points to the same memory region, avoids the mmap and uses the values from it); NM_OPEN_IFNAME (ignores ifname and uses the values in arg); NM_OPEN_ARG1, NM_OPEN_ARG2, NM_OPEN_ARG3 (uses the fields from arg); NM_OPEN_RING_CFG (uses the ring number and sizes from arg).
 
 - int nm_close(struct nm_desc *d)
 - closes the file descriptor, unmaps memory, frees resources.
 - int nm_inject(struct nm_desc *d, const void *buf, size_t size)
 - similar to pcap_inject(), pushes a packet to a ring, returns the size of the packet is successful, or 0 on error;
 - int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg)
 - similar to pcap_dispatch(), applies a callback to incoming packets
 - u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr)
 - similar to pcap_next(), fetches the next packet
 
SUPPORTED DEVICES¶
netmap natively supports the following devices:
On FreeBSD: em(4), igb(4), ixgbe(4), lem(4), re(4).
On Linux e1000(4), e1000e(4), igb(4), ixgbe(4), mlx4(4), forcedeth(4), r8169(4).
NICs without native support can still be used in
    netmap mode through emulation. Performance is
    inferior to native netmap mode but still significantly higher than sockets,
    and approaching that of in-kernel solutions such as Linux's
    pktgen.
Emulation is also available for devices with native netmap support, which can be used for testing or performance comparison. The sysctl variable dev.netmap.admode globally controls how netmap mode is implemented.
SYSCTL VARIABLES AND MODULE PARAMETERS¶
Some aspect of the operation ofnetmap are controlled
  through sysctl variables on FreeBSD (dev.netmap.*) and
  module parameters on Linux
  (/sys/module/netmap_lin/parameters/*):
- dev.netmap.admode: 0
 - Controls the use of native or emulated adapter mode. 0 uses the best available option, 1 forces native and fails if not available, 2 forces emulated hence never fails.
 - dev.netmap.generic_ringsize: 1024
 - Ring size used for emulated netmap mode
 - dev.netmap.generic_mit: 100000
 - Controls interrupt moderation for emulated mode
 - dev.netmap.mmap_unreg: 0
 - dev.netmap.fwd: 0
 - Forces NS_FORWARD mode
 - dev.netmap.flags: 0
 - dev.netmap.txsync_retry: 2
 - dev.netmap.no_pendintr: 1
 - Forces recovery of transmit buffers on system calls
 - dev.netmap.mitigate: 1
 - Propagates interrupt mitigation to user processes
 - dev.netmap.no_timestamp: 0
 - Disables the update of the timestamp in the netmap ring
 - dev.netmap.verbose: 0
 - Verbose kernel messages
 - dev.netmap.buf_num: 163840
 - dev.netmap.buf_size: 2048
 - dev.netmap.ring_num: 200
 - dev.netmap.ring_size: 36864
 - dev.netmap.if_num: 100
 - dev.netmap.if_size: 1024
 - Sizes and number of objects (netmap_if, netmap_ring, buffers) for the global memory region. The only parameter worth modifying is dev.netmap.buf_num as it impacts the total amount of memory used by netmap.
 - dev.netmap.buf_curr_num: 0
 - dev.netmap.buf_curr_size: 0
 - dev.netmap.ring_curr_num: 0
 - dev.netmap.ring_curr_size: 0
 - dev.netmap.if_curr_num: 0
 - dev.netmap.if_curr_size: 0
 - Actual values in use.
 - dev.netmap.bridge_batch: 1024
 - Batch size used when moving packets across a 
VALEswitch. Values above 64 generally guarantee good performance. 
SYSTEM CALLS¶
netmap uses select(2),
  poll(2), epoll and
  kqueue to wake up processes when significant events occur,
  and mmap(2) to map memory. ioctl(2) is
  used to configure ports and VALE switches.
Applications may need to create threads and bind them to specific cores to improve performance, using standard OS primitives, see pthread(3). In particular, pthread_setaffinity_np(3) may be of use.
EXAMPLES¶
TEST PROGRAMS¶
netmap comes with a few programs that can be used for
  testing or simple applications. See the examples/
  directory in netmap distributions, or
  tools/tools/netmap/ directory in
  FreeBSD distributions.
pkt-gen is a general purpose traffic source/sink.
As an example
pkt-gen -i ix0 -f tx -l
  60pkt-gen -i ix0 -f rxpkt-gen has many options can be uses to set packet sizes, addresses, rates, and use multiple send/receive threads and cores.
bridge is another test program which
    interconnects two netmap ports. It can be used for
    transparent forwarding between interfaces, as in
bridge -i ix0 -i ix1bridge -i ix0 -i ix0USING THE NATIVE API¶
The following code implements a traffic generator#include <net/netmap_user.h>
...
void sender(void)
{
    struct netmap_if *nifp;
    struct netmap_ring *ring;
    struct nmreq nmr;
    struct pollfd fds;
    fd = open("/dev/netmap", O_RDWR);
    bzero(&nmr, sizeof(nmr));
    strcpy(nmr.nr_name, "ix0");
    nmr.nm_version = NETMAP_API;
    ioctl(fd, NIOCREGIF, &nmr);
    p = mmap(0, nmr.nr_memsize, fd);
    nifp = NETMAP_IF(p, nmr.nr_offset);
    ring = NETMAP_TXRING(nifp, 0);
    fds.fd = fd;
    fds.events = POLLOUT;
    for (;;) {
	poll(&fds, 1, -1);
	while (!nm_ring_empty(ring)) {
	    i = ring->cur;
	    buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
	    ... prepare packet in buf ...
	    ring->slot[i].len = ... packet length ...
	    ring->head = ring->cur = nm_ring_next(ring, i);
	}
    }
}
HELPER FUNCTIONS¶
A simple receiver can be implemented using the helper functions#define NETMAP_WITH_LIBS
#include <net/netmap_user.h>
...
void receiver(void)
{
    struct nm_desc *d;
    struct pollfd fds;
    u_char *buf;
    struct nm_pkthdr h;
    ...
    d = nm_open("netmap:ix0", NULL, 0, 0);
    fds.fd = NETMAP_FD(d);
    fds.events = POLLIN;
    for (;;) {
	poll(&fds, 1, -1);
        while ( (buf = nm_nextpkt(d, &h)) )
	    consume_pkt(buf, h->len);
    }
    nm_close(d);
}
ZERO-COPY FORWARDING¶
Since physical interfaces share the same memory region, it is possible to do packet forwarding between ports swapping buffers. The buffer from the transmit ring is used to replenish the receive ring:    uint32_t tmp;
    struct netmap_slot *src, *dst;
    ...
    src = &src_ring->slot[rxr->cur];
    dst = &dst_ring->slot[txr->cur];
    tmp = dst->buf_idx;
    dst->buf_idx = src->buf_idx;
    dst->len = src->len;
    dst->flags = NS_BUF_CHANGED;
    src->buf_idx = tmp;
    src->flags = NS_BUF_CHANGED;
    rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur);
    txr->head = txr->cur = nm_ring_next(txr, txr->cur);
    ...
ACCESSING THE HOST STACK¶
The host stack is for all practical purposes just a regular ring pair, which you can access with the netmap API (e.g. withnm_open("netmap:eth0^",
  ...);netmap mode end up into the RX ring, whereas all
  packets queued to the TX ring are send up to the host stack.
VALE SWITCH¶
A simple way to test the performance of aVALE switch is
  to attach a sender and a receiver to it, e.g. running the following in two
  different terminals:
pkt-gen -i vale1:a -f rx #
  receiverpkt-gen -i vale1:b -f tx #
  senderpkt-gen -i vale:x{3 -f rx # receiver
  on the master sidepkt-gen -i vale:x}3 -f tx # sender on
  the slave sideThe following command attaches an interface and the host stack to a switch:
vale-ctl -h vale2:em0netmap clients attached to the same switch can now
  communicate with the network card or the host.
SEE ALSO¶
http://info.iet.unipi.it/~luigi/netmap/Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, Communications of the ACM, 55 (3), pp.45-51, March 2012
Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix ATC'12, June 2012, Boston
Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for virtual machines, ACM CoNEXT'12, December 2012, Nice
Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose
AUTHORS¶
Thenetmap framework has been originally designed and
  implemented at the Universita` di Pisa in 2011 by Luigi
  Rizzo, and further extended with help from Matteo
  Landi, Gaetano Catalli,
  Giuseppe Lettieri, and Vincenzo
  Maffione.
netmap and VALE
    have been funded by the European Commission within FP7 Projects CHANGE
    (257422) and OPENLAB (287581).
CAVEATS¶
No matter how fast the CPU and OS are, achieving line rate on 10G and faster interfaces requires hardware with sufficient performance. Several NICs are unable to sustain line rate with small packet sizes. Insufficient PCIe or memory bandwidth can also cause reduced performance.Another frequent reason for low performance is the use of flow control on the link: a slow receiver can limit the transmit speed. Be sure to disable flow control when running high speed experiments.
SPECIAL NIC FEATURES¶
netmap is orthogonal to some NIC features such as
  multiqueue, schedulers, packet filters.
Multiple transmit and receive rings are supported natively and can be configured with ordinary OS tools, such as ethtool or device-specific sysctl variables. The same goes for Receive Packet Steering (RPS) and filtering of incoming traffic.
netmap does not use
    features such as checksum offloading, TCP
    segmentation offloading, encryption,
    VLAN encapsulation/decapsulation, etc. When using netmap
    to exchange packets with the host stack, make sure to disable these
    features.
| December 14, 2015 | Linux 4.9.0-9-amd64 |