|Stable release||Linux 3.1 / 2011-10-24|
|License||GNU General Public License|
Netfilter is a set of hooks inside the Linux kernel that allows kernel modules to register callback functions with the network stack. A registered callback function is then called back for every packet that traverses the respective hook within the network stack.
- 1 History
- 2 Userspace utility programs
- 3 Packet Defragmentation
- 4 Connection Tracking
- 5 Network Address Translation
- 6 Further Netfilter projects
- 7 Netfilter Workshops
- 8 See also
- 9 References
- 10 External links
Rusty Russell started the netfilter/iptables project in 1998; he had also authored the project's predecessor, ipchains. As the project grew, he founded the Netfilter Core Team (or simply coreteam) in 1999. The software they produce (called netfilter hereafter) uses the GNU General Public License (GPL) license, and was merged into Linux kernel 2.3.x in March 2000. In August 2003 Harald Welte was made chairman of the coreteam, and in April 2004, following a crack-down by the project on those distributing the project's software embedded in routers without complying with the GPL, Welte was granted a historic injunction by a German court against Sitecom Germany, who refused to follow the GPL's terms (see GPL-related disputes). In September 2007, Patrick McHardy, who led development for past years, was elected the new chairman of the coreteam.
Prior to iptables, the predominant software packages for creating Linux firewalls were ipchains in Linux kernel 2.2.x and ipfwadm in Linux kernel 2.0.x, which in turn was based on BSD's ipfw. Both ipchains and ipfwadm alter the networking code so they could manipulate packets, as there was no general packet-control framework until Netfilter.
Whereas ipchains and ipfwadm combine packet filtering and NAT (particularly three specific kinds of NAT, called masquerading, port forwarding and redirection), Netfilter separates packet operations into multiple parts, described below. Each connects to the Netfilter hooks at different points to access packets. The connection-tracking and NAT subsystems are more general and more powerful than the rudimentary versions within ipchains and ipfwadm.
Userspace utility programs
The kernel modules named ip_tables, ip6_tables, arp_tables (the underscore is part of the name) and ebtables are some of the significant users of the Netfilter hook system. They provide a table-based system for defining firewall rules that can filter or transform packets. The tables can be administered through the user-space tools iptables, ip6tables, arptables and ebtables, respectively.
Each table is actually its own hook, and each table was introduced to serve a specific purpose. As far as Netfilter is concerned, usually to run said table in a specific order with respect to other tables. Other than that however, all tables will call the same table processing function to further iterate over, and execute rules.
Chains in this regard equate to where from the Netfilter stack was invoked, such as packet reception (PREROUTING), locally delivered (INPUT), forwarded (FORWARD), locally output (OUTPUT) and packet send (POSTROUTING). Netfilter modules that do not provide tables (see below) may also check for the origin to select their mode of operation.
- the iptable_raw module will, when loaded, register a hook that will be called before any other Netfilter hook. It provides a table called raw that can be used to filter packets before they reach more memory-demanding operations such as Connection Tracking.
- the iptable_mangle module registers a hook and mangle table to run after Connection Tracking (but still before any other table), so that modifications can be made to the packet that may influence further rules such as NAT or filtering.
- the iptable_nat module registers two hooks: DNAT-based transformations are applied before the filter hook, SNAT-based transformations are applied afterwards. The nat table that is made available to iptables is merely a “configuration database” for NAT mappings only, and not intended for filtering of any kind.
- the iptable_filter module registers the filter table, used for general-purpose filtering (firewalling).
- the security_filter module is used for Mandatory Access Control (MAC) networking rules, such as those enabled by the SECMARK and CONNSECMARK targets. Mandatory Access Control is implemented by Linux Security Modules such as SELinux. The security table is called after the filter table, allowing any Discretionary Access Control (DAC) rules in the filter table to take effect before MAC rules. This table provides the following built-in chains: INPUT (for packets coming into the box itself), OUTPUT (for altering locally-generated packets before routing), and FORWARD (for altering packets being routed through the box).
nftables is the userspace part of a new general-purpose in-kernel packet classification engine, which is intended to replace iptables.
nftables kernel engine adds a simple virtual machine into the Linux kernel, which is able to execute bytecode to inspect a network packet and make decisions on how that packet should be handled. The operations implemented by this virtual machine are intentionally made basic. It can get data from the packet itself, have a look at the associated metadata (inbound interface, for example), and manage connection tracking data. Arithmetic, bitwise and comparison operators can be used for making decisions based on that data. The virtual machine is also capable of manipulating sets of data (typically IP addresses), allowing multiple comparison operations to be replaced with a single set lookup.
That is contrary to the currently used firewalling code, which has protocol awareness built-in so deeply into the code, that it has had to be replicated four times — for IPv4, IPv6, ARP, and Ethernet bridging — as the firewall engines are too protocol-specific to be used in a generic manner.
The main advantages over iptables are:
- simplification of the Linux kernel ABI
- reduction of code duplication
- improved error reporting
- more efficient execution, storage, and incremental changes of filtering rules.
The nf_defrag_ipv4 module will defragment IPv4 packets before Connection Tracking (nf_conntrack_ipv4 module) sees them. This is necessary for the in-kernel Connection Tracking and NAT helper modules (which are a form of “mini-ALGs”) that only work reliably on entire packets, not necessarily fragments.
The IPv6 defragmenter is not a module in its own right, but is integrated into the nf_conntrack_ipv6 module.
One of the important features built on top of the Netfilter framework is connection tracking. Connection tracking allows the kernel to keep track of all logical network connections or sessions, and thereby relate all of the packets which may make up that connection. NAT relies on this information to translate all related packets in the same way, and iptables can use this information to act as a stateful firewall.
The connection state however is completely independent of any upper-level state, such as TCP's or SCTP's state. Part of the reason for this is that when merely forwarding packets, i.e. no local delivery, the TCP engine may not necessarily be invoked at all. Even connectionless-mode transmissions such as UDP, IPsec (AH/ESP), GRE and other tunneling protocols have a, at least pseudo, connection state. The heuristic for such protocols is often based upon a preset timeout value for inactivity, after whose expiration a Netfilter connection is dropped.
Each Netfilter connection is uniquely identified by a (layer-3 protocol, source address, destination address, layer-4 protocol, layer-4 key) tuple. The layer-4 key depends on the transport protocol; for TCP/UDP it is the port numbers, for tunnels it can be their tunnel ID, but otherwise is just zero, as if it were not part of the tuple. To be able to inspect the TCP port in all cases, packets will be mandatorily defragmented.
Netfilter connections can be manipulated with the user-space tool conntrack.
iptables can make use of checking the connection's information such as states, statuses and more to make packet filtering rules more powerful and easier to manage. The most common states are:
- “NEW”: trying to create a new connection
- “ESTABLISHED”: part of an already-existing connection
- “RELATED”: assigned to a packet that is initiating a new connection and which has been “expected”. The aforementioned mini-ALGs set up these expectations, for example, when the nf_conntrack_ftp module sees an FTP “PASV” command.
- “INVALID”: the packet was found to be invalid, e.g. it would not adhere to the TCP state diagram.
- “UNTRACKED” is a special state that can be assigned by the administrator to bypass connection tracking for a particular packet (see raw table, above)
A normal example would be that the first packet the conntrack subsystem sees will be classified “new”, the reply would be classified “established” and an ICMP error would be “related”. An ICMP error packet which did not match any known connection would be “invalid”.
Connection tracking helpers
Through the use of plugin modules, connection tracking can be given knowledge of application-layer protocols and thus understand that two or more distinct connections are “related”. For example, consider the FTP protocol. A control connection is established, but whenever data is transferred, a separate connection is established to transfer it. When the nf_conntrack_ftp module is loaded, the first packet of an FTP data connection will be classified as “related” instead of “new”, as it is logically part of an existing connection.
The helpers only inspect one packet at a time, so if vital information for connection tracking is split across two packets, either due to IP fragmentation or TCP segmentation, the helper will not necessarily recognize patterns and therefore not perform its operation. IP fragmentation is dealt with the connection tracking subsystem requiring defragmentation, though TCP segmentation is not handled. In case of FTP, segmentation is deemed not to happen “near” a command like PASV with standard segment sizes, so is not dealt with in Netfilter either.
Network Address Translation
Each connection has a set of original addresses and reply addresses, which initially start out the same. NAT in Netfilter is implemented by simply changing the reply address, and where desired, port. When packets are received, their connection tuple will also be compared against the reply address pair (and ports). Being fragment-free is also a requirement for NAT. (If need be, IPv4 packets may be refragmented by the normal, non-Netfilter, IPv4 stack.)
Similar to connection tracking helpers, NAT helpers will do a packet inspection and substitute original addresses by reply addresses in the payload.
Further Netfilter projects
Though not being kernel modules that make use of Netfilter code directly, the Netfilter project hosts a few more noteworthy software.
conntrack-tools is a set of user-space tools for Linux that allow system administrators to interact with the Connection Tracking entries and tables. The package includes the conntrackd daemon and the command line interface conntrack. The userspace daemon conntrackd can be used to enable high availability cluster-based stateful firewalls and collect statistics of the stateful firewall use. The command line interface conntrack provides a more flexible interface to the connection tracking system than the obsolete /proc/net/nf_conntrack.
Unlike other extensions such as Connection Tracking, ipset is more related to iptables than it is to the core Netfilter code. ipset does not make use of Netfilter hooks for instance, but actually provides an iptables module to match and do minimal modifications (set/clear) to IP sets.
The user-space tool called ipset is used to set up, maintain and inspect so called “IP sets” in the Linux kernel. An IP set usually contains a set of IP addresses, but can also contain sets of other network numbers, depending on its “type”. These sets are much more lookup-efficient than bare iptables rules, but of course may come with a greater memory footprint. Different storage algorithms (for the data structures in memory) are provided in ipset for the user to select an optimum solution.
Any entry in one set can be bound to another set, allowing for sophisticated matching operations. A set can only be removed (destroyed) if there are no iptables rules or other sets referring to it.
SYNPROXY target makes handling of large SYN floods possible without the large performance penalties imposed by the connection tracking in such cases. By redirecting initial SYN requests to the SYNPROXY target, connections are not registered within the connection tracking until they reach a validated final ACK state, freeing up connection tracking from accounting large numbers of potentially invalid connections. This way, huge SYN floods can be handled in an effective way.
ulogd is a user-space daemon to receive and log packets and event notifications from the Netfilter subsystems. ip_tables can deliver packets via the userspace queueing mechanism to it, and connection tracking can interact with ulogd to exchange further information about packets or events (such as connection teardown, NAT setup).
There are plans to implement the functionality of the Network scheduler ins user-space as a part of Netfilter.
The Netfilter projects also provides a set of libraries whose prefix name is libnetfilter that can be used to perform different task from user-space. These libraries are released under the GNU GPL version 2. Specifically, they are:
- libnetfilter_queue, that allows to perform user-space packet queueing in conjunction with iptables. Based on libnfnetlink.
- libnetfilter_conntrack, that allows to manipulate Connection Tracking entries from user-space. Based on libnfnetlink.
- libnetfilter_log, that allows to collect log messages that are generated by iptables. Based on libnfnetlink.
- libnl-3-netfilter - The part of libnl project. Allows to operate on Queues, Conntracks and Log. It's the part of libnl. See Documentation
- libiptc, that allows changing the iptables firewall ruleset. This is not based on any netlink library and its API is internally used by the iptables utilities. Not for third party projects.
- libipset, that allows to operate on IP sets. Based on libmnl.
The Netfilter project organize an annual meeting for developers which is used to discuss on-going research and development efforts. The last Netfilter Workshop took place in Copenhagen, Denmark, in March 2013.
- ipchains, the predecessor to iptables
- NPF (firewall)
- PF (firewall)
- Netlink, an API used by Netfilter extensions
- Linux Virtual Server (LVS)
- IP Virtual Server (IPVS, part of LVS)
- Jonathan Corbet (2013-08-20). "The return of nftables". LWN.net. Retrieved 2013-10-22.
- Netfilter's Connection Tracking System, by Pablo Neira Ayuso, June 14, 2006: http://people.netfilter.org/pablo/docs/login.pdf
- Patrick McHardy (2013-08-07). "netfilter: implement netfilter SYN proxy". LWN.net. Retrieved 2013-11-05.
- "netfilter: add SYNPROXY core/target". 2013-08-27. Retrieved 2013-11-05. Unknown parameter
- "netfilter: add IPv6 SYNPROXY target". 2013-08-27. Retrieved 2013-11-05. Unknown parameter