Docker建立容器后在宿主机上的规则

本文会看下Docker中容器建立后在宿主机上的一些规则配置。

在docker daemon没有启动的时候,可以发现系统上只有libvirt建立的默认网桥:

[root@dev ~]# systemctl status  docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
   Active: inactive (dead)
     Docs: http://docs.docker.com

[root@dev ~]# brctl show
bridge name bridge id       STP enabled interfaces
virbr0      8000.52540027e8bc   yes     virbr0-nic

同时观察对应的iptables规则,可以看到规则是很干净的:

[root@dev ~]# iptables-save 
# Generated by iptables-save v1.4.21 on Thu Aug  6 13:40:20 2015
*nat
:PREROUTING ACCEPT [13:997]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Thu Aug  6 13:40:20 2015
# Generated by iptables-save v1.4.21 on Thu Aug  6 13:40:20 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [44:5440]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Thu Aug  6 13:40:20 2015

现在启动docker daemon:

[root@dev ~]# service docker start
Redirecting to /bin/systemctl start  docker.service
[root@dev ~]# brctl show
bridge name bridge id       STP enabled interfaces
docker0     8000.56847afe9799   no      
virbr0      8000.52540027e8bc   yes     virbr0-nic
[root@dev ~]# ip l show dev docker0
8: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT 
    link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff

可以发现出现了一个docker0的bridge。同时我们看下iptables规则:

[root@dev ~]# iptables-save 
# Generated by iptables-save v1.4.21 on Thu Aug  6 13:41:52 2015
*nat
:PREROUTING ACCEPT [22:1779]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
COMMIT
# Completed on Thu Aug  6 13:41:52 2015
# Generated by iptables-save v1.4.21 on Thu Aug  6 13:41:52 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [18:2944]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Thu Aug  6 13:41:52 2015

可以看到在还没启动任何容器的情况下,nat和filter这两个表中都添加了规则,我们来看下这些规则。首先我们再来复习下iptables的一些基本知识,当网卡收到一个数据包后,内核会决定这个数据包是发送给本地的程序,还是转发给其他的主机。如果这个数据包是发给本机的,则会的走:mangle-PREROUTING -> nat-PREROUTING -> mangle-INPUT -> filter-INPUT这个顺序,然后走到接收的应用。如果一个数据包是从本机发送出去的,那么会走:mangle-OUTPUT -> nat-OUTPUT -> filter-OUTPUT -> mangle-POSTROUTING -> nat-POSTROUTING这个顺序。如果一个数据包收到后是要转发出去的,则会走mangle-PREROUTING -> nat-PREROUTING -> mangle-FORWARD -> filter-FORWARD -> mangle-POSTROUTING -> nat-POSTROUTING这个顺序。有了这个基础后我们看下上面docker的iptables规则。也来看三种情况:

1.数据包发送到本机
根据我们上面的顺序,会的走mangle-PREROUTING -> nat-PREROUTING -> mangle-INPUT -> filter-INPUT。从nat表中可以看到-A PREROUTING -m addrtype –dst-type LOCAL -j DOCKER,也就是说这里通过addrtype这个module来匹配数据包,如果这个数据包是发给本地的那么走DOCKER的链。目前我们的DOCKER链是空的。

2.数据包发送出去
根据我们上面的顺序,会的走mangle-OUTPUT -> nat-OUTPUT -> filter-OUTPUT -> mangle-POSTROUTING -> nat-POSTROUTING。这里我们看到两条规则-A OUTPUT ! -d 127.0.0.0/8 -m addrtype –dst-type LOCAL -j DOCKER以及-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE。第一条规则表明如果这个数据包的目的地址是非127的LOCAL地址则走DOCKER,第二天是个SNAT,表示非docker0这个口发送的源地址网段为172.17.0.0/16的包在出去的时候会走NAT。这个规则应该保证了我们docker的容器可以访问外网。

3.数据包的转发
根据我们上面的顺序,会的走mangle-PREROUTING -> nat-PREROUTING -> mangle-FORWARD -> filter-FORWARD -> mangle-POSTROUTING -> nat-POSTROUTING这个顺序。这里有四条规则是和DOCKER相关的,最中要的作用是说如果一个数据包的目的设备是docker0,则走DOCKER链。

现在我们启动一个容器,不指定任何的网络参数:

[root@dev ~]# docker run -dit --name test-os docker.io/centos /bin/bash

可以看到docker0这个bridge中被plug了一个veth:

[root@dev ~]# brctl show
bridge name bridge id       STP enabled interfaces
docker0     8000.56847afe9799   no      vethe7cd2e3
virbr0      8000.52540027e8bc   yes     virbr0-nic
[root@dev ~]# ip l show dev vethe7cd2e3
16: vethe7cd2e3: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT 
    link/ether f6:b9:36:fb:5b:ac brd ff:ff:ff:ff:ff:ff
[root@dev ~]# ip a show dev vethe7cd2e3
16: vethe7cd2e3: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP 
    link/ether f6:b9:36:fb:5b:ac brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f4b9:36ff:fefb:5bac/64 scope link 
       valid_lft forever preferred_lft forever

而veth的另一头则在我们的test-os的namespace中。需要注意的是docker对namespace的操作是直接通过套接字发送到内核的,没有像Neutron一样使用ip命令。根据之前的文章我们知道ip命令能查看到的namespace必须是ip自己建立的,因此这里不能直接通过ip命令查看namespace。

我们看下iptables此时的规则变化:

[root@dev ~]# iptables-save
# Generated by iptables-save v1.4.21 on Thu Aug  6 14:12:58 2015
*nat
:PREROUTING ACCEPT [5481:502017]
:INPUT ACCEPT [4:256]
:OUTPUT ACCEPT [57:4519]
:POSTROUTING ACCEPT [57:4519]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
COMMIT
# Completed on Thu Aug  6 14:12:58 2015
# Generated by iptables-save v1.4.21 on Thu Aug  6 14:12:58 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1375:163760]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Thu Aug  6 14:12:58 2015

可以看到iptables的规则没有发生变化。此时如果容器要发送请求给公网,那么这个请求会通过veth走到物理机namespace的vethe7cd2e3。由于这个vethe7cd2e3是plug到docker0上的,所以会的走网桥的逻辑。由于我们的内核设置了允许ip forward,所以这个包会从docker0出来传递到内核。对于内核来说这是一次forward的请求,所以会的走转发链,也就是走mangle-PREROUTING -> nat-PREROUTING -> mangle-FORWARD -> filter-FORWARD -> mangle-POSTROUTING -> nat-POSTROUTING。如果在容器中ping 1.2.4.8,iptables的统计信息中可以看到:

[root@dev ~]# iptables -t nat -L -xvn
Chain PREROUTING (policy ACCEPT 6518 packets, 593756 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
       5      316 DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 4 packets, 256 bytes)
    pkts      bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 58 packets, 4574 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
       0        0 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 58 packets, 4574 bytes)
    pkts      bytes target     prot opt in     out     source               destination         
       2      168 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0           

Chain DOCKER (2 references)
    pkts      bytes target     prot opt in     out     source               destination  

这里POSTROUTING的pkts可以看到增加。之所以这里的pkts数目和实际上的ping包个数不一致是由于conntrackd的session信息有一定的默认cache时间造成的。

再来看下物理机想要访问容器的数据链路。当我们访问容器的172.17.0.5这个地址的时候,根据路由这个数据包是走到docker0的:

[root@dev ~]# ip r
default via 172.16.1.1 dev enp0s3  proto static  metric 100 
default via 10.0.2.1 dev enp0s8  proto static  metric 101 
10.0.2.0/24 dev enp0s8  proto kernel  scope link  src 10.0.2.6 
10.0.2.0/24 dev enp0s8  proto kernel  scope link  src 10.0.2.6  metric 100 
172.16.1.0/24 dev enp0s3  proto kernel  scope link  src 172.16.1.75  metric 100 
172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.42.1 
192.168.56.0/24 dev enp0s9  proto kernel  scope link  src 192.168.56.200  metric 100 
192.168.100.0/24 dev enp0s10  proto kernel  scope link  src 192.168.100.101  metric 100 
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 

当数据包走到docker0后走正常的bridge的逻辑,就能到ethe7cd2e3,然后到达对端的namespace中的veth,被我们的容器接收。对于容器与容器之间的互相访问也和这里类似。

现在再来看一个使用了网络参数的例子。比如我们希望映射物理机的8888到容器的80端口,我们来看下在这种情况下的规则变化:

[root@dev ~]# docker run -dit -p 8888:80 --name test-os2 docker.io/centos /bin/bash

此时看下iptables:

[root@dev ~]# iptables-save 
# Generated by iptables-save v1.4.21 on Thu Aug  6 14:29:18 2015
*mangle
:PREROUTING ACCEPT [2644:222950]
:INPUT ACCEPT [2588:218246]
:FORWARD ACCEPT [56:4704]
:OUTPUT ACCEPT [393:75016]
:POSTROUTING ACCEPT [449:79720]
COMMIT
# Completed on Thu Aug  6 14:29:18 2015
# Generated by iptables-save v1.4.21 on Thu Aug  6 14:29:18 2015
*nat
:PREROUTING ACCEPT [38:2854]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [1:55]
:POSTROUTING ACCEPT [1:55]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.6/32 -d 172.17.0.6/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8888 -j DNAT --to-destination 172.17.0.6:80
COMMIT
# Completed on Thu Aug  6 14:29:18 2015
# Generated by iptables-save v1.4.21 on Thu Aug  6 14:29:18 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [29:2822]
:DOCKER - [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A DOCKER -d 172.17.0.6/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
COMMIT
# Completed on Thu Aug  6 14:29:18 2015

可以看到这里多了很多和80、8888相关的规则,我们来分析下。首先来看一个数据包需要从容器发送到公网的情况,此时会走转发的逻辑,也就是我们上面说的mangle-PREROUTING -> nat-PREROUTING -> mangle-FORWARD -> filter-FORWARD -> mangle-POSTROUTING -> nat-POSTROUTING。此时数据包已经经过网桥的逻辑,从docker0进入到内核,然后走nat-PREROUTING,也就是会的走DOCKER的链,但DOCKER链! -i docker0这个要求使得这个数据包没有被匹配上,于是就继续走剩余的规则。filter中的FORWARD也没有匹配的,于是最后走nat-POSTROUTING,也就是通过SNAT出去了。现在来看下公网访问物理机的8888的流程,当这个数据包进入内核后,内核走nat-PREROUTING,然后匹配上了-A DOCKER ! -i docker0 -p tcp -m tcp –dport 8888 -j DNAT –to-destination 172.17.0.6:80,接着内核将这个数据包的目的地址改为172.17.0.6:80,接着就发送给了我们的容器。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

*