likes
comments
collection
share

记一次线上微信支付故障-dns解析问题排查

作者站长头像
站长
· 阅读数 15

项目环境

  • 开发语言:golang
  • 开发框架:go-zero
  • 部署环境:阿里云k8s集群

具体表现

调用微信app支付统一下单接口,测试反馈偶尔会出现报错:

Post \"https://api.mch.weixin.qq.com/pay/unifiedorder\": dial tcp [240e:e1:a900:50::4a]:443: connect: network is unreachabl

这里连接的是ipv6的地址[240e:e1:a900:50::4a],查看当前服务器配置,并不支持ipv6地址解析,查看域名dns解析:

[root@localhost xwj-services]# nslookup api.mch.weixin.qq.com
Server:         10.225.136.20
Address:        10.225.136.20#53

Non-authoritative answer:
api.mch.weixin.qq.com   canonical name = forward.weixin.qq.com.
forward.weixin.qq.com   canonical name = forwardtmp.weixin.qq.com.
Name:   forwardtmp.weixin.qq.com
Address: 101.91.0.140
Name:   forwardtmp.weixin.qq.com
Address: 101.226.137.13
Name:   forwardtmp.weixin.qq.com
Address: 240e:e1:a900:50::4a
Name:   forwardtmp.weixin.qq.com
Address: 240e:e1:a900:50::49

可以看到该域名同时支持了ipv4和ipv6,但是为什么偶尔会解析到ipv6地址呢

查看微信社区文档:pay.weixin.qq.com/wiki/doc/ap…

2. IPV6相关

如果您的服务器开启了IPv6支持,由于当前互联网对IPv6支持不完整,导致在DNS解析时通常会碰到超时问题;

建议在调用支付API时,显示指定使用IPv4解析.

PHP程序使用curl调用参考代码如下: 
if(defined('CURLOPT_IPRESOLVE') && defined('CURL_IPRESOLVE_V4'))
{
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
} 

查看当前服务器ipv6配置

sysctl -a |grep ipv6|grep disable

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.136.65.38  netmask 255.255.255.0  broadcast 10.136.65.255
        ether 00:16:3e:0e:46:ca  txqueuelen 1000  (Ethernet)
        RX packets 348024466  bytes 124403304860 (115.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 276384477  bytes 87874832467 (81.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

并无inet6的字样,没有开启ipv6,于是尝试查看golang客户端是否支持指定使用Ipv4,网上搜索后并未找到结果,查看golang net/http源码

net/lookup.go 288-line  go version 1.18
// lookupIPAddr looks up host using the local resolver and particular network.
// It returns a slice of that host's IPv4 and IPv6 addresses.
func (r *Resolver) lookupIPAddr(ctx context.Context, network, host string) ([]IPAddr, error) {
	// Make sure that no matter what we do later, host=="" is rejected.
	// parseIP, for example, does accept empty strings.
	if host == "" {
		return nil, &DNSError{Err: errNoSuchHost.Error(), Name: host, IsNotFound: true}
	}
	if ip, zone := parseIPZone(host); ip != nil {
		return []IPAddr{{IP: ip, Zone: zone}}, nil
	}
	fmt.Println("return ip??")
	trace, _ := ctx.Value(nettrace.TraceKey{}).(*nettrace.Trace)
	if trace != nil && trace.DNSStart != nil {
		trace.DNSStart(host)
	}
	// The underlying resolver func is lookupIP by default but it
	// can be overridden by tests. This is needed by net/http, so it
	// uses a context key instead of unexported variables.
	resolverFunc := r.lookupIP
	if alt, _ := ctx.Value(nettrace.LookupIPAltResolverKey{}).(func(context.Context, string, string) ([]IPAddr, error)); alt != nil {
		resolverFunc = alt
	}
...

可以看到最终进行dns解析时,ipv4和ipv6都会一起解析返回,再来看下解析完成后选择ip的过程:

ipsock.go 249 line

// internetAddrList resolves addr, which may be a literal IP
// address or a DNS name, and returns a list of internet protocol
// family addresses. The result contains at least one address when
// error is nil.
func (r *Resolver) internetAddrList(ctx context.Context, net, addr string) (addrList, error) {
	var (
		err        error
		host, port string
		portnum    int
	)
	switch net {
	case "tcp", "tcp4", "tcp6", "udp", "udp4", "udp6":
		if addr != "" {
			if host, port, err = SplitHostPort(addr); err != nil {
				return nil, err
			}
			if portnum, err = r.LookupPort(ctx, net, port); err != nil {
				return nil, err
			}
		}
	case "ip", "ip4", "ip6":
		if addr != "" {
			host = addr
		}
	default:
		return nil, UnknownNetworkError(net)
	}
	inetaddr := func(ip IPAddr) Addr {
		switch net {
		case "tcp", "tcp4", "tcp6":
			return &TCPAddr{IP: ip.IP, Port: portnum, Zone: ip.Zone}
		case "udp", "udp4", "udp6":
			return &UDPAddr{IP: ip.IP, Port: portnum, Zone: ip.Zone}
		case "ip", "ip4", "ip6":
			return &IPAddr{IP: ip.IP, Zone: ip.Zone}
		default:
			panic("unexpected network: " + net)
		}
	}
	if host == "" {
		return addrList{inetaddr(IPAddr{})}, nil
	}

	// Try as a literal IP address, then as a DNS name.
	ips, err := r.lookupIPAddr(ctx, net, host)
	fmt.Println("net",net,ips,net[len(net)-1])
	if err != nil {
		return nil, err
	}
	// Issue 18806: if the machine has halfway configured
	// IPv6 such that it can bind on "::" (IPv6unspecified)
	// but not connect back to that same address, fall
	// back to dialing 0.0.0.0.
	if len(ips) == 1 && ips[0].IP.Equal(IPv6unspecified) {
		ips = append(ips, IPAddr{IP: IPv4zero})
	}

	var filter func(IPAddr) bool
	if net != "" && net[len(net)-1] == '4' {
		filter = ipv4only
	}
	if net != "" && net[len(net)-1] == '6' {
		filter = ipv6only
	}
	return filterAddrList(filter, ips, inetaddr, host)
}

// filterAddrList applies a filter to a list of IP addresses,
// yielding a list of Addr objects. Known filters are nil, ipv4only,
// and ipv6only. It returns every address when the filter is nil.
// The result contains at least one address when error is nil.
func filterAddrList(filter func(IPAddr) bool, ips []IPAddr, inetaddr func(IPAddr) Addr, originalAddr string) (addrList, error) {
	var addrs addrList
	for _, ip := range ips {
		if filter == nil || filter(ip) {
			addrs = append(addrs, inetaddr(ip))
		}
	}
	if len(addrs) == 0 {
		return nil, &AddrError{Err: errNoSuitableAddress.Error(), Addr: originalAddr}
	}
	return addrs, nil
}

dial.go 359 line

// DialContext connects to the address on the named network using
// the provided context.
//
// The provided Context must be non-nil. If the context expires before
// the connection is complete, an error is returned. Once successfully
// connected, any expiration of the context will not affect the
// connection.
//
// When using TCP, and the host in the address parameter resolves to multiple
// network addresses, any dial timeout (from d.Timeout or ctx) is spread
// over each consecutive dial, such that each is given an appropriate
// fraction of the time to connect.
// For example, if a host has 4 IP addresses and the timeout is 1 minute,
// the connect to each single address will be given 15 seconds to complete
// before trying the next one.
//
// See func Dial for a description of the network and address
// parameters.
func (d *Dialer) DialContext(ctx context.Context, network, address string) (Conn, error) {
	if ctx == nil {
		panic("nil context")
	}
	deadline := d.deadline(ctx, time.Now())

	if !deadline.IsZero() {
		if d, ok := ctx.Deadline(); !ok || deadline.Before(d) {
			subCtx, cancel := context.WithDeadline(ctx, deadline)
			defer cancel()
			ctx = subCtx
		}
	}
	if oldCancel := d.Cancel; oldCancel != nil {
		subCtx, cancel := context.WithCancel(ctx)
		defer cancel()
		go func() {
			select {
			case <-oldCancel:
				cancel()
			case <-subCtx.Done():
			}
		}()
		ctx = subCtx
	}

	// Shadow the nettrace (if any) during resolve so Connect events don't fire for DNS lookups.
	resolveCtx := ctx
	if trace, _ := ctx.Value(nettrace.TraceKey{}).(*nettrace.Trace); trace != nil {
		shadow := *trace
		shadow.ConnectStart = nil
		shadow.ConnectDone = nil
		resolveCtx = context.WithValue(resolveCtx, nettrace.TraceKey{}, &shadow)
	}
    //这里得到了所有地址
	addrs, err := d.resolver().resolveAddrList(resolveCtx, "dial", network, address, d.LocalAddr)
	if err != nil {
		return nil, &OpError{Op: "dial", Net: network, Source: nil, Addr: nil, Err: err}
	}

	sd := &sysDialer{
		Dialer:  *d,
		network: network,
		address: address,
	}

	var primaries, fallbacks addrList
	// FallbackDelay specifies the length of time to wait before
	// spawning a RFC 6555 Fast Fallback connection. That is, this
	// is the amount of time to wait for IPv6 to succeed before
	// assuming that IPv6 is misconfigured and falling back to
	// IPv4.
	//
	// If zero, a default delay of 300ms is used.
	// A negative value disables Fast Fallback support.
	// FallbackDelay time.Duration
	// dualStack() = FallbackDelay > 0
	if d.dualStack() && network == "tcp" {
		primaries, fallbacks = addrs.partition(isIPv4)
	} else {
		primaries = addrs
	}

	var c Conn
	if len(fallbacks) > 0 {
		c, err = sd.dialParallel(ctx, primaries, fallbacks)
	} else {
		c, err = sd.dialSerial(ctx, primaries)
	}
	if err != nil {
		return nil, err
	}

	if tc, ok := c.(*TCPConn); ok && d.KeepAlive >= 0 {
		setKeepAlive(tc.fd, true)
		ka := d.KeepAlive
		if d.KeepAlive == 0 {
			ka = defaultTCPKeepAlive
		}
		setKeepAlivePeriod(tc.fd, ka)
		testHookSetKeepAlive(ka)
	}
	return c, nil
}

// dialSerial connects to a list of addresses in sequence, returning
// either the first successful connection, or the first error.
func (sd *sysDialer) dialSerial(ctx context.Context, ras addrList) (Conn, error) {
	var firstErr error // The error from the first address is most relevant.

	for i, ra := range ras {
		select {
		case <-ctx.Done():
			return nil, &OpError{Op: "dial", Net: sd.network, Source: sd.LocalAddr, Addr: ra, Err: mapErr(ctx.Err())}
		default:
		}

		dialCtx := ctx
		if deadline, hasDeadline := ctx.Deadline(); hasDeadline {
			partialDeadline, err := partialDeadline(time.Now(), deadline, len(ras)-i)
			if err != nil {
				// Ran out of time.
				if firstErr == nil {
					firstErr = &OpError{Op: "dial", Net: sd.network, Source: sd.LocalAddr, Addr: ra, Err: err}
				}
				break
			}
			if partialDeadline.Before(deadline) {
				var cancel context.CancelFunc
				dialCtx, cancel = context.WithDeadline(ctx, partialDeadline)
				defer cancel()
			}
		}
        //遍历所有连接,当有一个连接成功时直接返回
		c, err := sd.dialSingle(dialCtx, ra)
		if err == nil {
			return c, nil
		}
		if firstErr == nil {
			firstErr = err
		}
	}

	if firstErr == nil {
		firstErr = &OpError{Op: "dial", Net: sd.network, Source: nil, Addr: nil, Err: errMissingAddress}
	}
	return nil, firstErr
}

// dialSingle attempts to establish and returns a single connection to
// the destination address.
func (sd *sysDialer) dialSingle(ctx context.Context, ra Addr) (c Conn, err error) {
	trace, _ := ctx.Value(nettrace.TraceKey{}).(*nettrace.Trace)
	if trace != nil {
		raStr := ra.String()
		if trace.ConnectStart != nil {
			trace.ConnectStart(sd.network, raStr)
		}
		if trace.ConnectDone != nil {
			defer func() { trace.ConnectDone(sd.network, raStr, err) }()
		}
	}
	la := sd.LocalAddr
	switch ra := ra.(type) {
	case *TCPAddr:
		la, _ := la.(*TCPAddr)
		c, err = sd.dialTCP(ctx, la, ra)
	case *UDPAddr:
		la, _ := la.(*UDPAddr)
		c, err = sd.dialUDP(ctx, la, ra)
	case *IPAddr:
		la, _ := la.(*IPAddr)
		c, err = sd.dialIP(ctx, la, ra)
	case *UnixAddr:
		la, _ := la.(*UnixAddr)
		c, err = sd.dialUnix(ctx, la, ra)
	default:
		return nil, &OpError{Op: "dial", Net: sd.network, Source: la, Addr: ra, Err: &AddrError{Err: "unexpected address type", Addr: sd.address}}
	}
	if err != nil {
		return nil, &OpError{Op: "dial", Net: sd.network, Source: la, Addr: ra, Err: err} // c is non-nil interface containing nil pointer
	}
	return c, nil
}

可以看到,如果不是特意指定了ipv4或者ipv6(这里对应的是nerwork=tcp4/tcp6或udp4/udp6),经过测试,正常的http请求默认都是tcp,在不做特殊设置的情况下(FallbackDelay > 0)客户端会优先使用IPv4再使用ipv6

func main() {
	u := "https://api.mch.weixin.qq.com/pay/unifiedorder"
	b := strings.NewReader("test")
	c := NewClient()
	resp, _ := c.Post(u, "text/json;charset=utf-8", b)
	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		fmt.Println(err)
	}
	ur, _ := url.Parse(u)
	fmt.Println(ur.Host, string(body))
}
func NewClient() *http.Client {
	return &http.Client{
		Timeout: 60 * time.Second,
		Transport: &http.Transport{
			TLSClientConfig:   &tls.Config{InsecureSkipVerify: true},
			DisableKeepAlives: true,
			Proxy:             http.ProxyFromEnvironment,
		},
	}
}

打印dns日志,并放到线上测试

打印日志代码,在上面部分:
// Try as a literal IP address, then as a DNS name.
ips, err := r.lookupIPAddr(ctx, net, host)
fmt.Println("net",net,ips,net[len(net)-1])
	
请求正常响应,连接ipv4时的日志:
net tcp [{101.226.137.13 } {101.91.0.140 } {240e:e1:a900:50::4a } {240e:e1:a900:50::49 }] 112 api.mch.weixin.qq.com:443

请求报错,连接到ipv6时的日志
net tcp [{240e:e1:a900:50::4a } {240e:e1:a900:50::49 }] 112 api.mch.weixin.qq.com:443

出现异常时,addrlist只有ipv6地址,由于没有解析到ipv4地址,已经没得选择,只能去连接ipv6了

dns抓包

#正常时
17:05:29.473836 IP shop-rpc-7c84bb44f6-6vj7p.36570 > kube-dns.kube-system.svc.cluster.local.53: 46005+ AAAA? api.mch.weixin.qq.com. (39)
17:05:29.473887 IP shop-rpc-7c84bb44f6-6vj7p.57888 > kube-dns.kube-system.svc.cluster.local.53: 31830+ A? api.mch.weixin.qq.com. (39)
17:05:29.474309 IP kube-dns.kube-system.svc.cluster.local.53 > shop-rpc-7c84bb44f6-6vj7p.36570: 46005 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::4a, AAAA 240e:e1:a900:50::49 (258)
17:05:29.474404 IP kube-dns.kube-system.svc.cluster.local.53 > shop-rpc-7c84bb44f6-6vj7p.57888: 31830 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.226.137.13, A 101.91.0.140 (234)

# 异常时
17:02:33.045324 IP shop-rpc-7c84bb44f6-6vj7p.48458 > kube-dns.kube-system.svc.cluster.local.53: 41571+ AAAA? api.mch.weixin.qq.com.xwj.svc.cluster.local. (61)
17:02:33.045325 IP shop-rpc-7c84bb44f6-6vj7p.46782 > kube-dns.kube-system.svc.cluster.local.53: 22281+ A? api.mch.weixin.qq.com.xwj.svc.cluster.local. (61)
17:02:33.045846 IP kube-dns.kube-system.svc.cluster.local.53 > shop-rpc-7c84bb44f6-6vj7p.46782: 22281 NXDomain*- 0/1/0 (154)
17:02:33.046166 IP kube-dns.kube-system.svc.cluster.local.53 > shop-rpc-7c84bb44f6-6vj7p.48458: 41571 5/0/0 CNAME api.mch.weixin.qq.com., CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::4a, AAAA 240e:e1:a900:50::49 (358)

dns解析类型:

A记录与CNAME记录

A记录是把一个域名解析到一个IP地址,而CNAME记录是把域名解析到另外一个域名,而这个域名最终会指向一个A记录,在功能实现在上A记录与CNAME记录没有区别。

CNAME记录在做IP地址变更时要比A记录方便。CNAME记录允许将多个名字映射到同一台计算机,当有多个域名需要指向同一服务器IP,此时可以将一个域名做A记录指向服务器IP,然后将其他的域名做别名(即:CNAME)到A记录的域名上。当服务器IP地址变更时,只需要更改A记录的那个域名到新IP上,其它做别名的域名会自动更改到新的IP地址上,而不必对每个域名做更改。

A记录与AAAA记录

二者都是指向一个IP地址,但对应的IP版本不同。

A记录指向IPv4地址,AAAA记录指向IPv6地址。AAAA记录是A记录的升级版本。

NXDomain错误

根据抓包可以看出,dns A类型解析出现异常,返回了NXDomain,所以导致应用程序只拿到了ipv6的ip。

以下是一些可能导致 NXDomain 的情况:

  • 拼写错误: 用户输入的域名可能存在拼写错误,或者请求的域名确实不存在。
  • DNS记录尚未生效: 如果域名是最近注册或修改的,DNS 记录可能尚未完全传播到所有的 DNS 服务器。这种情况下,需要等待 DNS 记录的生效时间,通常为 TTL(Time To Live)值所指定的时间。
  • 域名被停用或删除: 域名可能已被停用或删除,导致在 DNS 中找不到相应的记录。
  • DNS服务器问题: DNS 服务器本身可能遇到问题,无法提供正确的域名解析。这可能是由于服务器故障、配置错误或网络问题引起的。
  • DNS缓存问题: 本地 DNS 缓存可能包含过期或不正确的信息,导致域名解析错误。尝试清除本地 DNS 缓存,然后重新尝试解析域名。

另一个实验,使用nslookup和curl对api.mch.weixin.qq.com进行解析

在相同环境下,curl和nslookup并未出现过解析异常,目前问题只是出现在了goalng net/http客户端

dns抓包结果:

13:44:11.618584 IP localhost.localdomain.11115 > sgs-dc-01.dobest.corp.domain: 9108+ A? api.mch.weixin.qq.com. (39)
13:44:11.618658 IP localhost.localdomain.11115 > sgs-dc-01.dobest.corp.domain: 11739+ AAAA? api.mch.weixin.qq.com. (39)
13:44:11.619076 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.11115: 9108 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.226.137.13, A 101.91.0.140 (118)
13:44:11.619339 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.11115: 11739 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::4a, AAAA 240e:e1:a900:50::49 (142)
13:44:12.538100 IP localhost.localdomain.43789 > sgs-dc-01.dobest.corp.domain: 9052+ A? api.mch.weixin.qq.com. (39)
13:44:12.538159 IP localhost.localdomain.43789 > sgs-dc-01.dobest.corp.domain: 31643+ AAAA? api.mch.weixin.qq.com. (39)
13:44:12.538692 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.43789: 9052 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.91.0.140, A 101.226.137.13 (118)
13:44:12.538765 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.43789: 31643 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::49, AAAA 240e:e1:a900:50::4a (142)
13:44:13.312403 IP localhost.localdomain.35964 > sgs-dc-01.dobest.corp.domain: 4603+ A? api.mch.weixin.qq.com. (39)
13:44:13.312459 IP localhost.localdomain.35964 > sgs-dc-01.dobest.corp.domain: 25914+ AAAA? api.mch.weixin.qq.com. (39)
13:44:13.313349 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.35964: 4603 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.226.137.13, A 101.91.0.140 (118)
13:44:13.313448 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.35964: 25914 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::4a, AAAA 240e:e1:a900:50::49 (142)
13:44:14.002346 IP localhost.localdomain.25122 > sgs-dc-01.dobest.corp.domain: 52577+ A? api.mch.weixin.qq.com. (39)
13:44:14.002455 IP localhost.localdomain.25122 > sgs-dc-01.dobest.corp.domain: 22434+ AAAA? api.mch.weixin.qq.com. (39)
13:44:14.002983 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.25122: 22434 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::49, AAAA 240e:e1:a900:50::4a (142)
13:44:14.003062 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.25122: 52577 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.91.0.140, A 101.226.137.13 (118)

A和AAAA类型解析,大部分情况都是同一个连接发出和接收的,但是golang 客户端是两个不同连接

13:45:36.754562 IP localhost.localdomain.60407 > sgs-dc-01.dobest.corp.domain: 20327+ A? api.mch.weixin.qq.com. (39)
13:45:36.754562 IP localhost.localdomain.44534 > sgs-dc-01.dobest.corp.domain: 20065+ AAAA? api.mch.weixin.qq.com. (39)
13:45:36.755395 IP localhost.localdomain.44588 > sgs-dc-01.dobest.corp.domain: 14900+ PTR? 20.136.225.10.in-addr.arpa. (44)
13:45:36.755486 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.60407: 20327 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., A 101.226.137.13, A 101.91.0.140 (118)
13:45:36.756127 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.44588: 14900* 1/0/0 PTR sgs-dc-01.dobest.corp. (79)
13:45:36.756294 IP localhost.localdomain.59552 > sgs-dc-01.dobest.corp.domain: 3799+ PTR? 192.136.225.10.in-addr.arpa. (45)
13:45:36.756886 IP sgs-dc-01.dobest.corp.domain > localhost.localdomain.44534: 20065 4/0/0 CNAME forward.weixin.qq.com., CNAME forwardtmp.weixin.qq.com., AAAA 240e:e1:a900:50::49, AAAA 240e:e1:a900:50::4a (142)

原因猜测

阿里云dns服务器对攻击做了预防机制,这里goalng客户端获取dns时每次都是并行2个连接去做解析,请求频率过快导致被拦截,返回NXDomain错误。

dns解析不稳定处理方案

  1. 更换dns(当前集群里依赖了coredns做解析,更换较为困难)
  2. 增加dns缓存

最后解决方案

增加dns缓存,具体参考:help.aliyun.com/document_de…

转载自:https://juejin.cn/post/7303342257792679955
评论
请登录