likes
comments
collection
share

SpringBoot 优雅停机

作者站长头像
站长
· 阅读数 15

SpringBoot 优雅停机的方式

  1. K8S 停止 Pod 时,默认会先发送 SIGTERM 信号,尝试让应用进程优雅停机,如果应用进程无法在 K8S 规定的优雅停止超时时间内退出,即 terminationGracePeriodSeconds 的值(默认为 30 秒),则 K8S 会送 SIGKILL 强制杀死应用进程。
  2. 手动停止,发送请求到 Spring Boot Actuator 的停机端点:/actuator/shutdown,SpringBoot 会关闭 Web ApplicationContext,然后退出,实现优雅停机。

kill -TERM 方式

SpringBoot 优雅停机时会调用 @PreDestroy 标注的函数。

@PreDestroy
public void cleanup() {
    // 执行清理操作
    log.info("Received shutdown event. Performing cleanup and shutting down gracefully.");
}

发送 SIGTERM 信号给 SpringBoot 进程,在 cleanup() 打印的日志信息中,找到了执行停机任务的线程名:SpringApplicationShutdownHook。

[2023-09-21 08:29:34.232] INFO  [SpringApplicationShutdownHook] - Received shutdown event. Performing cleanup and shutting down gracefully.

立马全局搜索该线程名,发现 SpringBoot 调用 Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook")) 方法,向 JVM 注册了一个 ShutdownHook。ShutdownHook 可以在 JVM 即将关闭时执行一些清理或收尾的任务。

class SpringApplicationShutdownHook implements Runnable {

    void addRuntimeShutdownHook() {
        try {
           Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
        }
        catch (AccessControlException ex) {
           // Not allowed in some environments
        }
    }

在 SpringApplication#run() 方法中,执行 applicationContext.refresh() 方法之前,向 JVM 注册了 ShutdownHook。

SpringBoot 优雅停机

使用 AtomicBoolean shutdownHookAdded 变量,确保多线程并发执行时,只有一个线程可以成功添加 SpringApplicationShutdownHook。

将 ConfigurableApplicationContext context 对象添加到 Set<ConfigurableApplicationContext> contexts 集合中,后续会调用会调用 close() 方法关闭 ConfigurableApplicationContext 对象。SpringBoot Web 容器对应的实现为 AnnotationConfigServletWebServerApplicationContext。

class SpringApplicationShutdownHook implements Runnable {

    private final Set<ConfigurableApplicationContext> contexts = new LinkedHashSet<>();

    private final AtomicBoolean shutdownHookAdded = new AtomicBoolean();

    SpringApplicationShutdownHandlers getHandlers() {
       return this.handlers;
    }

    void registerApplicationContext(ConfigurableApplicationContext context) {
       addRuntimeShutdownHookIfNecessary();
       synchronized (SpringApplicationShutdownHook.class) {
          assertNotInProgress();
          context.addApplicationListener(this.contextCloseListener);
          this.contexts.add(context);
       }
    }

    private void addRuntimeShutdownHookIfNecessary() {
       if (this.shutdownHookAdded.compareAndSet(false, true)) {
          addRuntimeShutdownHook();
       }
    }

    void addRuntimeShutdownHook() {
       try {
          Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
       }
       catch (AccessControlException ex) {
          // Not allowed in some environments
       }
    }

SpringApplicationShutdownHook: A Runnable to be used as a shutdown hook to perform graceful shutdown of Spring Boot applications. run() 方法中做了两件重要的事情:

  1. contexts.forEach(this::closeAndWait):关闭 ConfigurableApplicationContext,并等待 context 变为 inactive,超时时间默认 10S。如果 context.close() 操作中存在非常耗时的同步操作 ,这里的超时等待不会生效,程序会阻塞在 context.close() 操作。
  2. actions.forEach(Runnable::run):用户自定义的 Shutdown Action 可以添加到 this.handlers 中,SpringApplicationShutdownHook 在执行关闭任务时,会回调用户自定义的 Shutdown Action。Logback 优雅停机就用到了这个机制,后面会说到。
class SpringApplicationShutdownHook implements Runnable {

    private static final int SLEEP = 50;
    
    private static final long TIMEOUT = TimeUnit.MINUTES.toMillis(10);
    
    @Override
    public void run() {
        Set<ConfigurableApplicationContext> contexts;
        Set<ConfigurableApplicationContext> closedContexts;
        Set<Runnable> actions;
        synchronized (SpringApplicationShutdownHook.class) {
           this.inProgress = true;
           contexts = new LinkedHashSet<>(this.contexts);
           closedContexts = new LinkedHashSet<>(this.closedContexts);
           actions = new LinkedHashSet<>(this.handlers.getActions());
        }
        contexts.forEach(this::closeAndWait);
        closedContexts.forEach(this::closeAndWait);
        actions.forEach(Runnable::run);
    }
    
    // Call ConfigurableApplicationContext.close() and wait until the context becomes inactive. 
    // We can't assume that just because the close method returns that the context is actually inactive. 
    // It could be that another thread is still in the process of disposing beans.
    // 关闭 ConfigurableApplicationContext,等待 context 变为 inactive,超时时间默认 10S
    private void closeAndWait(ConfigurableApplicationContext context) {
        if (!context.isActive()) {
           return;
        }
        context.close();
        try {
           int waited = 0;
           while (context.isActive()) {
              if (waited > TIMEOUT) {
                 throw new TimeoutException();
              }
              Thread.sleep(SLEEP);
              waited += SLEEP;
           }
        }
        catch (InterruptedException ex) {
           Thread.currentThread().interrupt();
           logger.warn("Interrupted waiting for application context " + context + " to become inactive");
        }
        catch (TimeoutException ex) {
           logger.warn("Timed out waiting for application context " + context + " to become inactive", ex);
        }
    }

ConfigurableApplicationContext#close() 方法注意事项:

Close this application context, releasing all resources and locks that the implementation might hold. This includes destroying all cached singleton beans.

Note: Does not invoke close on a parent context; parent contexts have their own, independent lifecycle.

This method can be called multiple times without side effects: Subsequent close calls on an already closed context will be ignored.

ShutdownEndpoint 方式

在 yml 中添加如下配置,暴露 Spring Actuator Shutdown 端点:/actuator/shutdown。

management:
  endpoint:
    shutdown:
      enabled: true
  endpoints:
    web:
      exposure:
        include: '*'

ShutdownEndpoint 原理:接收到请求后,启动新线程执行 this.context.close() 操作。

@Endpoint(id = "shutdown", enableByDefault = false)
public class ShutdownEndpoint implements ApplicationContextAware {

    private static final Map<String, String> NO_CONTEXT_MESSAGE = Collections
          .unmodifiableMap(Collections.singletonMap("message", "No context to shutdown."));

    private static final Map<String, String> SHUTDOWN_MESSAGE = Collections
          .unmodifiableMap(Collections.singletonMap("message", "Shutting down, bye..."));

    private ConfigurableApplicationContext context;

    @WriteOperation
    public Map<String, String> shutdown() {
       if (this.context == null) {
          return NO_CONTEXT_MESSAGE;
       }
       try {
          return SHUTDOWN_MESSAGE;
       }
       finally {
          Thread thread = new Thread(this::performShutdown);
          thread.setContextClassLoader(getClass().getClassLoader());
          thread.start();
       }
    }

    private void performShutdown() {
       try {
          Thread.sleep(500L);
       }
       catch (InterruptedException ex) {
          Thread.currentThread().interrupt();
       }
       this.context.close();
    }

注意:执行 this.context.close() 时,也会异步触发 SpringApplicationShutdownHook#run() 方法,至于是咋触发的,我也没搞清楚。。。

和发送 SIGTERM 信号相比,SpringApplicationShutdownHook 会在 ApplicationContextClosedListener 中监听 closedContexts,确保不会重复调用 context#close() 方法。

class SpringApplicationShutdownHook implements Runnable {

    private final Set<ConfigurableApplicationContext> contexts = new LinkedHashSet<>();

    private final Set<ConfigurableApplicationContext> closedContexts = Collections.newSetFromMap(new WeakHashMap<>());

    private final ApplicationContextClosedListener contextCloseListener = new ApplicationContextClosedListener();

    // ApplicationListener to track closed contexts.
    private class ApplicationContextClosedListener implements ApplicationListener<ContextClosedEvent> {

        @Override
        public void onApplicationEvent(ContextClosedEvent event) {
           // The ContextClosedEvent is fired at the start of a call to {@code close()}
           // and if that happens in a different thread then the context may still be
           // active. Rather than just removing the context, we add it to a {@code
           // closedContexts} set. This is weak set so that the context can be GC'd once
           // the {@code close()} method returns.
           synchronized (SpringApplicationShutdownHook.class) {
              ApplicationContext applicationContext = event.getApplicationContext();
              SpringApplicationShutdownHook.this.contexts.remove(applicationContext);
              SpringApplicationShutdownHook.this.closedContexts
                    .add((ConfigurableApplicationContext) applicationContext);
           }
        }

    }

SpringBoot 优雅停机

SpringBoot Tomcat 优雅停机

SpringBoot 接收到停机信号,默认会立即终止 Tomcat,不会等待现有请求完成。在配置文件中加上 server.shutdown=GRACEFUL 配置,Tomcat 等待当前请求完成,实现优雅停机。

server:
  shutdown: GRACEFUL

server.shutdown=IMMEDIATE 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控制台咔咔报错。

[2023-09-22 09:11:08.533] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:11:10.842] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:11:10.863] ERROR [http-nio-8080-exec-3] [bcb01a34-9721-461f-ad3d-2f71c386ff10] [TID: N/A] - controller system exception, java.nio.channels.ClosedChannelException
org.apache.catalina.connector.ClientAbortException: java.nio.channels.ClosedChannelException
	at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
	at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:784)
	at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:299)
[2023-09-22 09:11:10.915] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]

server.shutdown=GRACEFUL 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控台输出:Commencing graceful shutdown. Waiting for active requests to complete,SpringBoot 进程会等待 active requests 完成,再退出。

[2023-09-22 09:18:12.507] INFO  [Thread-5] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 53 [] [TID: N/A] - Commencing graceful shutdown. Waiting for active requests to complete
[2023-09-22 09:18:12.507] INFO  [tomcat-shutdown] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.633] INFO  [tomcat-shutdown] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 78 [] [TID: N/A] - Graceful shutdown complete
[2023-09-22 09:18:17.637] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.645] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.657] INFO  [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]

顺藤摸瓜,在 TomcatWebServer 源码中找到 Graceful Shutdown 的相关代码,如果 shutdown == Shutdown.GRACEFUL 时,会创建 GracefulShutdown 实例,处理优雅停机相关操作:this.gracefulShutdown.shutDownGracefully(callback),否则接收到停止信号后会立即停机:callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE)。

TomcatWebServer#shutDownGracefully() 在 WebServerGracefulShutdownLifecycle#stop() 生命周期函数中被调用。

public class TomcatWebServer implements WebServer {

    private final Tomcat tomcat;

    private final boolean autoStart;

    private final GracefulShutdown gracefulShutdown;
        
    public TomcatWebServer(Tomcat tomcat, boolean autoStart, Shutdown shutdown) {
        Assert.notNull(tomcat, "Tomcat Server must not be null");
        this.tomcat = tomcat;
        this.autoStart = autoStart;
        this.gracefulShutdown = (shutdown == Shutdown.GRACEFUL) ? new GracefulShutdown(tomcat) : null;
        initialize();
    }
    
    @Override
    public void shutDownGracefully(GracefulShutdownCallback callback) {
        if (this.gracefulShutdown == null) {
           callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE);
           return;
        }
        this.gracefulShutdown.shutDownGracefully(callback);
    }
    

GracefulShutdown#shutDownGracefully() 新建了一个线程,异步执行 doShutdown() 方法:获取所有 Connectors,执行 connector.getProtocolHandler().closeServerSocketGraceful() 方法,优雅关闭还未断开连接的 ServerSocket,然后再 while 循环中不断等待 TomcatEmbeddedContext 变为 inactive 状态,调用回调函数,将 Tomcat 状态设置为 GracefulShutdownResult.IDLE。

如果优雅关闭未在规定时间内返回,this.aborted 会被设置为 true,将 Tomcat 状态设置为 GracefulShutdownResult.REQUESTS_ACTIVE 并返回。

// Handles Tomcat graceful shutdown.
final class GracefulShutdown {

    private final Tomcat tomcat;

    private volatile boolean aborted = false;

    GracefulShutdown(Tomcat tomcat) {
       this.tomcat = tomcat;
    }

    void shutDownGracefully(GracefulShutdownCallback callback) {
       logger.info("Commencing graceful shutdown. Waiting for active requests to complete");
       new Thread(() -> doShutdown(callback), "tomcat-shutdown").start();
    }

    private void doShutdown(GracefulShutdownCallback callback) {
       List<Connector> connectors = getConnectors();
       connectors.forEach(this::close);
       try {
          for (Container host : this.tomcat.getEngine().findChildren()) {
             for (Container context : host.findChildren()) {
                while (isActive(context)) {
                   if (this.aborted) {
                      logger.info("Graceful shutdown aborted with one or more requests still active");
                      callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
                      return;
                   }
                   Thread.sleep(50);
                }
             }
          }

       }
       catch (InterruptedException ex) {
          Thread.currentThread().interrupt();
       }
       logger.info("Graceful shutdown complete");
       callback.shutdownComplete(GracefulShutdownResult.IDLE);
    }
    
    private void close(Connector connector) {
        connector.pause();
        connector.getProtocolHandler().closeServerSocketGraceful();
    }

SpringBoot 优雅停机

优雅停机的关键就在 Connector#close() 方法中,不过太底层了,啃不动。

public abstract class AbstractEndpoint<S,U> {

    // Close the server socket (to prevent further connections) if the server socket was originally bound on start() (rather than on init()).
    public final void closeServerSocketGraceful() {
        if (bindState == BindState.BOUND_ON_START) {
            // Stop accepting new connections
            acceptor.stop(-1);
            // Release locks that may be preventing the acceptor from stopping
            releaseConnectionLatch();
            unlockAccept();
            // Signal to any multiplexed protocols (HTTP/2) that they may wish
            // to stop accepting new streams
            getHandler().pause();
            // Update the bindState. This has the side-effect of disabling
            // keep-alive for any in-progress connections
            bindState = BindState.SOCKET_CLOSED_ON_STOP;
            try {
                doCloseServerSocket();
            } catch (IOException ioe) {
                getLog().warn(sm.getString("endpoint.serverSocket.closeFailed", getName()), ioe);
            }
        }
    }

后续会执行 TomcatWebServer#stop() 方法,如果超过规定时间,Tomcat GracefulShutdown 还未完成其任务,则会执行 TomcatWebServer#stop() 强制停止 Tomcat。

TomcatWebServer#stop() 在 WebServerStartStopLifecycle#stop() 生命周期函数中被调用。

public class TomcatWebServer implements WebServer {

    @Override
    public void stop() throws WebServerException {
        synchronized (this.monitor) {
           boolean wasStarted = this.started;
           try {
              this.started = false;
              try {
                 if (this.gracefulShutdown != null) {
                    this.gracefulShutdown.abort();
                 }
                 stopTomcat();
                 this.tomcat.destroy();
              }
              catch (LifecycleException ex) {
                 // swallow and continue
              }
           }
           catch (Exception ex) {
              throw new WebServerException("Unable to stop embedded Tomcat", ex);
           }
           finally {
              if (wasStarted) {
                 containerCounter.decrementAndGet();
              }
           }
        }
    }

Logback 优雅停机,保证日志不丢失

为了优化程序日志性能,通常有两个做法:

  1. 设置 OutputStreamAppender#immediateFlush = false,OutputStreamAppender#immediateFlush 默认为 true,默认每次 log event 都强制执行 flush 刷盘操作。将 immediateFlush 改为 false 后,不用每次 log event 都执行刷盘操作,可减少 IO 刷盘次数。但是当 Pod 重启或者停止时,可能会丢失操作系统未 flush 的日志内容。这就需要利用 ShutdownHook 实现 logback 优雅停机。
  2. 设置 AsyncAppender,logback 默认同步方式打印日志,在同一个进程中,每个线程需要先获取 lock 锁,才能操作 outputStream,多线程同时打印日志,锁争抢可能导致性能问题。使用 AsyncAppender 装饰原生 Appender,log event 变为异步操作,由统一的线程统一操作 outputStream。问题同上,Pod 重启或者停止时,可能会丢失 BlockingQueue 中的 log event,同样需要利用 ShutdownHook 实现 logback 优雅停机。

说个好消息,SpringBoot 已经帮我们造好了轮子,而且 AutoConfiguration 也默认生效,也就是说,我们啥代码也不需要写,只需要保证 SpringBoot 能够正确接收到 SIGTERM 信号,就行。。。他真的,我哭死。。。

logback 优雅停机回调函数的注册:在 LoggingApplicationListener#onApplicationEvent() 方法中监听到 ApplicationEnvironmentPreparedEvent 事件,会调用 SpringApplication.getShutdownHandlers().add(shutdownHandler) 方法,向 SpringApplication.getShutdownHandlers() 中注册 logback shutdownHandler。该 shutdownHandler 会被 SpringApplicationShutdownHook#run() 方法回调。

public class LoggingApplicationListener implements GenericApplicationListener {

    @Override
    public void onApplicationEvent(ApplicationEvent event) {
        // ...
        else if (event instanceof ApplicationEnvironmentPreparedEvent) {
           onApplicationEnvironmentPreparedEvent((ApplicationEnvironmentPreparedEvent) event);
        }
        // ...
    }

    private void registerShutdownHookIfNecessary(Environment environment, LoggingSystem loggingSystem) {
        if (environment.getProperty(REGISTER_SHUTDOWN_HOOK_PROPERTY, Boolean.class, true)) {
           Runnable shutdownHandler = loggingSystem.getShutdownHandler();
           if (shutdownHandler != null && shutdownHookRegistered.compareAndSet(false, true)) {
              registerShutdownHook(shutdownHandler);
           }
        }
    }
    void registerShutdownHook(Runnable shutdownHandler) {
        SpringApplication.getShutdownHandlers().add(shutdownHandler);
    }

上述代码添加的 Logback ShutdownHandler 在 LogbackLoggingSystem 类中定义:

public class LogbackLoggingSystem extends Slf4JLoggingSystem {

    public Runnable getShutdownHandler() {
        return () -> getLoggerContext().stop();
    }

LifeCycle 接口是 logback 组件的生命周期规范,stop() 方法是销毁组件的方法,Appender 接口实现了 LifeCycle 规范,调用 Appender#stop() 方法可以优雅地销毁 Appender 实例。

public interface Appender<E> extends LifeCycle, ContextAware, FilterAttachable<E> {

getLoggerContext().stop() --> reset() 会调用 root.recursiveReset() 方法,这个 root 是 ch.qos.logback.classic.Logger 对象,对应着 logback <root> 标签。

<root level="INFO">
    <appender-ref ref="STDOUT"/>
    <appender-ref ref="FILE"/>
</root>

SpringBoot 优雅停机

root logger 对象中聚合两个 appender 对象,分别为代码中配置的 ConsoleAppender 和 RollingFileAppender。在 AppenderAttachableImpl#detachAndStopAllAppenders() 方法中,遍历 Appender 对象,调用其 stop() 方法,销毁实例。

public final class Logger implements org.slf4j.Logger, LocationAwareLogger, AppenderAttachable<ILoggingEvent>, Serializable {

    transient private AppenderAttachableImpl<ILoggingEvent> aai;

    public void detachAndStopAllAppenders() {
        if (aai != null) {
            aai.detachAndStopAllAppenders();
        }
    }
    
public class AppenderAttachableImpl<E> implements AppenderAttachable<E> {

    final private COWArrayList<Appender<E>> appenderList = new COWArrayList<Appender<E>>(new Appender[0]);

    public void detachAndStopAllAppenders() {
        for (Appender<E> a : appenderList) {
            a.stop();
        }
        appenderList.clear();
    }

OutputStreamAppender stop 时会关闭输出流,该操作将未 flush 的日志内容强制刷出到 this.outputStream 中,并关闭输出流。

public class OutputStreamAppender<E> extends UnsynchronizedAppenderBase<E> {

    public void stop() {
        lock.lock();
        try {
            closeOutputStream();
            super.stop();
        } finally {
            lock.unlock();
        }
    }
    
    protected void closeOutputStream() {
        if (this.outputStream != null) {
            try {
                // before closing we have to output out layout's footer
                encoderClose();
                this.outputStream.close();
                this.outputStream = null;
            } catch (IOException e) {
                addStatus(new ErrorStatus("Could not close output stream for OutputStreamAppender.", this, e));
            }
        }
    }

AsyncAppenderBase stop 时,会等待 work 线程:worker.join(maxFlushTime),默认时间为 1s。

public class AsyncAppenderBase<E> extends UnsynchronizedAppenderBase<E> implements AppenderAttachable<E> {

    // The default maximum queue flush time allowed during appender stop. 
    // If the worker takes longer than this time it will exit, discarding any remaining items in the queue
    public static final int DEFAULT_MAX_FLUSH_TIME = 1000;
    int maxFlushTime = DEFAULT_MAX_FLUSH_TIME;

    @Override
    public void stop() {
        if (!isStarted())
            return;

        // mark this appender as stopped so that Worker can also processPriorToRemoval if it is invoking
        // aii.appendLoopOnAppenders
        // and sub-appenders consume the interruption
        super.stop();

        // interrupt the worker thread so that it can terminate. Note that the interruption can be consumed
        // by sub-appenders
        worker.interrupt();

        InterruptUtil interruptUtil = new InterruptUtil(context);

        try {
            interruptUtil.maskInterruptFlag();

            worker.join(maxFlushTime);

            // check to see if the thread ended and if not add a warning message
            if (worker.isAlive()) {
                addWarn("Max queue flush timeout (" + maxFlushTime + " ms) exceeded. Approximately " + blockingQueue.size()
                                + " queued events were possibly discarded.");
            } else {
                addInfo("Queue flush finished successfully within timeout.");
            }

        } catch (InterruptedException e) {
            int remaining = blockingQueue.size();
            addError("Failed to join worker thread. " + remaining + " queued events may be discarded.", e);
        } finally {
            interruptUtil.unmaskInterruptFlag();
        }
    }