问题描述
我们开发了一个新的项目,项目核心功能包括对外提供的API接口 和 内部的定时任务。提供的API服务部署在4台服务器上,内部的定时任务部署在另外的2台服务器上。项目中使用了一些会变化的配置信息,这些配置信息,使用Nacos的配置中心管理。
修改Nacos中的配置信息,API网关服务配置会动态变化,但定时任务的服务的动态配置没有生效。这个问题排查,让我有点怀疑人生。下面列出我的排查步骤。
Nacos配置信息
Nacos的配置信息在 bootstrap.yml 中配置。bootstrap.yml 用来程序引导时执行,应用于更加早期配置信息读取。可以理解成系统级别的一些参数配置,这些参数一般是不会变动的。一旦bootStrap.yml 被加载,则内容不会被覆盖。
配置如下:
spring:
application:
name: order-web
##下面是环境区分,主要不同环境不同文件获取
---
#测试环境
spring:
profiles: beta
#nacos
cloud:
nacos:
discovery:
server-addr: 172.0.0.1:8848
namespace: 21406c22-abef-4472-953e-tyea2aeb167a
username: nacos
password: nacos
config:
server-addr: 172.0.0.1:8848
username: nacos
password: nacos
namespace: 21406c22-abef-4472-953e-tyea2aeb167a
group: DEFAULT_GROUP
shared-configs:
- data-id: common-kafka.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-xxl-job.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-redis-order.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-mysql-order.yaml
group: DEFAULT_GROUP
refresh: true
extension-configs:
- data-id: order-config.yaml
group: DEFAULT_GROUP
refresh: true
---
#本地环境
spring:
profiles: local
#nacos
cloud:
nacos:
discovery:
server-addr: 172.0.0.1:8848
namespace: 21406c22-abef-4472-953e-tyea2aeb167b
username: nacos
password: nacos
config:
server-addr: 172.0.0.1:8848
username: nacos
password: nacos
namespace: 21406c22-abef-4472-953e-tyea2aeb167b
group: DEFAULT_GROUP
shared-configs:
- data-id: common-kafka.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-xxl-job.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-redis-order.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-mysql-order.yaml
group: DEFAULT_GROUP
refresh: true
extension-configs:
- data-id: order-config.yaml
group: DEFAULT_GROUP
refresh: true
---
#正式环境
spring:
profiles: prod
#nacos
cloud:
nacos:
discovery:
server-addr: 172.0.0.1:8848
namespace: 21406c22-abef-4472-953e-tyea2aeb167c
username: nacos
password: nacos
config:
server-addr: 172.0.0.1:8848
username: nacos
password: nacos
namespace: 21406c22-abef-4472-953e-tyea2aeb167c
group: DEFAULT_GROUP
shared-configs:
- data-id: common-kafka.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-xxl-job.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-redis-order.yaml
group: DEFAULT_GROUP
refresh: true
- data-id: common-mysql-order.yaml
group: DEFAULT_GROUP
refresh: true
extension-configs:
- data-id: order-config.yaml
group: DEFAULT_GROUP
refresh: true
排查步骤
1、spring.application.name 放在bootstrap配置文件中
定义的 spring.application.name 配置在bootstrap.yml文件中,满足条件。
2、refresh 配置成 true
NacosConfigProperties 的refreshEnabled 默认值为 true,无须配置。shared-configs 和 extension-configs 中 refresh 配置须为true。我们配置的也没有问题。
refresh-enabled: true
3、通过添加打印日志排查
配置Nacos的打印日志,搜索 “ Refresh Nacos config group ”为空
logging:
level:
com:
alibaba:
nacos: DEBUG
NacosContextRefresher类定义如下:
public class NacosContextRefresher implements ApplicationListener, ApplicationContextAware {
public void onApplicationEvent(ApplicationReadyEvent event) {
if (this.ready.compareAndSet(false, true)) {
this.registerNacosListenersForApplications();
}
}
private void registerNacosListenersForApplications() {
if (this.isRefreshEnabled()) {
Iterator var1 = NacosPropertySourceRepository.getAll().iterator();
while(var1.hasNext()) {
NacosPropertySource propertySource = (NacosPropertySource)var1.next();
if (propertySource.isRefreshable()) {
String dataId = propertySource.getDataId();
this.registerNacosListener(propertySource.getGroup(), dataId);
}
}
}
}
private void registerNacosListener(final String groupKey, final String dataKey) {
String key = NacosPropertySourceRepository.getMapKey(dataKey, groupKey);
Listener listener = (Listener)this.listenerMap.computeIfAbsent(key, (lst) -> {
return new AbstractSharedListener() {
public void innerReceive(String dataId, String group, String configInfo) {
NacosContextRefresher.refreshCountIncrement();
NacosContextRefresher.this.nacosRefreshHistory.addRefreshRecord(dataId, group, configInfo);
NacosContextRefresher.this.applicationContext.publishEvent(new RefreshEvent(this, (Object)null, "Refresh Nacos config"));
if (NacosContextRefresher.log.isDebugEnabled()) {
NacosContextRefresher.log.debug(String.format("Refresh Nacos config group=%s,dataId=%s,configInfo=%s", group, dataId, configInfo));
}
}
};
});
try {
this.configService.addListener(dataKey, groupKey, listener);
} catch (NacosException var6) {
log.warn(String.format("register fail for nacos listener ,dataId=[%s],group=[%s]", dataKey, groupKey), var6);
}
}
}
是什么原因导致 ApplicationListener 事件注册失败呢?
4、梳理 spring boot的启动流程
spring boot的核心类SpringApplication
public class SpringApplication {
public ConfigurableApplicationContext run(String... args) {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
ConfigurableApplicationContext context = null;
Collection exceptionReporters = new ArrayList();
this.configureHeadlessProperty();
SpringApplicationRunListeners listeners = this.getRunListeners(args);
listeners.starting();
Collection exceptionReporters;
try {
ApplicationArguments applicationArguments = new DefaultApplicationArguments(args);
ConfigurableEnvironment environment = this.prepareEnvironment(listeners, applicationArguments);
this.configureIgnoreBeanInfo(environment);
Banner printedBanner = this.printBanner(environment);
context = this.createApplicationContext();
exceptionReporters = this.getSpringFactoriesInstances(SpringBootExceptionReporter.class, new Class[]{ConfigurableApplicationContext.class}, context);
this.prepareContext(context, environment, listeners, applicationArguments, printedBanner);
this.refreshContext(context);
this.afterRefresh(context, applicationArguments);
stopWatch.stop();
if (this.logStartupInfo) {
(new StartupInfoLogger(this.mainApplicationClass)).logStarted(this.getApplicationLog(), stopWatch);
}
listeners.started(context);
this.callRunners(context, applicationArguments);
} catch (Throwable var10) {
this.handleRunFailure(context, var10, exceptionReporters, listeners);
throw new IllegalStateException(var10);
}
try {
listeners.running(context);
return context;
} catch (Throwable var9) {
this.handleRunFailure(context, var9, exceptionReporters, (SpringApplicationRunListeners)null);
throw new IllegalStateException(var9);
}
}
}
发现SpringApplication的run()中有一行callRunners(context, applicationArguments); 这个方法内部代码使用主线程执行实现ApplicationRunner和CommandLineRunner的类中的代码,如果这些类中有阻塞,spring就不会执行。
经过上面的分析,可以确定问题了,项目中有些类实现了ApplicationRunner,同时有while(true)的代码,从而导致主线程阻塞在这里。排查我们的代码,如我们预测一样。
@Slf4j
@Component
public class PullIncomeSubscriber implements ApplicationRunner {
@Override
public void run(ApplicationArguments args) throws Exception {
doBusiness();
}
private void doBusiness() {
while (true) {
try {
this.execute();
} catch (Exception ex) {
log.error("PullIncomeSubscriber.execute", ex);
AlterFunction.sendMsg(AlterCodeEnum.AD_TRACK, "收益拉取任务异常:" + ex.getMessage());
}
}
}
public void execute() throws Exception {
PullIncomTask pull = pullIncomeTaskCache.pull();
if (Objects.isNull(pull)) {
Thread.sleep(30000);
return;
}
log.info("PullIncomeSubscriber.execute#adPlatfrom={}", pull.getAdPlatform());
// 业务逻辑
}
}
修改为异步线程执行,问题彻底解决。
@Slf4j
@Component
public class PullIncomeSubscriber implements ApplicationRunner {
private final ExecutorService pool = Executors.newSingleThreadExecutor();
@Override
public void run(ApplicationArguments args) throws Exception {
pool.execute(this::doBusiness);
}
}