0%

20220919服务进程停止原因

问题

原来服务是稳定运行的,加了一个子进程功能后,服务器经常无法访问。

具体情况

  • 除代码直接返回数据的接口,其他涉及到数据库 IO 的接口都无法访问
  • 内存占用正常
  • CPU 占用正常(待定)
  • 抽风时间不固定
  • 同一套数据库,其他接口服务正常

可能性猜测

  • 数据库连接池有问题
  • 子进程调用方式有问题
  • 定时任务有问题

排查问题

排查日志记录,确定抽风是不是有固定的模式,业务。

  • 最近一次 2022-09-19 07:00:00.0
    1
    2
    3
    4
    5
    6
    85 [info] 2022-09-19 02:14:02.6 [ApiInterceptor] 126ms 61.165.44.200: 1184582: GET: /api/video: {"current":"1","pageSize":"20","orderBy":"id","order":"DESC","isProfessional":"true","isFake":"false"}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (K    HTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    86 [info] 2022-09-19 07:00:00.0 [DayReportCron] 数据日报开始发送,是否是生产环境:true
    87 [info] 2022-09-19 07:47:13.10 [ApiInterceptor] 30ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    88 [info] 2022-09-19 07:47:19.3 [ApiInterceptor] 24ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    89 [info] 2022-09-19 07:47:25.6 [ApiInterceptor] 22ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    90 [info] 2022-09-19 09:47:21.8 [NestFactory] Starting Nest application...
  • 倒第二次 2022-09-18 04:00:00.0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    328 [info] 2022-09-18 03:30:02.0 [KeywordMonitorCommunityCron] 收到close事件,子进程收到信号 256120
    329 [info] 2022-09-18 04:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    330 [info] 2022-09-18 04:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    331 [info] 2022-09-18 05:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    332 [info] 2022-09-18 05:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    333 [info] 2022-09-18 06:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    334 [info] 2022-09-18 06:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    335 [info] 2022-09-18 07:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    336 [info] 2022-09-18 07:00:00.0 [DayReportCron] 数据日报开始发送,是否是生产环境:true
    337 [info] 2022-09-18 07:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    338 [info] 2022-09-18 08:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    339 [info] 2022-09-18 08:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    340 [info] 2022-09-18 09:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    341 [info] 2022-09-18 09:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    342 [info] 2022-09-18 10:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    343 [warn] 2022-09-18 10:04:53.5 [exceptionLogger] ::ffff:127.0.0.1: -: GET: /: {} A0002: 用户未登录
    344 [warn] 2022-09-18 10:05:36.3 [exceptionLogger] ::ffff:127.0.0.1: -: GET: /: {} A0002: 用户未登录
    345 [info] 2022-09-18 10:05:43.10 [ApiInterceptor] 27ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    346 [info] 2022-09-18 10:07:57.10 [ApiInterceptor] 30ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36
    347 [info] 2022-09-18 10:10:14.10 [ApiInterceptor] 31ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36
    348 [info] 2022-09-18 10:11:53.2 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
    349 [warn] 2022-09-18 10:12:28.2 [exceptionLogger] ::1: 1288925: PATCH: /api/configuration/fake-user: {} A0001: 请求参数错误:fakeUsers must be an array
    350 [info] 2022-09-18 10:14:21.3 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
    351 [info] 2022-09-18 10:14:22.7 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
    352 [info] 2022-09-18 10:14:23.6 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
    353 [info] 2022-09-18 10:14:24.4 [ApiInterceptor] 1ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
    354 [info] 2022-09-18 10:16:33.2 [ApiInterceptor] 26ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
    355 [info] 2022-09-18 10:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
    356 [info] 2022-09-18 10:37:32.3 [NestFactory] Starting Nest application...
  • 倒第三次 2022-09-17 03:30:02.1
    1
    2
    3
    94 [info] 2022-09-17 03:30:02.1 [KeywordMonitorCommunityCron] 收到close事件,子进程收到信号 145279
    95 [info] 2022-09-17 03:35:45.7 [ApiInterceptor] 160ms 61.173.30.240: 1221394: GET: /api/oss/sts: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
    96 [info] 2022-09-17 03:45:34.8 [ApiInterceptor] 141ms 61.173.30.240: 1221394: GET: /api/oss/sts: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36

日志没有特征,确认是否是定时任务 bug

没有发现问题

暂时定为是数据库连接池问题

确认连接池问题

确认是否是连接池问题

有一段代码循环使用 sql,导致连接池被用尽。

尝试复现 BUG

1
2
3
4
5
6
7
const value = await PromiseTools.queue(list, async (item) => {
const count = await this.labelCategoryDao.findV2LabelCount(item?.id);
return {
...item,
count,
};
});

如果 list 过多,会导致连接池全部使用完,无法释放。
PromiseTools.queue 代码逻辑有问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static async queue<T, K>(dataList: T[], callBack: (item: T, index: number) => Promise<K> | K, spliceLength = 5): Promise<K[]> {
if (!dataList?.length) {
return [];
}
const promiseList: Array<Promise<K> | K> = [];
const list: K[] = [];
for (let i = 0; i < dataList.length; i++) {
promiseList.push(callBack(dataList[i], i));
}
while (promiseList.length) {
list.push(...(await Promise.all(promiseList.splice(0, spliceLength))));
}
return list;
}

尝试优化代码

1
2
3
4
5
6
7
8
9
10
11
12
13
static async queue<T, K>(dataList: T[], callBack: (item: T, index: number) => Promise<K> | K, spliceLength = 5): Promise<K[]> {
if (!dataList?.length) {
return [];
}
const list: K[] = [];
for (let i = 0; i < dataList.length; i += spliceLength) {
const step = i + spliceLength < dataList.length ? spliceLength : dataList.length - i;
const promiseList = new Array(step).fill(0).map((_, index) => callBack(dataList[i + index], i + index));
const result = await Promise.all(promiseList);
list.push(...result);
}
return list;
}