问题
原来服务是稳定运行的,加了一个子进程功能后,服务器经常无法访问。
具体情况
- 除代码直接返回数据的接口,其他涉及到数据库 IO 的接口都无法访问
- 内存占用正常
- CPU 占用正常(待定)
- 抽风时间不固定
- 同一套数据库,其他接口服务正常
可能性猜测
- 数据库连接池有问题
- 子进程调用方式有问题
- 定时任务有问题
排查问题
排查日志记录,确定抽风是不是有固定的模式,业务。
- 最近一次 2022-09-19 07:00:00.0
1
2
3
4
5
685 [info] 2022-09-19 02:14:02.6 [ApiInterceptor] 126ms 61.165.44.200: 1184582: GET: /api/video: {"current":"1","pageSize":"20","orderBy":"id","order":"DESC","isProfessional":"true","isFake":"false"}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
86 [info] 2022-09-19 07:00:00.0 [DayReportCron] 数据日报开始发送,是否是生产环境:true
87 [info] 2022-09-19 07:47:13.10 [ApiInterceptor] 30ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
88 [info] 2022-09-19 07:47:19.3 [ApiInterceptor] 24ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
89 [info] 2022-09-19 07:47:25.6 [ApiInterceptor] 22ms 180.162.130.187: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
90 [info] 2022-09-19 09:47:21.8 [NestFactory] Starting Nest application... - 倒第二次 2022-09-18 04:00:00.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29328 [info] 2022-09-18 03:30:02.0 [KeywordMonitorCommunityCron] 收到close事件,子进程收到信号 256120
329 [info] 2022-09-18 04:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
330 [info] 2022-09-18 04:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
331 [info] 2022-09-18 05:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
332 [info] 2022-09-18 05:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
333 [info] 2022-09-18 06:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
334 [info] 2022-09-18 06:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
335 [info] 2022-09-18 07:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
336 [info] 2022-09-18 07:00:00.0 [DayReportCron] 数据日报开始发送,是否是生产环境:true
337 [info] 2022-09-18 07:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
338 [info] 2022-09-18 08:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
339 [info] 2022-09-18 08:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
340 [info] 2022-09-18 09:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
341 [info] 2022-09-18 09:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
342 [info] 2022-09-18 10:00:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
343 [warn] 2022-09-18 10:04:53.5 [exceptionLogger] ::ffff:127.0.0.1: -: GET: /: {} A0002: 用户未登录
344 [warn] 2022-09-18 10:05:36.3 [exceptionLogger] ::ffff:127.0.0.1: -: GET: /: {} A0002: 用户未登录
345 [info] 2022-09-18 10:05:43.10 [ApiInterceptor] 27ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
346 [info] 2022-09-18 10:07:57.10 [ApiInterceptor] 30ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36
347 [info] 2022-09-18 10:10:14.10 [ApiInterceptor] 31ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Mobile Safari/537.36
348 [info] 2022-09-18 10:11:53.2 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
349 [warn] 2022-09-18 10:12:28.2 [exceptionLogger] ::1: 1288925: PATCH: /api/configuration/fake-user: {} A0001: 请求参数错误:fakeUsers must be an array
350 [info] 2022-09-18 10:14:21.3 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
351 [info] 2022-09-18 10:14:22.7 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
352 [info] 2022-09-18 10:14:23.6 [ApiInterceptor] 0ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
353 [info] 2022-09-18 10:14:24.4 [ApiInterceptor] 1ms 117.143.104.164: -: GET: /api: {}: apifox/1.0.0 (https://www.apifox.cn)
354 [info] 2022-09-18 10:16:33.2 [ApiInterceptor] 26ms 117.143.104.164: -: GET: /api/sms: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
355 [info] 2022-09-18 10:30:00.0 [KeywordMonitorCommunityCron] 社群关键字开始运行
356 [info] 2022-09-18 10:37:32.3 [NestFactory] Starting Nest application... - 倒第三次 2022-09-17 03:30:02.1
1
2
394 [info] 2022-09-17 03:30:02.1 [KeywordMonitorCommunityCron] 收到close事件,子进程收到信号 145279
95 [info] 2022-09-17 03:35:45.7 [ApiInterceptor] 160ms 61.173.30.240: 1221394: GET: /api/oss/sts: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
96 [info] 2022-09-17 03:45:34.8 [ApiInterceptor] 141ms 61.173.30.240: 1221394: GET: /api/oss/sts: {}: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
日志没有特征,确认是否是定时任务 bug
没有发现问题
暂时定为是数据库连接池问题
确认连接池问题
确认是否是连接池问题
有一段代码循环使用 sql,导致连接池被用尽。
尝试复现 BUG
1 | const value = await PromiseTools.queue(list, async (item) => { |
如果 list 过多,会导致连接池全部使用完,无法释放。
PromiseTools.queue 代码逻辑有问题
1 | static async queue<T, K>(dataList: T[], callBack: (item: T, index: number) => Promise<K> | K, spliceLength = 5): Promise<K[]> { |
尝试优化代码
1 | static async queue<T, K>(dataList: T[], callBack: (item: T, index: number) => Promise<K> | K, spliceLength = 5): Promise<K[]> { |