[00:17:18] RECOVERY - puppet on ORES-web02.Experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:23:34] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10WMF-JobQueue, 10Wikimedia-production-error: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler - https://phabricator.wikimedia.org/T225199 (10Krinkle) The immediate impact is that a Job that the ORES extens... [06:36:04] PROBLEM - check load on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [06:36:43] PROBLEM - check users on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [06:37:08] PROBLEM - check disk on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [06:37:39] PROBLEM - puppet on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [06:39:54] PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:39:55] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: Server answer [06:40:35] PROBLEM - check load on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:41:05] PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:41:55] RECOVERY - ssh on ORES-web01.Experimental is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u6 (protocol 2.0) [06:42:23] PROBLEM - puppet on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:53:05] RECOVERY - check users on ORES-web01.Experimental is OK: USERS OK - 1 users currently logged in [06:53:54] RECOVERY - check disk on ORES-web01.Experimental is OK: DISK OK [06:54:35] RECOVERY - check load on ORES-web01.Experimental is OK: OK - load average: 0.11, 0.37, 0.82 [06:57:02] RECOVERY - puppet on ORES-web01.Experimental is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:16:43] RECOVERY - check users on ORES-web02.Experimental is OK: USERS OK - 1 users currently logged in [07:16:54] RECOVERY - puppet on ORES-web02.Experimental is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [07:17:08] RECOVERY - check disk on ORES-web02.Experimental is OK: DISK OK [07:18:04] RECOVERY - check load on ORES-web02.Experimental is OK: OK - load average: 0.07, 0.12, 0.17 [08:12:34] 10ORES, 10Scoring-platform-team: Implement sentinel for ORES production Redis - https://phabricator.wikimedia.org/T122676 (10akosiaris) >>! In T122676#5266970, @Halfak wrote: > @akosiaris, I'm checking in on this task. Have we come to any conclusions about investing in Sentinel or not from the ops side of thi... [13:43:59] what the heck was all that? [13:45:10] 10ORES, 10Scoring-platform-team: Implement sentinel for ORES production Redis - https://phabricator.wikimedia.org/T122676 (10Halfak) Sounds reasonable. Thank you for the update! [13:46:50] halfak: hmm? [13:47:21] icinga2 alerted a ton last night because "connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused" [13:47:31] 🤷‍♂️ [13:47:48] Probably just a hiccup [13:48:28] 10ORES, 10Scoring-platform-team, 10Growth-Team, 10WMF-JobQueue, 10Wikimedia-production-error: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler - https://phabricator.wikimedia.org/T225199 (10Halfak) Thank you for the additional information, @Krinkle! Tha... [14:02:51] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:52:36] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @CFisch_WMDE & @amir1 - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [17:28:32] 10ORES, 10Scoring-platform-team, 10Wikidata, 10editquality-modeling, 10artificial-intelligence: ORES is too slow for ORC tool - https://phabricator.wikimedia.org/T226120 (10Halfak) [18:01:24] o/ accraze [18:01:41] Sorry for the last minute invite. I invited you to our "BS meeting" for talking about non-work things :) [18:01:56] cool! be right there [18:02:01] I awesome! [22:52:43] 10ORES, 10Scoring-platform-team, 10RESTBase, 10RESTBase-API, and 2 others: Use RESTBase for ORES precaching - https://phabricator.wikimedia.org/T166161 (10awight) This will reduce the load on the ORES cluster significantly, by relieving of httpd duty when serving cached results—and more results will be cac...