[06:31:18] PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:31:25] PROBLEM - check load on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:32:11] PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:34:09] PROBLEM - puppet on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [07:23:10] RECOVERY - puppet on ORES-web01.Experimental is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:23:17] RECOVERY - check users on ORES-web01.Experimental is OK: USERS OK - 1 users currently logged in [07:23:25] RECOVERY - check load on ORES-web01.Experimental is OK: OK - load average: 0.15, 0.13, 0.24 [07:24:12] RECOVERY - check disk on ORES-web01.Experimental is OK: DISK OK [14:04:01] 10Scoring-platform-team (Current), 10Wikilabels, 10editquality-modeling, 10artificial-intelligence: Re-label huwiki damaging and badfaith edits - https://phabricator.wikimedia.org/T223882 (10Halfak) We should have the updated model deployed early this week. Sorry for the delay! We had a couple minor hicc... [14:15:43] PROBLEM - ssh on ORES-worker02.experimental is CRITICAL: connect to address ores-worker-02.ores.eqiad.wmflabs and port 22: No route to host [14:15:43] PROBLEM - check load on ORES-worker02.experimental is CRITICAL: connect to address 172.16.3.125 port 5666: No route to hostconnect to host ores-worker-02.ores.eqiad.wmflabs port 5666: No route to host [14:15:50] PROBLEM - check disk on ORES-worker02.experimental is CRITICAL: connect to address 172.16.3.125 port 5666: No route to hostconnect to host ores-worker-02.ores.eqiad.wmflabs port 5666: No route to host [14:15:50] PROBLEM - check users on ORES-worker02.experimental is CRITICAL: connect to address 172.16.3.125 port 5666: No route to hostconnect to host ores-worker-02.ores.eqiad.wmflabs port 5666: No route to host [14:16:09] PROBLEM - puppet on ORES-worker02.experimental is CRITICAL: connect to address 172.16.3.125 port 5666: No route to hostconnect to host ores-worker-02.ores.eqiad.wmflabs port 5666: No route to host [14:16:52] PROBLEM - Host ORES-worker02.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-02.ores.eqiad.wmflabs) [14:17:29] Hmm. Looks like we might actually be down. [14:18:51] RECOVERY - Host ORES-worker02.experimental is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [14:19:15] PROBLEM - ping4 on ORES-redis02.experimental is CRITICAL: CRITICAL - Host Unreachable (ores-redis-02.ores.eqiad.wmflabs) [14:19:45] PROBLEM - Host ORES-redis02.experimental is DOWN: CRITICAL - Host Unreachable (ores-redis-02.ores.eqiad.wmflabs) [14:20:42] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 2103 bytes in 0.041 second response time https://wikitech.wikimedia.org/wiki/ORES [14:21:00] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 2103 bytes in 1.729 second response time https://wikitech.wikimedia.org/wiki/ORES [14:21:00] RECOVERY - puppet on ORES-worker02.experimental is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [14:21:38] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 2103 bytes in 0.028 second response time https://wikitech.wikimedia.org/wiki/ORES [14:21:45] RECOVERY - Host ORES-redis02.experimental is UP: PING OK - Packet loss = 0%, RTA = 21.63 ms [14:22:42] PROBLEM - check load on ORES-worker01.experimental is CRITICAL: connect to address 172.16.3.127 port 5666: No route to hostconnect to host ores-worker-01.ores.eqiad.wmflabs port 5666: No route to host [14:22:42] PROBLEM - ssh on ORES-worker01.experimental is CRITICAL: connect to address ores-worker-01.ores.eqiad.wmflabs and port 22: No route to host [14:23:06] PROBLEM - Host ORES-worker01.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-01.ores.eqiad.wmflabs) [14:23:38] Looks like this is planned downtime. [14:23:46] But whole notification, batman. [14:24:00] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 6.041 second response time https://wikitech.wikimedia.org/wiki/ORES [14:27:04] RECOVERY - Host ORES-worker01.experimental is UP: PING OK - Packet loss = 0%, RTA = 179.00 ms [14:28:56] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [14:29:46] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 4.788 second response time https://wikitech.wikimedia.org/wiki/ORES [14:30:24] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.974 second response time https://wikitech.wikimedia.org/wiki/ORES [14:30:40] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 1.372 second response time https://wikitech.wikimedia.org/wiki/ORES [15:15:51] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [15:17:21] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 4.254 second response time https://wikitech.wikimedia.org/wiki/ORES [16:16:52] PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: No route to hostconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: No route to host [16:16:52] PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: No route to hostconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: No route to host [16:16:55] PROBLEM - ssh on ORES-web01.Experimental is CRITICAL: connect to address ores-web-01.ores.eqiad.wmflabs and port 22: No route to host [16:17:10] PROBLEM - Host ORES-web01.Experimental is DOWN: CRITICAL - Host Unreachable (ores-web-01.ores.eqiad.wmflabs) [16:18:05] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [16:19:10] RECOVERY - Host ORES-web01.Experimental is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [16:19:25] RECOVERY - ping4 on ORES-web01.Experimental is OK: PING OK - Packet loss = 0%, RTA = 0.41 ms [16:19:33] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.099 second response time https://wikitech.wikimedia.org/wiki/ORES [16:24:08] 10ORES, 10Scoring-platform-team (Current): ORES deployment, Early August 2019 - https://phabricator.wikimedia.org/T229848 (10Halfak) [16:25:52] 10ORES, 10Scoring-platform-team (Current): ORES deployment, Early August 2019 - https://phabricator.wikimedia.org/T229848 (10Halfak) [16:25:54] 10Scoring-platform-team (Current), 10articlequality-modeling, 10draftquality-modeling, 10drafttopic-modeling, and 3 others: Retrain models with revscoring 2.5.1 - https://phabricator.wikimedia.org/T229351 (10Halfak) [16:25:56] 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: Wikibase references is a count of ref claims, should be reference statements - https://phabricator.wikimedia.org/T229029 (10Halfak) [16:25:58] 10ORES, 10Scoring-platform-team (Current), 10Analytics-EventLogging, 10Analytics-Kanban, and 4 others: Fix "Must provide the 'topic' parameter" in ORES /precache endpoint - https://phabricator.wikimedia.org/T228689 (10Halfak) [16:26:05] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Tgr, 10artificial-intelligence: Retrain damaging/goodfaith models for huwiki - https://phabricator.wikimedia.org/T228078 (10Halfak) [16:26:07] 10Scoring-platform-team (Current), 10revscoring, 10artificial-intelligence: On en.wikipedia, ref tags inserted by the shortened footnote template, {{sfn}}, are not counted in ORES features - https://phabricator.wikimedia.org/T227153 (10Halfak) [16:26:07] 10[1] 10https://meta.wikimedia.org/wiki/Template:sfn [16:26:36] PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [16:26:38] PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [16:27:15] PROBLEM - check load on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [16:30:40] PROBLEM - puppet on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [16:30:47] 10Scoring-platform-team (Current): Develop automated release strategy from travis CI - https://phabricator.wikimedia.org/T229850 (10Halfak) [16:54:37] RECOVERY - check disk on ORES-web01.Experimental is OK: DISK OK [16:54:38] RECOVERY - check users on ORES-web01.Experimental is OK: USERS OK - 1 users currently logged in [16:55:16] RECOVERY - check load on ORES-web01.Experimental is OK: OK - load average: 0.68, 0.85, 1.52 [16:55:17] RECOVERY - puppet on ORES-web01.Experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:14:18] hey halfak is there a different proxy env var I need to set on beta? It keeps hanging when I do a git `pull` [17:14:41] hmm. I don't think so. Let me check what I have. [17:16:01] No proxy needed on beta as far as I can tell [17:16:09] hmmm [17:17:23] yeah when I do a git pull I get: Failed to connect to webproxy.eqiad.wmnet port 8080: Connection timed out [17:21:59] Oh. No proxy. Remove that [17:22:24] eqiad.wmnet and eqiad.wmflabs are firewalled apart [17:22:30] ahhh [17:25:15] I forgot I had added that to my bashrc [17:25:19] alls good [17:25:21] thanks! [17:33:55] :) no problem [18:06:00] got another question halfak [18:06:34] in the deploy docs it says to backport commits to gerrit from the wmf deploy repo... [18:06:55] does this just mean cherry-pick the commits over to the gerrit repo? [18:08:32] Hmm. I'm not sure. [18:08:33] * halfak reads. [18:09:29] Oh! No. This means that the gerrit (prod) and github (wmflabs) repos for ORES have totally separate histories. [18:09:41] And thus changes need to be made in parallel or allowed to diverge on purpose. [18:10:08] ahh ok so I need to add the changes to gerrit [18:10:17] ? [18:13:53] right. The prod deploy repo is the primarly place we update for beta and prod. [18:14:07] The wmflabs deploy is intended to be used to experiment, but it largely matches our prod config. [18:14:22] Every now and then we have a model that we don't want to send to prod. E.g. the translatewiki model. [18:16:51] * halfak --> lunch [19:12:51] hey halfak, it looks like the gerrit mirrors of articlequality et all have not been updated [19:13:05] is there a manual way to do that? [19:13:25] Oh yes! We have a work-around for that. I'll do it quick. Damn. [19:17:16] * halfak pushes things to articlequality [19:21:17] accraze, https://github.com/wikimedia/editquality/pull/210 [19:21:28] I just saw that I forgot to push a bunch of changes I made last week :| [19:21:31] Ha! [19:22:56] cool looks good, waiting on travis to merge [19:23:36] Blocked on gerrit nonsense. [19:23:40] See #wikimedia-releng [19:55:35] accraze, should be good to go [20:00:11] cool seems like we're back in business [20:02:11] ahh actually the ores mirror doesn't seem to be updated still [20:02:26] Oh that's crazy. [20:06:28] accraze, try again [20:06:39] it worked! [20:18:56] (03PS1) 10Accraze: Release revscoring v2.5.1 [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/528258 (https://phabricator.wikimedia.org/T229848) [20:48:50] * accraze needs sustenance [21:31:24] (03CR) 10Halfak: [V: 03+2 C: 03+2] Release revscoring v2.5.1 [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/528258 (https://phabricator.wikimedia.org/T229848) (owner: 10Accraze) [21:31:41] merged! [21:31:48] Sorry for the delay. Just saw it. [22:02:18] thanks halfak!