[06:32:36] PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:32:38] PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:33:16] PROBLEM - check load on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [06:35:40] PROBLEM - puppet on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused [07:23:16] RECOVERY - check load on ORES-web01.Experimental is OK: OK - load average: 0.41, 0.11, 0.23 [07:24:37] RECOVERY - check disk on ORES-web01.Experimental is OK: DISK OK [07:24:38] RECOVERY - check users on ORES-web01.Experimental is OK: USERS OK - 1 users currently logged in [07:24:53] RECOVERY - puppet on ORES-web01.Experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:53:17] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [07:55:27] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [07:56:17] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 1.356 second response time https://wikitech.wikimedia.org/wiki/ORES [07:56:49] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1007 bytes in 0.155 second response time https://wikitech.wikimedia.org/wiki/ORES [12:06:59] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,3,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10elukey) [12:07:46] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10elukey) [13:54:12] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) It looks to me like all of this log output is actually from celery starting back up. I wo... [13:57:15] 10ORES, 10Scoring-platform-team: Monitor ORES celery worker status - https://phabricator.wikimedia.org/T230931 (10Halfak) [14:00:03] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10elukey) [14:01:01] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10elukey) >>! In T230917#5428548, @Halfak wrote: > It looks to me like all of this log output is actua... [14:01:49] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10elukey) [14:02:51] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @amir1 & @duesen - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:03:26] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) On ores1002, I see the following in app.log: ` 2019-08-21 11:31:10,673 ERROR celery.worker.... [14:05:21] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) I see the same error on ores1006. But celery is clearly still running there. [14:31:35] 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) But on ores1006, the top-level error is: ` redis.exceptions.TimeoutError: Timeout reading f... [14:52:46] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @amir1 & @duesen - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [16:02:53] accraze is AFK today. So I'm doing async standup. [16:04:17] Y: I mostly worked on session-orientation and a paper review. Today I'm working on session orientation and another paper review. [16:04:31] I'm also reviewing task submissions from our Assoc Engineer candidates. [17:01:24] Aha! Also third paper is work is discussing ORES impact measurements with groceryheist. :) [17:01:40] o/ groceryheist Are we meeting today? [17:02:35] This paper cycle is the cycle of ORES. ORES system paper, ORES values study, and ORES impact study. Mwahahaha! [17:03:01] I forsee that this time next year, it'll be Jade [17:03:59] groceryheist, I'm dropping out of the call and going back to paper reviewing. Ping me if you want to talk after all. [17:08:13] I'm going to run a quick errand [17:08:14] brb [17:53:39] 10Scoring-platform-team, 10editquality-modeling, 10Growth-Team (Current Sprint), 10User-Tgr, 10artificial-intelligence: Update ORES filter thresholds for huwiki - https://phabricator.wikimedia.org/T230031 (10JTannerWMF) a:03Tgr [18:49:32] Woo! That was a great paper and I've learned a bunch of things about Communication Privacy Management theory :) [18:55:01] Seems relevant to ORES and Wikipedia generally. [18:55:12] I just pinged Margeigh (head of design strategy) to dig into that more ^_^ [18:55:24] * halfak talks to the darkness [19:21:20] FYI, your team asked us (Growth team) to deploy the editquality model for jawiki, but they turn out to be very bad: T225563 [19:21:21] T225563: Deploy ORES filters for jawiki - https://phabricator.wikimedia.org/T225563 [19:24:32] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Why is jawiki's goodfaith model so bad? - https://phabricator.wikimedia.org/T230953 (10Halfak) [19:24:50] 10Scoring-platform-team, 10Edit-Review-Improvements-Integrated-Filters, 10editquality-modeling, 10Growth-Team (Current Sprint), 10artificial-intelligence: Deploy ORES filters for jawiki - https://phabricator.wikimedia.org/T225563 (10Halfak) Interesting! We struggled to get good performance out of the re... [19:25:10] T230953 [19:25:11] T230953: Why is jawiki's goodfaith model so bad? - https://phabricator.wikimedia.org/T230953 [19:25:14] RoanKattouw, ^ [19:25:19] Thanks for the ping. We'll look into it. [19:25:57] I was hoping based on the global statistics that both models would be *useful*. [19:39:39] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Why is jawiki's goodfaith model so bad? - https://phabricator.wikimedia.org/T230953 (10Catrope) (While you're at it, the damaging model isn't great either: it's not able to identify damaging edits with 60% precision or higher.) [19:39:42] Thanks! [20:04:09] 10Scoring-platform-team, 10Edit-Review-Improvements-Integrated-Filters, 10editquality-modeling, 10Growth-Team (Current Sprint), 10artificial-intelligence: Deploy ORES filters for jawiki - https://phabricator.wikimedia.org/T225563 (10SBisson) a:05SBisson→03None Unassigning myself and moving back to in... [20:17:15] 10Scoring-platform-team, 10Edit-Review-Improvements-Integrated-Filters, 10editquality-modeling, 10Growth-Team (Current Sprint), 10artificial-intelligence: Deploy ORES filters for jawiki - https://phabricator.wikimedia.org/T225563 (10Halfak) I think we could call this "done" since the damaging filters --... [20:36:20] I like the "list_of" meta datasource. It does a good job of managing DRY and minimizing complication [20:36:30] But applying it to a whole tree is going to be weird. [20:38:37] There are some corner cases that I'm trying to think my way through. [20:38:50] E.g. For revision-orientation, we associate a user with a revision. [20:39:19] But for session-orientation, all of the revisions in a session are from the same user, so we don't need to associate a user with each revision. [22:17:59] halfak: I'm at opensym today [22:18:17] sorry didn't make it clear I can't meet today [22:18:25] will definitely meet next week and I'll send some updates [22:18:54] I did rerun the analysis I showed you last week with changes to the revert window and definition of newcomer [22:18:59] and nothing much changed