[06:33:15] <icinga2-wm>	 PROBLEM - check load on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused
[06:34:37] <icinga2-wm>	 PROBLEM - check disk on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused
[06:34:38] <icinga2-wm>	 PROBLEM - check users on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused
[06:35:40] <icinga2-wm>	 PROBLEM - puppet on ORES-web01.Experimental is CRITICAL: connect to address 172.16.3.131 port 5666: Connection refusedconnect to host ores-web-01.ores.eqiad.wmflabs port 5666: Connection refused
[06:53:16] <icinga2-wm>	 RECOVERY - check load on ORES-web01.Experimental is OK: OK - load average: 0.11, 0.13, 0.26
[06:54:37] <icinga2-wm>	 RECOVERY - check disk on ORES-web01.Experimental is OK: DISK OK
[06:54:38] <icinga2-wm>	 RECOVERY - check users on ORES-web01.Experimental is OK: USERS OK - 1 users currently logged in
[06:55:22] <icinga2-wm>	 RECOVERY - puppet on ORES-web01.Experimental is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[08:20:19] <icinga-wm>	 PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES
[08:20:21] <icinga-wm>	 PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.020 second response time https://wikitech.wikimedia.org/wiki/ORES
[08:21:13] <icinga-wm>	 RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.074 second response time https://wikitech.wikimedia.org/wiki/ORES
[08:22:31] <icinga-wm>	 RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.093 second response time https://wikitech.wikimedia.org/wiki/ORES
[10:11:34] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Joe) I think this is a reasonable explanation, but how would you suggest we should fix our monitoring?
[10:11:58] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Joe) p:05Triage→03Normal a:03Joe
[13:16:18] <halfak>	 o/
[13:20:55] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) I'm looking into what it would take to monitor a celery worker pool on a specific machine....
[14:50:30] <wikibugs>	 10ORES, 10Scoring-platform-team, 10Operations, 10serviceops: celery-ores-worker service failed on ores100[2,4,5] without any apparent reason or significant log - https://phabricator.wikimedia.org/T230917 (10Halfak) So, I've been trying explore the behaviors of `celery -A ores_celery inspect ping` to see if...
[18:12:49] <accraze>	 halfak: i'm back, still want to go over the ORES session orientation stuff?
[18:13:28] <halfak>	 Hey!  Yes.  Let's do it.  
[19:05:04] <halfak>	 https://github.com/wikimedia/revscoring/pull/450
[19:05:06] <halfak>	 accraze, ^ 
[19:05:30] <halfak>	 My TODO is to try to implement the tree generation function that applies "list_of" to every leaf. 
[19:05:57] <travis-ci>	 wikimedia/revscoring#1691 (session_orientation - 2a68f8c : halfak): The build failed. https://travis-ci.org/wikimedia/revscoring/builds/575488406
[19:06:09] <accraze>	 awesome i'll take a look
[19:07:12] <halfak>	 brb changing locations
[19:07:38] <halfak>	 accraze, not much code so it should be straightforward.  You can see me implementing some of the meta-datasources that I've been thinking about like "first" and "last"
[19:07:46] * halfak runs away
[20:28:32] <halfak>	 Thinking out loud in the chat.  Feel free to ignore: 
[20:28:50] <halfak>	 So I'm converting all of our Dependents to list_of(Dependent)
[20:29:05] <halfak>	 All dependents are a member of a hierarchical DependentSet
[20:29:52] <halfak>	 We can do cool things like ask "is feature a in set X" or even "what features from feature list A are in set X" 
[20:30:09] <halfak>	 We do this by treating the tree as a flat set when asking these questions. 
[20:31:14] <halfak>	 So, when I'm working to recursively process a DependentSet, I could just ask for the flattened set and work from that. 
[20:31:31] <halfak>	 The tree structure of the DependentSet doesn't tell me anything about actual dependencies. 
[20:31:38] <halfak>	 OK I'm convinced.  
[20:31:49] * halfak goes back to writing code. 
[20:59:32] <halfak>	 Bah!  I need to put it back in the same tree though.  Arg!
[21:07:43] <halfak>	 accraze, what's the name of the "maintainer" I should add for revscoring in readthedocs?
[21:08:06] <halfak>	 Aha!  I added you but I imagine I need to add scoring-internal somehow. 
[21:10:34] <halfak>	 I ran into an annoying wall with dependency rewrites so I'm hoping to get that done before I disappear. 
[21:24:04] <halfak>	 I hope you can work with the maintainer status I gave you because I'm outta here. 
[21:24:07] <halfak>	 o/
[21:29:56] <accraze>	 that should work fine, thanks halAFK!