[07:04:43] PROBLEM - check users on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [07:05:08] PROBLEM - check disk on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [07:06:04] PROBLEM - check load on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [07:07:39] PROBLEM - puppet on ORES-web02.Experimental is CRITICAL: connect to address 172.16.6.234 port 5666: Connection refusedconnect to host ores-web-02.ores.eqiad.wmflabs port 5666: Connection refused [08:16:04] RECOVERY - check load on ORES-web02.Experimental is OK: OK - load average: 0.05, 0.01, 0.00 [08:16:20] RECOVERY - puppet on ORES-web02.Experimental is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [08:16:43] RECOVERY - check users on ORES-web02.Experimental is OK: USERS OK - 1 users currently logged in [08:17:08] RECOVERY - check disk on ORES-web02.Experimental is OK: DISK OK [14:02:51] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @chiborg & @milimetric - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:05:17] PROBLEM - ores-extension grafana alert on icinga1001 is CRITICAL: CRITICAL: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is alerting: Service hits for obtaining thresholds alert. https://wikitech.wikimedia.org/wiki/ORES [14:06:43] RECOVERY - ores-extension grafana alert on icinga1001 is OK: OK: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is not alerting. https://wikitech.wikimedia.org/wiki/ORES [14:07:02] I don't see a very big spike here. [14:07:19] I'm going to increase the moving average window a bit for this alert. [14:08:21] PROBLEM - ores grafana alert on icinga1001 is CRITICAL: CRITICAL: ORES advanced metrics ( https://grafana.wikimedia.org/d/vAN_bQemz/ores-advanced-metrics ) is alerting: Overload errors alert. https://wikitech.wikimedia.org/wiki/ORES [14:08:28] I'm stuggling now because it looks like grafana is not responding. [14:08:32] Overload... hmm. [14:08:50] https://grafana.wikimedia.org/d/vAN_bQemz/ores-advanced-metrics?refresh=1m&panelId=9&fullscreen&orgId=1 [14:08:54] This graph won't load for me. [14:09:07] Maybe the alert is because graphite is having issues. [14:09:22] [10:01:41] !log rebooting graphite1004 for kernel security update [14:09:23] Aha! [14:10:53] PROBLEM - ores-extension grafana alert on icinga1001 is CRITICAL: CRITICAL: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is alerting: Service hits for obtaining thresholds alert. https://wikitech.wikimedia.org/wiki/ORES [14:15:13] RECOVERY - ores grafana alert on icinga1001 is OK: OK: ORES advanced metrics ( https://grafana.wikimedia.org/d/vAN_bQemz/ores-advanced-metrics ) is not alerting. https://wikitech.wikimedia.org/wiki/ORES [14:16:03] RECOVERY - ores-extension grafana alert on icinga1001 is OK: OK: ORES extension ( https://grafana.wikimedia.org/d/000000263/ores-extension ) is not alerting. https://wikitech.wikimedia.org/wiki/ORES [14:17:12] OK I just changed our alerting to not go crazy every time that graphite is down :) [14:52:35] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @chiborg & @milimetric - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:27:17] o/ accraze & groceryheist [15:27:48] I'm not going to make it to stand-up because I'm doing meetings at Berkman today. [15:28:13] I'll check the notes for updates. Ping me here. I should have an hour or two to work on things today, so I could help if you get stuck. [15:30:40] Finished groceryheist's new NDA. Talking with people about misinformation campaigns today. [15:30:54] Also talked to people at Wikidata about their data quality efforts and how ORES can help. [15:34:47] Sounds good halfak! I might have some revert model questions a little later today [15:44:24] no updates from me. got sidetracked into papers most of the day yesterday [15:45:36] accraze, cool. I should be around. Will pop out shortly for lunch. Drop an email if you don't see me here. [15:45:47] groceryheist, read anything fun? [15:45:58] no other papers I'm writing [15:46:21] CSCW collab and the opensym paper with Tilman [15:46:30] so less fun, more proofreading and fine tuning plots [15:47:29] goal for next week is to read something fun [16:14:59] Gotcha. [16:54:26] wikimedia/editquality#591 (glwiki - 0ddd7de : Andy Craze): The build passed. https://travis-ci.org/wikimedia/editquality/builds/550908959 [19:11:15] 10Scoring-platform-team: Add Andy Craze to icinga notification for ORES related monitoring - https://phabricator.wikimedia.org/T226417 (10Halfak) Looks like this is the right spot: https://github.com/wikimedia/puppet/blob/47b2c5b18a8a880750baeb48b40bf275d0478b9e/modules/nagios_common/files/contactgroups.cfg [19:13:44] o/ accraze [19:13:47] saw your email. [19:13:59] I responded suggesting you model config after bnwiki [19:55:56] accraze, I just saw your PR and everything looks good except for the Makefile. It looks like there are some human_labeled dataset references in there. [19:56:13] That's weird because I don't see those referenced in the glwiki.yaml file. [19:56:50] So either those shouldn't be there (added manually by you or from an old call to "generate_make") or we have a bug in our Makefile template. [21:14:04] halfak yeah i added the human_labeled reference manually, but I think it was due to an error I was getting. I'll retrace my steps and see if it's a bug or not.