[03:52:17] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3824715 (10mmodell) [10:51:08] 10Scoring-platform-team, 10MediaWiki-extensions-ORES: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824930 (10Ladsgroup) p:05Triage>03Unbreak! [10:51:48] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Wikimedia-log-errors: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824933 (10Legoktm) @Ladsgroup is saying that people can't check or change their preferences on fawiki because of this. [10:54:47] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Wikimedia-log-errors: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824937 (10Legoktm) ``` 2017-12-09 10:52:14 [WivAXgpAEKsAABFZlbQAAABQ] mw1216 fawiki 1.31.0-wmf.11 exception ERROR: [WivAXgpAEKsA... [11:06:39] (03CR) 10Ladsgroup: "That caused a huge regression: https://phabricator.wikimedia.org/T182354" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/392452 (https://phabricator.wikimedia.org/T180866) (owner: 10Petar.petkovic) [11:21:10] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Wikimedia-log-errors: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824960 (10Ladsgroup) Offending commit: https://gerrit.wikimedia.org/r/#/c/392452 CC @jmatazzoni @Catrope... [11:29:35] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Wikimedia-log-errors: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824989 (10Legoktm) p:05Unbreak!>03High Disabled ORES on fawiki for now. This only affected fawiki beca... [11:29:47] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, 10Wikimedia-log-errors: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3824991 (10Legoktm) [11:40:03] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, and 3 others: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3825002 (10Ladsgroup) [11:40:33] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, and 3 others: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3821279 (10Ladsgroup) Made a comment about disabling ORES in [[https://fa.wikipedia.org/w/index.php?old... [12:06:36] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, and 3 others: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3821279 (10Arash.pt) Do you check changes you implement in different configs and systems? Aren't there... [12:40:50] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10Patch-For-Review, 10Regression, and 3 others: OresDamagingPref back-compatibility is logging exceptions - https://phabricator.wikimedia.org/T182354#3825082 (10awight) @Arash.pt Taking your questions at face value: Yes, yes and yes :). The issue here i... [15:34:43] 10Scoring-platform-team, 10ORES, 10Operations: Update logrotate config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497#3825269 (10awight) p:05Triage>03High [15:35:01] awight: good morning [15:45:05] (03PS1) 10Awight: Less verbose Celery logging [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/396586 (https://phabricator.wikimedia.org/T182497) [15:45:16] Amir1: ^ if you’re around [15:48:08] (03CR) 10Awight: [V: 032 C: 032] "Self-merging due to urgency." [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/396586 (https://phabricator.wikimedia.org/T182497) (owner: 10Awight) [15:48:09] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review: Update logrotate config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497#3825284 (10ArielGlenn) Needs to happen: logging to ores logs instead of daemon.log, make sure logrot conf file for ores l... [15:51:35] (03PS1) 10Awight: Less verbose Celery logging [services/ores/deploy] (STABLE) - 10https://gerrit.wikimedia.org/r/396587 (https://phabricator.wikimedia.org/T182497) [15:52:11] (03CR) 10Awight: [V: 032 C: 032] "Self-merging" [services/ores/deploy] (STABLE) - 10https://gerrit.wikimedia.org/r/396587 (https://phabricator.wikimedia.org/T182497) (owner: 10Awight) [15:59:42] 10Scoring-platform-team, 10Release-Engineering-Team: Scap is unhappy about deploying from a branch other than master - https://phabricator.wikimedia.org/T182498#3825297 (10awight) [16:07:31] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3825309 (10awight) This is affecting me in production, now: ``` Timeout, server scb2004.codfw.wmnet not responding. 16:01:39 conn... [16:24:10] o/ halfak [16:24:23] Just did an emergency deployment to reduce logging back to ERROR. [16:24:36] note that production is on a “STABLE” branch, now. [16:24:50] Hey! What? [16:24:53] I wanted to cherry-pick the logging change over without getting the bump to editquality and ores. [16:25:00] T182497 [16:25:00] T182497: Update logrotate config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497 [16:25:46] "morning" [16:25:52] (it's my evening) [16:30:03] o/ apergos [16:30:06] Looking [16:30:19] apergos, how did you notice the emergency? [16:30:33] it's done for now, no need to do anything til it's a workday again [16:30:42] scb1001 very low on space on root partition [16:30:46] 300M left [16:30:54] Oh right on. I'm usually here on Saturday "mornings" though :) [16:31:02] and then shortly after, the other three chimed in as low [16:31:17] Thanks. Good to know. [16:31:26] I won't stop you from looking :-D [16:31:33] but the pressure is off [16:31:43] We'll probably want to review logging for celery. There's config for it and we can probably review some of the levels. [16:31:55] awight, what are your thoughts on what was making it into the log? [16:32:06] the level isn't the main factor here, but [16:32:09] the routing [16:32:22] it was going stright to syslog (daemon.log) [16:32:27] instea of to y'all's logs [16:33:13] yeah. That’s the really creepy part. [16:33:27] apergos, oh! Wow. [16:33:31] yeah [16:33:32] Our logging_config.yaml is explicit, and says “put all that crap into app.log, nowhere else” [16:33:45] You can see on ores-beta that it’s going into daemon.log [16:33:56] And! … with a different log format string than we use. [16:34:03] wtf [16:35:06] Check the task for examples. I’ll add a line from app.log [16:38:57] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review: Update log config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497#3825360 (10awight) [16:40:17] gtg, have fun! [16:40:27] apergos: Thanks again for all the help notifying and fixing. [16:42:02] o/ have a good one awight! [16:42:12] Hey Amir1, you hacking on order stuff today? [16:49:00] halfak: I was asleep on couch in the office [16:49:24] Yesterday was WMDE Xmas party, I was super hungover [16:49:51] Lol [16:50:10] Wmde knows how to party [16:51:16] halfak: my plan for today today is: 1- taking out one method out of Cache.php 2- deploying eswikiquote wikilabels campaign 3- set alarm for failure ratio of ores extension [16:51:19] Feeling better, Amir1? [16:51:26] Zppix: It definitely does [16:51:29] \o/ sounds great :) [16:52:10] halfak: yeah, I'm almost done with 1 but the error handling is a little bit awkward, need to polish that [16:55:34] Amir1, I got the multilabel stuff implemented in revscoring if you want to take a look. I think ultimately, we should let codezee reflect on it before merging, so there's no rush. [16:55:53] halfak: cool, sure [16:55:56] https://github.com/wiki-ai/revscoring/pull/376 [16:56:15] test pass. Just some flake8 stuff to clean up for the build. [16:56:21] I might get some docs in there today. [16:56:38] I found that the "label normalizer" is a really useful abstraction for working around sklearn. [17:56:50] 10Scoring-platform-team (Current), 10ORES, 10monitoring, 10User-Ladsgroup, 10Wikimedia-Incident: Clean up failure ratio monitoring and set up an alarm when it goes more than a certain threshold - https://phabricator.wikimedia.org/T154175#3825402 (10Ladsgroup) Added a group for scoring platform team: http... [18:28:03] I'm off for the day, will work a little bit more tomorrow [18:28:04] o/ [21:33:46] topic [21:33:48] meh [22:19:35] 10Scoring-platform-team, 10ORES, 10Operations, 10Patch-For-Review: Update log config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497#3825924 (10awight) p:05High>03Normal Urgent fix is deployed, lowering the priority.