[06:33:44] PROBLEM - check load on ORES-web03.experimental is WARNING: WARNING - load average: 15.56, 5.12, 1.80 [06:34:41] RECOVERY - check load on ORES-web03.experimental is OK: OK - load average: 6.55, 4.37, 1.73 [07:15:10] 10Scoring-platform-team, 10Wikilabels, 10I18n: Wikilabel interface in Hungarian has uninformative action buttons, due to translations not updating - https://phabricator.wikimedia.org/T183068#3842574 (10Ladsgroup) I deployed changes and it should be okay now. [07:15:27] 10Scoring-platform-team (Current), 10Wikilabels, 10I18n, 10User-Ladsgroup: Wikilabel interface in Hungarian has uninformative action buttons, due to translations not updating - https://phabricator.wikimedia.org/T183068#3843687 (10Ladsgroup) a:03Ladsgroup [10:16:26] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3844074 (10akosiaris) [12:16:23] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Develop a backup strategy for campaigns/tasks/labels - https://phabricator.wikimedia.org/T155116#3844387 (10Ladsgroup) Okay I made wikilabels-dumps.wmflabs.org/psql_dumps/ and documented the steps in https://wikitech.wikimedia.org/wiki/Nova_R... [13:00:56] (03PS1) 10Ladsgroup: Follow up for PSR-4 work [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) [13:02:09] (03CR) 10jerkins-bot: [V: 04-1] Follow up for PSR-4 work [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) (owner: 10Ladsgroup) [13:05:19] (03PS1) 10Ladsgroup: Use ExtensionRegistry to check if BetaFeatures is loaded [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398831 (https://phabricator.wikimedia.org/T183096) [13:06:03] Amir1: what did you deploy on wikilabels last night out of curiosity [13:06:32] Zppix: The i18n updates and the flag for inactive campaigns [13:06:45] Hungarian? [13:07:05] some languages including Hungarian [13:07:06] (03CR) 10jerkins-bot: [V: 04-1] Use ExtensionRegistry to check if BetaFeatures is loaded [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398831 (https://phabricator.wikimedia.org/T183096) (owner: 10Ladsgroup) [13:07:15] Amir1: sweet thank you! [13:08:55] (03PS2) 10Ladsgroup: Follow up for PSR-4 work [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) [13:09:12] yw [13:10:24] (03CR) 10Ladsgroup: "recheck" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398831 (https://phabricator.wikimedia.org/T183096) (owner: 10Ladsgroup) [14:07:16] (03PS1) 10Petar.petkovic: Make Contributions ORES preference sticky [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) [14:08:29] (03CR) 10jerkins-bot: [V: 04-1] Make Contributions ORES preference sticky [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) (owner: 10Petar.petkovic) [14:10:04] (03PS2) 10Petar.petkovic: Make Contributions ORES preference sticky [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) [14:11:49] (03CR) 10jerkins-bot: [V: 04-1] Make Contributions ORES preference sticky [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) (owner: 10Petar.petkovic) [14:25:11] 10Scoring-platform-team, 10Wikilabels, 10Google-Code-in-2017: Provide a pytest for database of wikilabels - https://phabricator.wikimedia.org/T179014#3844726 (10Eisenhaus335) I am found a bug and i am already fix it. I am still wondering what i just do and the true meaning of it. Take a look on this test htt... [14:27:35] (03PS3) 10Petar.petkovic: Make Contributions ORES preference sticky [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) [15:40:05] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3845144 (10Halfak) 05Open>03Resolved a:03awight [15:40:19] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for eswikiquote - https://phabricator.wikimedia.org/T182218#3816871 (10Halfak) 05Open>03Resolved [15:40:21] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832581 (10Halfak) [15:40:25] 10Scoring-platform-team (Current), 10editquality-modeling, 10Patch-For-Review, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Icelandic - https://phabricator.wikimedia.org/T181099#3779529 (10Halfak) 05Open>03Resolved [15:40:28] 10Scoring-platform-team (Current), 10ORES, 10Patch-For-Review: Deploy ORES mid December (2017) - https://phabricator.wikimedia.org/T182719#3832581 (10Halfak) [15:42:57] 10Scoring-platform-team, 10ORES, 10Operations, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3845155 (10awight) [15:51:14] 10Scoring-platform-team (Current), 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Wikimedia-Incident: Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3845197 (10awight) @akosiaris Just a nudge, I'm waiting for your feedback... [15:52:08] 10Scoring-platform-team, 10Wikimedia-Incident: How can we test all the wiki/page combinations that can be affected by ORES? - https://phabricator.wikimedia.org/T181830#3845202 (10Halfak) [15:52:55] 10Scoring-platform-team, 10Global-Collaboration, 10Wikimedia-Incident: How can we test all the wiki/page combinations that can be affected by ORES? - https://phabricator.wikimedia.org/T181830#3845213 (10awight) [15:53:13] 10Scoring-platform-team, 10Global-Collaboration, 10Wikimedia-Incident: How can we test all the wiki/page combinations that can be affected by ORES? - https://phabricator.wikimedia.org/T181830#3845214 (10Halfak) We talked to @jmatazzoni and @Catrope. They agreed that Collaboration should look into implementi... [15:57:07] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10User-Ladsgroup: Redesign onRecentChange_save hook handler for ORES - https://phabricator.wikimedia.org/T181335#3845223 (10Halfak) [15:59:46] 10Scoring-platform-team, 10cloud-services-team: Reload monthly article quality dataset into wikireplica "datasets_p" - https://phabricator.wikimedia.org/T179187#3845231 (10Halfak) [16:00:06] 10Scoring-platform-team (Current), 10ORES: Design JADE data storage schema - https://phabricator.wikimedia.org/T153152#3845232 (10Halfak) a:05awight>03None [16:00:40] 10Scoring-platform-team (Current), 10Wikilabels, 10editquality-modeling, 10Bengali-Sites, 10artificial-intelligence: Edit quality campaign for Bengali Wikipedia - https://phabricator.wikimedia.org/T174878#3845234 (10Halfak) a:05awight>03None [16:04:57] SPAM [16:04:57] ! [16:08:36] (03CR) 10Catrope: [C: 04-1] "I don't think making this checkbox sticky is the right way to solve this problem. The fundamental problem is that the checkboxes on the co" (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398841 (https://phabricator.wikimedia.org/T182911) (owner: 10Petar.petkovic) [16:21:48] (03CR) 10Awight: "Looks right!" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) (owner: 10Ladsgroup) [16:21:49] (03CR) 10Awight: [C: 032] Follow up for PSR-4 work [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) (owner: 10Ladsgroup) [16:22:51] (03CR) 10Awight: [C: 032] "Matches the manual." [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398831 (https://phabricator.wikimedia.org/T183096) (owner: 10Ladsgroup) [16:24:04] * Zppix bans halfak for spam xD [16:24:33] hey Zppix [16:24:38] Heyo [16:24:44] Saw your blog post. Thanks for sharing. It is very flattering. [16:25:19] I'm still feeling sick today (somehow sicker than yesterday) so I'm gonna be mostly AFK [16:25:46] (03Merged) 10jenkins-bot: Follow up for PSR-4 work [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398829 (https://phabricator.wikimedia.org/T182943) (owner: 10Ladsgroup) [16:26:47] (03Merged) 10jenkins-bot: Use ExtensionRegistry to check if BetaFeatures is loaded [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398831 (https://phabricator.wikimedia.org/T183096) (owner: 10Ladsgroup) [16:31:29] halfak: :( [16:31:42] halfak: hope its not the flu thats been goin round [16:32:17] * TheresNoTime sprays lysol around the channel [16:32:35] * Zppix gaga [16:32:40] Gags* [16:35:03] (03CR) 10Awight: [C: 04-1] "Minor docstring fix requested..." (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/398651 (https://phabricator.wikimedia.org/T181334) (owner: 10Ladsgroup) [16:46:16] halfak: Random thought: have you profiled PyPy over CPython? [16:47:02] Not in a while. I did it a while ago for mwxml and cpython was generally faster for anything that involved a regex [16:53:23] interesting! [16:54:02] lol http://scikit-learn.org/stable/faq.html#do-you-support-pypy [16:54:05] nvm. [17:04:39] wiki-ai/wikilabels#243 (master - 3819f49 : translatewiki.net): The build has errored. https://travis-ci.org/wiki-ai/wikilabels/builds/318187253 [17:05:18] Hmmm [17:06:08] awight: pip install requirements is failing on wikilabels [17:08:12] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3845447 (10akosiaris) [17:08:23] Zppix: oh yeah? lemme see... [17:08:39] wiki-ai/wikilabels#243 (master - 3819f49 : translatewiki.net): The build has errored. https://travis-ci.org/wiki-ai/wikilabels/builds/318187253 [17:09:02] awight: i reran it just to be sure it wasnt an one off thing and it still errored [17:10:02] That error log doesn’t make any sense to me. [17:10:21] Do you see a line that actually errored [17:10:22] ? [17:11:17] 493 [17:11:34] I… don’t have lines beyond 433 [17:11:38] What’s the link? [17:12:07] I’m looking at https://travis-ci.org/wiki-ai/wikilabels/builds/318187253 [17:12:12] https://travis-ci.org/wiki-ai/wikilabels/builds/318187253#L493 [17:14:11] Thanks. I must have been getting a cached page. [17:15:08] Zppix: https://github.com/psycopg/psycopg2/issues/594 [17:15:24] Looks like we need psycopg2 >=2.7 [17:15:38] I can do a patch now [17:15:40] Amir1: ^ new build thing [17:16:44] shoot, I will take a look soon [17:16:55] I need to lie down a little [17:17:02] awight: i can write up a patch should be hard to fix [17:17:12] Shoudnt* [17:17:19] Zppix: Awesome! [17:22:17] wiki-ai/wikilabels#244 (Fix-builds - 1d87036 : Devin/Zppix): The build passed. https://travis-ci.org/wiki-ai/wikilabels/builds/318194583 [17:22:18] It passed [17:23:19] awight: lgtm, feel free to review [17:30:41] \p/ [17:30:58] 10Scoring-platform-team (Current), 10ORES, 10monitoring, 10User-Ladsgroup, 10Wikimedia-Incident: Clean up failure ratio monitoring and set up an alarm when it goes more than a certain threshold - https://phabricator.wikimedia.org/T154175#3845488 (10Krinkle) @Ladsgroup Grafana does not have outgoing E-mai... [17:34:40] RoanKattouw: o/ [17:34:52] RoanKattouw: One of our “future” agenda items from https://etherpad.wikimedia.org/p/scoring_collaboration just came up today. [17:35:17] We have new models for Icelanding and eswikiquote, they’re deployed to production and just waiting for the wiki configuration. [17:35:24] Would you like to own that side? [17:36:05] e.g. https://ores.wikimedia.org/v3/scores/iswiki/123456 [17:36:34] Icelandic even? [17:37:14] lol that's just off the coast of scotland [17:37:17] Icelanding does sound cool though [17:39:28] awight, those are only the "reverted" models. [17:39:38] We don't usually expose them via rc filters [17:39:47] But I'd be interested in discussing that more. [17:40:18] * halfak makes himself an electrolyte drink and prepares for a nap [17:42:04] halAFK: oho good point. [17:42:07] back in ~4 hours if everything goes as planned. [17:42:28] * Zppix didnt know we actually planned stuff [17:42:37] RoanKattouw: ^ I was wrong about the models, they won’t be exposed in RCFilters cos: just “reverted" [17:46:46] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10Collaboration-Team-Triage (Collab-Team-This-Quarter), 10User-Ladsgroup: Tests should have covered regression in T182936 - https://phabricator.wikimedia.org/T182942#3845600 (10Catrope) Just a few hours before this breakage, we discussed the i... [17:58:01] Thanks @halfak, should we add the official connection limit to https://ores.wikimedia.org? I think there were somewhere says it's suggested that at most 4 connections in parallel be used for batch fetch [17:59:42] xinbenlv: That would be a polite limit. +1 that we should document and even enforce if possible. [18:00:10] xinbenlv: Randomly, if you’re running a bot, do you have your email set in the User-Agent? [18:06:38] `it's recommended to batch 50 revisions in each request as described below. It's acceptable to use up to 4 parallel requests. ` From https://www.mediawiki.org/wiki/ORES (API usage) section [18:07:05] :) [18:07:25] We are not running a bot. and yes, we have set our request header. Let me double check with the email in the User-Agent [18:17:21] Very cool, thank you. [18:33:38] is there a "how many requests per minute in serial" or not so much? [18:34:47] awight: ^ [18:36:45] apergos: I think serial requests will be fine cos it takes a little over 1sec to generate a response, so the natural limit is < 60/min [18:37:26] Unless they’re cached scores, in which case the response is virtually free. [18:37:50] We have about 1,000 web workers, so keeping parallelism down to 4 should be fine no matter what the workload. [18:37:50] so no limit until we learn otherwise [18:37:53] :D [18:37:56] gotcha, thanks [18:37:59] I see you’ve done this before ;-) [18:38:18] I try to keep track of api/bot/other limits generally [18:38:39] it helps us to determine whether a peak is something we can/should cut off at the user side or not [18:39:43] apergos: I odn’t know if this would be interesting to you, T169246 [18:39:44] T169246: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246 [18:40:01] I just did it to look at the pretty graphs [18:40:01] There’s something really weird going on that we never figured out, actually. [18:40:38] Here’s a good one, https://grafana-admin.wikimedia.org/dashboard/db/ores?orgId=1&from=1513256100000&to=1513257600000&refresh=1m [18:40:53] Check out the sine waves in “scores processed”, across all ores* machines that were part of the test. [18:41:02] Something centralized and evil. [18:43:01] doesn't it look like it correlates with memory usage? [18:45:35] apergos: yes, hold on I have one where we’re not close to OOM though [18:46:19] apergos: https://grafana-admin.wikimedia.org/dashboard/db/ores?orgId=1&from=1513283400000&to=1513286400000&refresh=1m [18:47:20] I can’t give a convincing explanation for the fluctuations in memory use, so that’s an interesting lead. [18:47:30] those look the same to me [18:48:00] what's your stack look like? any gc anyplace in there? [18:48:26] Good call—yes, Python is doing nondeterministic GC [18:48:34] maybe worth a look [18:50:04] GC wouldn’t be in sync across the cluster though [18:50:07] PROBLEM - puppet on ORES-worker05.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 8 minutes ago with 0 failures [18:50:54] I guess the machines do wander away from one another, eventually. [18:51:37] if the usage patterns are similar, started at around the same time for the stress test [18:51:43] with about same resource usage and load [18:52:00] wouldn't you expect to see your boxes behave similarly? [18:52:38] Yeah this is a neat direction to look in. Seems that the Python GC has some debugging stuff available. [18:52:53] anyways, I would look at a given one and see if you can find out what's happening there [18:53:07] PROBLEM - puppet on ORES-worker10.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 18 minutes ago with 0 failures [18:53:11] PROBLEM - puppet on ORES-worker06.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 2 minutes ago with 0 failures [18:53:14] PROBLEM - puppet on ORES-worker08.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 16 minutes ago with 0 failures [18:53:27] PROBLEM - puppet on ORES-lb02.Experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 7 minutes ago with 0 failures [18:53:28] PROBLEM - puppet on ORES-redis01.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 22 minutes ago with 0 failures [18:53:32] PROBLEM - puppet on ORES-web03.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 29 minutes ago with 0 failures [18:53:38] PROBLEM - puppet on ORES-web05.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 21 minutes ago with 0 failures [18:53:43] Woah [18:53:45] 2 minutes ago and you get a warning? uh [18:53:45] Hello [18:53:46] PROBLEM - puppet on ORES-worker09.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 31 minutes ago with 0 failures [18:53:50] PROBLEM - puppet on ORES-redis02.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 11 minutes ago with 0 failures [18:53:55] PROBLEM - puppet on ORES-worker07.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 9 minutes ago with 0 failures [18:54:05] 10Scoring-platform-team, 10ORES, 10Operations, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3845797 (10awight) Interesting hypothesis from IRC conversation: the sine waves could be a garbage collection artifact. Python includes some to... [18:54:08] Fixing [18:54:47] ACKNOWLEDGEMENT - puppet on ORES-worker07.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 9 minutes ago with 0 failures zppix . [18:55:05] ACKNOWLEDGEMENT - puppet on ORES-redis02.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 12 minutes ago with 0 failures zppix . [18:56:26] ACKNOWLEDGEMENT - puppet on ORES-worker09.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 33 minutes ago with 0 failures zppix / [18:56:38] ACKNOWLEDGEMENT - puppet on ORES-web05.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 24 minutes ago with 0 failures zppix . [18:56:47] ACKNOWLEDGEMENT - puppet on ORES-web03.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 32 minutes ago with 0 failures zppix . [18:56:56] ACKNOWLEDGEMENT - puppet on ORES-redis01.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 25 minutes ago with 0 failures zppix .. [18:57:08] ACKNOWLEDGEMENT - puppet on ORES-lb02.Experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 10 minutes ago with 0 failures zppix . [18:57:17] ACKNOWLEDGEMENT - puppet on ORES-worker08.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 20 minutes ago with 0 failures zppix / [18:57:22] apergos it uses the default time set in icinga2 as we didnt specify a time for puppet [18:57:29] ACKNOWLEDGEMENT - puppet on ORES-worker06.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 6 minutes ago with 0 failures zppix . [18:57:38] ACKNOWLEDGEMENT - puppet on ORES-worker05.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 15 minutes ago with 0 failures zppix . [18:57:45] ACKNOWLEDGEMENT - puppet on ORES-worker10.experimental is WARNING: WARNING: Puppet is currently disabled, message: puppet upgrades - andrew, last run 22 minutes ago with 0 failures zppix . [18:58:01] ic [18:58:08] Fixed [18:58:12] Your welcome [18:58:36] it's only spamming you folks so I guess... your business what the settings are [18:58:54] apergos: it will be fixed [18:58:59] paladox: make it less spammy [18:59:09] Zppix lol how? [18:59:14] It does it for each service [18:59:18] Hmm [18:59:37] I can up the time for it to check but other then that it will send it out for each service [18:59:39] don't worry about me, make it do what you folks need [18:59:43] Is there notification settings paladox [18:59:43] * apergos checks back out [19:00:01] apergos: no, spamming 200 messages at once isnt really helpful [19:00:12] Zppix there's a script but that wont do anything. [19:00:18] Hmm [19:00:31] Can we rate limit icinga2-wm messages? [19:00:52] (Wishes i had access to ssh into cloud vps rn) [19:00:54] Not sure as we use ircecho :) [19:01:22] Be nice if we had a heads up :/ [19:02:32] * Zppix wonders how icinga2-wm didnt get excess flood [19:03:08] I need an ack everything script pls paladox [19:03:14] lol [19:03:18] i was thinking the same [19:03:33] i think wmf have one, but it will need to be updated to be compatible with icinga2 [19:03:45] Not auto ack [19:03:54] Just a button to ack everything [19:03:59] oh lol [19:04:16] So the world can melt in peace [19:04:39] lol [19:05:10] * Zppix wonders if i could convience cloud services to add my mobile ssh key so i can ssh in w/o changing a config which isnt possible on mobile [19:05:27] paladox: Having the monitors is awesome, I’m happy getting some occasional spam in exchange for not having to probe the servers manually. Don’t feel like you need to rush any fixes... [19:05:28] Zppix you add your ssh key through wikitech [19:05:41] awight heh thanks :) [19:05:45] paladox: but i cannot get into our instances [19:05:57] Zppix adding your key will allow you too [19:06:13] you can add mutiple ssh keys [19:06:16] My ssh key is in there im able to toolforge just not vps instances [19:06:22] Unless im doing it wrong [19:06:30] Zppix where did you add it [19:06:35] did you add it now? [19:08:03] zppix [19:08:06] Ive had it on wikitech [19:08:33] ok then it should be the right one :) [19:08:47] remeber you have to ssh in through bastion [19:08:59] bastion.wmflabs.org [19:09:00] I do it says public key issue [19:09:10] hmm [19:09:37] zppix do you do [19:09:38] Host gerrit-test [19:09:39] ProxyCommand ssh -a -W %h:%p paladox@primary.bastion.wmflabs.org [19:09:39] UseRoaming no [19:09:39] User paladox [19:10:07] No i cant do that to my ssh on mobile [19:10:20] ok [19:10:27] ssh paladox@primary.bastion.wmflabs.org [19:10:50] Im on bastion [19:11:49] ok [19:11:56] so ssh into bastion works [19:12:05] im not sure how to get into the instances from there [19:12:39] Ill ask [19:15:53] RECOVERY - puppet on ORES-worker08.experimental is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:15:58] RECOVERY - puppet on ORES-redis01.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:02] RECOVERY - puppet on ORES-web03.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:14] RECOVERY - puppet on ORES-lb02.Experimental is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:16:16] RECOVERY - puppet on ORES-worker09.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:17] RECOVERY - puppet on ORES-web05.experimental is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:16:20] RECOVERY - puppet on ORES-redis02.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:25] RECOVERY - puppet on ORES-worker07.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:37] RECOVERY - puppet on ORES-worker10.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:37] RECOVERY - puppet on ORES-worker05.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:16:41] RECOVERY - puppet on ORES-worker06.experimental is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:26:40] Zppix: it begs the question - can't you get on a PC of some sort to SSH? [19:28:25] TheresNoTime: sure if i had access to one atm [19:29:27] good point.. [19:33:24] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Investigate why ORES logs are being written to syslog despite explicit logging config. Fix. - https://phabricator.wikimedia.org/T182614#3845874 (10awight) Here's a fun debugging tool, https://pypi.python.org/pypi/logging_tree [20:19:40] wiki-ai/ores#889 (regular_logging - 325ab21 : Adam Roses Wight): The build failed. https://travis-ci.org/wiki-ai/ores/builds/318267783 [20:23:17] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Investigate why ORES logs are being written to syslog despite explicit logging config. Fix. - https://phabricator.wikimedia.org/T182614#3846052 (10awight) https://github.com/wiki-ai/ores/pull/241 [20:25:49] wiki-ai/ores#891 (regular_logging - 065f530 : Adam Roses Wight): The build was fixed. https://travis-ci.org/wiki-ai/ores/builds/318270111 [20:38:56] halAFK: Turns out, there was a second issue masking the log glitch, and it was caused by us :) [20:39:22] The upstream celery hack and complex workarounds aren’t necessary for us, since we don’t care much about the root logger. [20:41:17] Hey awight. What was the issue? [20:41:48] halfak: We were doing some fallback logging configuration prodecurally… [20:41:58] inside the run() method? [20:42:01] yep [20:42:17] IMO we don’t want that, see https://github.com/wiki-ai/ores/pull/241 if you want to check it out [20:42:27] We can still re-add if it’s actually desirable [20:42:48] but it’ll be conditional on ! --logging-config [20:42:51] awight, I don't think that run() is executed. [20:42:54] Not on the server. [20:42:57] hahaha [20:42:58] kk [20:43:00] I think only when running it directly. [20:43:16] The main() function is only executed when ores applications.celery is run. [20:43:18] well the primary thing to fix was just the hijack root logger config for celery [20:43:34] yeah. Trying to figure out what change you did to fix it. [20:43:51] harrgh I see, so maybe we do want the fallback stuff since it’s console-only [20:44:02] right. It's debugging oriented. [20:44:07] Anyone have some internet i can borrow my isp decided to have an outage :/ [20:44:28] Zppix: sure, here’s some internet metered at $1/kB [20:44:37] lol [20:44:49] Hmm can i charge that to wikimedia? [20:45:00] no, we have a preferential deal with your ISP [20:45:10] (then it got too close to the truth) [20:45:26] Maaan [20:47:57] halfak: .main is called from the vagrant “service” scripts, I believe [20:48:02] lemme check production [20:48:23] Oh... That could make sense. I thought we were using the celery worker manager. [20:48:49] awight, https://github.com/wiki-ai/ores-wmflabs-deploy/blob/master/ores_celery.py [20:48:53] This is what gets executed [20:49:18] Except, of course, of the if __name__ == "__main__": ... [20:49:21] +1 [20:49:48] In vagrant it’s ExecStart=<%= @venv_dir %>/bin/ores applications.celery --logging-config <%= @logging_config %> [20:49:55] which is different. [20:50:12] Aha! [20:50:15] It is [20:50:32] All this celery talk is making me hungry [20:51:02] Well, don't eat celery because it won't help. [20:51:09] Zppix: That’s rich because ^ haha halfak beat me to it [20:51:26] I’ve heard it takes more energy to chew than you get from digesting. [20:53:28] halfak: Looking at the stuff I’ve removed… I still think we’re better off without it. The only line I might keep is the logging.basicConfig [20:54:10] awight, I was wondering if there would be a nice way to make it configurable. Regardless, it seems like vagrant is starting ORES wrongly. [20:54:26] Configurable how? [20:54:39] +1 I’ll fix the vagrant start scripts to match prod better [20:54:56] awight, configurable via a CLI argument or something like that. [20:55:08] ok [20:55:12] E.g. --debug [20:55:53] Without any logging_config.yaml, I think you’re only going to get celery logs on the commandline, right? [20:56:27] s/commandline/console [20:57:03] Not sure what "celery logs" you are referring to. [20:57:09] But we'd get our own internal logging. [20:57:22] *AND* we'd get sudo graphite messages (via logging) [20:57:53] halfak: What is the command line you have in mind where a person would add --debug, btw? ores-{prod,wmflabs}-deploy/ores_{celery,wsgi}.py? [20:58:08] ores applications.celery --debug [20:58:26] Nothing to do with ores_celery.py [20:58:30] hunm [21:00:37] Interesting that I’ve never done that directly. It does happen to be what vagrant runs, though. [21:01:42] OK calorie session complete. Gonna go lay back down [21:01:46] back in 1.5 hours [21:01:47] o/ [21:03:23] I’ll put that bit back in place and fix the vagrant systemd... [21:36:33] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3846342 (10awight) [21:36:44] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3804319 (10awight) p:05Triage>03Normal [21:36:58] 10Scoring-platform-team (Current), 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3804319 (10awight) [21:45:02] (03PS1) 10Awight: Clean up logging config. [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/398942 (https://phabricator.wikimedia.org/T182614) [21:55:49] wiki-ai/wikilabels#248 (master - 1841c25 : Devin/Zppix): The build passed. https://travis-ci.org/wiki-ai/wikilabels/builds/318308713 [21:56:53] What i do? [21:56:57] Oh its travis [22:00:57] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Deploy wikilabels mid-December - https://phabricator.wikimedia.org/T183196#3846457 (10Ladsgroup) [22:01:42] 10Scoring-platform-team (Current), 10Wikilabels, 10I18n, 10User-Ladsgroup: Wikilabel interface in Hungarian has uninformative action buttons, due to translations not updating - https://phabricator.wikimedia.org/T183068#3846474 (10Ladsgroup) [22:01:44] 10Scoring-platform-team (Current), 10Wikilabels, 10Easy, 10User-Ladsgroup: Allow Wiki Labels API to list inactive campaigns - https://phabricator.wikimedia.org/T171768#3846475 (10Ladsgroup) [22:01:46] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Deploy wikilabels mid-December - https://phabricator.wikimedia.org/T183196#3846473 (10Ladsgroup) [22:01:55] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Deploy wikilabels mid-December - https://phabricator.wikimedia.org/T183196#3846457 (10Ladsgroup) I deployed one in morning, going to deploy again now. [22:12:15] 10Scoring-platform-team: Scoring Platform FY18 Q3 - https://phabricator.wikimedia.org/T183198#3846557 (10Halfak) [22:12:37] 10Scoring-platform-team: Scoring Platform FY18 Q3 - https://phabricator.wikimedia.org/T183198#3846570 (10Halfak) [22:19:57] 10Scoring-platform-team: ORES Extension refactoring - https://phabricator.wikimedia.org/T183199#3846597 (10Halfak) [22:20:22] Amir1, if you're still around, could you extend or merge https://phabricator.wikimedia.org/T183199 with a task you have already created? [22:20:39] Supposed to be an over-aching ask for the Ext work you are doing [22:20:44] yeah, sure, just woke and can't sleep [22:21:41] I think these should be more structured, let me make ten more tickets :D [22:21:42] 10Scoring-platform-team (Current), 10ORES, 10artificial-intelligence: Deploy topic model to ORES - https://phabricator.wikimedia.org/T176336#3846614 (10Halfak) [22:21:44] 10Scoring-platform-team: Scoring Platform FY18 Q3 - https://phabricator.wikimedia.org/T183198#3846613 (10Halfak) [22:22:37] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Deploy wikilabels mid-December - https://phabricator.wikimedia.org/T183196#3846620 (10Tgr) [22:22:40] 10Scoring-platform-team (Current), 10Wikilabels, 10I18n, 10User-Ladsgroup: Wikilabel interface in Hungarian has uninformative action buttons, due to translations not updating - https://phabricator.wikimedia.org/T183068#3846618 (10Tgr) 05Open>03Resolved Confirmed, thanks! [22:23:14] 10Scoring-platform-team, 10JADE, 10WMF-Communications: Blog about JADE - https://phabricator.wikimedia.org/T183200#3846621 (10Halfak) [22:23:38] 10Scoring-platform-team (Current), 10Wikilabels, 10User-Ladsgroup: Deploy wikilabels mid-December - https://phabricator.wikimedia.org/T183196#3846636 (10Ladsgroup) [22:23:40] 10Scoring-platform-team (Current), 10Wikilabels, 10Easy, 10Google-Code-in-2017: Error messages should not contain relative paths or error codes - https://phabricator.wikimedia.org/T175726#3601563 (10Ladsgroup) [22:25:50] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2017-12-12 (1.31.0-wmf.12)), 10Patch-For-Review, 10User-Ladsgroup: Rewrite Stats.php - https://phabricator.wikimedia.org/T181892#3846663 (10Ladsgroup) [22:25:52] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2017-12-12 (1.31.0-wmf.12)), 10Patch-For-Review, 10User-Ladsgroup: Split Cache.php to different services - https://phabricator.wikimedia.org/T181334#3846664 (10Ladsgroup) [22:25:54] 10Scoring-platform-team: ORES Extension refactoring - https://phabricator.wikimedia.org/T183199#3846662 (10Ladsgroup) [22:26:48] 10Scoring-platform-team: Scoring Platform FY18 Q3 - https://phabricator.wikimedia.org/T183198#3846676 (10Ladsgroup) p:05Triage>03High [22:26:58] 10Scoring-platform-team: ORES Extension refactoring - https://phabricator.wikimedia.org/T183199#3846597 (10Ladsgroup) p:05Triage>03High [22:27:17] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2017-12-12 (1.31.0-wmf.12)), 10Patch-For-Review, 10User-Ladsgroup: Split Cache.php to different services - https://phabricator.wikimedia.org/T181334#3786618 (10Ladsgroup) p:05Triage>03High [22:28:00] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2017-12-12 (1.31.0-wmf.12)), 10Patch-For-Review, 10User-Ladsgroup: Rewrite Stats.php - https://phabricator.wikimedia.org/T181892#3846681 (10Ladsgroup) p:05Triage>03High [22:28:43] 10Scoring-platform-team (Current), 10MediaWiki-extensions-ORES, 10MW-1.31-release-notes (WMF-deploy-2017-12-12 (1.31.0-wmf.12)), 10Patch-For-Review, 10User-Ladsgroup: Rewrite Stats.php - https://phabricator.wikimedia.org/T181892#3805825 (10Ladsgroup) The Api.php is a PITA here, I need to find a way to di... [22:29:09] lo [22:29:15] l [22:29:18] 10Scoring-platform-team: Write annual plan proposal for Scoring Platform FY19 - https://phabricator.wikimedia.org/T183203#3846684 (10Halfak) p:05Triage>03High [22:33:12] \o/ thanks Amir1 [22:33:41] Hope that's enough I can be more verbose and make more patches if you want me to [22:34:24] I think it's great. For right now, it just needs to be representative of the work in a general way [22:35:01] 10Scoring-platform-team, 10JADE: Build consumable dumps of JADE - https://phabricator.wikimedia.org/T183204#3846704 (10Ladsgroup) [22:47:24] 10Scoring-platform-team (Current), 10ORES, 10monitoring, 10User-Ladsgroup, 10Wikimedia-Incident: Clean up failure ratio monitoring and set up an alarm when it goes more than a certain threshold - https://phabricator.wikimedia.org/T154175#2903147 (10Ladsgroup) Thanks for the help. I will make the patches... [22:55:23] OK. Goal meeting done. [22:55:27] back under the blankets [23:02:55] out [23:09:27] 10Scoring-platform-team (Current), 10ORES, 10monitoring, 10Patch-For-Review, and 2 others: Clean up failure ratio monitoring and set up an alarm when it goes more than a certain threshold - https://phabricator.wikimedia.org/T154175#3846755 (10Ladsgroup) Everything seems fine now, I wish we could build simi... [23:23:47] 10Scoring-platform-team, 10Wikilabels, 10Google-Code-in-2017: Provide a pytest for database of wikilabels - https://phabricator.wikimedia.org/T179014#3846764 (10Ladsgroup) Even though performance is not the case in here because the module is used ~once a week in prod, that idea seems like a very bad practice...