[05:40:26] 10Scoring-platform-team, 10Wikilabels, 10Google-Code-in-2017: Provide a pytest for database of wikilabels - https://phabricator.wikimedia.org/T179014#3710279 (10Eisenhaus335) https://travis-ci.org/eisenhaus335/wikilabels/builds/315706960#L1751 i was sure i am already checking the form to create a new item. T... [10:11:29] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833644 (10mmodell) 15d5283b7422919d85203b5ba907027f9356e421 doesn't exist in the editquality repo. Somehow the submodule pointer... [10:42:52] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833707 (10akosiaris) But it does exist on tin ``` akosiaris@tin:/srv/deployment/ores/deploy/.git/modules/submodules/editquality$... [10:45:21] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833708 (10mmodell) Another thing: I'm having difficulty just cloning the editquality submodule. It's so large that git pack-obje... [10:47:48] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833717 (10akosiaris) @mmodell is on to something though with the comment about that commit not being in the repo ``` akosiaris@t... [10:49:15] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833719 (10mmodell) Hmm, indeed, if the object does not exist on any branch or tag then it likely won't be fetched by the "dumb" g... [10:52:25] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833721 (10mmodell) Just fetching this one repo (editquality) from phabricator is causing inordinate load on the server. It's noth... [11:00:38] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833730 (10akosiaris) Behavior is erratic as well ``` akosiaris@bast1001:~$ git clone https://phabricator.wikimedia.org/source/ed... [11:02:42] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3833733 (10mmodell) Yeah that repo is 334M in the current workdir but the .git is 2.1 gigs. That doesn't seem too unreasonable bu... [15:07:07] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834330 (10akosiaris) A fresh clone of `http://tin.eqiad.wmnet/ores/deploy/.git/modules/submodules/editquality` on bast1001 does n... [15:20:02] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834452 (10mmodell) @akosiaris: scap //should// be getting the hash from the submodule pointers contained at `HEAD` of `tin.eqiad... [15:20:53] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834455 (10akosiaris) the ores submodule btw is in the exact same state and also fails to checkout ``` akosiaris@tin:/srv/deploym... [15:26:22] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834476 (10mmodell) ah ha! I figured _something_ out at least! The 15d5283b commit is in origin/master it just hasn't been merged... [15:26:31] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834478 (10akosiaris) >>! In T181661#3834452, @mmodell wrote: > @akosiaris: scap //should// be getting the hash from the submodule... [15:30:06] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834498 (10mmodell) >>! In T181661#3834478, @akosiaris wrote: >>>! In T181661#3834452, @mmodell wrote: >> @akosiaris: scap //shoul... [15:35:43] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834505 (10mmodell) so @awight, can you enlighten me about your scap.cfg? Is git_rev: origin/master intentional? If then I think... [15:37:54] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834510 (10akosiaris) Aha! nice find. It looks like it's been there since the very beginning. See fd1067ff4da. It has undergone a... [15:40:59] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834521 (10mmodell) I think I should add a NOTICE to scap that says something along the lines of "Deploying from non-default origi... [15:46:10] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834531 (10akosiaris) > I 've crafted a commit on tin removing that line and retrying a scap deploy from tin just for ores1004. O... [15:56:02] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834565 (10akosiaris) >>! In T181661#3834531, @akosiaris wrote: >> I 've crafted a commit on tin removing that line and retrying a... [16:00:07] o/ [16:00:27] halfak: FYI, I sent Joe an invite for 3.5 hours from now [16:00:41] The element of surprise ;-) [16:00:46] awight, looks like I'm still working on the PR for wmflabs, but there's a patchset for prod [16:00:50] Cool [16:00:57] Oh backwards [16:01:12] I’ll review and put it on beta [16:01:45] (03PS1) 10Alexandros Kosiaris: scap: Remove git_rev setting [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/398067 (https://phabricator.wikimedia.org/T181661) [16:02:58] (03CR) 10Awight: [C: 032] "Great! That might explain T182498..." [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/398067 (https://phabricator.wikimedia.org/T181661) (owner: 10Alexandros Kosiaris) [16:03:01] (03CR) 10Awight: [V: 032 C: 032] scap: Remove git_rev setting [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/398067 (https://phabricator.wikimedia.org/T181661) (owner: 10Alexandros Kosiaris) [16:03:34] awight: i see your point on cumulative times, but the deepest one causing trouble would be some entry inside the report right? [16:03:48] if not predict_proba [16:03:59] I think so—can you share the profiler output somewhere? [16:04:01] awight: thanks! [16:04:10] I sure hope we finally solved that [16:04:11] akosiaris: Awesome to see progress on that blocker! [16:05:21] (03CR) 10Awight: [V: 032 C: 032] "Looks right." [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/397962 (https://phabricator.wikimedia.org/T182719) (owner: 10Halfak) [16:05:54] akosiaris: I’m about to do some deployment, so I can try it out, if you’re all done playing with ores*? [16:06:10] yes I am done [16:08:16] awight, https://github.com/wiki-ai/ores-wmflabs-deploy/pull/94 [16:08:31] I'm AFK for a doctor apt. Back in a little while [16:08:50] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834600 (10awight) @mmodell Tangential note, I've been happy using `git clone --depth 1` on personal projects. Would that mak... [16:09:19] akosiaris: Great, I’ll keep you posted about whatever happens. [16:09:23] awight: its on the etherpad [16:09:42] codezee: ah sorry I was too lazy to look :D [16:10:21] codezee: What was the specific profiling tool to print the output, btw? [16:15:53] It’s hard to tell what the call stack looks like. [16:21:53] codezee: What version of scikit-learn is this? [16:23:29] Seems like 0.17.1 [16:26:26] I left a few comments in your profiler output on github [16:28:41] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834679 (10mmodell) @awight: from what I understand, git has to do a lot of extra work on the server side in order to build to... [16:53:49] 10Scoring-platform-team, 10Release-Engineering-Team, 10Scap: Scap is unhappy about deploying from a branch other than master - https://phabricator.wikimedia.org/T182498#3834771 (10mmodell) 05Open>03Invalid As we found out in T181661, `git_rev=origin/master` was set in the scap.cfg for ores. This was prob... [16:58:48] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834785 (10mmodell) [16:59:39] 10Scoring-platform-team, 10ORES: Make sure ORES is compatible with stretch - https://phabricator.wikimedia.org/T182799#3834792 (10awight) [17:06:54] * awight faints [17:06:57] Another scap issue [17:09:30] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: New, mysterious scap failure - https://phabricator.wikimedia.org/T182801#3834842 (10awight) p:05Triage>03High [17:09:40] halAFK: We can’t deploy to beta. Unknown why not…. Meanwhile, I smoke-tested the new ores patch locally and it looks fine. [17:10:51] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834855 (10mmodell) >>! In T181661#3834679, @mmodell wrote: > @awight: from what I understand, git has to do a lot of extra wo... [17:12:47] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834860 (10awight) [17:13:11] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: New, mysterious scap failure - https://phabricator.wikimedia.org/T182801#3834858 (10awight) 05Open>03Invalid /srv is full. Strange that there was no error message during deployment, though... [17:13:38] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: New, mysterious scap failure - https://phabricator.wikimedia.org/T182801#3834862 (10mmodell) strange indeed. Full disk can case all sorts of weird behaviors though. [17:13:42] (03PS2) 10Awight: Limit to no more than 3 cached revisions [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/395048 (https://phabricator.wikimedia.org/T182013) [17:14:54] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, 10Scap: New, mysterious scap failure - https://phabricator.wikimedia.org/T182801#3834866 (10awight) >>! In T182801#3834862, @mmodell wrote: > strange indeed. Full disk can case all sorts of weird behaviors though. +1 This might n... [17:15:24] halAFK: It was a full disk… Merge if you get this message, https://gerrit.wikimedia.org/r/#/c/395048/ [17:15:36] (03CR) 1020after4: [C: 032] Limit to no more than 3 cached revisions [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/395048 (https://phabricator.wikimedia.org/T182013) (owner: 10Awight) [17:18:26] halAFK: We need the icelandic dictionary. [17:18:43] aspell-is works locally. Do we have a preferred variant? [17:22:55] 10Scoring-platform-team (Current), 10editquality-modeling, 10Patch-For-Review, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Icelandic - https://phabricator.wikimedia.org/T181099#3779529 (10awight) Temporarily stalled on https://gerrit.wikimedia.org/r/#/c/398078/ [17:26:50] Just got back [17:26:52] awight: thanks! i used cProfile [17:27:08] awight, I think aspell-is passes tests [17:27:28] * awight rubs eyes [17:27:35] I accidentally deployes aspell-is on tin-beta [17:27:37] whew [17:28:03] Yup. Aspell-is is what Amir1 used. [17:28:09] excellent. [17:28:16] Just confirmed in revscoring docs :) [17:28:28] It's in the README [17:28:29] :D [17:28:35] Looks like that patchset was merged. [17:28:44] argh in another repo [17:29:13] Is that a historical accident or intentional, btw? [17:29:19] Do we use dicts from anything but editquality? [17:29:35] http://ores-beta.wmflabs.org/v3/scores/iswiki/123456 [17:29:52] We can let that fester for a few hours now. [17:30:18] 10Scoring-platform-team (Current), 10editquality-modeling, 10Patch-For-Review, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Icelandic - https://phabricator.wikimedia.org/T181099#3834917 (10awight) This is deployed to the beta cluster and ready for testing: http://ores-beta.w... [17:30:30] BTW we’ll need an ops merge for https://gerrit.wikimedia.org/r/#/c/398078/ [17:33:30] halfak: I noticed you noticing > caching is just to help patrollers and researchers with speed, only reduces our server load by c. 7% [17:33:38] Was that wrong? [17:33:47] Oh! Was wondering how your math worked out. [17:35:21] 10Scoring-platform-team, 10ORES, 10Operations, 10Release-Engineering-Team, and 2 others: Connection timeout from tin to new ores servers - https://phabricator.wikimedia.org/T181661#3834939 (10awight) Looks like I'm getting the same error. > commit b67bba77acb7c0ffc678201c9f3f54f198da6650 > > scap deploy -... [17:38:54] halfak: Ah it was on a used napkin. I eyeballed the proportion of cached responses…. good point though that one doesn’t say “circa” and then not round to a power ot 10 :) [17:40:50] awight heh https://news.sky.com/story/disney-reportedly-closes-in-on-60bn-fox-deal-11169018 :) [17:41:07] If they buy it then they can buy sky in europe :) [17:41:31] Gross. The vultures are circling [17:42:55] lol [17:46:11] awight, no worries if you don't have it handy. :) [17:46:25] halfak: No that’s exactly what I did... [17:46:57] Oh no. I hear you. I expect you probably filed that napkin somewhere deep in a compost bin. :D [17:47:10] No need to dig it out or re-derive. :) [17:47:24] I looked at the average of precached requests, count of all requests and divided [17:47:40] It roughly matched your “10%” napkin so I went with it [17:47:54] I'm a little bit light on blood so I'm going to head out to lunch soon. [17:47:57] It assumes that a cached request takes zero server resources [17:48:00] Need anything before I walk away? [17:48:10] hehe no I’m deep in celery’s logging internals [17:48:13] awight, which is a pretty good estimate. [17:48:22] I suggest grain halfak [17:48:26] I think we use *a lot* of resources to precache things that people might never use. [17:48:36] halfak: I’ll make a note to note how the math works, though [17:48:52] I'ma get a JJ Gargantuan (lotsa meet sandwich) [17:48:58] *meat [17:49:06] Yum [17:49:13] I’m so jealous [17:49:28] food here is surprisingly bland [17:50:06] yuppie places have wifi and bad food, regular people restaurants are pretty tasty, but never have wifi and threaten bowel trouble [17:50:40] I think the assumption is that tourists subsist on bad coffee [17:51:11] Also: face meat sandwich [17:52:48] {{bread}} :) {{bread}} [17:52:49] 10[5] 04https://meta.wikimedia.org/wiki/Template:bread [17:53:01] OK lunch time. [18:24:25] 10Scoring-platform-team (Current), 10ORES, 10Operations, 10Patch-For-Review: Investigate why ORES logs are being written to syslog despite explicit logging config. Fix. - https://phabricator.wikimedia.org/T182614#3835192 (10awight) The explanation is that Celery follows an archaic pattern of hijacking the... [19:02:09] awight, did you try adding the app.log.setup() to ores/applications/celery.py? [19:02:15] for sure. [19:02:17] everything. [19:02:18] damn [19:02:39] In fact, I have everything listed, plus a monkey patch to subclass the celery custom logger [19:02:46] awight, could it be our version of celery? [19:02:46] and it’s *still* jacking the logs [19:02:58] I doubt it. There was no motion on any of those GH issues [19:05:47] damn [19:06:57] On the bright side, the bug reporters are clowning Celery severely for acting like early 2000’s log4j apps [19:07:35] It honestly might be easier to patch Celery than to monkey-patch, but I donno how excited they are about outside PRs [19:07:50] They've been good to us in the past. [19:08:01] Currently, my thinking is to savagely work around from our code, then work on something to upstream. [19:08:07] oh? That’s great news. [19:08:12] awight, I've been considering talking to them about us being a high profile celery user. :) [19:08:18] +1 [19:08:29] Don’t make us blog the pain. [19:09:52] :) [19:10:09] I'm considering doing a stress test while you work on that. [19:10:11] What do you think? [19:10:43] (03CR) 10Chad: [V: 032 C: 032] "Harmless" [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/395048 (https://phabricator.wikimedia.org/T182013) (owner: 10Awight) [19:14:04] halfak: perfect! but we can’t deploy to ores* [19:15:31] OK no worries there. [19:24:45] halfak: You could run a short test just to see if all 9 celery nodes are online… [19:26:22] OK just about to start. Hopefully it will be no harm to run during our meeting ^_^ [19:26:47] very nice [19:27:13] haha that’s the last time I make optional invites. Looks like it’s going to be a full house. [19:27:20] arg. Looks like docopt is no longer one of our wheels. [19:27:57] oh wait... it is. Hmmm [19:28:56] Yeah...looks like maybe we have a failed deployment in ores1001 [19:29:23] https://phabricator.wikimedia.org/P6462 [19:30:19] awight, ^ what do you think. Could we have a failed deploy sitting there? Would that make sense? [19:30:26] yes [19:30:38] cos Alex was beating on one machine at a time. [19:30:44] especially ores1004, IIRC [19:37:03] halfak: more than 1/3 done in labeling! [20:11:31] relocating, back in 15. [20:11:55] \o/ Adotchar [20:11:58] That's amazing! [20:12:15] Hi [20:13:48] halfak: i probably been told already, where would the download for the labelling campaign results be for simplewiki? [20:16:36] Zppix, http://labels.wmflabs.org/campaigns/simplewiki/63?tasks [20:16:49] Most items have no labels. Items with labels are "done" [20:17:58] Ok [20:18:02] Thanks [20:31:27] halfak: hey, so. Tomorrow is the last deployment we’re allowed until Jan 2. [20:37:05] I thought there was just no swat? [20:38:50] awight, roger that. [20:39:09] Seems to me we should make sure to get iswiki and eswikiquote, but nothing else is close enough. [20:39:22] BTW, we on for deployment in 20 mins? [20:39:52] 2:39 PM In 0 hour(s) and 20 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171213T2100) [20:45:18] halfak: sure, I can deploy [20:45:28] cool :) [20:45:38] There’s one other patch sitting there, lemme see what that is. [20:46:28] halfak: > fixes precache endpoint [20:46:49] > Differentiate the task_id if features are requested [20:46:59] some random cleanup that should be safe [20:47:30] +1 [20:47:45] The editquality stuff all seems to be iswiki and eswikiquote related [20:47:50] git log ac3042ee681242c64c078ca67f2dc4a4699b6bbd..15d5283b7422919d85203b5ba907027f9356e421 [20:48:26] Since nothing else is ready, looks like we can play it safe and leave tomorrow for any rollbacks we might need. [20:48:32] +1 [20:49:19] 10Scoring-platform-team (Current): Talk to reporter from OZY - https://phabricator.wikimedia.org/T182823#3835714 (10Halfak) [20:56:26] halfak: fyi we might be blocked on the deployment, talking to mutante in gerrit... [20:56:36] :( [20:56:39] https://gerrit.wikimedia.org/r/#/c/398078/1 [20:57:06] Yeah aspell-id isn’t in jessie, https://packages.debian.org/search?keywords=aspell-id&searchon=names&suite=stable§ion=all [20:57:50] is this a new problem? [20:59:09] halfak: This is something I’d forgotten about [20:59:21] We need the aspell-is package, or the ORES master won’t run. [21:01:00] aspell-is is in Jessie [21:01:46] +1 [21:02:01] but we need that patch above merged by an op ^ [21:02:07] gotcha [21:02:17] and a manual push, otherwise it could take the entire hour to roll out. [21:04:27] Roger. This might be a blocker. [21:04:30] Damn. [21:04:34] mutante: If you get around to the aspell-is patch in the next hour, ^ just cos it’s our deployment window. No worries if it doesn’t happen today though, we also have tomorrow. [21:04:35] Did beta work? [21:04:40] halfak: yep [21:04:49] Did someone manually install aspell-is? [21:04:51] Tomorrow will work [21:04:52] I did [21:04:58] we have sudo on labs boxes. [21:04:59] Oh! I get it. [21:05:20] Hey. This is something to add to the “new model” checklist. [21:06:40] oops. I’m on parenting, gtg [21:06:44] https://wikitech.wikimedia.org/wiki/ORES/New_model_checklist?veaction=edit [21:06:45] ... [21:07:20] will add [21:07:31] awight, you gonna be back soon or gone for forseeable future? [21:08:26] I just got handed my baby [21:08:34] sitter surprised me :) [21:08:45] Looks like there’s no deployment today anyway [21:08:47] but tomorrow... [21:09:14] cool that works for me. [21:11:04] halfak: i am around too if you need something done quickly [21:11:27] I've got a task for you [21:11:29] Will PM [22:11:31] o/ [22:36:39] OK heading out to bike home and then doing an interview. See y'all tomorrow [23:26:56] 10Scoring-platform-team, 10JADE, 10Design: Design conceptual prototype of JADE integration with MediaWiki - https://phabricator.wikimedia.org/T182829#3836184 (10Halfak)