[11:46:39] 10Scoring-platform-team, 10ORES, 10User-Ladsgroup: ORES UI doesn't handle API errors - https://phabricator.wikimedia.org/T149118#3400594 (10Ladsgroup) https://github.com/wiki-ai/ores/pull/213 [13:50:45] PROBLEM - puppet on ores-worker-07 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:54:39] PROBLEM - puppet on ores-lb-02 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:55:24] see -operations. I think that's related. [13:55:59] PROBLEM - puppet on ores-web-05 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:05] RECOVERY - puppet on ores-worker-07 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [14:24:58] RECOVERY - puppet on ores-lb-02 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [14:25:20] RECOVERY - puppet on ores-web-05 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [14:55:56] o/ [15:19:47] 10Scoring-platform-team, 10Project-Admins: Create a phabricator project for meta-ORES - https://phabricator.wikimedia.org/T169229#3401228 (10Halfak) See https://phabricator.wikimedia.org/tag/meta-ores/ [15:20:00] 10Scoring-platform-team, 10Project-Admins: Create a phabricator project for meta-ORES - https://phabricator.wikimedia.org/T169229#3401230 (10Halfak) 05Open>03Resolved [15:52:39] 10Scoring-platform-team-Backlog, 10Repository-Admins, 10User-Zppix: Update diffusion repo names for Wiki-AI repos - https://phabricator.wikimedia.org/T167612#3401336 (10Halfak) [15:53:05] 10Scoring-platform-team-Backlog, 10Repository-Admins, 10User-Zppix: Update diffusion repo names for Wiki-AI repos - https://phabricator.wikimedia.org/T167612#3338702 (10Halfak) I made the updates and fixed links in the description of this task. [15:53:15] 10Scoring-platform-team, 10Repository-Admins, 10User-Zppix: Update diffusion repo names for Wiki-AI repos - https://phabricator.wikimedia.org/T167612#3401339 (10Halfak) [15:53:49] 10Scoring-platform-team, 10Repository-Admins, 10User-Zppix: Make names for Wiki-AI diffusion repos consistent - https://phabricator.wikimedia.org/T167612#3338702 (10Halfak) [15:53:56] 10Scoring-platform-team, 10Repository-Admins, 10User-Zppix: Make names for Wiki-AI diffusion repos consistent - https://phabricator.wikimedia.org/T167612#3338702 (10Halfak) a:03Halfak [15:54:03] 10Scoring-platform-team, 10Repository-Admins, 10User-Zppix: Make names for Wiki-AI diffusion repos consistent - https://phabricator.wikimedia.org/T167612#3338702 (10Halfak) 05Open>03Resolved [15:54:22] 10Scoring-platform-team, 10ORES, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add flake8 to travis checks - https://phabricator.wikimedia.org/T169473#3399331 (10Halfak) Plz link to PRs. I saw a bunch go by :) [16:00:17] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3396074 (10Halfak) I think that we probably have the error sitting in celery's backend. See https://stackoverflow.com/questions/41501362/celery-how-to-remove-results-of-tasks-in-redis-... [16:00:26] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3401397 (10Halfak) p:05Triage>03High [16:49:38] halfak: im back from leave [16:51:14] 10Scoring-platform-team: revscoring train_model dies without --observations - https://phabricator.wikimedia.org/T169157#3388946 (10Halfak) 05Open>03Resolved a:03awight [16:52:19] 10Scoring-platform-team, 10ORES, 10Easy: ORES 500's on integers that can't be processed - https://phabricator.wikimedia.org/T168920#3401705 (10Halfak) a:03Halfak [16:57:49] halfak: Two really easy spots of CR, https://github.com/wiki-ai/editquality/pull/79 https://github.com/wiki-ai/draftquality/pull/8 [16:57:59] awight: i can merge it [16:58:16] Zppix: o/ Thanks! [16:58:26] no problemmo [16:58:56] Too late :D [16:59:05] halfak: i just logged in to xD [16:59:39] * Zppix launches webarchive to remerge it :P xd jk [17:00:00] * awight considers unleashing a bevy of one-line patches for sheer entertainment value [17:00:17] I dont think halfak would approve [17:00:31] Or if he did, it would be almost instantaneous :p [17:00:54] It would tick me off i hate 1 line patches [17:09:16] 10Scoring-platform-team: ORES puppet error on labs boxes, unable to set user to "deploy-service" - https://phabricator.wikimedia.org/T169164#3401777 (10awight) [17:09:52] tried deploy_service? [17:10:20] Zppix: lemme take a look... [17:11:24] The web-04 box doesn't have that user either. I think the user may have been left behind on other boxes due to puppet rules which have since changed. [17:11:48] On boxes where the user exists, it does seem to be "deploy-service". [17:14:00] are you on wmflabs? [17:14:13] I have an account there [17:14:14] err durr read the task title you noob *facepalms* [17:14:28] hehe, no worries I feel the same way. [17:16:07] im trying to get the docs again i forgot the link :P [17:16:58] awight: are you attempting to changeprop? [17:18:13] Zppix: Is this an open task or something? [17:18:22] ? [17:18:35] what about changeprop? [17:18:43] im trying to find the right docs [17:18:51] so i need to know what exactly your attempting [17:19:26] I'm currently attempting two things, fixing a celery bug https://phabricator.wikimedia.org/T169367 and running statistics for https://phabricator.wikimedia.org/T167305 [17:19:35] aha [17:19:36] https://github.com/wikimedia/puppet/blob/production/modules/scap/manifests/target.pp#L70 [17:19:39] awight ^^ [17:19:44] Zppix: Sorry if I'm being dense, is this about deploy-service? [17:19:45] we need to call the scap class [17:19:50] to get the user created [17:20:07] oh nevermind i see [17:20:08] paladox: That makes sense [17:20:09] however [17:20:13] yep [17:20:19] other classes do it [17:20:30] I'm wondering whether we need it [17:20:51] hmm [17:20:54] awight: well we want changeprop from labs-prod iirc (halfak correct me) [17:21:00] paladox: I was thinking that the scap class was only used on deployment hosts, rather than on the receiving box. [17:21:11] uh nah [17:21:16] i found this out [17:21:27] that the user that deploys has to exist on the reciving end too [17:21:45] Zppix: ah ha, now the context makes sense. My goal was near-sighted, just to address the puppet error in that task. [17:22:12] paladox: any idea why we aren't failing to scap to the web-04 box without that user, then? [17:22:24] awight: i see ok [17:22:26] for example on phab-tin i deploy as paladox since that's what i have in scap/scap.cfg and on the phabricator hosts it connects as paladox. [17:22:40] awight what is in scap/scap.cfg [17:22:41] ? [17:23:04] paladox: good call. let me get that repo [17:23:10] :) [17:24:12] https://github.com/wikimedia/mediawiki-services-ores-deploy/blob/master/scap/scap.cfg [17:24:26] oh. [17:24:40] Maybe we don't use scap on labs? [17:24:53] https://phabricator.wikimedia.org/source/ores-deploy/browse/master/scap/scap.cfg [17:24:58] lol [17:25:04] hmm [17:25:29] Although there is a labs section in that scap.cfg [17:26:23] yep [17:26:33] though im not sure if that will work that section. [17:26:53] bbl [17:27:04] * paladox is going to the toby carvery heh [17:28:41] halfak: How do we deploy to labs? Is that scapp'ed as well? [17:29:16] uses fabric [17:29:27] I think it's documented on the deploy page in wikitech [17:29:33] ORES/Deployment or something like that [17:29:34] kk [17:32:03] ouch. I accidentally discovered how large the with_cache feature extracts are, uncompressed: [17:32:06] -rw-r--r-- 1 awight wikidev 8599562503 Jul 3 16:51 enwiki.draft_quality.201508-201608.shuffled.with_cache.json [17:37:27] awight: yeah, they are huge [17:43:11] 10Scoring-platform-team, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3364601 (10Dzahn) You have been added to the ores-admins group. This was approved in today's ops meeting. This gives you access to... [17:43:43] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3401884 (10Dzahn) [17:43:46] 10Scoring-platform-team, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3401883 (10Dzahn) 05Open>03Resolved [17:45:20] "get adam all the rights" gg on the naming of that task lol [17:46:17] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3401903 (10awight) a:03awight [17:47:55] 10Scoring-platform-team, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3401908 (10awight) Thanks! Confirmed working. [17:50:37] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10Edit-Review-Improvements-ReviewStream, 10editquality-modeling, and 4 others: Automatically adjust ORES threshold settings when ORES models are updated - https://phabricator.wikimedia.org/T152161#3401935 (10awight) [17:50:39] 10Scoring-platform-team, 10Diffusion, 10ORES, 10Repository-Admins, and 2 others: Diffusion repository can't be cloned: 500 errors (research-ores-editquality) - https://phabricator.wikimedia.org/T157141#3401937 (10awight) [17:50:45] halfak: im closing T168917 is this ok? [17:50:46] T168917: Get Adam all the rights - https://phabricator.wikimedia.org/T168917 [17:51:05] 10Scoring-platform-team, 10Edit-Review-Improvements, 10editquality-modeling, 10Collaboration-Team-Triage (Collab-Team-Q2-Oct-Dec-2016), 10artificial-intelligence: Research how to present ORES scores to users in a way that is understandable and meets their... - https://phabricator.wikimedia.org/T146333#3401945 [17:51:47] 10Scoring-platform-team, 10articlequality-modeling, 10Epic, 10artificial-intelligence: [Epic] Article quality models (wp10) - https://phabricator.wikimedia.org/T130259#3401955 (10awight) [17:51:53] 10Scoring-platform-team, 10Research-and-Data-Backlog, 10editquality-modeling, 10Epic, and 3 others: [Epic] Explore disparate impacts of damage detection and goodfaith prediction on anons and newcomers. - https://phabricator.wikimedia.org/T120138#3401957 (10awight) [17:54:01] 10Scoring-platform-team, 10User-Ladsgroup: Grant AWight CR+2 on scoring platform repos - https://phabricator.wikimedia.org/T168443#3401983 (10awight) [17:54:03] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3401982 (10awight) [17:54:14] Zppix: sure, thanks [17:54:49] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3401986 (10Zppix) 05Open>03Resolved [17:54:58] awight: done [18:03:28] halfak: can you close https://github.com/wiki-ai/draftquality/pull/3 I've messed something up with it so can't rebase and will address the new issues in a new PR... [18:05:07] codezee: i will [18:05:30] done [18:06:20] Zppix: thanks.... [18:06:45] codezee: no problem [18:12:16] I'm so confused. Why does scb1001:/srv/deployment/ores/deploy/config/00-main.yaml give the redis host as BROKER_URL: redis://ores-redis-01:6379 [18:12:27] halfak: around? [18:12:27] yet ores-redis-01 doesn't resolve? [18:12:58] nor does ores-redis-01.eqiad.wmnet [18:13:51] awight: the real config is in /etc/ores [18:14:01] haha ok I was hoping it was something like that. [18:14:10] there are several config files which get overwritten several times :D [18:14:11] got it, ty [18:14:17] harr [18:14:25] :) [18:20:53] 10Scoring-platform-team, 10ORES, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add flake8 to travis checks - https://phabricator.wikimedia.org/T169473#3402110 (10Ladsgroup) Completely forgot: - https://github.com/wiki-ai/ores/pull/211 - https://github.com/wiki-ai/ores/pull/212 - https://g... [18:23:26] awight, i think to fix this problem in labs as it uses the prod class is to add scap::target (even if we doint use it, it will make sure prod user account is created too) Unless the user exists in ldap? [18:23:55] awight: do you think you can review the flake8 pr I have in revscoring? I want to enable it in travis [18:24:00] paladox: That might be a good solution, sorry I've been fighting it. [18:24:02] Amir1: i can [18:24:05] Amir1: sure thing! [18:24:08] Amir1: link me [18:24:12] That'd be great! [18:24:24] https://github.com/wiki-ai/revscoring/pull/332 [18:24:32] awight oh nothing to be sorry about :). [18:25:42] Amir1: done :) [18:25:51] Thanks! [18:25:54] np [18:28:51] * paladox wonders where ores gets the scap class from? [18:29:06] i mean where to put it in the ores class [18:29:07] awight: im pretty sure that typo can just be directly commited no need to pr that fix its a simple typo [18:29:53] awight: thanks for finding the typo [18:29:58] I will get it fixed ASAP [18:30:12] hehe I imagine it's just test data so probably harmless [18:30:21] ah [18:30:23] found it [18:30:24] https://github.com/wikimedia/puppet/blob/f21d9bf9efb9282523df33f3f32dec49ac8a1376/modules/ores/manifests/web.pp#L112 [18:30:28] although "howeve" is actually proper colloquial English [18:31:29] awight how is ores repo in deploymenet? [18:31:32] awight: if so then gfhdfh is too :P [18:31:33] ie for servermon it's [18:31:34] scap::target { 'servermon/servermon': [18:31:34] deploy_user => 'deploy-librenms', [18:31:34] } [18:31:48] so would it be ores/ores or ores-deploy/ores-deploy? [18:33:16] ah [18:33:18] found it [18:33:18] ores/deploy [18:33:24] im not sure for prod deploy i know our wmflabs deplyo is on github the list is at github.com/wiki-ai/ [18:34:37] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3402135 (10awight) I'm poking at the Redis db and found some interesting things... ``` oresrdb.svc.eqiad.wmnet:6379> ttl "celery-task-meta-enwiki:wp10:0.5.0:641962088"... [18:35:23] https://gerrit.wikimedia.org/r/#/c/363042/ [18:35:28] awight halfak amir1 ^^ [18:36:03] paladox: On production, our code files are owned by deploy-service/deploy-service, if that's what you wanted? [18:36:12] yep [18:36:19] that's what it should do [18:36:23] but fix it for labs too [18:36:32] nice! ty [18:36:32] follows other classes which do that too :) [18:36:37] your welcome :) [18:37:05] wiki-ai/revscoring#1084 (travis_flake8 - 65ee9e3 : Amir Sarabadani): The build failed. https://travis-ci.org/wiki-ai/revscoring/builds/249735793 [18:37:21] Mukunda should review this [18:37:41] as I said before, these kind of changes are extremely fragile as scap is not super stable [18:37:59] also this needs to be tested in beta cluster [18:40:57] or we could just add User and Group syntax for now as Amir1 is right. This will need testing otherwise it will break prod. [18:50:12] wiki-ai/revscoring#1086 (travis_flake8 - d79da88 : Amir Sarabadani): The build was fixed. https://travis-ci.org/wiki-ai/revscoring/builds/249739630 [18:50:57] I've fixed the change awight to be less dangerous :) [18:51:58] 10Scoring-platform-team, 10ORES, 10revscoring, 10User-Ladsgroup, 10artificial-intelligence: Add flake8 to travis checks - https://phabricator.wikimedia.org/T169473#3402176 (10Ladsgroup) A new one now: https://github.com/wiki-ai/revscoring/pull/333 [18:52:01] https://github.com/wiki-ai/revscoring/pull/333 [19:25:54] Thanks! [19:26:31] Amir1: Likewise, thanks for doing that bookkeeping. It'll pay off in the long run... [19:26:58] Thanks :) [19:47:55] awight: ive been gone for a bit anyhing major happen since like june 15th [19:49:18] Zppix: There was this excitement, https://wikitech.wikimedia.org/wiki/Incident_documentation/20170623-ORES [19:49:46] I knew that [19:50:15] That's all that comes to mind, then! [19:50:18] I am/have working on finding ways yo prevent that/adjust our icinga stuff [19:50:32] We also decided to rewrite revscoring in nodejs [19:50:57] Thats lot of conversions [19:51:00] j/k ;-) [19:51:07] I was gonna say [19:52:28] Fyi u prob figured it out but im pix1234 on github [19:59:16] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3402328 (10awight) Here's the decoded data, performed using `pickle.loads(b"LITERAL...")`. There was no traceback. ``` { "children": [], "result": { "features": {... [20:03:08] awight: I was thinking if we can define a styling standard for javascript codes and stick to it [20:03:10] Zppix: a person of many mysteries ;-) [20:03:19] what do you think? [20:03:30] Amir1: Let's do it. We can start with the Mediawiki style guide? [20:03:44] yeah, I take it from core [20:03:50] Amir1: we should maybe use mw style for js? keep it uniform and easy to remember [20:03:58] eslint [20:04:14] mediawiki is migrating from jslint to eslint [20:04:18] halfak: what you think? [20:04:24] fwiw https://www.mediawiki.org/wiki/Manual:Coding_conventions/JavaScript [20:04:53] Amir1: we should just use eslint so we dont have to worry bout changing immediately [20:05:07] halfak is afk AFAIK [20:05:23] lol [20:05:28] halfafk [20:06:10] Amir1: one issue is how long/how large of a commit will result in us converting to eslint standards? [20:06:22] 10Scoring-platform-team, 10User-Ladsgroup: Apply mediawiki core styling convention on javascript files - https://phabricator.wikimedia.org/T169576#3402334 (10Ladsgroup) [20:06:25] :)))) [20:06:38] Zppix: for wikilabels it will be rather big [20:06:41] but for ores it won't [20:06:50] we don't have much javascript [20:07:31] I think we should start with ORES then go to wikilabels, reason being is that wikilabels is mainly used by the public whereas ores' ui is rarely used afaik [20:07:58] 10Scoring-platform-team, 10ORES, 10User-Ladsgroup: Apply mediawiki core styling convention on javascript files of ores - https://phabricator.wikimedia.org/T169577#3402352 (10Ladsgroup) [20:08:21] 10Scoring-platform-team, 10Wikilabels, 10User-Ladsgroup: Apply mediawiki core styling convention on javascript files of wikilabels - https://phabricator.wikimedia.org/T169578#3402366 (10Ladsgroup) [20:08:25] yeah [20:56:25] hey you guys are aware of this on puppet for ores-web-04 right: "CRITICAL: Puppet has 2 failures. Last run 12 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml]" [20:56:50] Zppix: yes, thanks for noticing. we're tracking in that in bug... [20:56:52] number... [20:57:05] T169164 [20:57:05] T169164: ORES puppet error on labs boxes, unable to set user to "deploy-service" - https://phabricator.wikimedia.org/T169164 [20:57:28] awight: no problem i just logged into the icinga dashboard to look at some things for myself and noticed that and i was like ummm that is not intented :P [20:57:50] We like to leave one or two broken as incentive for the others to work harder. [20:58:46] any idea on when it will be fixed (so i can schedule it to remove the ack so no-one forgets to unack after its fixed and well stuff breaks and no one knows if not i can just make the ack auto-remove when the puppet check no longer errors out [20:59:45] There are two patches out for review, but I expect it to take a few days to merge. [21:00:15] okay ill just leave the ack and change its settings to auto-remove on recovery [21:00:16] Zppix the ack will auto remove once the service is fixed :) [21:04:15] ik [21:24:04] Anyone happen to know where our celery logs are pointed? [21:25:13] is it prod? [21:25:15] or wmflabs [21:25:27] In this case, production [21:25:33] i think logstash [21:25:57] i may be wrong though [21:55:44] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3402676 (10awight) Celery supports [[ http://docs.celeryproject.org/en/latest/userguide/monitoring.html | event logging ]], which might help us monitor and debug. It doesn't look like... [21:55:44] 10[1] 04https://meta.wikimedia.org/wiki/http://docs.celeryproject.org/en/latest/userguide/monitoring.html [21:59:10] O_o [21:59:20] * awight aims a kick... [22:14:33] Zppix mcdonalds begins deliverying to the uk in summer 2017. [22:15:10] ? [22:15:23] just saying for fun :) [22:15:35] i meant awight :P [22:15:50] we wont need to rely on just eat it to deliver. though i've never used it [22:23:04] Random junk food: FPGAs for ORES... https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/ [22:23:51] Also, we can mint *coin on days off :p [22:24:05] mint coin? [22:24:12] we already do that heh with the pounds [22:24:14] * awight smacks chops [22:25:43] i found the wikimedia-uk channel [22:25:51] guess what there's no uk channel heh [22:25:57] #uk is empty. [22:29:30] That's the UK chapter and not ukwiki? [22:29:46] aha yes I see [22:30:43] yep [22:32:02] 10Scoring-platform-team-Backlog: Send celery logs and events to logstash - https://phabricator.wikimedia.org/T169586#3402704 (10awight) [22:53:29] 10Scoring-platform-team-Backlog: Send error logs to logstash - https://phabricator.wikimedia.org/T168921#3402740 (10awight) When returning error responses, ores.wsgi.util.format_error summarizes as type=error class name, message=str cast. This response error handling code might be a good place to log the comple... [23:09:18] did you know you can create refs/heads/sandbox// branches [23:09:28] without needing to be an admin or owner of the project [23:09:39] though i doint recommend doing it [23:09:54] yes! They're neat, the only drawback IIRC is that you can't collaborate with other devs on those [23:10:06] yes you can. [23:10:13] oh! [23:10:26] you can do refs/for/sandbox// [23:10:29] So I can create a refs/heads/sandbox/paladox/surprise branch? [23:10:39] well you carn't create that [23:10:42] without being the user [23:10:48] so it will be [23:10:52] refs/heads/sandbox/awight/surprise [23:10:55] tryed it on mw [23:51:58] 10Scoring-platform-team: On ORES, some revisions frequently return TaskRevokedError - https://phabricator.wikimedia.org/T169367#3402904 (10awight) I'm either grossly misunderstanding something here, or (unlikely) have accidentally found the bug. The `"message": "revoked"` property is surprising, this value is t...