[00:38:59] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 26 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [00:51:44] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3389347 (10awight) Some links, https://www.mediawiki.org/wiki/Extension:FlaggedRevs https://www.mediawiki.org/wiki/API:Review https:... [01:02:55] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3389356 (10awight) Log record contents are, Approval: ``` log_id: 9445215 log_type: review log_action: approve log... [01:05:59] halfak: Can you take a look at this? https://phabricator.wikimedia.org/P5648 [01:06:04] I don't think I have the workflow right [01:07:39] The tuning report seems to say that our reference model has mean 84% accuracy, but the run I just did, with baseline feature code, claims 97% accuracy. [01:19:04] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:59:09] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 15 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [02:39:14] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 25 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [03:19:19] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [03:59:24] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 15 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [04:39:29] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 25 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [04:51:16] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3389498 (10Zache) Logging..log_action * approve, approve2 = reviewing the pending changes * approve-i, approve2-i = articles firs... [05:14:32] PROBLEM - puppet on ores-worker-08 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:19:28] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [05:40:51] awight, this looks fine. When you train with very few revisions and rare classes, you get strange problems. [05:41:21] E.g. Vandalism and attack just aren't even learned at all. [05:43:54] RECOVERY - puppet on ores-worker-08 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [05:59:33] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [06:39:38] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 25 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [07:19:43] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [07:59:48] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [08:39:53] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 26 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [09:19:58] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [10:00:02] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [10:40:07] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 26 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [11:20:07] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [11:21:26] This unneeded icinga alarms makes me numb towards them, it's dangerous [12:00:12] PROBLEM - puppet on ores-web-04 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 16 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ores/99-main.yaml] [12:13:29] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, and 2 others: [Discuss] Make ORES Review Tool preferences more prominent - https://phabricator.wikimedia.org/T167910#3349272 (10Johan) When (and where? All wikis?) will... [12:15:29] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, and 2 others: [Discuss] Make ORES Review Tool preferences more prominent - https://phabricator.wikimedia.org/T167910#3390506 (10Ladsgroup) The tag was for a small back-e... [12:23:48] ACKNOWLEDGEMENT - puppet on ores-web-04 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues paladox Ack [12:25:35] Amir1 you can ack it (enable sticky thingy) so that it will now only alarm here when there is a recovery and thus the ack removed :) [14:07:35] paladox, I think we should turn it off. [14:11:42] halfak: Don't you think we need to explain the usage of the argument "[--label-type=]" in revscoring tune? I mean, briefly explain its description below the text "Options:". Personally, when I see "label type", it's a bit ambiguous regarding to what do you mean with 'label' in this context. See: https://github.com/wiki-ai/revscoring/blob/master/revscoring/utilities/tune.py [14:12:30] I knew you have told me what you meant with 'label' yesterday. But, how about the other guys who are not familiar about it [14:12:51] halfak the puppet check for ores-web-04? [14:19:24] 10Scoring-platform-team, 10MediaWiki-JobQueue, 10ORES, 10Performance-Team, and 5 others: Job queue corruption after codfw switch over (Queue growth, duplicate runs) - https://phabricator.wikimedia.org/T163337#3390892 (10elukey) >>! In T163337#3388935, @Whatamidoing-WMF wrote: > Side question: Will stoppin... [14:40:56] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3390974 (10awight) @akosiaris Hi! It looks like we're ready to rock... I'd be happy to patch or deploy anything here, when the time is right. Let me know! [14:45:06] 10Scoring-platform-team-Backlog, 10Privacy: Use filesystem group permissions to protect deleted article content on ores wmflabs boxes - https://phabricator.wikimedia.org/T169123#3390995 (10Halfak) p:05Triage>03Normal [14:46:46] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10Edit-Review-Improvements-RC-Page, 10ORES, 10Wikidata: ORES: Don't highlight changes propagated from Wikidata - https://phabricator.wikimedia.org/T168487#3391019 (10Halfak) [14:47:51] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10Edit-Review-Improvements-RC-Page, 10ORES, 10Wikidata: ORES: Don't highlight changes propagated from Wikidata - https://phabricator.wikimedia.org/T168487#3366057 (10Halfak) Is this something that the ORES Review Tool is doing? I'm confused. [14:48:36] 10Scoring-platform-team-Backlog, 10Design: Discuss and create a UI mockup for the Meta-ORES editor interface - https://phabricator.wikimedia.org/T168993#3391041 (10Halfak) [14:48:39] 10Scoring-platform-team-Backlog, 10ORES: Meta ORES: UI - https://phabricator.wikimedia.org/T153148#3391042 (10Halfak) [14:49:25] 10Scoring-platform-team-Backlog: Create a phabricator project for meta-ORES - https://phabricator.wikimedia.org/T169229#3391045 (10Halfak) [14:49:31] 10Scoring-platform-team: Create a phabricator project for meta-ORES - https://phabricator.wikimedia.org/T169229#3391058 (10Halfak) [14:50:14] 10Scoring-platform-team-Backlog: Send error logs to logstash - https://phabricator.wikimedia.org/T168921#3380887 (10Halfak) p:05Triage>03Low [14:50:28] 10Scoring-platform-team-Backlog: Send error logs to logstash - https://phabricator.wikimedia.org/T168921#3380887 (10Halfak) p:05Low>03High [14:50:41] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3391075 (10akosiaris) Yeah I am still finishing some other tasks and will circle back to this next week. There's a good question I 've been asking myself. Should we fully switch from `scb*` nod... [14:51:03] 10Scoring-platform-team-Backlog: badid_rvstartid error during autolabel - https://phabricator.wikimedia.org/T168592#3391076 (10Halfak) p:05Triage>03Low [14:52:44] 10Scoring-platform-team: Remove custom apt repo from ores labs boxes - https://phabricator.wikimedia.org/T169129#3391080 (10Halfak) [14:54:43] 10Scoring-platform-team-Backlog: Investigate parallelizing the model makefile - https://phabricator.wikimedia.org/T168913#3380574 (10Halfak) Try running `revscoring extract` and checking top. It should parallelize. However there isn't parallelization for everything, so this task may still be valid. [14:55:04] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10draftquality-modeling, 10editquality-modeling, and 2 others: Investigate parallelizing the model makefile - https://phabricator.wikimedia.org/T168913#3391086 (10Halfak) [14:55:15] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10draftquality-modeling, 10editquality-modeling, and 2 others: Investigate parallelizing the model makefile - https://phabricator.wikimedia.org/T168913#3391088 (10Halfak) p:05Triage>03Low [14:56:13] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10editquality-modeling, 10Documentation, 10artificial-intelligence: Document nuances of training data - https://phabricator.wikimedia.org/T168912#3391090 (10Halfak) [14:56:27] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10editquality-modeling, 10Documentation, 10artificial-intelligence: Document nuances of training data - https://phabricator.wikimedia.org/T168912#3380543 (10Halfak) p:05Triage>03Low [14:56:54] 10Scoring-platform-team-Backlog, 10editquality-modeling, 10Spanish-Sites, 10artificial-intelligence: Complete Spanish Wikibooks edit quality campaign - https://phabricator.wikimedia.org/T145408#3391099 (10Halfak) p:05Normal>03Lowest [14:57:09] 10Scoring-platform-team-Backlog, 10ORES, 10revscoring, 10artificial-intelligence: Build a sketch of basic ORES model building patterns - https://phabricator.wikimedia.org/T148692#3391103 (10Halfak) [14:57:29] 10Scoring-platform-team-Backlog, 10ORES, 10Documentation, 10I18n: Make ORES meta documentation translatable - https://phabricator.wikimedia.org/T163786#3391107 (10Halfak) p:05Triage>03Normal [14:58:44] 10Scoring-platform-team-Backlog, 10ORES, 10Documentation, 10I18n: Make ORES documentation translatable - https://phabricator.wikimedia.org/T163786#3209937 (10Halfak) [14:59:45] 10Scoring-platform-team-Backlog, 10ORES, 10Documentation, 10I18n: Make ORES documentation translatable - https://phabricator.wikimedia.org/T163786#3391113 (10Halfak) https://www.mediawiki.org/wiki/ORES I'm OK with translating this page as-is. [15:00:13] 10Scoring-platform-team-Backlog: Grant AWight admin access to ORES pypi repos - https://phabricator.wikimedia.org/T168672#3371593 (10Halfak) p:05Triage>03High [15:01:08] 10Scoring-platform-team: Bury horrors of the editquality makefile - https://phabricator.wikimedia.org/T168455#3391117 (10awight) [15:02:12] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3365588 (10Halfak) Weird. We have been getting restarts when there's a labs-wide kernel update. What makes you think we aren't getting updates and what should we... [15:02:48] 10Scoring-platform-team, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3391121 (10Halfak) [15:04:09] 10Scoring-platform-team-Backlog, 10draftquality-modeling, 10artificial-intelligence: [Discuss] draftquality on a sample, humongous everything, or something else? - https://phabricator.wikimedia.org/T168909#3391124 (10Halfak) [15:04:25] 10Scoring-platform-team-Backlog, 10draftquality-modeling, 10artificial-intelligence: [Discuss] draftquality on a sample, humongous everything, or something else? - https://phabricator.wikimedia.org/T168909#3380334 (10Halfak) @halfak, make notes about what options there are. [15:09:07] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3391158 (10Paladox) @halfak they update the labs machine that hosts the vm. So the machines got updated but not the vms. [15:10:24] 10Scoring-platform-team, 10ORES, 10Easy, 10User-Ladsgroup: Switch ores precache to use new EventStreams - https://phabricator.wikimedia.org/T166046#3391160 (10Nuria) [15:22:51] PROBLEM - puppet on ores-worker-09 is UNKNOWN: [15:23:00] PROBLEM - puppet on ores-web-05 is UNKNOWN: [15:23:06] WTF [15:23:16] PROBLEM - puppet on ores-web-03 is UNKNOWN: [15:23:18] PROBLEM - puppet on ores-compute-01 is UNKNOWN: [15:23:22] PROBLEM - puppet on ores-worker-10 is UNKNOWN: [15:23:23] PROBLEM - puppet on ores-worker-05 is UNKNOWN: [15:23:27] Amir1 thats not me [15:23:37] i am getting it for my other instances too [15:23:37] PROBLEM - puppet on ores-redis-01 is UNKNOWN: [15:23:38] PROBLEM - puppet on ores-worker-06 is UNKNOWN: [15:23:38] PROBLEM - puppet on ores-lb-02 is UNKNOWN: [15:23:45] PROBLEM - puppet on ores-worker-07 is UNKNOWN: [15:23:46] PROBLEM - puppet on ores-redis-02 is UNKNOWN: [15:23:49] PROBLEM - puppet on ores-worker-08 is UNKNOWN: [15:23:56] are we supposed to get shit ton of alarms every f. day? [15:23:59] what's the point [15:24:03] no [15:24:14] there wont be any more for 40mins [15:24:34] it's because instances are freezing. [15:24:41] due to some kind of labs maintenance [15:24:57] !log set downtimes for labstore1004/1005 failover see https://etherpad.wikimedia.org/p/labstore_reboots [15:28:17] Amir1 also it will only notifiy if something is wrong, the other notification for ores-web-04 is because puppet is failing for a different reason which im going to switch off temp until its fixed. [15:35:17] Amir1 i've reported the problem in -cloud. Sudo su is freezing which is causing the timeout problems with icinga. [15:44:45] I wont be able to stop the next round of notifications ldap is having problems. [15:45:43] RECOVERY - puppet on ores-web-05 is OK: OK: Puppet is currently enabled, last run 20 minutes ago with 0 failures [15:45:46] RECOVERY - puppet on ores-worker-09 is OK: OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures [15:45:52] RECOVERY - puppet on ores-worker-05 is OK: OK: Puppet is currently enabled, last run 12 minutes ago with 0 failures [15:45:58] RECOVERY - puppet on ores-worker-10 is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures [15:46:00] RECOVERY - puppet on ores-web-03 is OK: OK: Puppet is currently enabled, last run 28 minutes ago with 0 failures [15:46:08] RECOVERY - puppet on ores-worker-06 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:46:10] RECOVERY - puppet on ores-lb-02 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures [15:46:11] RECOVERY - puppet on ores-compute-01 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures [15:46:12] RECOVERY - puppet on ores-worker-08 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:46:19] RECOVERY - puppet on ores-redis-01 is OK: OK: Puppet is currently enabled, last run 12 minutes ago with 0 failures [15:46:26] RECOVERY - puppet on ores-worker-07 is OK: OK: Puppet is currently enabled, last run 26 minutes ago with 0 failures [15:46:35] RECOVERY - puppet on ores-redis-02 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [15:59:08] wiki-ai/revscoring#1073 (BernoulliNB_Fix - cc05299 : Adam Roses Wight): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/248428098 [16:00:10] it was an ldap outage wikimedia wide. [16:00:16] should be recoverying now :) [16:00:49] wiki-ai/revscoring#1074 (BernoulliNB_Fix - c43c62d : GlorianY): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/248428340 [16:18:04] hmm, I’m considering grabbing quality predictions for the entire English Wikipedia, how easy is it to get https://analytics.wikimedia.org/datasets/archive/public-datasets/enwiki/article_quality/wp10-scores-enwiki-20160820.tsv.bz2 updated to a newer version? [16:18:36] Nettrom, I think I might have that for you already [16:18:38] or is there a way around this that involves downloading a dump and the enwiki wp10 model? [16:18:38] * halfak digs [16:20:01] Oh wait. Damn. I was confused. [16:20:08] So we can do an update for that. [16:20:16] And it is faster than just re-processing everything. [16:20:27] Essentially, the script knows how to update old data. [16:20:31] * halfak digs more. [16:21:15] it’s not very important, so feel free to ignore it [16:21:43] in other words, if it can be done by launching an update job or a one-liner, then it would be awesome :) [16:22:37] Nettrom, see https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/utilities/extract_scores.py [16:22:50] It should work if you provide a decompressed file to `--extend` [16:23:47] Set --score-at to "monthly" [16:24:04] I'd run it on stat1003 [16:26:52] 10Scoring-platform-team, 10ORES, 10Easy, 10User-Ladsgroup: Switch ores precache to use new EventStreams - https://phabricator.wikimedia.org/T166046#3391525 (10Halfak) [16:27:55] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3391527 (10Halfak) I'm surprised to find out that we aren't getting regular updates on these vms. Why has that happened and how do we change it? [16:29:53] awesome, thanks halfak ! :) [16:32:35] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3391538 (10Paladox) The only way to change that is to run a cron script that does apt-get update and then apt-get upgrade -y [16:34:40] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3391551 (10Halfak) +1 to @akosiaris' notes. I'm not sure how services will feel about us keeping the web workers on scb* nodes, but personally, I don't see an issue. They require a fraction o... [16:38:24] 10Scoring-platform-team-Backlog, 10ORES: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391572 (10Halfak) [16:38:39] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3391585 (10Halfak) [16:38:40] 10Scoring-platform-team-Backlog, 10ORES: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391584 (10Halfak) [16:46:30] 10Scoring-platform-team, 10cloud-services-team: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391616 (10Halfak) [16:46:56] 10Scoring-platform-team, 10cloud-services-team: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391605 (10Halfak) [16:47:33] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3365588 (10Halfak) I've talked to @bd808 in #wikimedia-cloud and he doesn't think there's a best practice for this yet so I created {T169247} and set that as a blo... [16:48:19] 10Scoring-platform-team, 10Labs, 10cloud-services-team: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391628 (10Paladox) [16:48:37] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure, 10ORES: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3391629 (10Halfak) [16:48:57] 10Scoring-platform-team, 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10Documentation: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391630 (10Halfak) [16:49:13] 10Scoring-platform-team, 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10Documentation: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391605 (10Halfak) I'd like to see a puppet class that is enabled by default that sets up a... [16:52:26] 10Scoring-platform-team, 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10Documentation: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391637 (10Halfak) Looks like the cron strategy is recommended practice. https://help.ubuntu... [16:53:04] 10Scoring-platform-team, 10Project-Admins: Create a phabricator project for meta-ORES - https://phabricator.wikimedia.org/T169229#3391639 (10Halfak) [16:54:52] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3391656 (10akosiaris) >>! In T168073#3391551, @Halfak wrote: > +1 to @akosiaris' notes. I'm not sure how services will feel about us keeping the web workers on scb* nodes, but personally, I do... [16:55:53] 10Scoring-platform-team, 10Labs, 10Labs-Infrastructure, 10cloud-services-team, 10Documentation: Document recommended process for installing OS upgrades in Wikimedia VPS - https://phabricator.wikimedia.org/T169247#3391605 (10faidon) Labs used to have unattended-upgrades install fleet-wide, not sure what h... [16:57:33] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3391667 (10Halfak) I'd like to stress the production-like setup so I'd want to send requests directly to load balancer that would send it to the web nodes. That might be crazy in which case, I... [16:58:36] 10Scoring-platform-team-Backlog, 10ORES: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391572 (10Halfak) In T168073#3391656, @akosiaris asked: > How do you plan to do that stress test ? Would you require the hosts to be fully installed with the ORES software ? Both roles (... [17:03:23] * halfak finally gets to the bottom of a deep phab hole [17:04:05] Phab hole: (n) a sequence of action of phab that started with something simple but suddenly required the creation of many parent/sub-tasks and conversations that spring up from them [17:05:50] lol [17:19:05] 10Scoring-platform-team-Backlog, 10ORES: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391758 (10akosiaris) >>! In T169246#3391668, @Halfak wrote: > In T168073#3391656, @akosiaris asked: >> How do you plan to do that stress test ? Would you require the hosts to be fully in... [17:20:09] thinking out loud... the way to randomly but disjointedly split the draftquality observations is to shuf the entire set, then head 4/5 and 1/5 into train and test [17:20:39] I see there's a train_test_split utility in scikit-learn, but it doesn't look like revscoring provides that directly. [17:21:19] awight, right. We used to use that but it doesn't provide the ability to write out and reuse files like that. [17:21:27] So I prefer the shuf-head pattern you described. [17:23:44] 10Scoring-platform-team-Backlog, 10ORES: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391767 (10Halfak) +1. I'm worried about some of the decisions we made about workers/CPU too. E.g. does the number of web workers roughly match the capacity we have for celery workers +... [17:25:42] kk [17:28:51] 10Scoring-platform-team, 10Collaboration-Team-Triage, 10Edit-Review-Improvements-RC-Page, 10ORES, 10Wikidata: ORES: Don't highlight changes propagated from Wikidata - https://phabricator.wikimedia.org/T168487#3366057 (10Catrope) >>! In T168487#3391019, @Halfak wrote: > Is this something that the ORES Rev... [17:35:40] * halfak reads glorian_wd's report [17:37:14] Looks good. Let's get this merged. [17:37:29] I'm going to submit a followup commit on top of your squashed commits and then call it good. :) [17:37:39] Will likely do that during lunch today. [17:43:56] It would be great to get some quick CR of this: https://phabricator.wikimedia.org/P5651 [17:44:04] cos I don't want to find out I was wrong in 24h [17:53:05] * halfak looks [17:53:50] err to be clear, by quick I meant "it is a small amount of CR" rather than "need have ASAP" [17:53:51] awight, I think your tail +n isn't going to work. [17:54:01] * halfak thinks harder [17:54:13] line 7 "bzcat $< | tail -n +100000 | head -n 700000 | bzip2 -c > $@ " [17:54:28] that one skips 100k, then samples 700k observations [17:54:30] in theory [17:54:39] Oh! wait. yeah. I guess that will work. [17:54:46] Why not just skip 100k and take the rest? [17:54:56] I did it backwards like that as a micro-optimization, so the 100k one can exit right away [17:55:01] oh [17:55:05] cos I wanted a known number [17:55:11] if that's not a thing, I'll go ahead and get 733k [17:55:33] Hmm... That's OK I guess. I'd just go for the 733k though :) [17:55:51] What's the rate of the rarest case that you expect to see in 100k observations? [17:56:31] What will " $@" be on line 36? [17:57:30] The least likely case is attack, at 0.14% for my 10k sample [17:57:35] Also FWIW, I think this process will take 5 hours or so -- not 24 :) It's the 5 fold CV + training on the full data that takes so long in the case of cv_train [17:57:42] oh good [17:58:08] 140 obs should be OK in the 100k dataset [17:58:28] I wish there was a convenient way to run both the baseline and sentiment jobs at the same time, I guess I could rsync the entire directory [17:59:13] Oh yeah. You can set up two feature lists and do an extract for both [17:59:34] Note that when we run extract, we'll often include the feature lists for all the models we eventually want to build. [17:59:46] Does that make sense? [17:59:49] * halfak gets an example [17:59:55] whoa [18:00:01] :DDDD [18:00:08] * halfak <3's dependency injection [18:00:10] it does make a little sense, but how do I create a new feature list? [18:00:24] See the "feature_lists" directory [18:01:01] In this case, we extract the necessary cache for both "damaging" and "goodfaith" models: https://github.com/wiki-ai/editquality/blob/master/Makefile#L229 [18:01:21] :DD makes me think of, http://homepages.uc.edu/~pairanrt/Images/Album%20Covers/relics.jpg [18:01:30] Then here, we tell the model to train on just the "damaging" features: https://github.com/wiki-ai/editquality/blob/master/Makefile#L284 [18:01:38] GOodfaith features here: https://github.com/wiki-ai/editquality/blob/master/Makefile#L310 [18:01:48] lemme look at codezee's patch to see if it lends itself to this monkey business [18:01:51] lol [18:01:53] ::::DDDD [18:02:42] "sentiment_based" [18:02:46] Perfect! [18:02:59] https://github.com/wiki-ai/draftquality/pull/3/files#diff-2880a70da137a52eb10e2a40a2c8514eR194 [18:03:52] you'll need to create a variable that is just called "draft_quality" and remove the sentiment_based features. Then add a new variable "draft_quality_and_sentiment = draft_quality + sentiment_based" [18:04:46] & you're saying I can somehow take a w_cache file and add features to it? [18:05:16] Note the first line in the makefile I linked to above. [18:05:32] https://github.com/wiki-ai/editquality/blob/master/Makefile#L229 [18:06:20] kk [18:06:31] now can I do it without recomputing all the baseline features? :) [18:06:58] lol. shuf: write error: No space left on device [18:07:24] I'll move to /srv [18:07:25] lol [18:07:27] damn [18:07:36] awight, I suggest putting your whole working directory on srv [18:07:47] And now, it looks like you probably extracted all of the features already. [18:07:54] hey, what's this about? awight@ores-compute-01:/srv$ sudo su - [18:07:54] Cannot execute zsh: No such file or directory [18:07:57] I'd just try using that cache file with the two feature sets. [18:08:13] ? [18:08:14] I dunno [18:08:15] hold up. That sounds like magic and I don't know how to get from point A to point B [18:08:42] so, I did extract the features. But I thought that sumit's patch added features. [18:08:45] looking now [18:08:59] yeah those are new features [18:09:15] "diff_polarity" [18:09:56] Ideally, I'd like to compute *just* that feature for my entire w_cache. Is that possible? [18:10:10] +1 so if you extracted sumit's features into the cache, you can just use that to do a subset of sumit's features (e.g. the old ones) [18:10:39] I didn't extract his features yet, my w_cache is the baseline feature list. [18:10:43] If you didn't extract sumit's features, you can re-use the cache, but I don't think it'll save you anything because it'll need to get the raw text to operate on again :( [18:10:56] but but. [18:10:58] but [18:11:03] I'd try though. Just pass the with_cache file into the extract utility [18:11:07] It'll use old cache if it can. [18:12:24] metrics... biab [18:12:34] Does anyone know where i can find the wikimedia desgn channel? Need to ask them what i should do desgn wise for polygerrit :) (for wikimedia needs off course) :) [18:12:34] #notalaptop [18:12:52] paladox: Sorry, I don't off-hand, but https://meta.wikimedia.org/wiki/IRC/Channels [18:12:59] thanks [18:17:52] #wikimedia-design [18:18:29] ah [18:18:31] thankyou [19:57:12] halfak: ok. Please walk me through this, very slowly ;-) I currently have a w_cache file with the baseline draftquality.feature_lists.enwiki.draft_quality feature list. [19:57:16] However [19:57:25] codezee's patch changes that same feature list [19:57:42] so should I tweak the code slightly, and rename that variable? [19:57:50] https://github.com/wiki-ai/draftquality/pull/3/files#diff-2880a70da137a52eb10e2a40a2c8514eR194 [19:58:02] s/draft_quality/draft_quality_w_sentiment/ [19:59:00] +1 [19:59:24] then I run wikiclass extract_from_txt with the old cache file as an argument, and I give both the old feature list variable name and the new one [19:59:34] +1 [19:59:53] * ack, which means I should have edited enwiki.py to have *both the old and new feature lists [20:00:36] After that, the workflow looks normal, and I just swap the feature list name when I want to build a sentiment-feely model [20:01:03] +1 [20:01:07] :) Looks like you got it [20:01:13] :) okay wish me luck [20:01:34] * halfak tries to manually squash and fix glorian's crazy long commit history [20:01:40] I don't want to just merge into master. [20:01:43] There's some things for me to fix. [20:02:26] you should be talking. https://github.com/wiki-ai/revscoring/pull/307 [20:03:07] Ahh but those commits make sense as atomic units :P [20:03:24] You don't see 6 commits in a row of "deleting pycache" :) [20:03:27] baahhahaha [20:03:34] We've all done that though [20:05:58] OMG IT WORKED [20:06:10] And it still attributed the commit to glorian [20:06:13] lolol [20:06:27] author: glorian_WD survivor: halfak [20:27:57] question. draftquality recommends installing it like "pip install draftquality" but I don't want to do that for development. [20:28:24] I would pip install -r requirements.txt, but I ran into glitches doing that last time. [20:28:38] Perhaps I should python setup.py --develop ? [20:30:05] Feature request: a built-in mechanism for doing all the steps to compare feature_list performance against feature_list_prime [20:33:05] awight, you shouldn't need to install for dev [20:33:13] Can you tell me more about the errors you got? [20:33:54] I... documented them somewhere, let me dig [20:34:55] well, I'm willing to go through it again. [20:35:03] but tell me which of the installation methods is recommended? [20:35:14] For extra points, I'd also like to use a git version of revscoring [20:35:37] awight, you'll need to setup.py install revscoring then :\ [20:35:57] Otherwise, pip install -r requirements.txt in draftquality is recommended. [20:36:22] I just did a test run in my local directory and it worked for me :( [20:36:24] ok, I'll do the latter [20:36:32] but what is "setup.py install revscoring"? [20:36:37] like, git clone revscoring [20:36:44] python setup.py install --develop there? [20:36:52] sharing a virtualenv with draftquality? [20:37:29] Apologies for the basic python stuff, but I want to understand the normal workflow [20:40:30] awight, woops. meant "python setup.py install" in the revscoring repo dir [20:41:01] I guess you could also symlink to the revscoring/revscoring directory in the base of draftquality too. [20:41:19] We do that in the deploy repos instead of installing revscoring, draftquality, etc. [20:41:57] neat, ok [20:43:49] too bad--pip insisted on installing revscoring from pypi. I'll delete that in lib/ [20:44:09] One thing I really don't like is, compiling numpy and scipy. [20:44:33] You need to either install revscoring locally first or do the symlink trick [20:44:42] The symlink will take precedence over local installs. [20:44:49] EIther way you'll need scipy numpy :( [20:45:00] If you are using python 3.5+ it's way faster to install [20:45:09] yeah, I just wish the wheel were all self-contained, like the .deb [20:45:20] oh. I'm using 3.4 [20:45:55] even debs need dependencies [20:45:58] ores-compute doesn't have anything newer [20:46:14] Ahh yeah. That's a pain indeed. [20:46:32] awight|brb, when you get back, try checking out our wheels repo and just installing all of them :) [20:46:36] done in 30 seconds. [21:01:37] halfak: I see no wheels repo [21:01:53] ores-wmflabs-deploy? [21:02:02] It's a submodule in there [21:02:07] kthx [21:02:10] It's not in github because scare binaries. [21:02:14] * halfak looks for repo [21:02:28] https://phabricator.wikimedia.org/diffusion/1915/ [21:05:35] Double requirement given: itsdangerous==0.24 from file:///srv/awight/ores-wmflabs-deploy/submodules/wheels/itsdangerous-0.24-py3-none-any.whl (already in itsdangerous>=0.21 (from Flask==0.10.1), name='itsdangerous') [21:05:46] I tried "pip install *.whl" [21:06:49] Don't drop work btw, I'm just leaving running commentary on my progress. I'll squawk extra loud if I get blocked. [21:08:47] I'm good. I used the wheels for scipy and numpy, the rest I'll do however I need to. [21:11:26] well, here I am compiling numpy again >.< [21:11:46] AH no, scikit-learn [21:14:17] happy me. [21:20:03] https://www.youtube.com/watch?v=AMTAQ-AJS4Y [21:23:44] woops. Accidently merged to master. Apparently "git push" will happily directly push to master even if you specify a different branch name. [21:23:55] I'm going to leave this and fix it. [21:37:14] My python deps are a steaming heap. [21:37:26] trying again... [21:37:40] halfak: what exact command did you use to magically install all the wheels? [21:38:20] pip install *.whl --nodeps [21:38:28] might be --no-deps [21:38:37] ah that helps ty [21:38:52] q [21:39:02] that was the one! [21:45:53] my wikiclass is trying to use a module that was removed from revscoring in 2015. [21:46:00] woah. [21:46:04] which one? [21:46:32] revscoring.datasources.revision [21:47:10] ah, I was trying to use wikiclass extract_features -h [21:47:18] turns out I should have been looking at extract_from_text [21:47:21] perhaps the former is stale [21:47:26] woops. We should probably crush that [21:47:51] can do [21:49:17] awight, if you've got a minute. Check out https://github.com/wiki-ai/wikiclass/pull/44 [21:49:25] erp. nvm, the feature *was* removed, apparently my entire wikiclass is stale [21:49:28] k [21:55:29] halfak: I don't see how to provide an existing cache file to extract_from_text [21:55:44] Oh! good point. [21:55:45] ooh--w_cache also includes the text [21:55:48] ? [21:56:02] It probably doesn't. :( [21:56:08] I think there's an arg to keep the text? [21:56:24] Not one that I see [21:56:48] No worries, I'll do this from scratch [21:56:57] The side project that would not end ;-) [21:57:16] Quite a good trial by fire tho [21:57:24] Usually I just accept that it will take additional time and re-extract from the beginning :\ [21:57:31] ++ [21:57:34] Machine time is cheaper than human time [21:57:55] When the machines figure out what we're talking about in this channel... [21:58:44] ha lol [22:03:07] lol Isla knows it's time for me to stop working [22:03:11] Like clockwork [22:04:13] d'oh. [22:04:14] revscoring.errors.CaughtDependencyError: ValueError: Failed to process feature.positive_polarity: Expected , but got instead. [22:04:18] Traceback (most recent call last): [22:04:21] File "/home/awight/.env/lib/python3.4/site-packages/revscoring/dependencies/functions.py", line 244, in _solve [22:04:24] value = dependent(*args) [22:04:26] File "/home/awight/.env/lib/python3.4/site-packages/revscoring/features/feature.py", line 43, in __call__ [22:04:29] return self.validate(value) [22:04:32] File "/home/awight/.env/lib/python3.4/site-packages/revscoring/features/feature.py", line 100, in validate [22:04:35] .format(self.returns, type(value))) [22:04:37] ValueError: Expected , but got instead. [22:04:40] debugging... [22:04:43] OMG [22:05:27] So, features are supposed to know the type that they return. If a polarity value is an int, it should be defined to return an int. [22:05:40] k that helps. [22:05:47] I'm not sure how this ever ran, though [22:07:40] I'm not sure it did, did it? [22:08:18] Probably needs some tests ;) [22:08:26] I think codezee ran it on a sample set. [22:08:34] Looking at the docs, that *is* supposed to be a float. [22:08:46] now it's getting interesting. [22:11:06] lol: starting value needed to be 0.0 rather than 0. [22:11:20] :) [22:13:49] I like those hi-tek solutions [22:15:44] Alright. I'm heading out. Have a good evening folks! [22:15:45] o/ [22:16:01] see ya! [22:40:48] 10Scoring-platform-team-Backlog, 10articlequality-modeling, 10draftquality-modeling, 10editquality-modeling, and 2 others: Investigate parallelizing the model makefile - https://phabricator.wikimedia.org/T168913#3393253 (10awight) I gave the --extractors argument to extraction and it's working nicely. We... [22:58:56] * Nettrom thanks the ORES team for their large prediction cache [22:59:24] * awight checks grafana for signs of API abuse :p [23:01:42] I’m single-threaded :) [23:02:11] yes, but dedicated! [23:02:37] we have to watch out for these mathy types [23:02:50] hi, by the way! Thanks for the algorithm ;-) [23:03:30] beware of the researcher, they might want more data than other people B) [23:04:09] hi, and you’re most welcome! great fun to have done some research that is useful to other people [23:04:27] Don't worry, we'll charge you a percentage of the profit you're making off of this data [23:05:01] (division by zero error...) [23:05:50] lol [23:09:24] not sure the WMF is making any profits yet, but I’ll let you know if they do :D [23:09:27] maybe it's actually negative, and thus they are happy to oblige [23:09:30] http://phdcomics.com/comics/archive.php?comicid=1941 [23:42:20] Nettrom: I think halfak mentioned that you had put some wiki entities into a graph db at some point? Is that still a thing? Do you have any documentation about that? [23:42:37] I'm very curious what might have jumped out of the data... [23:44:05] awight: yeah, I did work with Neo4J in 2009, but abandoned the project some time the same year. [23:44:40] ah hehe [23:44:47] I was looking into newcomer socialization in Wikipedia, similar to how other research had looked at that in FLOSS projects. [23:44:48] Can I ask why? [23:46:28] If I remember correctly it was partly due to lack of research skills and funding. [23:46:47] The funding part was probably the major one, I had to focus elsewhere. [23:47:03] But my poor research skills were probably a contributing factor. [23:47:25] But aha, I’ve built some graphs for the current project, using Python and networkx [23:47:36] but that’s not really a graph DB though :) [23:51:44] Interesting! [23:52:01] Anything written about that yet? [23:52:32] I've been playing with Neo4J for a side project, but we're actually tied into all the wikidata stuff [23:53:03] So naturally, I was imagining I'd dump at least editor behavior into a graph to see if I could whip up any useful queries [23:55:55] I’ve not written up anything specifically, but I can make a note to do so on a subpage of the project page. In this project I decided to use Wikidata for determining if articles should be side-chained (meaning they have a predefined importance rating), so I wanted a Wikidata relational graph stored that I could explore in Gephi. [23:56:28] if you’re interested in editor behaviour and network graphs, Brian Keegan’s been doing some work on that [23:56:52] e.g. mining behaviour around articles etc [23:57:30] * Nettrom made a note to document the Wikidata network stuff [23:58:36] So, I guess the top thing on my mind was that we could potentially decouple ORES feature extraction to make it easy to pull from external data sources rather than just the MediaWiki API, then a graph might be used to provide a few data points [23:59:26] Thanks, Brian Keegan looks like a fun read