[00:03:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.39, 4.13, 4.89 [00:10:01] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.92, 5.90, 5.26 [00:40:05] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.98, 7.98, 7.61 [01:10:10] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.55, 7.32, 6.85 [01:40:15] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 4.90, 6.08, 7.06 [02:10:19] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.86, 6.65, 6.62 [02:25:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 1.42, 3.61, 4.88 [02:30:01] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.71, 5.31, 5.17 [02:32:52] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.92, 4.51, 4.92 [02:39:13] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 6.68, 5.51, 5.11 [02:57:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 1.28, 3.71, 4.95 [03:02:01] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.82, 5.36, 5.22 [03:06:46] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.23, 4.28, 4.90 [03:14:05] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.35, 5.79, 5.15 [03:15:04] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 4.11, 5.15, 4.96 [03:22:20] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.51, 5.91, 5.18 [03:52:24] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 4.35, 3.75, 5.12 [03:59:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 1.64, 3.69, 4.85 [04:05:00] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 8.13, 5.73, 5.24 [04:08:48] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.51, 4.57, 4.94 [04:13:16] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.24, 5.59, 5.20 [04:43:19] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 4.71, 6.81, 7.00 [05:13:24] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.85, 7.88, 7.50 [05:43:29] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.14, 5.72, 5.70 [06:13:34] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.14, 5.31, 5.72 [06:43:39] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 1.17, 3.53, 5.38 [07:13:44] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.44, 7.33, 6.84 [07:43:49] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 8.00, 7.57, 6.96 [08:00:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.15, 2.88, 4.86 [08:13:00] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.17, 5.98, 5.20 [08:43:03] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 1.16, 3.88, 5.43 [08:44:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 1.00, 2.91, 4.88 [08:46:30] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, and 2 others: [Discuss] Make ORES Review Tool preferences more prominent - https://phabricator.wikimedia.org/T167910#3377876 (10Trizek-WMF) Adding user-notice: create a... [08:52:01] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.97, 5.85, 5.30 [09:15:36] 10Scoring-platform-team-Backlog, 10Labs, 10Labs-Infrastructure, 10Operations: Keep wmflabs scoring boxes up-to-date - https://phabricator.wikimedia.org/T168478#3377954 (10ArielGlenn) p:05Triage>03Normal [09:22:03] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 8.00, 8.02, 7.69 [09:32:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.25, 2.81, 4.93 [10:05:00] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.57, 6.95, 5.15 [10:35:03] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 6.90, 5.06, 5.50 [11:05:08] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 6.33, 4.66, 5.42 [11:35:13] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 3.22, 5.78, 6.29 [11:45:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 1.69, 3.68, 4.91 [11:50:01] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.87, 5.38, 5.20 [11:52:52] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.63, 4.32, 4.85 [11:58:16] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.06, 5.51, 5.13 [12:28:17] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 8.06, 7.49, 6.54 [12:36:21] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.07, 3.25, 4.85 [12:45:00] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.15, 5.68, 5.16 [13:15:02] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.08, 7.19, 7.00 [13:45:07] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 2.09, 4.41, 5.81 [13:48:22] RECOVERY - check load on Ores-Compute-01 is OK: OK - load average: 2.00, 3.07, 4.93 [13:53:00] PROBLEM - check load on Ores-Compute-01 is WARNING: WARNING - load average: 7.29, 5.05, 5.21 [14:02:55] ^ this needs to be turned off [14:10:49] halfak you can ack it on http://gerrit-icinga.wmflabs.org/dashboard :) [14:10:57] it will remove the ack when ever it recovers [14:11:07] ACKNOWLEDGEMENT - check load on Ores-Compute-01 is WARNING: WARNING - load average: 3.44, 5.88, 6.23 paladox ack [14:12:51] Looks like my old password doesn't work. [14:13:03] zppix gave me a password to log in with [14:13:57] halfak, i can create an account for you if you doint want to use ldap. [14:14:16] I had to fix it over the weekend after doing jessie to stretch upgrade on the host. [14:14:19] Oh yeah. I don't want to type my prod ldap for sure. [14:14:25] ok [14:28:09] paladox, did you turn off the load warning? [14:28:17] halfak i acked it. [14:28:25] So it should come back when ever it recovers [14:28:30] Great. [14:28:39] We should not have the load warning fire on the compute node. That's what it is for :) [14:28:57] ok [14:29:12] So you want it removed? The load check? [14:30:18] Yup [14:30:30] Thanks! [14:30:39] I'd do it if the UI worked for me :/ [14:31:03] halfak that's done through the puppet repo [14:31:12] ui wont allow you to remove it unless done through director [14:31:31] https://gerrit.wikimedia.org/r/#/admin/projects/labs/icinga2 [14:31:34] halfak ^^ [14:32:13] gotcha. [14:33:17] and the file is service [14:37:41] halfak https://gerrit.wikimedia.org/r/#/c/361462/ [14:38:26] (y) [14:38:58] deployed [14:52:12] great. [14:52:24] FYI, Amir1, I don't think we're going to be able to deploy today. [14:52:37] halfak: why :( [14:52:40] the model building process takes more than 24 hours these days [14:52:46] Working on ptwiki now [14:52:48] I see [14:52:50] okay [14:52:53] Alphabetically. [14:53:06] The problem is re-extracting all the features. [14:53:12] We've done lots of optimizations but still... [14:56:17] Yeah, I know :( [15:06:26] halfak: o/ [15:06:42] halfak: I have re-submitted the PR. I guess this should be the last one [15:20:45] 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3379263 (10RobH) I'm removing the #operations and #ops-access-requests tags, so this doesn't show in clinic duty triage, since its no longer an active request. I'd have declined it, but since its on tw... [16:01:34] halfak: ping [16:02:57] sorry bio [16:02:59] here now [16:18:03] 10Scoring-platform-team, 10User-Zppix: Graphite access for Zppix - https://phabricator.wikimedia.org/T168014#3379558 (10Zppix) >>! In T168014#3379263, @RobH wrote: > I'm removing the #operations and #ops-access-requests tags, so this doesn't show in clinic duty triage, since its no longer an active request. I... [16:24:00] 10Scoring-platform-team, 10editquality-modeling, 10revscoring, 10artificial-intelligence: Build damaging/goodfaith models for Romanian Wikipedia - https://phabricator.wikimedia.org/T156503#3379601 (10Halfak) a:03Sumit [16:32:00] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for Albanian - https://phabricator.wikimedia.org/T168369#3379637 (10Halfak) Looks like the generated list is there. Instructions: https://www.mediawiki.org/wiki/ORES/BWDS_review Ping: @M... [16:35:27] 10Scoring-platform-team, 10ORES, 10Services (watching), 10User-Ladsgroup: ORES POST precaching always fails with 500 - https://phabricator.wikimedia.org/T168674#3371682 (10Halfak) [16:35:29] 10Scoring-platform-team, 10ORES: Mid June 2017 ORES deployment - https://phabricator.wikimedia.org/T168099#3379655 (10Halfak) [16:37:25] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Fix degenerate regular expressions for matching "hahaha" and "jajaja" - https://phabricator.wikimedia.org/T168888#3379659 (10Halfak) [16:37:32] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Fix degenerate regular expressions for matching "hahaha" and "jajaja" - https://phabricator.wikimedia.org/T168888#3379674 (10Halfak) 05Open>03Resolved [16:37:53] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Fix degenerate regular expressions for matching "hahaha" and "jajaja" - https://phabricator.wikimedia.org/T168888#3379659 (10Halfak) [16:37:54] 10Scoring-platform-team, 10ORES: Mid June 2017 ORES deployment - https://phabricator.wikimedia.org/T168099#3379676 (10Halfak) [16:38:29] 10Scoring-platform-team, 10ORES, 10articlequality-modeling, 10editquality-modeling, 10artificial-intelligence: Rebuild all of the models for ORES (new regexes) - https://phabricator.wikimedia.org/T168889#3379678 (10Halfak) [16:39:01] 10Scoring-platform-team, 10revscoring, 10artificial-intelligence: Fix degenerate regular expressions for matching "hahaha" and "jajaja" - https://phabricator.wikimedia.org/T168888#3379710 (10ssastry) [16:40:50] 10Scoring-platform-team, 10Bad-Words-Detection-System, 10revscoring, 10artificial-intelligence: Add language support for Albanian - https://phabricator.wikimedia.org/T168369#3379718 (10Halfak) a:03Sumit [16:41:00] 10Scoring-platform-team, 10ORES: Mid June 2017 ORES deployment - https://phabricator.wikimedia.org/T168099#3379720 (10Halfak) a:03Halfak [16:43:56] 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Flagged revs approve model to fiwiki - https://phabricator.wikimedia.org/T166235#3379730 (10Halfak) a:05Ladsgroup>03awight [16:47:21] awight: I cannot download and train the draftquality model because of permissions, let me know if you need to know anything on the PR [16:50:01] awight, I needed to request access to deleted text via my staff account in order to run the extractor. [16:54:37] halfak: while i was playing aroung with bwds on my laptop, it was taking a lot of time due to limited bandwitdth, same would probably happend with model training, is it possible for me to access one of the labs accounts used for training models? [16:54:47] *would happen [17:09:53] 10Scoring-platform-team, 10MediaWiki-JobQueue, 10ORES, 10Performance-Team, and 5 others: Job queue corruption after codfw switch over (Queue growth, duplicate runs) - https://phabricator.wikimedia.org/T163337#3379884 (10elukey) The idea about the experiment is to remove rdb2004 as slave of rdb2003, to see... [17:15:58] codezee: Okay thanks, I'm sure I'll have some questions shortly! [17:18:11] halfak: ping :D https://github.com/wiki-ai/ores/pull/209 [17:50:48] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380009 (10awight) [17:53:17] Amir1, on it [17:53:40] thanks [17:58:09] Weird. I merged it but it didn't auto close. I can see the commit in master [17:58:51] Thanks! [18:06:52] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: Experiment with Sentiment score feature for draftquality - https://phabricator.wikimedia.org/T167305#3323948 (10awight) >>! In T167305#3340316, @Sumit wrote: > So I could setup a test with the library - https://github.com/kevincobai... [18:13:32] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: Experiment with Sentiment score feature for draftquality - https://phabricator.wikimedia.org/T167305#3380074 (10Sumit) >>! In T167305#3380066, @awight wrote: >>>! In T167305#3340316, @Sumit wrote: >> So I could setup a test with the... [18:19:11] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380076 (10awight) I think I mentioned this in code review, but with a fresh vagrant checkout and VM, I get: > ==> default: Execution... [18:24:51] awight, where did that work you were doing on the basic Meta ORES docs end up getting saved? [18:27:07] halfak: This is the entry point, https://www.mediawiki.org/wiki/Meta-ORES [18:27:26] Thanks! [18:28:55] ooh--I had a better diagram, I'll try to iterate on that shortly. [18:37:47] Nice. I saved a couple edits too :) [18:38:02] keegan is hoping to use that as a case study in outreach around a new software thingie [18:41:50] codezee: It looks like I have sufficient permissions to run the draftquality makefile... [18:42:05] Currently, I'm pulling datasets/enwiki.draft_quality.201508.tsv.bz2 [18:42:15] Do you think that will be enough to run your stuff? [18:44:11] Also--how do I run your stuff? You mentioned something about running the utilities in the Makefile, it looks like I would have to pull all data sets and train the model? [18:45:13] awight: yes model training would be necessary, since I've added features to the original draftquality model, its like retraining the model with added feature set [18:46:03] awight: if we could see a tuning report like - https://github.com/wiki-ai/draftquality/blob/master/tuning_reports/enwiki.draft_quality.md with some improvement it would mean the features work [18:46:14] o/ Keegan [18:46:20] \o [18:46:37] halfak: Amir1: Any thoughts about how to train models that require privileged data? We shouldn't be exporting that stuff to wmflabs... [18:46:50] Was just talking to awight about that meta-ORES page. It's pretty sparse at the moment. I think we need a nice coherent statement and maybe even a wireframed ui before reachin out. [18:47:26] awight: any way you could extract individual feature values of the three sentiment features with associated labels of spam, attack, vandalism or normal? that'd be very helpful for feature study [18:47:34] awight, fair point. I don't think it's crazy to have a dataset with a random sample of deleted pages on a labs VM, but it's not the best./ [18:47:39] like the one I showed on task [18:47:42] we need a secure place to train model files. [18:48:11] halfak: Better yet, we should be able to completely decouple feature value caching. [18:48:31] Then, we can import the feature values from external sources. [18:48:41] awight, not sure how to decouple any more than we currently are. [18:48:53] Oh! You mean "don't store the text -- just extract the features" [18:48:56] yes [18:48:57] We can do that! No problem. [18:49:06] Just a bit slower for re-training. [18:49:12] But not too crazy. [18:49:31] Just skip the w_text step and go right to the w_cache step [18:50:01] I think you could even pipe the output. [18:50:57] * awight looks for the howto :) [18:51:19] Ha. I think the makefile should give strong indications of where stdout and stdin can be matched. [18:51:28] lol @ hotwo [18:51:32] *howto [18:51:39] hot potato [18:53:28] Another example of what I'm imagining for decoupling, I want to pull just the sentiment features codezee is adding, but AFAICT I currently have to go also process the entire feature set for the data. [18:53:47] halfak: do you have a time to look at my PR today? [18:53:57] awight, that'd take a little coding but you could do it. [18:54:13] It'd be a somewhat happy interpreter dance. [18:54:20] See https://github.com/wiki-ai/revscoring/blob/master/ipython/feature_engineering.ipynb [18:54:28] for how to extract arbitrary sets of features [18:55:06] cool, this does sound like a fun adventure [18:55:34] I love that notebook. It comes in super handy for a lot of ORES stuff :) [18:56:04] yes, its been helpful for understanding revscoring feature datasources thingy [18:57:06] Keegan, anything obvious you want to deal with in that mw page for Meta-ORES? [18:59:29] halfak: Overall the content looks good [18:59:55] Seems it still lacks a clear statement and maybe a UI example explaining WTF we are talking about. [19:01:49] halfak: It could probably use a nutshell in simple English of what the tool will be for at the top [19:02:02] I'm not sure everyone is going to be able to understand on first pass [19:03:35] halfak: I also added that suggestion as a talk page topic simple to turn the discussion tab blue :) [19:03:41] *simply to [19:03:45] Nice [19:03:47] :D [19:03:50] Legitimizing [19:04:10] halfak: in case you missed earlier, any possibility to get labs access for training models? [19:05:59] "labs access"? [19:06:07] As in access to our instances? [19:06:20] I'm surprised we didn't already get to that yet! [19:06:27] Do you have an labs shell name? [19:06:31] codezee, ^ [19:07:25] halfak: yes, same as this nick i think you'll just need to add me to the required groups afaik [19:07:40] Sure. Can do [19:19:37] awight, don't forget to make tasks for your nice little PRs :) [19:19:41] e.g. https://github.com/wiki-ai/draftquality/pull/4 [19:19:44] harr [19:19:45] k [19:19:52] They are great things to report in our status updates :D [19:21:33] halfak: whats with the no_review labels in the makefile of editquality? I see not all languages have that no_review set... [19:21:54] codezee, yeah. This had to do with the way we used to sample revisions for review. [19:22:12] There was a short period of time and subset of projects that used a different partern. [19:22:18] You don't need that for albanian [19:22:26] 10Scoring-platform-team: Minor cleanup in Makefiles - https://phabricator.wikimedia.org/T168904#3380217 (10awight) [19:22:34] or romanian i guess? [19:23:34] 10Scoring-platform-team: Minor cleanup in Makefiles - https://phabricator.wikimedia.org/T168904#3380233 (10awight) https://github.com/wiki-ai/draftquality/pull/4 https://github.com/wiki-ai/editquality/pull/75 [19:23:38] i might disappear in a while so let me know through mail or a task the instance(s) name i'm granted access to [19:23:56] codezee, gotcha. [19:25:00] codezee, https://wikitech.wikimedia.org/wiki/User:Codezee [19:25:14] codezee: I could use a few more hints about how to extract the features you want... [19:25:28] I'm currently pulling all the draft extracts listed in the draftquality makefile [19:25:46] Then, I'll figure out how to evaluate your new features [19:26:10] But what's most useful for you--is a list of the new feature values for a small sample of deleted drafts all you need? [19:26:25] halfak: https://wikitech.wikimedia.org/wiki/User:Sumit [19:26:35] i though you were asking the shell name earlier :/ [19:26:41] *thought [19:27:09] Oh yeah. FOrgot they can be different [19:27:20] And forgot I needed the wiki name to do it through wikitech [19:27:50] yeah they get confusing... [19:28:08] codezee, are you set up to ssh to other labs instances? [19:28:16] ssh ores-compute-01.eqiad.wmflabs [19:28:18] awight: yes if i get the feature values of all three sentiment features for a decent number of drafts of each category that'd be nice... :) [19:28:38] codezee: cool--I'll split by category [19:28:39] halfak: i can ssh to eqiad let me try this one [19:28:58] If you just want to run some tests with data, you could work from https://figshare.com/articles/Deleted_Wikipedia_articles_spam_vandalism_attack_/4245035 [19:29:03] codezee: A decent number, is like 10,000? or 100? [19:29:08] sorry I'm new here ;-) [19:30:04] halfak: Seems problematic to only use censored data which is... not censored? [19:30:09] awight: 10k is always better than 100 when it comes to AI ;) [19:30:17] hehe [19:30:23] i think we should get a good enough idea with 10k [19:30:31] oh, "sentiment features" [19:30:37] not "sentient features" :P [19:30:52] lol stay tuned for the sentiment sentience [19:31:17] halfak: I'm able to ssh, i suppose i'll not need to install any ubuntu specific deps ? just create a virtualenv right ? [19:31:30] right. [19:31:42] thanks that should be it :) [19:31:47] Platonides, this is AI, right? [19:32:37] that's why it mostly made sense :) [19:34:22] codezee: lmk if halfak's figshare lets you do the processing you need? It might be nicer for you to own the data, than wait for me each time you change something... [19:34:31] halfak awight: the above link is already the same as the github sample and its results are on the phab task [19:34:38] lol [19:35:09] \o/ [19:35:23] awight: halfak on that dataset the hypothesis holds, imo the litmus test is if the scoring report shows a rise in accuracy [19:35:44] which has to be on the full dataset which I can't touch :/ :/ [19:36:25] aah--so rather than extract specific feature values, I should be training a model, and post the results? [19:37:36] awight: model report should be fine, i meant if you're able to get feature specific values it was even better :) [19:37:57] awight: and i think for the immediate case retraining the model is far easier [19:38:09] +1 I'll start with that [19:39:00] maybe we can have AI someday to censor text for us automatically, :D [19:39:37] that's the previous step to the AI realizing nothing we want to write about is worthy [19:39:47] at least, that would make its work easier! [19:41:44] 10Scoring-platform-team-Backlog: Design how we'll train models which depend on private data - https://phabricator.wikimedia.org/T168908#3380311 (10awight) [19:44:38] BTW, what's the theory behind .gitignoring the datasets dirs? [19:46:37] awight, mostly we don't want to check anything in from there. [19:46:50] I like git add -f to add a dataset when I really mean it. [19:47:16] haha fair enough [19:48:33] omg that .tsv.bz2 is useless [19:54:16] 10Scoring-platform-team: draftquality should be trained on a sample, rather than humongous everything - https://phabricator.wikimedia.org/T168909#3380334 (10awight) [19:54:26] 10Scoring-platform-team-Backlog: draftquality should be trained on a sample, rather than humongous everything - https://phabricator.wikimedia.org/T168909#3380346 (10awight) [19:55:20] halfak: Hey that reminds me of something I noticed last week. Please chat w/ me some time about the theory behind sampling / randomizing [19:55:43] E_OUTOFBRAINERROR [19:55:51] :D [19:55:56] There was no more brain to allocate at the moment [19:55:57] kill -9 [19:56:02] * halfak dies [19:56:05] * halfak is reborn [19:56:08] systemctl restart [19:56:13] lol [19:56:13] nohup [19:57:06] we need to throw more halfaks to the problem [19:57:06] gotta set that zombie process loose [19:57:14] -j6 [19:57:42] oh no! we've catalyzed geekpocalypse [20:47:53] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380445 (10awight) I think the ORES service might not be running. ``` vagrant roles list -e Enabled roles: ores... [20:55:39] 10Scoring-platform-team: Minor cleanup in Makefiles - https://phabricator.wikimedia.org/T168904#3380454 (10awight) Some nastiness I just discovered: many commands will overwrite their output on failure. E.g., ``` datasets/enwiki.draft_quality.201601.tsv.bz2: \ sql/draft_quality.variables.sql echo '... [21:36:37] https://gerrit.wikimedia.org/r/361576 [21:37:12] 10Scoring-platform-team-Backlog: Document nuances of training data - https://phabricator.wikimedia.org/T168912#3380543 (10awight) [21:38:24] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380555 (10Tgr) >>! In T159105#3380076, @awight wrote: >> ==> default: Execution of '/bin/systemctl start ores-wsgi' returned 6: Fail... [21:41:27] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380564 (10awight) Yep, the process was roughly, * vagrant role enable ... * vagrant destroy * vagrant box update * vagrant up [21:44:50] 10Scoring-platform-team-Backlog: Investigate parallelizing the model makefile - https://phabricator.wikimedia.org/T168913#3380574 (10awight) [21:45:15] * halfak submits enormous PR for editquality [21:45:47] halfak: need +2 on that, or are you self-merging? [21:46:13] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380589 (10Tgr) >>! In T159105#3380445, @awight wrote: > I think the ORES service might not be running. After a successful provision... [21:46:47] awight, https://github.com/wiki-ai/editquality/pull/77 [21:46:48] <3 [21:47:47] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380593 (10awight) ``` vagrant provision ==> default: Running provisioner: lsb_check... ==> default: Running provisioner: shell...... [21:50:39] Amir1, I think we might not make it with the draft quality model [21:51:20] halfak: we can delay it a little [21:51:26] what can I do? [21:51:30] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10ORES, 10Wikilabels, 10Patch-For-Review: ORES services should have vagrant roles - https://phabricator.wikimedia.org/T159105#3380604 (10awight) Looks pretty simple, actually! Missing Python dependency? Here are snippets from syslog, ``` Jun 26 21:47:21 me... [21:52:22] Amir1, not sure. I'm waiting on ores-compute-01 to finish training the draft quality model and it is mega slow. [21:52:27] 500k observations [21:52:40] We can only run two processes in parallel because of memory issues. [21:53:14] halfak how much storage does ores-compute-01 one have? [21:53:24] Is it an xtra large one which has alot of ram? [21:53:43] It has 16GB of ram [21:53:45] 8 "cores" [21:53:53] oh lol [21:53:54] But I don't think we're the biggest size. [21:53:56] We could upgrade. [21:54:08] it takes up 16gb? [21:54:16] paladox: If you're curious, https://tools.wmflabs.org/openstack-browser/project/ores [21:54:41] thanks [21:55:04] according to the ui [21:55:06] it has 8 [21:55:10] 8gb of ram. [21:55:16] for ores-computer-01 [21:55:29] paladox: I think the columns are confusing [21:55:48] i.e. off by one [21:55:51] oh [21:55:55] yeh [21:55:59] We seem to be using open source :p [21:56:09] o [21:56:14] oh i see [21:56:14] 16384M [21:56:19] j/k, proprietary software would be whitescreening inexplicably at this point. [21:56:41] paladox, 16GB [21:56:45] :P [21:56:47] yep [21:56:54] it was out of place :) [21:57:52] Here's an interesting one: [21:57:52] 21066 halfak 20 0 1055076 470508 24028 R 98.7 2.9 16:09.88 revscoring [21:57:55] 21065 halfak 20 0 5505852 5.247g 1748 S 0.7 33.5 1:12.35 shuf [21:58:04] It's actually "shuf" doing the ram killing [21:58:10] awight, right [21:58:12] I saw that [21:58:16] It'll be revscoring soon [21:58:38] If the order doesn't matter for training and we're not sampling, we can take shuf out of the chain... [21:58:59] head -n 500000 wouldn't have the same guarantees [21:59:42] I think that shuf will exit as soon as one revscoring process has all of the data. Then revscoring will spawn 2 more processes to do the cv pattern. [22:00:24] OK here we are. I say we try to move forward without draftquality [22:00:28] gotcha [22:00:33] Amir1, ^ [22:00:47] I don't see anything [22:00:49] :( [22:00:54] ? [22:00:55] fwiw I see that other people are annoyed at shuf. There are algorithms which can do this without 5GB of memory. [22:01:09] sorry, I thought it seems I need to merge something [22:01:12] Amir1: you looking for the PR? I just merged [22:01:16] okay [22:01:54] halfak: I'm okay with moving without draftquality [22:02:00] kk [22:02:03] I'd like to shoulder-surf this deploy, lmk [22:02:06] Will have a prod config pr soon [22:02:10] we can deploy twice, I can do it tomorrow later [22:02:20] awight, call when ready [22:03:56] should we jump into a call? [22:36:54] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380731 (10awight) 05Resolved>03Open Looks like I'll need shell access to scb1002.eqiad.wmnet, in order to do canary tests while deploying.... [22:49:20] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380745 (10Halfak) [22:49:49] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380759 (10awight) Sounds like I'll need shell on scb[1-2]* and also the ores-admin group, so I can do terrible things on production boxes. [22:49:51] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380760 (10Halfak) [22:50:20] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380745 (10Halfak) [22:50:23] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380762 (10Halfak) [22:50:33] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380745 (10Halfak) [22:57:03] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380797 (10Ladsgroup) This is the only thing that needs to be done [22:58:44] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380803 (10RobH) [22:58:48] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380801 (10RobH) 05Open>03Resolved Addition to the ores-admins is a sudo group, and thus will require review during the... [22:59:41] 10Scoring-platform-team: Get Adam all the rights - https://phabricator.wikimedia.org/T168917#3380745 (10RobH) [22:59:52] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380805 (10RobH) 05Resolved>03Open Also no one reopened this when requesting more rights be added, opening it back up now. [23:01:03] 10Scoring-platform-team-Backlog, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Grant AWight accounts on ores production clusters - https://phabricator.wikimedia.org/T168442#3380813 (10awight) My fault, I've been flapping this task like crazy... T168442#3380731 Thanks for taking a look! [23:03:52] 10Scoring-platform-team-Backlog, 10ORES, 10Easy: ORES 500's on integers that can't be processed - https://phabricator.wikimedia.org/T168920#3380833 (10Halfak) [23:07:27] So, is this only wmflabs? [23:07:27] https://logstash.wikimedia.org/app/kibana#/dashboard/ORES?_g=()&_a=(filters:!(),options:(darkTheme:!f),panels:!((col:1,id:Dashboards,panelIndex:1,row:1,size_x:12,size_y:2,type:visualization),(col:1,id:Events-Over-Time,panelIndex:2,row:3,size_x:9,size_y:2,type:visualization),(col:1,id:Event-Types,panelIndex:3,row:5,size_x:5,size_y:3,type:visualization),(col:6,id:Event-Level,panelIndex:4,row:5,size_x:3, [23:07:33] size_y:3,type:visualization),(col:1,columns:!(type,level,wiki,host,message),id:Default-Events-List,panelIndex:5,row:8,size_x:12,size_y:25,sort:!(%27@timestamp%27,desc),type:search),(col:10,id:Top-20-Hosts,panelIndex:6,row:3,size_x:3,size_y:2,type:visualization),(col:9,id:Events-Over-Time-By-Channel,panelIndex:7,row:5,size_x:4,size_y:3,type:visualization)),query:(query_string:(analyze_wildcard:!t,query [23:07:39] :%27type:ores%27)),title:ORES,uiState:(P-2:(vis:(legendOpen:!f)),P-3:(vis:(legendOpen:!f)),P-4:(vis:(legendOpen:!f)),P-6:(vis:(params:(sort:(columnIndex:!n,direction:!n)))))) [23:07:42] barf. [23:07:44] https://logstash.wikimedia.org/app/kibana#/dashboard/ORES [23:12:48] I ask because the server errors we just triggered are nowhere to be seen. [23:16:09] * halfak looks [23:17:12] 10Scoring-platform-team, 10ORES, 10Easy: ORES 500's on integers that can't be processed - https://phabricator.wikimedia.org/T168920#3380873 (10Halfak) [23:17:38] https://github.com/wiki-ai/ores/pull/210 [23:17:43] {{done}} [23:17:43] How efficient, halfak! [23:17:48] Damn right AsimovBot [23:18:22] wat [23:18:29] AsimovBot: help [23:18:29] Asimov v. 2, By jem (IRC) / -jem- (Wikimedia), 2010-17 - Bot IRC de apoyo a los proyectos y al movimiento Wikimedia programado en PHP - Las órdenes deben escribirse precedidas de alguno de los prefijos admitidos (@!-=) - 13Lista de órdenes: -ord - 13Problemas o sugerencias: -sug - 13Ayuda: -? 15,02orden / -ic - 10http://wikimedia.es/asimov?uselang=en [23:19:57] awight, I'm not very familiar with logstash [23:20:14] How do you search for type:ores and level:ERROR [23:20:28] why does that bot speak which i presume spanish? [23:20:48] jem speaks spanish, I guess [23:21:07] jem used to hang out around here, but I don't see them much anymore. But AsimovBot remains. [23:21:22] [[Foobar]] [23:21:22] 10[1] 04https://meta.wikimedia.org/wiki/Foobar [23:21:25] halfak: me neither--my uneducated guess is that the server error is falling through all the safety nets and might not be using python's logging settings. [23:21:43] Hmm... Certainly possible [23:21:46] [[es:AsimovBot]] [23:21:46] 10[2] 04https://es.wikipedia.org/wiki/AsimovBot [23:22:45] oh [23:23:08] 10Scoring-platform-team, 10ORES, 10articlequality-modeling, 10editquality-modeling, 10artificial-intelligence: Rebuild all of the models for ORES (new regexes) - https://phabricator.wikimedia.org/T168889#3379678 (10Halfak) https://github.com/wiki-ai/editquality/pull/77 [23:23:41] 10Scoring-platform-team, 10ORES, 10Easy: ORES 500's on integers that can't be processed - https://phabricator.wikimedia.org/T168920#3380833 (10Halfak) https://github.com/wiki-ai/ores/pull/210 [23:24:45] From T149010: > I am wondering however how we could change the logformat to support more level (like ERROR, WARN) etc. [23:24:45] T149010: Send ORES logs to logstash - https://phabricator.wikimedia.org/T149010 [23:26:50] 10Scoring-platform-team-Backlog: Send error logs to logstash - https://phabricator.wikimedia.org/T168921#3380887 (10awight) [23:34:19] I wonder why our redis client count is so high? https://grafana.wikimedia.org/dashboard/db/ores?orgId=1&panelId=22&fullscreen&from=1498515346973&to=1498518706973 [23:35:15] Ah, maybe we have 100 workers per box?... [23:41:48] awight, that sounds about right [23:42:01] both celery and uwsgi workers get a connection