[04:25:18] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve cleaning of article quality assessment datasets - https://phabricator.wikimedia.org/T170434#3560995 (10Nettrom) I'm a bit pressed for time at the moment, so to prevent this from stalling I'd like to propose that a first pr... [11:48:17] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3561691 (10Marostegui) How do you guys want to proceed with this in the end? Is it worth the risk? [15:27:45] 10Scoring-platform-team, 10articlequality-modeling, 10artificial-intelligence: Improve cleaning of article quality assessment datasets - https://phabricator.wikimedia.org/T170434#3562292 (10Halfak) If you can get me clean labeled data, I can get the model updated. No problem. [16:12:37] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3562459 (10chasemp) a:03madhuvishy [16:15:44] o/ awight [16:15:48] good morning [16:16:01] holla! [16:29:49] 10Scoring-platform-team-Backlog: [Investigate] ORES worker threads shouldn't use Redis connection pool - https://phabricator.wikimedia.org/T174403#3562516 (10Halfak) [16:29:55] I'm going to head to lunch [16:30:03] k [16:30:14] Thanks for the summary ^ [16:34:59] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3562523 (10madhuvishy) @Marostegui We talked about this today in our meeting, and think that since we don't have significant user traffic moved over from 1001... [16:35:03] I’ve been tinkering with the meta-ores schema etherpad. [16:35:12] Trying to each one serving of candy a day now. [16:35:14] *eat [16:38:07] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3562543 (10bd808) I vote we close this as "resolved" with a note that 1001/3 have not been rebooted because of the fear of catastrophic hardware failure and t... [16:54:11] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3562639 (10Marostegui) >>! In T168584#3562523, @madhuvishy wrote: > @Marostegui We talked about this today in our meeting, and think that since we don't have... [17:23:18] halfak: I’m confused. scb1002 doesn’t report that it has ORES puppet roles. [17:23:50] Did we migrate to our own cluster already? [17:24:12] * awight reads operations/puppet [17:24:47] oof, okay I see all the uwsgi processes. [17:26:47] When you’re back from lunch—I’m also wondering how celery is distributed over the cluster. Is there a drawing? [17:29:29] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3562860 (10madhuvishy) 05Open>03Resolved 1001/3 have not been rebooted because of the fear of catastrophic hardware failure and their impending decomm. [17:34:05] awight, currently no drawing for the SCB cluster [17:34:16] both celery and uwsgi workers are running on all of the SCB nodes. [17:34:19] (or should be) [17:34:30] okay, ty [17:34:56] I must be looking at the wrong thing btw. I see c. 50 uwsgi processes on scb1002, but only one celery worker [17:35:40] I see 26 celery workers [17:35:49] ps aux | grep celery | wc [17:35:55] *46 [17:36:04] yep nvm me. Donno why my grep was bad [17:40:41] There are no workers on ores1001 cos no requests? [17:40:48] or am I having another blind spot moment... [17:42:05] awight, I think because the celery process crashed maybe [17:42:15] kk [17:42:20] you should be able to: sudo service ores-celery-worker restart [17:42:24] ty [17:42:37] I cannot [17:42:48] checking what puppet lets me sudo... [17:43:12] 'ALL=(root) NOPASSWD: /usr/sbin/service uwsgi-ores *', [17:43:12] 'ALL=(root) NOPASSWD: /usr/sbin/service celery-ores-worker *', [17:43:16] lol [17:43:38] restarted. [17:43:52] and our processes are there. [17:44:14] I’ll prepare a patch to sudo lsof [17:47:44] Make sure to run some requests because I think a few things might spin up under load. [17:47:56] It took 10 minutes of sustained requests before to cause a problem. [17:49:19] perfect, I can watch the leak in action, then [17:49:33] Do you have your stressful command-line sitting around? [17:53:09] Check out the paste I had in the stress test task [17:53:15] It includes the full command I run. [17:53:31] You'll need to clone ores master somewhere. [17:55:43] the deployment repo? I have prod + labs checked out [17:55:49] perfect ty [17:57:14] 10Scoring-platform-team-Backlog, 10ORES, 10Operations, 10Patch-For-Review, 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3563049 (10awight) #operations I would like lsof permissions on the ORES boxes, https://gerrit.wiki... [17:59:53] awight, sorry, no the ORES repo [18:00:02] gotcha [18:00:06] wiki-ai/ores :) [18:14:03] halfak: what is the format of ~/enwiki.200k.sample.json ? [18:14:49] I dunno. Why is it in your home directory? [18:14:51] :P [18:14:59] What file is it a part of? [18:15:02] hehe I found it in your homedir [18:15:05] Oh [18:15:10] it’s something you feed to the stress tester [18:15:13] Which home dir? [18:15:14] np I have it now [18:15:15] Oh! [18:15:34] It's a random sample query of enwiki revisions :D [18:15:46] {"rev_id": ..., ...} [18:16:16] {"rev_id": 690651773} [18:16:21] right [18:16:26] great [18:16:37] * awight levels wand at ores1001 [18:16:45] Fun story. I just found out that we can use our draft quality model to filter out 95% of new pages while still catching 90% of spam/attack/vandalism [18:16:47] \o/ [18:16:57] Reducing workload by 20x [18:17:21] filter “out”? I get catching spam, but what do you mean by filter? [18:17:39] same as catching vandalism [18:17:40] good precision? [18:17:48] We filter out the known good stuff. [18:17:54] k makes sense [18:17:57] Precision is around 50% which is honestly phenominal IMO [18:18:28] that’s qualitatively fantastic! :-) I don’t know enough to be pumped abt the number yet [18:19:19] * halfak loves the new threshold querying strategy. I'm here geeking out at a bunch of optimization thresholds. [18:26:53] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: [Discuss] draftquality on a sample, humongous everything, or something else? - https://phabricator.wikimedia.org/T168909#3563232 (10Halfak) So, the latest version of the draftquality model implements a balanced training set. See ht... [18:30:54] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: [Discuss] draftquality on a sample, humongous everything, or something else? - https://phabricator.wikimedia.org/T168909#3563265 (10Halfak) OK I'm trying again with ~100k observations. I just increased the "OK" sample to 75k and le... [18:37:33] I’m rsync’ing for now, but what’s your trick for doing git things on ores1001? [18:38:51] great, stressing nicely [18:39:19] Still waiting on that lsof patch, though. [18:39:55] I guess I must have used rsync/scp [19:54:37] halfak: I’m finding filehandle leaks. [19:54:52] Wanted to check with you, do you still think I should make the fixes in revscoring 1.x? [19:55:03] It’s pretty minor jiggling so not a big deal [19:55:29] What are the chances that 2.x will be deployment-ready any time soon? [19:57:10] awight, we're working on the social size of 2.x now. Tech is ready. [19:57:16] But that social work is going to take a month [19:57:23] I think 1.x makes the most sense. [19:57:40] We'll probably need to back-port a branch of wiki-ai/ores [19:57:58] Also maybe a back-port of wheels. :/ [19:58:06] OK cool. I think I’ll still make the changes on master and back-port to 1.x, unless that sound wrong to you [19:58:15] Cool [19:58:15] that way we have the latest tests, etc. [19:58:30] +1 [19:59:20] So… in the background, I’m dd’ing two SSD chips to recover a couple weeks of un-backed-up notes. [19:59:37] Hairy as a unshorn fairy. [19:59:54] I had them in RAID 0 (striping) and using FDE [20:00:19] If this works, OIT will be drinking beer on my tab for the week ;-) [20:00:31] or unsugared soft drinks. [20:00:33] <_< [20:00:34] >_> [20:02:23] godspeed! [20:02:31] I'm switching locations. back in ~40 minutes. [20:04:05] halfak: Do we ever run utilities from celery? If not, I won’t bother with leaks there. [20:47:09] o/ [22:00:22] 10Scoring-platform-team, 10draftquality-modeling, 10artificial-intelligence: [Discuss] draftquality on a sample, humongous everything, or something else? - https://phabricator.wikimedia.org/T168909#3564310 (10Halfak) ``` roc_auc (micro=0.979, macro=0.962): attack vandalism spa... [22:02:44] O/ [22:04:02] awight: i wasnt expecting you to agree with me, i tottally thought i was going to be wrong lol [22:04:16] (Talking about T164796) [22:04:17] T164796: Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?) - https://phabricator.wikimedia.org/T164796 [22:05:36] Zppix: win some, lose some… [22:06:25] Thank you for framing your guess as a guess, btw :D [22:06:25] awight: well im glad i may be right :) [22:06:30] Np [22:06:47] It’s perfectly productive to make random guesses, they often help the discussion IMO [22:07:20] It was atleast a reasonable guess based upon queries and how the dbs tend to be at wikimedia [22:07:44] Totally. Rare changes and older changes would indeed not be in the cache, but that particular filter only uses stuff already in the db, and has no capability to query ORES for addional stuff [22:08:50] Maybe we should ensure rare combos are always stored in cache? But how to determine what is considered a rare combo is beyond me [22:19:31] In this case, we’re good cos the problem was not related to ORES tables [22:19:54] \o/ [22:20:03] For other stuff though, the “rare” revisions are ones which haven’t been in recent changes… recently [22:20:04] SOMEONE ELSES PROBLEM PARTY! [22:20:07] Ilol [22:20:31] http://hitchhikers.wikia.com/wiki/Somebody_Else%27s_Problem_field [22:20:36] It’s weird that we have our own cache tables actually. We only do it to be able to join efficiently against the MediaWiki revision table [22:20:39] halfak: ill bring the beer [22:21:04] I keep my SEP in a room of requirement [22:21:13] Sep? [22:21:24] c.f. amazing link above [22:27:14] ah [22:34:08] I like my files handleless [22:34:18] xD [22:35:29] wiki-ai/revscoring#1199 (filehandles - 5c54b00 : Adam Roses Wight): The build passed. https://travis-ci.org/wiki-ai/revscoring/builds/269789948 [22:38:30] I’m watching my file handles [22:39:03] Lol [22:41:15] https://phabricator.wikimedia.org/phame/post/view/68/more_better_model_information_and_threshold_optimizations/ [22:41:29] 10Scoring-platform-team-Backlog, 10ORES, 10Documentation: Demonstrate collab threshold detection needs in new 'thresholds' system - https://phabricator.wikimedia.org/T173019#3516228 (10Halfak) https://phabricator.wikimedia.org/phame/post/view/68/more_better_model_information_and_threshold_optimizations/ [22:41:49] 10Scoring-platform-team-Backlog, 10ORES, 10Documentation: Demonstrate collab threshold detection needs in new 'thresholds' system - https://phabricator.wikimedia.org/T173019#3564401 (10Halfak) I promise we'll actually have some proper documentation at some point :D [22:45:21] halfak: fun! Is that posted, or do you want feedback? [22:56:02] halfak: want me to send an announcement out about that phame post to ai-l! [22:56:03] ? [23:12:25] awight, it's posted. Can be edited :) [23:12:34] Zppix, let's wait until tomorrow. [23:12:43] Alright [23:13:15] halfak: mostly thinking, can I make a few graphs to illustrate the discussion? [23:14:49] btw, the filehandle glitch may have been this simple: https://github.com/wiki-ai/revscoring/compare/filehandles?expand=1 [23:15:38] awight, I was hoping it would be that easy :) [23:15:55] +1 for graphs if you want to take a hack at it. Otherwise, I'll see what I can get together tomorrow AM [23:15:58] heading out now. [23:15:58] o/ [23:16:06] ok great [23:25:43] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-extensions-ORES, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017): Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?) - https://phabricator.wikimedia.org/T164796#3564532 (... [23:45:22] 10Scoring-platform-team, 10Edit-Review-Improvements-RC-Page, 10MediaWiki-extensions-ORES, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017): Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?) - https://phabricator.wikimedia.org/T164796#3564570 (...