[00:30:06] 10Scoring-platform-team (Current), 10Wikilabels: labels.wmflabs.org still shows srwiki in available wikis - https://phabricator.wikimedia.org/T232958 (10Zoranzoki21) 05Open→03Resolved All is ok on https://labels.wmflabs.org/ui/ now. Closing task as resolved! [14:24:38] o/ kevinbazira [14:25:10] How's hacking? Was my email yesterday helpful? [14:40:55] o/ halfak [14:42:16] halfak: we're running a few minutes behind so 850 is probably a better time to join [14:42:49] Hey isaacj! I'm out at a shop getting my car worked on so I'm a bit out of sorts too. Will definitely join in but might be a bit intermittent. [14:44:12] halfak hacking is going well. Yes, your email was helpful to an extent. Will it be possible to discuss this in our rescheduled meeting? [14:44:21] Totally. [14:44:42] Great. [14:44:43] halfak: ahhh sorry to hear. no worries then -- join as much as makes sense [14:47:46] 10Scoring-platform-team: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10marcella) Thanks so much @Halfak! @EBernhardson and @Ottomata is there any other support or information you need from the Growth team or Scoring to move forward on this? [14:59:34] 10Scoring-platform-team, 10Discovery-Search: Consume ORES drafttopic data from Kafka and store it in HDFS - https://phabricator.wikimedia.org/T240553 (10marcella) > Do we need a mechanism to get rid of data for deleted pages? @Tgr and @Halfak , did this question get resolved? If not, I can open a separate ti... [15:07:08] 10Scoring-platform-team: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Ottomata) Just checked, and drafttopic scores are now in the event.mediawiki_revision_score table: `lang=sql select scores["drafttopic"].prediction, count(*) as cnt from event.mediawi... [15:07:34] 10Scoring-platform-team, 10Discovery-Search: Consume ORES drafttopic data from Kafka and store it in HDFS - https://phabricator.wikimedia.org/T240553 (10Ottomata) https://phabricator.wikimedia.org/T240549#5754339 Looks good! [15:08:03] 10Scoring-platform-team: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Ottomata) 05Open→03Resolved a:03Ottomata [15:08:07] 10ORES, 10Scoring-platform-team (Current): ORES deployment mid-Dec. 2019 - https://phabricator.wikimedia.org/T240725 (10Ottomata) [15:08:09] 10Scoring-platform-team, 10Discovery-Search: Consume ORES drafttopic data from Kafka and store it in HDFS - https://phabricator.wikimedia.org/T240553 (10Ottomata) [15:28:23] Changing locations. Back in ~30. [16:19:51] * halfak waits on model training jobs. [16:20:00] I might actually be able to experiment with vectors today. [16:20:13] kevinbazira, were you able to generate the 200 cell vectors with fasttext? [16:54:40] hey halfak & kevinbazira o/ [16:54:49] should we async today due to the monthly staff meeting? [16:55:00] Oh yes. Good call. [17:00:04] Y: ORES deploy is out! Finished the drafttopic model PR. Updated the Jade API i18n. I dug into expanding drafttopic into "articletopic" (training and testing against the full/recent version of pages). Answered some questions about jawiki audit needs. I also met with Research and Growth to talk topic modeling. I rebuilt the euwiki articlequality model without category counts (per request). I also fixed some minor bugs in revscoring. [17:00:04] T: Continue with "articletopic" explorations. If we can get it merged, I'll start working on a new deployment that includes the new topic model so that maybe we could deploy that soon. I'd like to start experimenting with the different length vectors soon. [17:00:04] Soft blocked on being able to use the fasttext vectors. Plenty I can do in the meantime. [17:06:45] Y: Completed generating skipgram 100 cell vector model. Working on implementing a strategy for loading them in revscoring and generating features [blocked: still wrapping my head around the key requirements in regards to expected input and output] [17:07:11] T: Started process to generate skipgram 200 cell vector model. Reviewed and merged Aaron's PR on formatting for rates model_info in revscoring: https://github.com/wikimedia/revscoring/pull/463 [17:08:59] Y: Worked on the various forms & menus for the Jade UI. Knocked out ~9.5/10 of them, they don't do anything yet, but the all components render and forms are accessible via the menus. [17:09:16] T: Focusing on interactivity today. Wiring up all the buttons/forms/inputs/etc and handling all api responses. Hopefully will also get to the post-api-call page reload stuff too, but that might be a stretch today. [19:06:23] accraze, do I smell a demo for tomorrow? [19:06:50] * halfak questions his choice to use the word "smell" [19:14:00] halfak: yeah i should have *something* to demo tomorrow for sync [19:22:27] :D looking forward to it. I'll be able to demo the new topic model too. [19:31:54] Oooh. It looks like fasttext can handle line breaks. [19:32:03] I think I might start experimenting with breaking text into sentences. [19:33:03] It seems like that would be very beneficial. As it stands right now, we train on a pure stream of words. So one article bleeds right into the other. [19:33:18] If we break on articles, that's a good start. If we break on sentences, that's additionally helpful. [19:34:09] Yup. No line breaks in the text file we train on. [20:54:22] 10Scoring-platform-team (Current), 10NewcomerTasks 1.1, 10drafttopic-modeling: Re-train English Wikipedia topic model using new WikiProject Taxonomy - https://phabricator.wikimedia.org/T240286 (10Halfak) Using our old vectors, it looks like we're getting decent fitness. I've trained models on article text (... [20:55:34] 10Scoring-platform-team (Current), 10drafttopic-modeling: Missing words should not have zero vectors - https://phabricator.wikimedia.org/T241175 (10Halfak) [21:16:38] accraze, when you want a break, I could run a fun test if you merge https://github.com/wikimedia/revscoring/pull/464 [21:17:37] Essentially, we do topic modeling by averaging word vectors. I found that we emit a zero vector ([0,0,0,0...0]) when we find a word that doesn't show up in our vectors. This is common for technical jargon. [21:17:45] And those zeros mess up our vectors. [21:17:55] So I made us stop doing that. [21:18:17] Our averages won't be pulled toward zero when there is jargon with this. [21:26:54] Hmm. Maybe I could test without interrupting you. Let me try that :) [22:31:08] 10Scoring-platform-team: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Tgr) ` hive (default)> select day, hour, count(*) from event.mediawiki_revision_score where scores["drafttopic"] IS NOT NULL and year=2019 and month=12 and ( day=18 or day=19) group by... [23:01:43] 10Scoring-platform-team: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Halfak) Woops! You're right. I just manually sent an event and it looks like it isn't getting picked up. ` $ python Python 3.5.1+ (default, Mar 30 2016, 22:46:26) [GCC 5.3.1 2016... [23:22:27] (03PS1) 10Halfak: Replace 'content_edit' event with 'main_edit' [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/559624 (https://phabricator.wikimedia.org/T240549) [23:23:18] 10Scoring-platform-team, 10Patch-For-Review: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Halfak) Got it! I had a type in the config for the event. I named the event "main_edit" not "content_edit" and mixed that up in the configuration. [23:24:54] 10Scoring-platform-team, 10Patch-For-Review: Configure ORES to publish new drafttopic scores to Kafka - https://phabricator.wikimedia.org/T240549 (10Halfak) It looks like it is too late to deploy this before the holiday. Bummer. From our discussions with @MMiller_WMF, it looks like we'll be deploying an upd... [23:27:22] OK I'm done for the day. See y'all tomorrow :) [23:27:22] o/ [23:29:37] wikimedia/ores#1388 (build_event_set - b05ab16 : halfak): The build failed. https://travis-ci.org/wikimedia/ores/builds/627479262