[06:26:22] 10Analytics, 10Trash: ---------------- Discussed above -------------------- - https://phabricator.wikimedia.org/T169900#3419371 (10Poyekhali) 05Open>03stalled p:05Triage>03Lowest [08:58:32] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3419825 (10Tbayer) Just a quick note that I tried to reproduce this myself using the given steps, and haven't got... [08:59:44] 10Analytics, 10DBA: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3419829 (10elukey) [09:08:42] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3419868 (10phuedx) >>! In T170018#3419825, @Tbayer wrote: > But in any case, this is undeniably a breakthrough re... [09:09:17] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3419869 (10phuedx) > Bucket yourself: `mw.storage.session.set('mwuser-sessionId', 1059 ) ` (you can get a token u... [09:30:29] 10Analytics-Kanban, 10Patch-For-Review: Troubleshoot issues with sqoop of data not working for big tables - https://phabricator.wikimedia.org/T169782#3419914 (10JAllemandou) @Milimetric : This is awesome :) Something to keep in mind(and possibly an option to add to the job) : the "Number of mappers" you give t... [10:24:09] 10Analytics, 10DBA, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3420110 (10elukey) [11:42:15] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3420286 (10phuedx) >>! In T170018#3417458, @Krinkle wrote: >>>! In T170018#3417220, @pmiazga wrote: >> @Jdlrobson... [11:43:08] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3420291 (10phuedx) @Tbayer: AFAICT this bug shouldn't impact the ReadingDepth instrumentation as we add an unload... [11:46:32] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3420304 (10phuedx) > 8. As soon as url changes to new page press keyboard shortcut for back e.g. cmd + left When... [12:04:58] heloooo team [12:12:46] o/ [12:52:47] hey all [12:57:46] elukey: wanna deploy the sqoop [hopefully] fix? [12:58:23] milimetric: sure, do you want me to deploy or was it just a "shall we?" [13:00:01] ok, so you need to merge the puppet and I can merge the python because it's got a +2 [13:00:06] lemme do that [13:00:36] (03CR) 10Milimetric: [V: 032] Implement sqooping with mappers > 1 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363735 (https://phabricator.wikimedia.org/T169782) (owner: 10Milimetric) [13:00:47] oh wait... i guess for that I'd have to deploy refinery [13:00:54] but not refinery source, so it'll be fast, one sec [13:03:01] 10Analytics-Kanban, 10DBA, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3420477 (10elukey) [13:04:05] milimetric: ah is there a puppet change to do ? [13:04:17] * elukey checks gerrit [13:04:21] elukey: https://gerrit.wikimedia.org/r/#/c/363846/ [13:04:27] just to update the cron [13:04:42] but the cron only runs once a month, so it won't actually do anything [13:04:46] I wasn't aware of it sorry :) [13:04:49] checking now! [13:05:06] but I might wait for it to take effect, and then copy the command to make sure that part's good [13:06:11] the patch looks good, I can merge whenever you prefer [13:08:19] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3420500 (10elukey) >>! In T169781#3412167, @Marostegui wrote: > @elukey I assume you meant d... [13:10:27] ok elukey, just deploying refinery (kind of unnecessary since the python script doesn't run on hadoop but just being cautious) [13:13:05] ok elukey, deploy done, merge when you're ready [13:19:43] 10Analytics-Kanban, 10Documentation, 10Services (watching): Document revision-create event for EventStreams - https://phabricator.wikimedia.org/T169245#3420566 (10Ottomata) FYI, it is auto documented (sorta): https://stream.wikimedia.org/?doc [13:21:15] ottomata: hhhhhhhhhhhhhhhhhiiiiiiiiiiiiiiiiiii o/ [13:21:39] HIIiII! [13:21:44] :) [13:21:47] I am SO CLOSE to blasting through all these emails [13:21:58] pretty goood for 2 weeks out! [13:22:07] hellooo [13:22:12] oh yeah, I'm still working on emails from a couple years ago [13:22:13] although i do have like 20 gerrit/phab tabs open i need to read... [13:22:17] xD [13:22:24] i'm a 100% read kinda guy [13:22:33] but that means I mark A LOT as read [13:22:55] *ui [13:23:02] in gmail works great:) [13:23:04] it is wise and advisable to do so [13:23:23] i skim subjects, read ones I want, then mark the rest as read [13:24:26] milimetric: done! [13:24:42] thanks elukey [13:24:59] I'll run it once puppet updates the cron [13:25:32] mforns: wanna do dashiki this morning before standup? [13:25:38] uhhhh wikistats [13:25:43] milimetric, yes :] [13:26:04] I just finished a couple small things [13:26:24] milimetric: already ran puppet :) [13:26:56] elukey, I added the whitelist to your patch [13:28:03] mforns: super! I created the (hopefully) last task to the kanban, namely adding the config for a new user for eventlogging_cleaner [13:28:36] elukey, I see [13:28:38] awesome [13:29:00] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revison-create tables in storage. Agreggation? - https://phabricator.wikimedia.org/T169898#3420583 (10Ottomata) Hm, I had thought that the revision-create data would be about t... [13:29:52] mforns: ok, a couple minutes [13:30:05] milimetric, ok no rush, reviewing code [13:31:04] fdans: did you push everything? [13:31:48] milimetric: not yet, I started a bit late today [13:32:00] * elukey waves to Francisco [13:32:30] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3420594 (10Ottomata) [13:32:37] elukey: o/ [13:32:55] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3420607 (10Ottomata) Hi @nirmos, I created T170145 to talk about this more. [13:33:00] ok, mforns going to the cave (fdans I'm just gonna show him around the latest) [13:33:37] fdans: have we discussed about season 5 of Prison Break in Prague? I was so disappointed after watching it [13:33:48] I need to rant with somebody [13:33:51] :D [13:34:10] haha waaat I've never watched prison break [13:34:57] then it wasn't you [13:35:05] I can't express my disappointment [13:55:05] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3420713 (10Ottomata) Hey ya fine with me! The data is brand new, no one is looking for it e... [13:58:11] (03CR) 10Ottomata: [C: 031] Add scap3 config [analytics/statsv] - 10https://gerrit.wikimedia.org/r/363578 (https://phabricator.wikimedia.org/T129139) (owner: 10Filippo Giunchedi) [14:00:40] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3420724 (10Ottomata) [14:16:40] (03CR) 10Ottomata: [C: 031] "Only did a quick review, didn't check logic." (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/361459 (https://phabricator.wikimedia.org/T158972) (owner: 10Joal) [14:17:37] 10Analytics-Cluster, 10Analytics-Kanban: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3420768 (10Ottomata) [14:20:09] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3420786 (10Ottomata) > We don't emit revision-create events in EventStreams at all right now We do now! T167670... [14:40:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Agreggation? - https://phabricator.wikimedia.org/T169898#3420864 (10Aklapper) [14:52:46] 10Analytics-Kanban, 10Wikimedia-Stream: Disable RCStream - https://phabricator.wikimedia.org/T170157#3420970 (10Ottomata) [14:55:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Agreggation? - https://phabricator.wikimedia.org/T169898#3421009 (10Nuria) @kaldari: can you confirm that you only care about page-create data? [14:56:43] elukey or ottomata: I'm fighting with bash in the cave, can either of you help? [15:00:20] ping milimetric [15:00:41] sorry milimetric! [15:17:53] (03PS1) 10Milimetric: Fix sqoop python mistake [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364222 [15:22:28] (03CR) 10Mforns: [V: 032 C: 032] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364222 (owner: 10Milimetric) [15:27:03] 10Analytics: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421167 (10Nuria) [15:29:34] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Make non-nullable columns in EL database nullable - https://phabricator.wikimedia.org/T167162#3421191 (10Nuria) 05Open>03Resolved [15:29:46] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3421192 (10Nuria) 05Open>03Resolved [15:29:57] 10Analytics-Kanban, 10Patch-For-Review: Add a job that regularly deletes druid webrequest deep-stored data - https://phabricator.wikimedia.org/T168614#3421193 (10Nuria) 05Open>03Resolved [15:30:08] 10Analytics-Kanban: Rename unique_devices_project_wide to unique_devices_per_project_family - https://phabricator.wikimedia.org/T168402#3421195 (10Nuria) 05Open>03Resolved [15:30:27] 10Analytics-Kanban, 10Patch-For-Review: Load webrequest raw data into druid so ops can use it for troubleshooting - https://phabricator.wikimedia.org/T166967#3421196 (10Nuria) 05Open>03Resolved [15:31:57] 10Analytics-Tech-community-metrics: Merge detached Phab and MediaWiki accounts with the same username in DB - https://phabricator.wikimedia.org/T169754#3421210 (10Aklapper) [15:31:59] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Check detached accounts in DB with same username for "mediawiki" and "phab" sources but different uuid's (and merge if connected) - https://phabricator.wikimedia.org/T170091#3421212 (10Aklapper) [15:34:29] 10Analytics-EventLogging, 10Analytics-Kanban: whitelist multimedia and upload wizard tables - https://phabricator.wikimedia.org/T166821#3421225 (10mforns) Will move this task to done, because the editing of the white-list is finished and will be merged in a Gerrit patch belonging to another task: T156933. [15:34:34] 10Analytics-Kanban: Implement purging settings for Schema:ReadingDepth - https://phabricator.wikimedia.org/T167439#3421228 (10mforns) Will move this task to done, because the editing of the white-list is finished and will be merged in a Gerrit patch belonging to another task: T156933. [15:34:41] 10Analytics-Kanban, 10Page-Previews, 10Reading-Web-Backlog (Tracking): Update purging settings for Schema:Popups - https://phabricator.wikimedia.org/T167449#3421231 (10mforns) Will move this task to done, because the editing of the white-list is finished and will be merged in a Gerrit patch belonging to anot... [15:34:43] 10Analytics-Kanban: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3421233 (10mforns) Will move this task to done, because the editing of the white-list is finished and will be merged in a Gerrit patch belonging to another task: T156933. [15:42:23] 10Analytics: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421256 (10Nuria) [15:43:41] 10Analytics: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421167 (10Nuria) p:05Triage>03Normal [15:43:45] 10Analytics, 10Analytics-Cluster: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421259 (10Ottomata) p:05Normal>03Triage [15:45:09] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421267 (10Nuria) p:05Triage>03Low [15:55:59] 10Analytics, 10Analytics-Cluster: Productionize Tranquility (or shut it off) - https://phabricator.wikimedia.org/T168550#3421326 (10Nuria) p:05Triage>03Low [15:56:24] 10Analytics: Measure portal pageviews - https://phabricator.wikimedia.org/T162618#3421329 (10Nuria) p:05Normal>03Low [16:00:54] 10Analytics, 10Analytics-Cluster: Produce webrequests from varnishkafka to Kafka with Kafka message timestamp set to configurable content field - https://phabricator.wikimedia.org/T166833#3309125 (10Nuria) To set the kafka metadata timestamp to the data timestamp so you can use time based consumption, producer... [16:09:26] mforns: merged eventlogging change, about to deploy [16:09:37] nuria_, OK [16:09:49] will look at stats [16:14:23] !log deploying eventlogging 5e16da16e3f5ce287829390a76b9f5b0c7715ee5 [16:14:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:16:53] !Log restarting eventlogging [16:19:55] mforns: tailing logs [16:22:49] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3421450 (10Nuria) Deployed eventlogging with fix after adding unit tests plus testing in beta. [16:23:27] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3421455 (10Nuria) Deployed eventlogging with fix after adding unit tests plus testing in bet... [16:27:15] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421465 (10Nirmos) > I'm not familiar with parsedcomment `comment` is the pseudo-wikitext that comes from user input. `parsedcomment` is the HTML that this comment produces. For example, in htt... [16:33:40] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3416919 (10Nuria) I second @Krinkle 's concerns. Has anyone been able to repro this with a clean reinstal of FF... [16:37:26] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421497 (10Ottomata) @mobrovac, @Pchelolo, any thoughts on this? We don't included parsed wikitext anywhere else in events (AFAIK), so I'm not so sure we should include it here. [16:37:49] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421502 (10Ottomata) [16:38:18] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421504 (10pmiazga) @Nuria I'm able to reproduce this on **Firefox 54.0.1 (64-bit)** build for archlinux - 1.0 [16:43:28] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421537 (10phuedx) >>! In T170018#3421469, @Nuria wrote: > I second @Krinkle 's concerns. Has anyone been able t... [16:47:53] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421561 (10Nuria) @pmiazga @phuedx so, to be clear, you se a 2nd load event with no realoading of resources? as i... [16:48:21] 10Analytics, 10Wikimedia-Stream: Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421563 (10mobrovac) We don't include any HTML-formatted content in our events, so I'm with you: I don't think we should include `parsedcomment`. [16:49:03] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421566 (10mobrovac) [16:57:39] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3421581 (10Pchelolo) >>! In T170145#3421563, @mobrovac wrote: > We don't include any HTML-formatted content in our events, so I'm with you: I don't think... [17:03:16] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421601 (10Jdlrobson) [17:09:10] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421625 (10Jdlrobson) [17:09:53] elukey, Neil replied and said it's OK to remove _Edit_123234_old [17:10:06] mforns: niceee [17:10:45] will drop it tomorrow morning if nobody says anything more :) [17:10:52] going afk team! ttl! [17:10:53] elukey, whenever we are thinking of running the script, I think we could run it for like a 10 minute time range in like 2015, and see if it went all well there [17:11:13] bye elukey ! cya [17:11:28] ah ok, so you'd prefer to leave it there? Because there is few space left and I'd really need to purge some data :( [17:11:41] if you are ok I'd drop it [17:12:06] (will read later on) [17:12:10] elukey, sure! [17:12:23] I was talking of all tables, when first running the script in analytics-store [17:14:56] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3421664 (10MBinder_WMF) [17:15:56] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3416919 (10MBinder_WMF) @Jdlrobson Is this #unplanned-sprint-work ? I can't tell from the history, I'm just going... [17:47:47] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3421858 (10Milimetric) @Halfak is it a deal-breaker if we couldn't migrate the history of Quarry to Redash? I'm wondering if you care as much about the history as the features themsel... [17:54:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Agreggation? - https://phabricator.wikimedia.org/T169898#3421900 (10kaldari) Yep, only care about page-create data. I would be fine with not... [18:00:06] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Managing size of page-create and revision-create tables in storage. Agreggation? - https://phabricator.wikimedia.org/T169898#3421925 (10Ottomata) Ok, will disable this when I also truncate the table in a bit,... [18:05:17] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3421933 (10Ottomata) [18:07:20] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3421944 (10Ottomata) FYI, I just turned off RCStream (routes)! [18:08:55] 10Analytics-Kanban, 10Wikimedia-Stream, 10Patch-For-Review: Decommission RCStream - https://phabricator.wikimedia.org/T170157#3421961 (10Ottomata) [18:09:38] bye bye RCStream! :) [18:10:18] nuria_: your eventlogging change is deployed, right? i can truncate/drop tables and stop inseting revision-create events to mysql, ya? [18:10:34] ottomata: one sec in meeting [18:10:55] k [18:13:04] ottomata: yes, chnage is deployed [18:13:25] ottomata: we can stop eventlogging to truncate tables but i though you said that was not needed? [18:13:52] nuria_: i'm going to restart just the el mysql consumer anyway, to take in my puppet config change to stop writing revision-create events [18:14:00] ottomata: k [18:14:02] so that will pick up your change too (unless..you already did that?) [18:14:04] but anyway [18:14:10] ottomata: i did restart yes [18:14:11] yea, TRUCATE on pagecreate table should be fine [18:14:13] i'll drop the revision-create one [18:14:27] ok cool [18:15:17] k, will login to help test [18:15:40] ottomata: you will truncate on master and slaves, right? [18:15:52] sure [18:15:52] ya [18:17:38] ottomata: ok, i am going to log into slave and wait for things to happen [18:25:10] ok done nuria_, waiting for a batch of page creates to be inserted [18:25:22] ottomata: k [18:25:51] nuria_: while i'm here, shoudl I go ahead and drop that huge old _Edit table? [18:26:15] ottomata: elukey was doing that tomorrow so no, i do not think so [18:26:27] cc but let's ask mforns [18:26:37] ok [18:26:38] np [18:26:39] mforns: we just did the cleanup for ACTRIAL [18:26:41] i'll let elukey do it [18:26:50] mforns: should we go ahead and deploy that edit table? [18:26:55] ottomata: ok [18:28:00] ottomata: wait, did you deleted all records from table? [18:28:12] hey [18:28:56] nuria_, what do you mean with deploying the edit table? [18:29:03] nuria_: yup [18:29:06] TRUNCATE [18:29:29] mforns: sorry deleting taht edit table [18:29:31] *that [18:29:34] ottomata: see [18:29:49] ottomata: i see old records [18:29:53] https://www.irccloud.com/pastebin/O7LPc7jZ/ [18:30:17] nuria_, ottomata, elukey said he would do it, but I don't see any reason why it could not be done, it has been confirmed to be deletable [18:30:31] nuria_: try again, i saw that too, i think the replication script sneakily inserted some [18:30:38] i truncated a second time [18:33:25] ottomata: k, empty table now [18:39:07] ottomata: we might need to create some pages.. nothing is coming [18:39:43] nuria_: they are on master [18:39:58] select count(*) from mediawiki_page_create_1; [18:40:01] 979 [18:40:06] we are waiting for replication [18:40:10] ottomata: k, replication then [18:40:16] yssir [18:42:30] ottomata: thus far it looks good [18:44:26] gr8 [18:45:06] ottomata: yess [18:45:14] ottomata: what now? [18:45:22] hmm, nothin? [18:45:23] we good? [18:45:55] 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, 10DBA, and 5 others: Drop tables with bad data: mediawiki_page_create_1 mediawiki_revision_create_1 - https://phabricator.wikimedia.org/T169781#3422177 (10Ottomata) Done. [18:46:45] ottomata: thus far data looks good [18:46:49] ottomata: let me triple chcek [18:47:11] ottomata: and old revision table has not been created [18:47:25] ottomata: let me tail consumer logs [19:03:05] 10Analytics, 10Wikipedia-iOS-App-Backlog, 10Reading Epics (Analytics), 10Spike, 10iOS-app-v5.6.0-Goat-On-A-Train: Research and define initial technical requirements for app analytics - https://phabricator.wikimedia.org/T164801#3245811 (10JMinor) 05Open>03Resolved The work for this ticket was mostly d... [19:17:16] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3422329 (10phuedx) OK. I couldn't create a screencast in any format that was small enough for Phab to accept. [[... [19:51:43] halfak: yt? [19:52:01] yeah. What's up? [19:52:46] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3422516 (10Nuria) mmm just to understand this better: is your instrumenting code (the one from which sendbeacon r... [19:53:24] so apparently stat1005 (new box with GPU) needs to be installed as debian stretch [19:53:27] in order to use GPU [19:53:43] it was the one that was going to replace stat1002, and as such it was going to have access to Hadoop [19:53:48] stat1006, was going to be the stat1003 replacement [19:53:52] wiith no access to hadoop [19:53:52] but. [19:53:59] we don't have Debian Stretch hadoop packages [19:54:13] would it be ok with you if we swapped those? [19:54:22] stat1005 would replace stat1003, have the GPU, but NO access to hadoop? [19:54:32] do you need the box with the GPU to have access to Hadoop? [19:56:05] halfak: ^ [19:56:08] Hmm... I guess we'd like that. What would it take to upgrade to stretch at some point when hadoop packages are available? [19:56:38] apparently the GPU is not usuable unless it is stretch, moritzm might be able to say more about that [19:56:40] From my point of view, hadoop access isn't that key right now, but I expect that will change. [19:56:46] ok [19:56:51] once cloudera builds stretch pcakges [19:56:59] adding access is easy [19:56:59] but [19:57:14] we have generally restricted access to hadoop by restricting access to stat1002/stat1003 [19:57:15] so [19:57:21] if we make stat1005 (GPU box), the NON hadoop box [19:57:28] i don't think we'll want to change that later [19:57:47] Is there anyone with access to stat1003 but not 1002? [19:57:51] yes [20:04:07] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422551 (10mobrovac) I don't have a firm stand on this, but it seems like verbose information that is already present in the `comment` field. Should we th... [20:04:52] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422552 (10Ottomata) > Should we then get rid of comment? Don't think we shoudl get rid of comment. Is there an easy way for someone to parse the comme... [20:05:00] OK so thinking about it. [20:05:06] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422553 (10Ottomata) Hm, I guess they could just get it from RecentChanges API, but mehhhhhh :/ [20:05:26] The only think I'm worried about is having the GPU in the wrong box once hadoop has stretch support. [20:05:29] *thing [20:05:34] ottomata, ^ [20:05:47] yeahhh [20:05:52] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422554 (10Pchelolo) >>! In T170145#3422551, @mobrovac wrote: > I don't have a firm stand on this, but it seems like verbose information that is already p... [20:06:04] So once we *can* provide access to hadoop from a GPU box, then this might become a problem. [20:06:30] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422555 (10Ottomata) I'm not opposed. Feels a little redundant to me, but I also don't have a firm stand. [20:07:30] ottomata, I think that, short term, what you want to do sounds reasonable. [20:07:49] If you are down for some struggling or followup work in the medium/long term, I think this is a good solution. [20:08:38] yeah, hm, ok. i will talk with luca and see if we can merge some groups, maybe both boxes can have hadoop access... [20:09:28] :) Cool. Let me know if I can help. [20:09:58] Also before I head out, I want to say "Wooo" about getting the revision-create schema in EventStreams. That reduced a lot of complexity for ORES [20:10:11] So now we can work with ChangeProp events and public EventStream events in the same way :D [20:10:24] :) [20:10:25] great! [20:10:28] i've wanted that for a while too [20:10:38] would love to get the revision score stuff in too [20:10:45] we need to decide if that stream has all events [20:10:48] i thought it was going to. [20:10:53] including unscored ones [20:12:25] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422560 (10Pchelolo) So, in my understanding, @Ottomata is -0.5, @mobrovac is also -0.5, I'm +0.5 and I guess @Nirmos who requested this is +1.0. Overall... [20:12:37] Hmm... the data-designer in me thinks: probably not. [20:13:06] But then again, I was also generally against including all of the revision data in the score event. [20:13:17] So I'm not well aligned with the general trajectory of decisions we are making. [20:13:28] Maybe it would be more consistent if revision-create was replicated. [20:13:54] halfak: why do you think it's bad to make score event just a superset of revision-create event? [20:14:40] so, i had thought we would expose a stream was revision creates, and where possible, scored [20:14:42] scores [20:14:53] so that if a tool designer built something using revision-create, and now they also want to include the scores, the shouldn't worry that some properties/events from the rev-create stream would disappear [20:15:01] otherwise, say someone wants to build a nice revision review tool [20:15:14] they want to review revisions that aren't scored, but for ones that are, they want to use that information in the tool [20:15:21] haha, exactly :) [20:15:34] if they are two distinct streams, then they have to build logic to join the streams [20:15:40] +1 ottomata, it would be pretty hard to merge them client-side [20:15:52] yeah, cause you don't know if a rev is going to be scored or not [20:15:58] how long do you wait? [20:16:18] ottomata, I'd never join the streams. I'd be querying the API. [20:16:25] Merging streams sounds terrible [20:16:31] querying the API when? [20:16:36] When a score is high [20:16:39] or low [20:16:40] or whatever [20:16:42] hm? [20:16:43] no i mena [20:16:44] mean [20:16:47] if you consume revision-create [20:16:49] you get a new rev event [20:16:55] then, you want a score [20:16:55] Well, I'm talking about revision-scored [20:16:57] oh [20:17:10] Why would I consume all of the revisions if I am targeting a subset? [20:17:14] you aren't [20:17:18] you are building a review tool for all revisions [20:17:24] Similarly, I'd just hit the API to get the score if I was consuming revision-create. [20:17:27] but, you'd like to augment the tool with scores [20:17:30] but when? [20:17:42] you get the revision-create almost immediately [20:17:45] For each event I guess. Or at least each relevant one. [20:17:49] Right [20:17:54] ORES handles that no problem :) [20:17:55] when will the ORES API have teh score? [20:17:58] deduping and all that [20:18:28] i'd have to wait some amount of time after i receive each revision-create event, and then query the ORES API to get a score, if there is one [20:18:33] but how do I know I've waited long enough? [20:18:39] ottomata, you wouldn't have to wait [20:18:42] just query [20:18:43] no? [20:18:44] on demand [20:18:59] doesn't scoring take a little whiel? [20:19:05] Sure does [20:19:07] will the request just block? [20:19:10] Yup [20:19:24] Same as for revision-scored [20:19:25] does it end even if ORES doesn't score? [20:19:32] "doesn't score"??? [20:19:40] only some revs are scored, right? [20:19:47] Oh. All can be scored. [20:19:53] Some are not scored by ChangeProp [20:19:59] e.g. bot edits in Wikidata [20:20:07] But you can always ask for them to be scored. [20:20:16] halfak: but wouldn't that be an issue if 10 requests come for non-precached score right after the revision was created? [20:20:28] Pchelolo, we dedupe [20:20:40] They'll all block on the single scoring job for that revision [20:20:48] * halfak flexes [20:20:55] * halfak tells ORES to flex too [20:20:58] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3422567 (10Jdlrobson) This appears to be an upstream issue and I'm filing an upstream bug. MTC: {F8695090} [20:21:04] hm. ok, so is a revision-score stream useful at all then? you say that if someone wants scores, they should consume rev-create, and then for each rev, query ORES [20:21:04] ? [20:21:27] ottomata, if I only want to respond to a subset of score ranges, yeah, I'd use that stream. [20:21:53] subset being whatever WMF wants scored? [20:22:06] e.g. we don't care about wikidata bot edits [20:22:13] so we don't ask to score them ourselves? [20:22:27] No more like "edits that are likely to be vandalism" or "edits that are damaging, but still saved in goodfaith" [20:23:14] how do we choose which events go in revision-score? [20:23:21] halfak: so you propose several streams with some logic that would decide where each event should go? [20:23:22] aka which events changeprop asks ORES to score? [20:24:06] FWIW, I think I've already lost the battle for stream normalization. Also, I'm not sure that I'm right about the Best(TM) way. Still interested in discussing though :) [20:24:16] :) [20:24:17] ottomata, I would suggest just letting ORES decide. [20:24:33] hm, you mean, emit the stream from ORES? [20:24:41] So the way it works is that ChangeProp sends and event or ORES. Then ORES either response 204 or 200 [20:24:45] i think i am lacking some knowledge here [20:24:46] :) [20:24:50] If 200, there's some stuff that got scored in response to that event. [20:25:06] If 204, ORES didn't do anything (for whatever reason). [20:25:17] 204 means that it chose not to score? [20:25:24] 204 usually means "we don't support that wiki", or "we support that wiki, but we're not scoring this edit" [20:25:27] ok [20:25:52] buy you said ORES will score anything if it is asked? is that a different method than what changeprop is using? [20:25:58] but* [20:26:06] ottomata, yeah. Different method. [20:26:08] ko [20:26:09] ok [20:26:28] so we'd want to emit a revision-score event from change prop if ORES responds 200 [20:26:29] User interface is /enwiki/?revids=56789&models=damaging|goodfaith [20:26:50] ChangeProp interface is /precache/ [20:27:20] ottomata, yeah. Assuming that we'll define the event as "a score with revision stuff attached" [20:27:20] and that will be all revisions with scores that ORES can generate useful scores for [20:27:33] as opposed to "revision stuff with maybe a score attached" [20:27:54] Useful is complex to think about. [20:28:07] ya, halfak i argued with marko about this when we were doing the schema for a while. i like the idea of revision with score attached, because from my POV the score belongs to a revision. [20:28:21] ORES scores are useful for a lot of edits we don't score, but I guess less directly useful (or simply too high in capacity) [20:28:28] the only reason it doesn't have a score in revision-create, is because it takes to long to score, [20:28:36] so for technical reasons we don't include it [20:28:41] ottomata, that's not too crazy in my view. [20:28:50] semantically it'd be nice if a revision just had scores [20:28:57] So -- essentially, don't add the "score" field to revision create if you get a 204. [20:29:03] Otherwise, just dump the "score" field in there [20:29:06] aye [20:29:18] i mean, the alternative is to make a totally different schema focused on the scores [20:29:21] or rewrite it to ottomata-weight-schema-0.0.1 :P [20:29:22] (which sounds like what you wanted in the first place?) [20:29:32] *weird [20:29:39] heheh [20:30:06] i guess having the schema as a revisoin schema with scores is useful, then someone consuming the stream gets lots of nice metadata. [20:30:09] Just so long as re rename the event to revision-with-scores-maybe [20:30:46] well, we can't replace revision-create with the union of revision-create+scores, because uhhhh you need it for ORES? [20:30:47] :) [20:30:48] right? [20:31:04] lol yup. [20:31:06] soyeah, if we did union, we'd have to spend 3 weeks bikeshedding a name [20:31:10] Don't take my revision-create away! [20:31:13] revision-create-with-scores [20:31:15] or [20:31:27] revision-score (which looks like revision-create, but only contains events that got scored) [20:31:36] revision-create-with-scores-or-maybe-an-error-while-generating-the-score [20:31:39] hah [20:31:40] yeah [20:31:45] Pchelolo: what do you think? [20:31:54] shoudl we just make revision-score stream, that only has scored events? [20:32:04] Because, BTW, you could score a single revision-create and get two good scores and one that errors. [20:32:07] ottomata: oh... Why did I even open the tab with this channel... [20:32:10] hahaha [20:32:14] lol [20:32:43] feel free to delay and respond on https://phabricator.wikimedia.org/T167180 [20:32:44] :) [20:33:34] thanks halfak you've at least convinced me that it might not be necessary to have a revision-create-with-scores, because someone can just query ORES in response to revision-create [20:33:35] :/ [20:33:38] It's all about the use-cases. I'm not sure whether the primary use-case for that would be 'tool previously using revision-creates wants to include scores' or 'some new tool that's all about the scores and it doesn't want unscored revision-create [20:34:01] hmmm, maybe we should ask those ERI folks again...or just re-read some of those phab tickets [20:34:37] the first use-case is easily solvable by an API call, while the second use-case will just use API call and filtering [20:34:48] https://phabricator.wikimedia.org/T145164#2698884 [20:34:51] FWIW, I think both potential approaches aren't crazy [20:35:05] So I won't be trolling y'all whichever way you go [20:35:27] https://phabricator.wikimedia.org/T143743 [20:35:32] ok thanks halfak :) [20:37:41] ottomata: that comment doen't really answer any of our questions.. [20:38:54] nope! [20:39:01] just providing the references i'm reading [20:40:44] Pchelolo: this one is more relevant: https://phabricator.wikimedia.org/T143743#2966929 "Proposal (as Otto understands it): [20:40:44] " [20:41:01] so, we could just make a paired down 'revision-score' stream, that has much less info than revision-create [20:41:03] and expose that [20:41:05] on its on [20:41:06] own [20:41:06] AND [20:41:16] create a stream endpoint that unions the revision-create and revision-score topics [20:41:25] that's kinda what we decided back at mw dev summit [20:42:41] time erases all memory, are we having the same discussion again?! [20:42:49] maybe we should just make revison-score all about scores, [20:42:59] and then make eventstreams expose a union stream of multiple topics [20:44:36] not sure about the union thing, but it seems having a stream all about scores is more useful then just extend the rev-create one. [20:44:44] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422716 (10Ottomata) Hahahaha, yeah I think so. Can easily get this from MW when the event is emitted? [20:45:30] 'more'? maybe. in that it will be cleaner and less redundant, but mabye less useful on its own, because it has less info [20:45:38] oooook, i'm going to summarize this in that ticket, see what marko thinks [20:45:45] maybe we'll just make revision-score all about scores :) [20:46:29] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422718 (10Pchelolo) It seems so from [[ https://github.com/wikimedia/mediawiki/blob/b95f7a6b07a3e3c0102693659c0d3cdb400e8e87/includes/api/ApiQueryProtect... [20:49:49] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3422727 (10Ottomata) Just had a little chat with @halfak and @Pchelolo in IRC. We've actually discussed this be... [20:51:30] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3416919 (10Jdlrobson) The upstream bug is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1379762 and has been... [21:03:44] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3422762 (10mobrovac) Yes, that's what I had in mind should happen here. At some point in the discussion I think... [21:07:29] bye team! [21:10:59] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422800 (10Pchelolo) Hm, not as trivial as I thought. In #eventbus we rely on the core `RecentChange` behavior, using the default formatter and sending it... [21:16:54] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Add parsedcomment to recentchange stream - https://phabricator.wikimedia.org/T170145#3422811 (10Ottomata) Especially now that revision-create is available in EventStreams (we should do an announcement about this, maybe after we settle the... [21:20:38] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3422819 (10Ottomata) Yeah, time erases my memory. Good thing we have phabricator! Ok, I'll take a pass at the... [21:23:56] Hello! I have a question about XML revision parsing code; I'd like to turn the revisions XML dump into JSON that I can load into bigquery... each file is about 11GB of XML, which is a bit much for standard XML parsers... does anyone know of existing tools that might help me (beyond core SAX parsing frameworks) [23:16:11] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3423261 (10Krinkle) >>! In T170018#3417479, @Jdlrobson wrote: > @krinkle this is happening in latest Firefox. It'... [23:48:12] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 2 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3423364 (10Jdlrobson) [23:54:57] nuria_: are you around? have some questions about tagging functionality