[03:29:57] (03CR) 10Nuria: "Thanks for fixing this" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/362614 (owner: 10Joal) [06:59:31] (03PS2) 10Amire80: [WIP] Interlanguage links SQL [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/362877 (https://phabricator.wikimedia.org/T158835) [08:42:50] joal: goooood morning [08:42:59] ready to merge your puppet changes? [08:50:35] Hi elukey :) [08:50:37] REady ! [08:50:43] oops, no actually :) [08:50:50] we need to deploy cluster first [08:51:10] I'm finishing some admin stuff (beginning of month ...) then I'll deploy [08:52:28] sure sure no rush, we can do it later :) [10:39:57] 10Analytics: wikipedia.org doesn't work, www.wikipedia.org does - https://phabricator.wikimedia.org/T169513#3400291 (10Milimetric) [10:48:50] 10Analytics: wikipedia.org doesn't work, www.wikipedia.org does - https://phabricator.wikimedia.org/T169513#3400291 (10elukey) Do you have a timeframe? Moreover, is the issue still occurring? :) [10:55:59] * elukey lunch! [11:53:02] 10Analytics: Add page title/path to Daily Pageviews data on Druid/Pivot - https://phabricator.wikimedia.org/T169524#3400618 (10Gilles) [12:25:53] 10Analytics, 10Analytics-Cluster, 10Operations, 10Patch-For-Review: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3400700 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['stat1005.... [12:50:00] hellooooo [12:55:17] 10Analytics, 10Analytics-Cluster, 10Operations, 10Patch-For-Review: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3400764 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['stat1005.eqiad.wmnet'] ``` Of which those **FAILED**: ```... [13:11:19] elukey@stat1005:~$ cat /etc/debian_version [13:11:20] 9.0 [13:11:22] \o/ [13:11:28] first stretch host in analytics :) [13:48:15] mforns: on db1047 we are 142/216 done [13:48:19] (alter tables) [13:48:29] elukey, cool [13:48:31] so I guess that in a couple of days we should be ok [13:48:37] awesome [13:48:38] leaving the other big tables aside [13:48:42] aha [13:56:14] * elukey brb (ice cream :) [14:46:05] 10Analytics: wikipedia.org doesn't work, www.wikipedia.org does - https://phabricator.wikimedia.org/T169513#3400291 (10Nuria) This seems to be working fine so we can probably close this. [14:46:12] 10Analytics: wikipedia.org doesn't work, www.wikipedia.org does - https://phabricator.wikimedia.org/T169513#3401071 (10Nuria) 05Open>03Resolved [14:53:34] wow alter tables on dbstore1002 are way faster [14:53:39] poor db1047 [15:09:57] 10Analytics: Add page title/path to Daily Pageviews data on Druid/Pivot - https://phabricator.wikimedia.org/T169524#3400618 (10JAllemandou) The use case is indeed interesting, but the page_title dimension (or path) in pageview is too heavy of a dimension to fit in Druid with the cluster we have. To be precise, t... [15:43:06] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3401289 (10mobrovac) Note that `revision-create` and `page-create` use the same schema, so all of the eve... [16:04:56] 10Analytics: Add page title/path to Daily Pageviews data on Druid/Pivot - https://phabricator.wikimedia.org/T169524#3400618 (10Nuria) >I believe we currently only have those top lists per wiki. And even then, the bot traffic heavily distorts the real data Declining, as this is something relatively easy to do in... [16:05:03] 10Analytics: Add page title/path to Daily Pageviews data on Druid/Pivot - https://phabricator.wikimedia.org/T169524#3401405 (10Nuria) 05Open>03declined [16:06:05] 10Analytics, 10Analytics-Dashiki: Site for Wikimedia Analytics lacks clear license - https://phabricator.wikimedia.org/T169270#3393022 (10Nuria) Do you mean on the source? or pages themselves? Would it be sufficient to add a license in the footer? [16:06:19] 10Analytics, 10Easy: Site for Wikimedia Analytics lacks clear license - https://phabricator.wikimedia.org/T169270#3401410 (10Nuria) [16:06:52] 10Analytics-Kanban, 10Documentation, 10Services (watching): Document revision-create event for EventStreams - https://phabricator.wikimedia.org/T169245#3401414 (10Nuria) a:03Ottomata [16:11:29] 10Analytics-Kanban: Final Vetting of Family Wide unique devices data - https://phabricator.wikimedia.org/T169550#3401448 (10Nuria) [16:14:26] 10Analytics-Kanban, 10Patch-For-Review: Modify EL purging script to not use limit/offset - https://phabricator.wikimedia.org/T168071#3401472 (10Nuria) 05Open>03Resolved [16:14:41] 10Analytics-Kanban: Modify EventLogging so that all table fields are nullable - https://phabricator.wikimedia.org/T167161#3401473 (10Nuria) 05Open>03Resolved [16:14:52] 10Analytics-Kanban: Extraneous whitelist items for WikimediaBlogVisit schema - https://phabricator.wikimedia.org/T168475#3401476 (10Nuria) 05Open>03Resolved [16:15:26] mforns: let me know your thoughts on the tagging [16:15:43] nuria_, sure, will review the changes [16:17:07] elukey: Got an answer on sqoop failure [16:18:15] elukey: 2 wiki got their revisions import failed because of netweork errors, and sqoop-rerun fails because of a classical no-rewrite hadoop thing [16:18:34] elukey: There also is me messign my deployment on 2 other tables, but that's something else [16:20:54] elukey: I will launch manual import for those tables [16:22:21] joal: ok! [16:24:58] elukey: Have we experienced network issues yesterday? [16:25:11] not that I know of [16:40:29] !log Manually launch sqoop imports for enwiki revision, and wikidatawiki revision and logging tables, snapshot=2017-06 [16:40:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:43:25] elukey: I'm planning the deploy for tomorrow morning if ok for you [16:43:52] elukey: I'll have a patch for sqoop [16:44:04] going for diner, will be back after a-team [16:45:18] 10Analytics-Dashiki, 10Analytics-Kanban, 10Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3401672 (10Nuria) [16:46:54] 10Analytics-Dashiki, 10Analytics-Kanban, 10Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3061266 (10Nuria) I leave this up to @Milimetric but see my comments above. Eventlogging is not well suited (at all) to capture error logging or free text. I understand... [16:47:36] joal: sure, let's also change the yarn queue tomorrow? [16:51:51] 10Analytics: Provide top domain and data to truly test superset - https://phabricator.wikimedia.org/T166689#3304286 (10Nuria) See : https://phabricator.wikimedia.org/T166414 maybe we can use navigationTiming data also to test UI? [16:52:30] 10Analytics: Pull data for edit reconstruction from labs and push it back after reconstruction - https://phabricator.wikimedia.org/T152788#3401707 (10Nuria) I think this task can be closed now right @JAllemandou ? [16:54:19] 10Analytics, 10Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. {pika} - https://phabricator.wikimedia.org/T89397#3401715 (10Nuria) Thi sis likely to change with RFC about redirects currently in place. Moving to q2 [16:55:12] 10Analytics: Making tests environment for pageview API deployments - https://phabricator.wikimedia.org/T131773#3401724 (10Nuria) 05Open>03Resolved [16:57:25] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-General, 10Technical-Debt: JsonData and EventLogging have multiple classes with the same name - https://phabricator.wikimedia.org/T159079#3401737 (10Nuria) Is this still an issue with recent work done in JsonExtension? [17:00:06] 10Analytics: Ceate tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3401740 (10Nuria) [17:00:50] 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#3401753 (10Nuria) [17:03:32] nuria_, tagging changes look very good to me [17:03:41] if you are OK I will merge [17:03:51] 10Analytics: Add page title/path to Daily Pageviews data on Druid/Pivot - https://phabricator.wikimedia.org/T169524#3401761 (10Gilles) Probably, I haven't used it yet [17:13:35] all right going offline people! [17:13:39] ttl! [17:13:41] * elukey afk! [17:43:06] mforns_brb: sounds good, no other todo's you can think of? [17:47:49] 10Analytics: Pull data for edit reconstruction from labs and push it back after reconstruction - https://phabricator.wikimedia.org/T152788#3401907 (10JAllemandou) @Nuria : The pull rom labs is done, the push is not. So if we want to close, let's scope-down this one and create av new one about pushing to labs. [18:15:37] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3402098 (10kaldari) @mobravac: There are 2 different definitions of "event" here. An "on-wiki event" and... [18:15:45] @nuria_ : You around? [18:15:52] yessir [18:20:38] nuria_, no I think all is good in the code, after that we'll need to add some docs to wikitech regarding tagging, but apart from that, no [18:20:42] so, will merge [18:20:48] (03CR) 10Mforns: [C: 032] UDF to tag requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353287 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [18:20:53] mforns: k [18:21:22] (03CR) 10Mforns: [V: 032 C: 032] UDF to tag requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353287 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [18:24:33] nuria_: I had a double thought on the banner-spark-streaming job [18:25:12] nuria_: Given we have not merged the code, I wonder about making a script [18:42:55] ok, in addition to my mess with sqoop, something is wrong with the DBs. Need to talk to DBAs [18:43:06] gone for tonight a-team [18:43:41] see you tomorrow joal! [18:47:25] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3402155 (10Nuria) Notes for @ottomata and @Nuria . * Since we are consuming directly from the kafka topi... [18:51:04] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3402173 (10Nuria) >kafkacat -b kafka1012.eqiad.wmnet:9092 -t eqiad.mediawiki.page-create Lists all cor... [18:55:51] 10Analytics: Pull data for edit reconstruction from labs - https://phabricator.wikimedia.org/T152788#3402181 (10Nuria) 05Open>03Resolved [18:56:29] 10Analytics: Pull mediawiki edit data into labs. Public edit data lake - https://phabricator.wikimedia.org/T169572#3402184 (10Nuria) [19:02:48] mforns: did you deploy eventlogging last with andrew? [19:38:40] nuria_, no, andrew deployed it [19:38:43] I was not there [19:38:47] why? [19:39:03] sorry, missed the ping [19:39:24] mforns: cause there is a bug with our current code and for a moment i was wondering if we were running latest but i think we are [19:39:50] nuria_, what's the problem? [19:39:57] is there a task? [19:40:45] mforns: you can see couple last comments here: https://phabricator.wikimedia.org/T150369 [19:41:12] mforns: but basically eventbus events are inserted (or should be inserted) according to "topic" not schema [19:41:20] mforns: but that is not happening [19:41:31] mforns: reproable in beta too so we shoudl be able to fix it [19:41:54] nuria_, reading [19:50:05] nuria_, understand [19:50:16] will look at the code [19:50:54] mforns: ok, thank you, i could use another pair of eyes cause i looked at code for a while and i do not see it [19:51:02] k [19:51:10] which is the patch you were looking? [19:51:19] mforns: https://gerrit.wikimedia.org/r/#/c/360698/2/eventlogging/jrm.py [19:51:27] mforns: but there are couple more [19:52:33] https://gerrit.wikimedia.org/r/#/q/status:merged+project:eventlogging+branch:master [20:12:49] nuria_, I think I might know what's happening [20:12:56] mforns: oohhhh [20:12:59] mforns: yes [20:13:09] https://gerrit.wikimedia.org/r/#/c/360698/2/eventlogging/jrm.py line 297 [20:13:36] store_sql_events assumes that all events received as parameter belong to the same schema [20:13:44] and uses the first one in the batch [20:13:54] to determine which table to insert them into [20:14:19] my guess is that those events, come mixed in the batch [20:14:24] because they share schema [20:14:47] ah, i see, cause they [20:14:51] and the one that happens to be the first in the batch dictates which table are all of them going to be inserted [20:14:53] are not mixed in kafka [20:15:16] but maybe after we read them they all get "bundled" up [20:15:23] I think the mysql consumer groups them by scid [20:15:43] maybe, as they share schema, they are grouped together [20:15:48] will continue digging [20:17:11] mforns: that makes total sense [20:17:35] mforns: if we add a page-create schema that is identical to revision-create schema in event bus it shall fix teh issue [20:17:55] mforns: and configure it here: https://github.com/wikimedia/mediawiki-event-schemas/blob/master/config/eventbus-topics.yaml [20:18:12] nuria_, I guess so, but isn't that defeating the purpose of sharing schemas? you mean as a quick fix? [20:18:26] mforns: yes, as aquick fix [20:18:29] I see [20:19:26] is this super-urgent? I mean, can I look at the handlers code, see if there's a quick patch, to avoid bad grouping? [20:22:02] mforns: this is teh issue: https://github.com/wikimedia/eventlogging/blob/master/eventlogging/handlers.py#L481 [20:22:05] *the [20:22:14] as it groups events by schema and revision [20:22:20] nuria_, yes [20:22:50] nuria_, but one question, how are eventBus events assigned a scid? [20:24:19] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3402427 (10Nuria) Thanks to @mforns for his insight, issue is with : https://github.com/wikimedia/eventlo... [20:24:31] maybe we can add a new method in event.py called extended_scid, that not only returns the schemaName + revisionId [20:24:47] but schemaName + revisionId + originTopic [20:24:52] if available [20:25:00] only for mysqlconsumer grouping [20:25:32] mforns: let me look where that is assigned [20:27:58] or better, we can get event.topic() and also group by that [20:28:53] mforns: see: https://github.com/wikimedia/eventlogging/blob/master/eventlogging/event.py#L200 [20:29:16] nuria_, yes that seems OK [20:29:22] mforns: ya [20:29:35] it seems that eventBus events do have an scid [20:29:39] as well [20:29:58] mforns: i think it is ok for everything other than grouping to enter into DB [20:30:37] yes [20:30:49] mforns: ya, events are pretty similar w/o capsule [20:30:52] https://www.irccloud.com/pastebin/JuRZkVwt/ [20:31:13] aha [20:31:58] nuria_, do you understand the event.topic() method? https://github.com/wikimedia/eventlogging/blob/master/eventlogging/event.py#L110 [20:32:35] mforns: yes, it is for publishing to kafka [20:32:56] ko [20:32:58] ok [20:33:02] mforns: eventlogging events are published with a different set of conventions that eventbus events [20:33:18] mforns: this code is run as event is ingested [20:33:23] I see [20:33:54] mforns: see topic on event i just pasted: "topic": "mediawiki.page-create", [20:34:10] such an identifier is not present in eventlogging events [20:34:20] makes sense [20:37:06] mforns: aI think you are right adding the topic to this grouping might be the way to go: https://github.com/wikimedia/eventlogging/blob/master/eventlogging/handlers.py#L481 [20:37:32] aha I'm trying to see if possible [20:47:45] mforns: thank you , will check in later and can test in beta as needed [20:47:58] k! [21:05:44] 10Analytics-Kanban, 10Reading-analysis: Final Vetting of Family Wide unique devices data - https://phabricator.wikimedia.org/T169550#3402589 (10Tbayer) [23:14:34] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-General, 10Technical-Debt: JsonData and EventLogging have multiple classes with the same name - https://phabricator.wikimedia.org/T159079#3402787 (10Krinkle) 05Open>03declined >>! In T159079#3401737, @Nuria wrote: > Is this still an issue wi... [23:14:44] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Technical-Debt: JsonData and EventLogging have multiple classes with the same name - https://phabricator.wikimedia.org/T159079#3402789 (10Krinkle)