[02:55:47] (03PS1) 10TerraCodes: git.wikimedia.org -> phab [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/389906 (https://phabricator.wikimedia.org/T139089) [03:14:25] (03PS1) 10TerraCodes: git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) [04:02:50] (03CR) 10Chad: [C: 032] git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [04:02:55] (03CR) 10Chad: [C: 032] git.wikimedia.org -> phab [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/389906 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [04:03:04] (03CR) 10jerkins-bot: [V: 04-1] git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [04:07:14] (03Merged) 10jenkins-bot: git.wikimedia.org -> phab [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/389906 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [04:39:23] (03CR) 10TerraCodes: "recheck" [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [08:43:33] morning people! [08:43:58] I'd need to reboot all the aqs nodes for jvm+kernel updates [09:23:53] (03CR) 10DCausse: [C: 031] "I think this should be ready to go, this code is being actively used since months to generate click logs. Any objections to merge?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T162054) (owner: 10EBernhardson) [09:36:23] Hi elukey - I'm around if you need me [09:37:07] (03CR) 10Gehel: [C: 031] "No objection to merging from me..." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T162054) (owner: 10EBernhardson) [09:39:33] joal: o/ - is there any cassandra loading that might happens soon ? (didn't check) [09:40:03] elukey: At fix hour normally, so we should be safe [09:42:33] I am still working on another thing (restbase dying on some hosts) so not sure when I'll start [09:45:55] elukey: we load hourly on cassandra - there's only one job - you can follow it here: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0013448-170621131133576-oozie-oozi-C/ [09:47:43] sure sure, I was about to say that I'd disable it for precaution [09:47:47] will do it later on [10:04:43] !log suspended cassandra-coord-pageview-per-project-hourly as prep step to reboot aqs nodes - T179943 [10:04:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:04:48] T179943: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943 [10:23:28] all right so aqs1004 rebooted, cassandra instances booting u [10:23:30] *up [10:33:57] proceeding with 1005 [10:44:23] now 1006 [10:54:13] now 1007 :) [11:05:47] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3743903 (10elukey) Just sent a summary of what's happening to engineering@ and analytics@, new deadlines: ``` - November 13th: the analytics-slave CNAME move... [11:07:25] now 1008 [11:10:14] in the meantime I've sent the email about the databases to engineering@ and analytics@ [11:12:03] 10Analytics, 10EventBus, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3743921 (10Pchelolo) [11:13:14] 10Analytics, 10EventBus, 10Easy, 10Services (done): EventBus logs don't show up in logstash - https://phabricator.wikimedia.org/T153029#3743935 (10Pchelolo) 05Open>03Resolved The EventBus logs were fixed and now can be seen in log stash. Resolving. [11:22:17] 10Analytics, 10EventBus, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3743921 (10mobrovac) We have 16 CPU cores with HyperThreading enabled on each node, but only 8 EventBus proxy service workers, so one immediate thing that we need to do is to increas... [11:27:54] aaaaan finally aqs restarts done [11:28:33] !log resumed cassandra-coord-pageview-per-project-hourly after maintenance to aqs hosts [11:28:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:29:08] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3744020 (10elukey) [12:21:17] * elukey lunch! [14:48:01] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3744585 (10Ottomata) +1 ^ There's also this old unmerged patch: https://gerrit.wikimedia.org/r/#/c/302372/ However, when I tested at the time, it didn't show... [15:16:55] !log deploying eventlogging analytics change for eventcapsule schema fixes, will be no-op until we deploy puppet changes too [15:16:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:41:26] 10Analytics, 10Analytics-Cluster, 10Analytics-Wikistats: Design map visualization on UI - https://phabricator.wikimedia.org/T175422#3592991 (10fdans) https://wikitech.wikimedia.org/wiki/Analytics/Wikistats2.0/Map_Component [15:52:56] (03PS1) 10Chad: Swap git.wikimedia.org -> phabricator.wikimedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/390003 (https://phabricator.wikimedia.org/T139089) [15:52:59] (03CR) 10Chad: [C: 032] Swap git.wikimedia.org -> phabricator.wikimedia.org [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/390003 (https://phabricator.wikimedia.org/T139089) (owner: 10Chad) [15:57:52] (03CR) 10Nuria: [C: 032] UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T162054) (owner: 10EBernhardson) [15:59:33] PROBLEM - EventLogging overall insertion rate from MySQL consumer on graphite1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [10.0] [16:00:00] (03PS1) 10Chad: Fix git.wikimedia.org -> phabricator.wikimedia.org [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/390006 (https://phabricator.wikimedia.org/T139089) [16:00:02] (03CR) 10Chad: [C: 032] Fix git.wikimedia.org -> phabricator.wikimedia.org [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/390006 (https://phabricator.wikimedia.org/T139089) (owner: 10Chad) [16:00:13] (03CR) 10jerkins-bot: [V: 04-1] Fix git.wikimedia.org -> phabricator.wikimedia.org [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/390006 (https://phabricator.wikimedia.org/T139089) (owner: 10Chad) [16:00:41] (03CR) 10jerkins-bot: [V: 04-1] Fix git.wikimedia.org -> phabricator.wikimedia.org [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/390006 (https://phabricator.wikimedia.org/T139089) (owner: 10Chad) [16:27:48] RECOVERY - EventLogging overall insertion rate from MySQL consumer on graphite1001 is OK: OK: Less than 20.00% under the threshold [50.0] [16:37:42] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3744994 (10Nuria) @JMinor I find yours quite an unfair representation of events. Please be so kind to invite us to the meting you have with legal on this regard. > Plea... [16:39:30] (03Abandoned) 10Chad: Fix git.wikimedia.org -> phabricator.wikimedia.org [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/390006 (https://phabricator.wikimedia.org/T139089) (owner: 10Chad) [16:42:40] nuria 'change has been reverted'? send me patch i will merge! :) [16:52:57] ottomata: sorry, here it is: https://gerrit.wikimedia.org/r/#/c/390019/ [16:53:20] ottomata: looking at https://gerrit.wikimedia.org/r/#/c/389861/ [16:53:26] ottomata: issue was empty objecvts? [16:53:50] ottomata: or you are just being triple safe [16:55:11] nuria, i'm not 100% sure what the issue was, the logging wasn't good enough. this patch improves some logging, but also keeps the process from dying [16:56:48] nuria_: fdans, qq [16:56:49] https://github.com/wikimedia/eventlogging/blob/master/eventlogging/utils.py#L328-L329 [16:57:01] i'm not sure what's better to do here [16:57:12] so I modified the EventCapsule schema to specicy the parsed ua objet schema [16:57:16] https://meta.wikimedia.org/w/index.php?title=Schema:EventCapsule [16:57:19] use userAgent here [16:57:28] however, since some values can be null [16:57:30] like os_minor [16:57:34] this will fail validation [16:57:36] so, either: [16:57:36] ottomata: yes [16:57:41] parse_ua() should set these to "-" [16:57:46] or [16:57:50] i should just remove the userAgent object schema from the capsule [16:57:52] and let it be free form [16:57:55] that might be easier [16:58:20] whatcha think? [16:58:28] ottomata: Os-minor:null will fail validation? [16:58:41] ottomata: even when present? [16:58:59] ottomata: cause the key "os_minor" always appears [16:59:17] yes [16:59:23] because the type of os_minor [16:59:26] is "string" [16:59:27] not [16:59:30] ["string", "null"] [16:59:31] and [16:59:49] since EventLogging extension code keeps us from using union types in the wiki schema editor [16:59:53] if we wanted to support null [17:00:01] we'd have to change the type of the fields to "any" [17:00:38] ottomata: i see, we did it that way to match what we do on hive [17:00:45] aye [17:00:54] hm, i think we shoudl change capsule to not specify propeties then [17:00:57] fdans: batcave for map? [17:01:05] nuria_: agree? [17:01:05] we're there! [17:01:06] (03PS1) 10TerraCodes: git.wikimedia.org -> phab [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/390035 (https://phabricator.wikimedia.org/T139089) [17:01:17] ottomata: as in userAgent any? [17:01:22] yup [17:01:45] ottomata: one sec map meeting [17:03:37] (03CR) 10Chad: [V: 032 C: 032] git.wikimedia.org -> phab [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/390035 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [17:03:53] (03CR) 10jenkins-bot: git.wikimedia.org -> phab [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/390035 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [17:31:56] nuria_: if ok with you i'm goign to do ^^ to capsule [17:32:25] ottomata: i need to think a bit about pros/cons [17:34:15] ok, lemme know if you want to discuss in bc or something, and trying to make moves on this :- [17:34:17] :) [17:55:14] 10Analytics-EventLogging, 10Analytics-Kanban: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3728047 (10Nuria) Per conversation in person: let's make sure that when/if we remove event capsule from meta (so it does not exists i... [18:00:14] nuria_: did you mean to post that on T179836? [18:00:15] T179836: Remove EL capsule from meta and add it to codebase - https://phabricator.wikimedia.org/T179836 [18:06:23] 10Analytics-EventLogging, 10Analytics-Kanban: Remove EL capsule from meta and add it to codebase - https://phabricator.wikimedia.org/T179836#3745241 (10Nuria) Per conversation in person: let's make sure that when/if we remove event capsule from meta (so it does not exists in schema form), when we do we documen... [18:13:18] * elukey off! [18:50:40] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3745575 (10JMinor) Thank you for holding this. > We would really like to clarify with your team that the privacy policy is driven by legal, abided by WMF and enforced... [19:25:49] team, EventLoggingToDruid has loaded 1 hour of Popups data to Druid/Pivot and looks OK so far: https://tinyurl.com/y9de3maw [19:27:39] the command I used: https://pastebin.com/uyeLY2NC [19:45:39] * mforns off! [19:56:10] HaeB: this is just a prototype ^ but looking great (tinyurl is a link to pivot) [20:02:40] ottomata: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Eventlogging_Capsule [20:02:53] cc HaeB https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Eventlogging_Capsule [20:04:47] HaeB: reading ticket and seeing this: https://meta.wikimedia.org/w/index.php?title=Schema:EventCapsule&diff=prev&oldid=16479585 [20:05:03] HaeB: i think points to the missunderstanding about capsule [20:05:30] HaeB: it does not describe data as it is on mysql (it never has) it describes data as it is digested by system [20:05:47] HaeB: on intake [20:06:08] nuria_: i already commented on this at https://phabricator.wikimedia.org/T179540#3734357 [20:07:26] HaeB: my point is that i think your original commit tries to change capsule according to what you see on Mysql Db [20:07:52] nuria_: cool, edited a bit: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Eventlogging_Capsule [20:08:00] HaeB: but capsule does not represent the data as it is present on db, i think that is where the confusion might be [20:08:25] TLDR: the page was wrong or misleading anyway (no milliseconds even in the internal format), and i don't think we ever specified whether the schema pages on meta should document the mysql version or something else, simply because this kind of discrepancy normally doesn't happen [20:09:16] also, the bug is mainly about the discovery of that discrepancy in the first place ;) [20:09:24] thanks for adding the documentaiton on wikitech [20:10:29] what's more, for regular schema pages it's quite normal for the page on meta to be ahead of the code (that's why the code refers to revision numbers) [20:11:31] milimetric: (popups/druid) looks great! this could become very useful [20:11:53] it's all Marcel, HaeB, I was just pinging you in case you didn't see his message [20:12:49] HaeB: again, teh capsule works bit differently but it shouldn't matter, for teh purpose of documenting what is on database tables wikitech will suffice [21:37:47] 10Analytics-EventLogging, 10Analytics-Kanban: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3745960 (10Ottomata) Talked with Nuria and Dan a bit today, and I feel like I should say a little more about this. The JSON Refineme... [21:46:48] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Tgr) Does that mean it's not going to be possible to JOIN EventLogging tables with MediaWiki tables in the future? I'm not working with user-facing... [21:52:36] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3746008 (10Nuria) @Tgr What types of selects were you doing? We think the best place to do this type of joining is hadoop, we are working into refining EL d... [22:01:36] milimetric: I think we are unbundling semantic plain (not minimized) , does that ring a bell at all? [22:02:14] in dev, yes, in prod it’s all minimized [22:04:31] milimetric: npm run runs prod build right? [22:05:01] milimetric: sorry, npm run build [22:07:58] nuria yes [22:08:16] and puts it in /dist [22:08:57] so if you found unminimized css in the prod build it’s a mistake [22:09:22] milimetric: no i did not [22:09:37] milimetric: i though was same build but is ee there are two configs for web[pack [22:10:06] yes, they both use the base config and override/add [22:25:31] elukey: are you around? :) [22:52:21] lzia: he is gone to bed hours ago [22:52:27] lzia: do you need help? [22:53:10] got you, nuria_. yes. I want to understand something better. so when log is dropped and all future EL events will be stored and processed in hadoop, what will exactly happen to old tables? do they get transferred? [22:53:25] nuria_: sorry if this is super obvious and/or discussed somewhere already. [22:53:38] lzia: no, no changes are happening to mysql backend as of now [22:53:58] lzia: luca's e-mail is about replacing mysql host with newer hardware [22:53:59] I see: so no tables or data will be dropped, nuria_. Correct? [22:54:04] lzia: right [22:54:08] ooo. got you. perfect. thanks. [22:54:17] lzia: there is an ongoing project to refine el data in hadoop [22:54:24] lzia: which is happening in parallel [22:54:57] lzia: the main difference is that EL db and mediawiki db will not be located in teh same box any more [22:56:29] yup. which brings its own challenges, but at least I can be sure that I don't lose data. ;) thanks nuria_ [22:56:49] lzia: right, if anything accessing EL data will be a lot faster