[00:01:58] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10Groceryheist) I got reviews back today. They pretty positive, but one reviewer asked for additional summary statistics that weren't originally reported. I'm also thinking of adding a small analysis to the appendix... [00:02:46] PROBLEM - Check the last execution of eventlogging_to_druid_netflow_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_netflow_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:03:02] PROBLEM - Check the last execution of eventlogging_to_druid_editattemptstep_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_editattemptstep_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:03:26] PROBLEM - Check the last execution of eventlogging_to_druid_prefupdate_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_prefupdate_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:06:48] PROBLEM - Check the last execution of eventlogging_to_druid_netflow-sanitization_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_netflow-sanitization_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:08:30] 10Analytics-Radar, 10Better Use Of Data, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, and 4 others: Documentation of client side error logging capabilities on mediawiki - https://phabricator.wikimedia.org/T248884 (10Jdlrobson) Thanks @nuria we're aware of this and actively using it in our error mon... [00:10:46] PROBLEM - Check the last execution of eventlogging_to_druid_navigationtiming_daily on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:31:04] good morning [06:31:18] perfect, all timers that I changed yesterday broken [06:33:11] ahhh no these are the daily ones, I have probably missed them [06:54:56] RECOVERY - Check the last execution of eventlogging_to_druid_editattemptstep_daily on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_editattemptstep_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:00:15] !log suspend all druid-related coordinators in Hue as prep step for upgrade [07:00:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:01:43] 10Analytics: Check home/HDFS leftovers of demon - https://phabricator.wikimedia.org/T259585 (10MoritzMuehlenhoff) [07:02:02] RECOVERY - Check the last execution of eventlogging_to_druid_navigationtiming_daily on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:05:08] RECOVERY - Check the last execution of eventlogging_to_druid_netflow_daily on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_netflow_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:05:46] RECOVERY - Check the last execution of eventlogging_to_druid_prefupdate_daily on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_prefupdate_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:08:54] RECOVERY - Check the last execution of eventlogging_to_druid_netflow-sanitization_daily on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_netflow-sanitization_daily https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:29:42] !log stop kafka supervisor for netflow on Druid Analytics (prep step for druid upgrade) [07:29:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:33:04] !log stop systemd timers related to druid on an-launcher1002 [07:33:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:33:10] fdans: o/ [07:33:37] hellooo [07:34:04] hola [07:35:28] fdans: so I think I have disabled all the indexation periodic jobs (timers and oozie coordinators) [07:37:42] and from the overlord console I don't see indexation running [07:37:55] ssh -L 8090:druid1001.eqiad.wmnet:8090 druid1001.eqiad.wmnet [07:38:04] so I am going to start from historicals, one at the time [07:38:09] fdans: ok to proceed? [07:38:35] elukey: yesss lets go [07:39:59] all right starting with 1001 [07:40:09] if you have questions doubts please ask [07:41:46] the historicals are the slowest ones to upgrade since we have to do one by one, and they all need to reload all segments from disk cache [07:41:54] and announce them to the coordinator etc.. [07:42:03] ok so 1001 bootstrapped correctly, good sign [07:47:50] upgrading 1002 [07:51:14] * fdans rooting for 1002 [07:51:23] doing 1002 [07:51:26] err 1003 [07:52:28] 6k segments to load for each historical, wow [07:52:37] mostly netflow related [07:54:44] now an-druid1001 [08:00:49] and finally an-druid1002 [08:01:19] once done, I'll do the overlords and middle managers (should be way quicker) [08:01:25] then brokers and finally coordinators [08:03:40] ok all historicals done [08:06:49] overlords and middlemanagers done [08:07:12] brokers now [08:09:16] PROBLEM - Number of Netflow realtime events received by Druid over a 30 minutes period on icinga1001 is CRITICAL: 0 le 0 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=41&fullscreen&orgId=1 [08:09:26] yes yes this was expected [08:09:34] (I killed the job a while ago) [08:10:42] brokers done [08:10:51] doing coordinators now [08:13:27] fdans: cluster upgraded [08:13:48] now, in theory, superset/turnilo should work fine [08:13:50] elukey: HELL YEA [08:16:05] so now indexations [08:16:48] I am trying eventlogging_to_druid_navigationtiming_hourly.service [08:18:10] fdans: new console is at ssh -L 8081:druid1001.eqiad.wmnet:8081 druid1001.eqiad.wmnet [08:18:47] ack [08:18:59] everything looks good [08:19:07] (let me know if you spot anomalies) [08:19:36] !log restore systemd timers for druid jobs on an-launcher1002 (after druid upgrade) [08:19:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:21:47] everything looking good to me [08:22:05] this new console is so fancy [08:24:31] yes I like it a lot! [08:24:40] we cannot query data via sql console from it sadly [08:24:58] since it is not the full power one, we need a new daemon called "router" for it [08:25:04] but for our use case should be ok [08:28:21] !log started netflow kafka supervisor on Druid Analytics (after upgrade) [08:28:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:29:08] supervisor running on 3 middlemanagers as expected [08:30:00] !log resume druid-related oozie coordinator jobs via Hue (after druid upgrade) [08:30:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:30:15] fdans: so upgrade completed! Now we need to be vigilant for issues etc.. [08:30:22] hopefully none, but we'll see :) [08:30:41] elukey: niiiiice [08:30:49] elukey: I did nothing but will take all the credit [08:31:42] the hue people could take a note or two from this druid console [08:32:19] ahahha [08:32:34] I am testing hue 4.7.1, that is a little bit nicer [08:39:34] * elukey coffee [08:46:49] is hue 4.7 working so far? [09:01:01] moritzm: only on my vm, still need to find time to package it :( [09:01:36] cool! [09:13:21] fdans: ah found a little problem, no realtime metrics [09:13:30] buuuuu [09:13:53] elukey: like in grafana? [09:14:06] yep, I think they changed the naming [09:14:19] https://grafana.wikimedia.org/d/000000538/druid?panelId=41&edit&fullscreen&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=All&refresh=1m [09:17:16] ahhhhh I got why [09:17:44] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) [09:17:56] the metrics are still emitted via druid/peon [09:18:16] meanwhile some time ago a pull request from a user for druid-exporter mentioned "middlemanager" [09:18:28] but it must be for the new middlemanager, the one with multi-threading [09:18:31] that we don't use [09:18:38] ok need to change the metric list [09:19:11] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) @DVrandecic The patchset is ready to be merged, but we can't proceed without your signature on the L3 document. Could you... [09:38:15] RECOVERY - Number of Netflow realtime events received by Druid over a 30 minutes period on icinga1001 is OK: (C)0 le (W)10 le 3.652e+05 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/dashboard/db/druid?refresh=1m&var-cluster=druid_analytics&panelId=41&fullscreen&orgId=1 [09:44:18] goood [09:49:59] 10Analytics: Check home/HDFS leftovers of demon - https://phabricator.wikimedia.org/T259585 (10elukey) ` ====== stat1004 ====== total 4 -r-------- 1 1145 root 6 Sep 6 2018 sensitive-datas ls: cannot access '/var/userarchive/demon.tar.bz2': No such file or directory ====== stat1005 ====== total 0 ls: cannot ac... [09:51:50] * elukey bbiab [09:51:56] * elukey bbiab [10:44:09] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: analytics1050 host + mgmt down - https://phabricator.wikimedia.org/T258370 (10elukey) @Cmjohnson yep do anything that you need! [10:45:52] * elukey lunch! [10:47:53] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The best dataset to measure content and contributors to wikipedia - https://phabricator.wikimedia.org/T259559 (10Aklapper) Heja @Nuria, thanks for this post! @srodlund is currently out, so I'm kindly asking for a bit more patience with this one. :) [12:45:54] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 3 others: EventLogging MEP Upgrade Phase 2 (Sampling) - https://phabricator.wikimedia.org/T234594 (10mpopov) 05Open→03Resolved [12:45:58] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Data, and 4 others: EventLogging MEP Upgrade - https://phabricator.wikimedia.org/T238544 (10mpopov) [13:07:17] hi teamm [13:19:04] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: EventLogging MEP Upgrade Phase 3 (Stream cc-ing) - https://phabricator.wikimedia.org/T256165 (10mpopov) We've decided to table this feature for now until we truly need it, but I uploaded what I had so far. Namely the PH... [13:19:21] helllooo [13:19:26] elukey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/618106 [13:19:45] also [13:19:46] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Anaconda [13:21:09] nice! [13:21:20] something is weird on apt [13:21:20] anaconda-wmf | 2020.07~wmf0 | stretch-wikimedia | main | amd64 [13:21:20] anaconda-wmf | 2020.02~wmf0 | stretch-wikimedia | main | source [13:21:20] anaconda-wmf | 2020.07~wmf0 | buster-wikimedia | main | amd64 [13:21:21] anaconda-wmf | 2020.02~wmf0 | buster-wikimedia | main | source [13:21:32] source seems not updated? [13:21:41] not really super important, but looks strange [13:21:57] +1ed [13:22:00] will read the docs! [13:27:37] oh yeah there is no source [13:27:39] will try to remove it [13:30:32] i dunno.... [13:30:45] i guess i have to remove the whole package and then re-add? [13:30:47] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) @Jclark-ctr would it be possible to move some CX allocations to BX? I am asking since workers on ROW C will be a little bit more unbalanced, bu... [13:31:13] elukey: if you can merge this, I will drop the log files and move the task to done :] [13:31:19] https://gerrit.wikimedia.org/r/c/operations/puppet/+/618311 [13:33:48] ok elukey i tihnk i fixed [13:34:15] super thanks! [13:37:50] ottomata: druid analytics upgraded btw, nothing on fire! [13:38:01] very nice! [13:39:53] thanks elukey :] [13:39:59] deployed mforns ! [13:40:32] elukey: will delete logs now [13:45:00] elukey: did you delete the logs?/ [13:46:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: RU reportupdater-ee-beta-features keeps logging a lot of daily errors to its logs - https://phabricator.wikimedia.org/T256195 (10mforns) [13:47:11] elukey: I was going to, and then they just disappeared [13:48:28] nope I didn't [13:48:31] ah yes puppet did :) [13:52:12] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Patch-For-Review: Stream cc map should not be generated on every pageload - https://phabricator.wikimedia.org/T256169 (10mpopov) The WIP patch I uploaded is just so I don't lose the work I did on this, especially if we decide to keep the feature and us... [13:59:57] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10elukey) > we will need to make sure that each stat node has all the necessary R & Python packages that any Oozie jobs we schedule depend on @mpopov this b... [14:08:19] (03CR) 10Mforns: [C: 03+1] "Haven't executed it, but concept and code look great!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/614850 (owner: 10Milimetric) [14:08:37] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) [14:14:01] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) To summarize: if I start from https://phabricator.wikimedia.org/T243521#6005828, more precisely from the view of the distribution of hadoop wor... [14:14:27] ottomata: --^ [14:14:30] does it make sense? [14:15:25] in theory we could even have row C with 6 more workers, not a big deal [14:15:44] after this, we'll also have to add 14 new nodes for expansion [14:15:59] so there will be the possibility to balance out even more [14:16:31] yeah [14:16:45] i think we are reaching the point where balacned is going to mean within some threshold [14:16:49] makes sense elukey [14:17:20] ok so I'll specify in the task that it is not the end of the world if we don't do it now [14:20:22] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10elukey) @mpopov thanks a lot for collecting thoughts, really appreciated. How about something like: * stat1004 reimaged during this week or the next * stat100[6,7] after sept 10th [14:23:49] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10Isaac) > stat1004 reimaged during this week or the next @elukey just a heads up that I'm running some long-running SWAP notebooks via stat1004 but it's okay to kill those processes as part of the rei... [14:33:41] 10Analytics-Clusters: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) Getting back to this: memory usage on an-coord1001 seems high, plus for the sake of availability (not having all daemons concentrated in one place), I'd do the following: * move oozie on a... [14:45:06] https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds-fastest-training-supercomputer [14:46:49] 10Analytics-Radar, 10Better Use Of Data, 10Product-Infrastructure-Data, 10Wikimedia-Logstash, and 4 others: Documentation of client side error logging capabilities on mediawiki - https://phabricator.wikimedia.org/T248884 (10jlinehan) >>! In T248884#6358150, @Jdlrobson wrote: > @jlinehan would you or anyone... [14:47:50] they have dedicated ASICs for tensorflow that do matrix multiplications [14:47:55] for TF [14:48:03] basically what we have XD [14:48:50] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: analytics1050 host + mgmt down - https://phabricator.wikimedia.org/T258370 (10Cmjohnson) 05Open→03Resolved @elukey the power reset cleared the issue, I was able to login to the idrac and reach console com2. Resolving the task [14:52:33] o/ nuria lemme know when you are there and avail for more java-ing [14:52:48] ottomata: we have standup in 5 mins! [14:53:00] oh it is in 1h on my cal? [14:53:13] ottomata: ahait, no , managers meeting that is [14:53:16] ahhh ok [14:53:16] ottomata: sorry [14:53:38] nuria FYI, the reason a pojo won't work is because the stream config settings are not predetermined [14:53:57] ottomata: and it cannot be a superset of all? [14:54:05] of all? [14:54:09] all of what? [14:54:24] they can put whatever they want in there :) [14:55:10] want to put up my code in gerrit, need a name. [14:55:22] currently have [14:55:26] wikimedia-eventutilities [14:55:29] ottomata: isn't at the end a key value pair? [14:55:30] don't love it but don't have a better idea [14:55:32] no [14:55:36] its a json object [14:56:02] e.g. [14:56:35] mystream: [14:56:35] topics: [a, b] [14:56:35] sampling: [14:56:35] rate: 0.5 [14:56:36] schema_title: my/schema [14:57:46] ottomata: we can talk later but that seems to me like once-read is an inmutable object [14:58:27] sure but it is an immutable ObjectNode [14:58:33] jackson [14:58:45] i don't know from code what is in there [14:58:46] i can do [14:58:58] eventStreamConfig.getSetting(streamName, settingName) [14:59:01] that will return me a JsonNode [14:59:04] for which I still don't know the type [14:59:06] i also have [14:59:14] eventStreamConfig.getSettingAsString [14:59:17] for convenience [15:02:06] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10Ottomata) > 14 x Hadoop workers - expansion for extra retention FYI, These are part of the FY 2020-2021 orders, not 2019-2020/ [15:04:36] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10mpopov) > * stat1004 reimaged during this week or the next > * stat100[6,7] after sept 10th Sounds good! [15:05:15] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 / FY2020-2021 - https://phabricator.wikimedia.org/T243521 (10elukey) [15:05:38] :p [15:05:50] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10elukey) [15:06:05] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 / FY2020-2021 - https://phabricator.wikimedia.org/T243521 (10Ottomata) [15:06:08] ottomata: are the 16 workers to refresh part of last fiscal? [15:06:14] yes [15:06:19] they were ordered last fiscal [15:06:35] ah okok, makes sense, I got confused [15:06:45] I can open a new task for the 2020/2021 hardware [15:06:52] so we can track it [15:06:55] there is also another 10 nodes that are new for regular expansion [15:07:09] we have those separate from the storage retention ones for justifcation reasons [15:07:21] but rob is going to put them all into the same order, since the spec is the same [15:08:12] and those are tracked in T258727 ? [15:08:25] i thnk so... [15:08:49] yes [15:08:49] https://phabricator.wikimedia.org/T259171 [15:08:50] was merged in [15:08:53] that was t he expansion [15:09:07] ah okok [15:09:32] so i think that will be 22 new nodes [15:09:32] so we are adding 20 nodes during the next months basically [15:09:34] in thta order [15:09:47] no sorry 30, 10 + 14 + 6 (gpus) [15:09:53] ya [15:09:56] ahahhaha [15:09:58] + referesh [15:10:02] do we have to run a shuttle? [15:10:06] haha [15:10:14] we have to store things because reasons sometimes [15:10:15] i guess [15:10:37] ok I am going to open new tasks etc.. to have a clear view [15:11:04] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) [15:11:34] actually about 24 [15:11:47] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 / FY2020-2021 - https://phabricator.wikimedia.org/T243521 (10Ottomata) [15:11:54] oh i juste edited your todo item in ^ to reflect [15:11:56] hope that makes sense [15:12:43] I created https://phabricator.wikimedia.org/T255146 a while ago (and forgot) let's add it in there [15:13:06] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10elukey) [15:13:20] 10Analytics-Clusters: Put 24 Hadoop worker nodes in service (cluster expansion) - https://phabricator.wikimedia.org/T255146 (10elukey) [15:13:33] past Luca already did the work [15:13:37] ok, so those 24 are the 10 expansion + 14 retention nodes [15:13:38] present Luca is not great [15:13:45] yep [15:13:46] maybe make that a parent of the procurement task? [15:14:06] sure [15:14:18] 10Analytics-Clusters: Put 24 Hadoop worker nodes in service (cluster expansion) - https://phabricator.wikimedia.org/T255146 (10elukey) [15:14:21] ottomata: does this https://phabricator.wikimedia.org/T254606#6359902 mean it's getting deployed next Tuesday? Were you saying yesterday you'd have a way to deploy it faster? [15:14:42] since that one is a code change yeah [15:14:47] but we could backport it if we wanted to [15:15:00] https://wikitech.wikimedia.org/wiki/Backport_windows [15:15:03] (AKA SWAT) [15:15:49] https://wikitech.wikimedia.org/wiki/Backport_windows#How_to_submit_a_patch_for_backport [15:16:05] 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey) [15:16:35] 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey) [15:20:18] 10Analytics-EventLogging, 10Analytics-Kanban, 10Analytics-Radar, 10Event-Platform, and 5 others: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 (10Ottomata) [15:20:35] 10Analytics-Radar, 10Datasets-General-or-Unknown, 10Product-Analytics, 10Structured-Data-Backlog: Set up generation of JSON dumps for Wikimedia Commons - https://phabricator.wikimedia.org/T259067 (10Cparle) @ArielGlenn is this in your purview? [15:20:45] 10Analytics-EventLogging, 10Analytics-Kanban, 10Analytics-Radar, 10Event-Platform, and 5 others: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 (10Ottomata) a:03Ottomata [15:51:57] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10Cmjohnson) @elukey I am sorry, 10G rackspace is already very limited and with these being 2U servers it's even tighter. The racking configuration will... [16:02:24] ping milimetric [16:02:27] standduppp [16:05:21] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) @Cmjohnson yep no problem, thanks :) [16:15:54] 10Analytics-Radar, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Jdlrobson) [16:21:40] 10Analytics, 10Product-Analytics: Streamline Superset signup and authentication - https://phabricator.wikimedia.org/T203132 (10nshahquinn-wmf) [16:28:58] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10leila) @Groceryheist I'm working on this. [16:29:10] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10leila) a:03leila [16:43:02] 10Analytics-Clusters: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10Iflorez) Thank you, @elukey ! [17:00:40] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP on a single page) - https://phabricator.wikimedia.org/T259371 (10Jdlrob... [17:08:16] 10Analytics-Radar, 10Product-Analytics: Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10jwang) [17:09:51] a-team: storm knocked me out [17:10:58] phone reception seems very weak, probably won’t join [17:11:46] mforns: do you remember when a truck hit Dan's connection and we were in a meeting? :D [17:11:59] hehe yes elukey [17:12:12] ahahhaha what a moment [17:12:34] xD we thought there was an earthquake or sth... [17:12:55] so weird, I got it back. Wait where is everyone [17:14:02] milimetric: we left because end of meeting, will continue discussions lter [17:14:09] oh ok [17:18:38] milimetric: sent e-mail with thoughts of today's meeting [17:18:50] milimetric: you can tag along and we shall continue discussing [17:20:19] 10Analytics-Radar, 10Product-Analytics, 10Product-Infrastructure-Team-Backlog, 10Epic: Re-define what constitutes a mobile pageview - https://phabricator.wikimedia.org/T257277 (10kzimmerman) [17:20:23] 10Analytics-Radar, 10Product-Analytics, 10Product-Infrastructure-Team-Backlog, 10Epic: Re-define what constitutes a mobile pageview - https://phabricator.wikimedia.org/T257277 (10LGoto) p:05Triage→03Medium [18:15:45] ottomata: if a stream is configured under metawiki (rather than 'default') and the client uses metawiki's stream config (fetched through meta's MW API), does the client need to specify meta.domain: 'meta.wikimedia.org'? asking for mobile apps, trying to establish how to deploy apps-specific streams and decide which mw api to fetch them from [18:18:55] ottomata: nothing wrong with configuring an apps-specific stream under 'default', since those streams wouldn't be registered for eventlogging so it's okay if the streams are deployed to all wikis and we let apps download the stream configs from any wikipedia's mw api [18:22:11] bearloga: since there isnt'a 'default' wiki [18:22:21] i've been using meta.wikipedia.org for api requests like that [18:22:38] that's what eventgate will use too when we centralize its stream confif there [18:22:44] (stream config is not just for EventLogging) [18:23:06] but, no i don't think the meta.domain field has to corrrespond with where you get the stream config from [18:23:16] that is the 'domain the event pertains to', whatever that means [18:23:39] does it even really have to mean a DNS 'domain name' ? i dunno maybe not! ) [18:23:39] :) [18:23:48] the client gets to decide what to put there [18:25:32] ottomata: cool! yeah, the current draft of the ios client fetches from metawiki. okay so that's good :D [18:28:08] ottomata: follow-up question, if a stream is configured exclusively for, say, enwiki, and another exclusively for, say, cawiki, then that's fine for eventgate because the stream config it uses for reference has all the streams combined. default, cawiki, enwiki all together [18:28:47] ottomata: also thanks for clarifying! [18:33:02] bearloga: it doesn't actually, stuff in cawiki does not go to default or metawiki [18:33:05] it needs to be in both places [18:33:34] the stream must exist in default/metawiki with a schema_title [18:33:47] for other stream configs specific to specific wikis, like sampling [18:34:04] you'd copy/paste the stream config into that wikis config and then add extra specific settings [18:38:24] ottomata: okay so just to clarify -- if i have an instrument deployed, producing events to a not-yet-existing stream and i want to activate it on, say, cawiki as a start, I would have to configure that stream in default/metawiki with sampling.rate = 0.0 (so it's out-of-sample eveywhere) and then override that stream on cawiki (the only place where I want it to be in-sample) [18:40:10] *override that config [18:41:40] aye, you'd have to copy the all of config from default to cawiki, and then add extra things you need to cawiki [18:42:00] the configs aren't merged...i'm pretty sure [18:42:05] not recursively mered anyway [18:42:34] i think there is a way to merge top level entries, like the full list of stream configs could be merged together [18:42:44] but the actual settings would only come from one of the wiki configs [18:42:51] so all the settings need to be copiued [18:44:13] alternatively, what if I have the stream configured in default/metawiki but then register it with eventlogging under cawiki only via $wgEventLoggingStreamNames? so when the instrument produces an event, that stream's config is only available in eventlogging on cawiki and nowhere else [18:46:11] that should work for the purpose of remotely activating/deactivating deployed instruments to/from specific wikis, right? [18:48:01] (this only pertains to mediawiki-based instrumentation) [18:49:03] that would work [18:49:07] ya [18:49:34] sweet [18:50:38] milimetric: if you are testing eventlogging things in mw vagrant....coudl you test this?! [18:50:39] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/617770 [18:51:02] ottomata: i'd be happy to document the findings from our convo on that instrumentation how-to page, if you're okay with that? [18:51:13] sure! [18:51:44] milimetric: heads-up that patch that andrew asked you to test: it's very cool [18:53:53] milimetric: also we ran into a problem with npm version in vagrant https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/617770/4#message-e00e18d24890dad030c832159c1f264044d9f338 [19:11:09] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Set up environment for Product Analytics system user - https://phabricator.wikimedia.org/T258970 (10nshahquinn-wmf) [19:11:12] 10Analytics-Radar, 10Product-Analytics, 10Release-Engineering-Team, 10Repository-Admins: Create a repository and user for Product Analytics Oozie jobs - https://phabricator.wikimedia.org/T230743 (10nshahquinn-wmf) 05Resolved→03Open >>! In T230743#6124838, @nshahquinn-wmf wrote: > @mpopov should we not... [19:21:38] (03PS8) 10Mforns: Fix project field of geoeditors public monthly and semantics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [19:25:02] (03CR) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks (0313 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [19:25:39] oook nuria [19:25:40] https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/618378 [19:25:41] (03PS9) 10Mforns: Fix project field of geoeditors public monthly and semantics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [19:25:43] what next? :) [19:25:47] joseph has reviewed most of it [19:25:56] i did not make EventStream a pojo, we can talk more if you want [19:27:46] (03CR) 10Mforns: "I added a CTE to get the active wikis (wikis with 3 or more active editors [5+ edits in that month])." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [19:46:00] (03CR) 10Mforns: [V: 03+2] "Tested this, should be fine to merge." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [19:50:20] ottomata: out of meetings, will look after munchies [19:51:19] k ty [19:56:28] a-team: power out in the whole town, cell phone reception is mostly out, I walked a mile or so to send this. Will be out for a few days maybe (that’s what the neighbors say) [19:57:27] sorry ottomata can’t test now, no easy way to get connection [20:04:36] wooow crazy! [20:37:20] uou... [20:56:57] 10Analytics-Radar, 10Product-Analytics (Kanban): Identify next steps for dealing with missing mobile app pageview counts - https://phabricator.wikimedia.org/T256804 (10SNowick_WMF) Pageview analysis used to estimate expected pageview counts [[ https://docs.google.com/spreadsheets/d/15we93X3OIBewap3PSUHgrT0o2HR... [21:02:23] 10Analytics-Radar, 10Product-Analytics (Kanban): Identify next steps for dealing with missing mobile app pageview counts - https://phabricator.wikimedia.org/T256804 (10kzimmerman) @SNowick_WMF to update this task: what next steps were decided on (or proposed)? If work has been done on those next steps, link to... [21:48:29] ottomata: done :) https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Per-wiki_configurations [22:08:52] 10Analytics-Radar, 10Product-Analytics (Kanban): Identify next steps for dealing with missing mobile app pageview counts - https://phabricator.wikimedia.org/T256804 (10Nuria) @kzimmerman given that data , after rewrite, had significant issues until most recently when those were fixed I think it makes little se... [23:18:30] bearloga: cool [23:18:51] btw i don't see any reason why we couldn't have 'enabled: false' type setting? [23:19:00] or is_active: false [23:19:01] ? [23:19:04] but this works for now too [23:19:05] :) [23:21:54] (03CR) 10Nuria: [C: 03+2] Fix project field of geoeditors public monthly and semantics (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser)