[04:52:49] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Nuria) [05:10:49] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Nuria) Note to self: do we need different alarms for these events, thus far we can catch hours not refined with: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589000/1/modules/p... [06:18:51] !log roll restart zookeeper on druid100[1-3] for openjdk upgrades [06:18:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:29:46] !log roll restart zookeeper on druid100[4-6] for openjdk upgrades [06:29:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:51:19] Good morning [06:59:41] bonjour! [07:00:02] the druid coord interface looks better today, ~11% of usage for the new druid nodes [07:00:23] so druid is doing the right thing, conservatively shuffling segments around [07:00:26] I like it [07:01:41] elukey: I read docs yesterday, and druid shuffles segments, but bery slowly - At every coordinator check, if there is data is too much unbalanced on historical, it shuffles a small number of semgents [07:02:01] elukey: we could configure it to shuffle stronger, but it feels good good doing it slowly :) [07:04:14] yep I agree [07:17:34] elukey: is something broken with the cook books for druid? I still see old Java procs on 1004-1006 [07:18:21] moritzm: morning! I haven't restarted druid public yet, only analytics.. I need to do thing with care due to the move from /var/lib/druid to /srv/druid [07:18:27] this time I can't use the cookbook [07:18:44] the two that I ran this morning were for zookeeper on Druid [07:21:00] ah, gotcha. I was just confused since I saw the !log above for 1004-1006 [07:21:24] yepyep sorry! [07:21:44] and in fact this mentioned ZK, my bad :-) make some tea now [07:40:25] (03CR) 10Fdans: Change "Active Editors" to registered user editors only (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/594194 (https://phabricator.wikimedia.org/T213800) (owner: 10Fdans) [08:17:41] (03CR) 10Fdans: [C: 03+2] Fix language dropdown for ios devices [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/589606 (https://phabricator.wikimedia.org/T246971) (owner: 10Fdans) [08:17:48] (03CR) 10jerkins-bot: [V: 04-1] Fix language dropdown for ios devices [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/589606 (https://phabricator.wikimedia.org/T246971) (owner: 10Fdans) [08:23:26] * elukey afk for a bit! [10:33:33] 10Analytics, 10Performance-Team, 10Readers-Web-Backlog (Tracking): Review referer configuration of origin/origin-when-crossorigin/origin-when-cross-origin - https://phabricator.wikimedia.org/T248526 (10TheDJ) @Krinkle for the origin-when-crossorigin fallback.. The spec compliant origin-when-cross-origin has... [11:16:19] * elukey lunch! [11:37:13] joal: how's https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/594719/ look? [11:37:25] I'd like to get it on the train for the week [11:38:04] Reading [11:39:43] (03CR) 10Joal: [C: 03+2] "LGTM! Thanks for the good doc :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594719 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric) [11:39:52] milimetric: feel free to merge when you want :) [11:40:58] joal: ok, then also the related change: https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/594428/ [11:41:08] (sorry I didn't link properly) [11:42:43] milimetric: forgot to mention for oozie patch: works for me as long as tested ) [11:42:49] milimetric: same for the scala patch :) [11:43:22] milimetric: actually in scala, only thing missing is some comments about why we join - please :) [11:43:49] joal: yes, I built the jar and tested them together. I didn't run a full test after the offset/dataset changes, but I did test the oozie part with --dryrun and made sure it does what we think. [11:44:18] milimetric: as long as you are confident - I trust you :) [12:16:41] (03PS43) 10Fdans: Add pageview daily dump oozie job to replace Pagecounts-EZ [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) [12:25:33] 10Analytics, 10Operations, 10serviceops, 10vm-requests: Create a VM for matomo1002 (eqiad) - https://phabricator.wikimedia.org/T252742 (10elukey) ` elukey@ganeti1003:~$ sudo gnt-group list Group Nodes Instances AllocPolicy NDParams row_A 4 44 preferred ovs=False, ssh_port=22, ovs_link=, spin... [12:31:38] (03PS44) 10Fdans: Add pageview daily dump oozie job to replace Pagecounts-EZ [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) [12:32:29] 10Analytics, 10Product-Analytics: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10Milimetric) I think this choice belongs with the security team, our privacy expert @JFishback_WMF, and I'm ok with whatever they decide. I will, of course, add my reasoning: First, I inte... [12:35:05] (03PS45) 10Fdans: Add pageview daily dump oozie job to replace Pagecounts-EZ [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) [12:56:25] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Ottomata) We could, but how to know if that hour is absent because of a lack of data, or due to loss? If we emit canary events into all topics then we know that all hours should have at... [13:03:09] 10Analytics: SQL query failed on superset SQL lab - https://phabricator.wikimedia.org/T252225 (10Milimetric) Jennifer, here's another little tidbit that shows looking over more data: ` -- this will count all periods of time with active blocks, even if blocks didn't change -- so it's definitely not the same thin... [13:04:25] (03CR) 10Milimetric: [C: 03+2] Use page move events to improve joining to entity [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/594428 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric) [13:04:31] (03CR) 10Milimetric: [V: 03+2] Use new page move incremental updates [analytics/refinery] - 10https://gerrit.wikimedia.org/r/594719 (https://phabricator.wikimedia.org/T249773) (owner: 10Milimetric) [13:44:30] Gone til standup [13:49:31] 10Analytics, 10Operations, 10serviceops, 10vm-requests: Create a VM for matomo1002 (eqiad) - https://phabricator.wikimedia.org/T252742 (10elukey) 05Open→03Stalled This is currently blocked due to resource constraints in row_c eqiad for Ganeti, see https://wikitech.wikimedia.org/wiki/Ganeti#Verify_clust... [13:49:34] 10Analytics, 10Analytics-Kanban: Move Matomo to Debian Buster - https://phabricator.wikimedia.org/T252740 (10elukey) [13:50:15] 10Analytics, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10elukey) [13:50:35] elukey: sorry i got puppet nerd sniped by your java patch and i just left a big idea [13:50:57] it became a little much and if you think that it is too fancy your patch is good [13:59:20] ottomata: feel free to take over and update the patch, I wanted to start the conversation to end up in something shared by everybody, no strong feelings about the code:) [14:00:20] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog (Kanban): EventLogging Server Side client should POST to EventGate - https://phabricator.wikimedia.org/T253121 (10Ottomata) [14:00:30] after multiple people chiming it, it seems that we are going to a good single direction [14:00:45] oh elukey i don't mean to take over, just to give an idea...and then describing my idea ended up in some semi complete pseudo code :p [14:01:28] ottomata: no no I didn't mean take over as bad thing, it makes more sense that you upload your version since you have the complete picture in mind [14:01:45] then we can review, it looks very good as well [14:01:52] (03CR) 10Fdans: "job re-tested successfully" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/595152 (https://phabricator.wikimedia.org/T251777) (owner: 10Fdans) [14:01:54] I just want a single place in which we define things :) [14:01:56] aye [14:02:13] i dunno though, is mine too fancy? :) [14:03:35] ahahah what do you mean with too fancy? [14:09:43] well, for loops and structs ,etc. i thnk my version would even let you install java 8 jre-headless + java 8 jre + java 8 jdk, etc. [14:09:49] which maybe is fine? [14:10:45] a-team, who is most familiar with all of the EventLogging on wiki documentation and practices [14:11:07] things are going to be really confusing as I change things, snice the EventLogging extension still does support things like on-wiki schemas [14:11:15] mostly i'm starting with https://www.mediawiki.org/wiki/Extension:EventLogging/Guide [14:11:27] and I'm not sure if I should edit that, or just rewrite it MEP focused on wikitech [14:11:57] I think we should stop supporting EventLogging for third parties [14:12:00] and move everythign onto wikitech [14:12:01] but i dunno [14:12:26] EventLogging Guide has both WMF specific and non WMF stuff [14:13:09] 10Analytics, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10elukey) @Eevans thanks a lot for all the answers. If you have time I have another doubt :) Say that we upgrade a node to 3.11, and upgrade a small schema's sstables to the new format. A... [14:14:20] ottomata: ah yes I believe it is fine, I chose to allow only one just to simplify, but I don't see problems with multiple packages if one wants to [14:18:43] aye k [14:18:53] sigh...i still wish we had written a new extension instead of re-using eventlogging [14:23:28] ottomata: as far as i know we have never really supported eventlogging for 3rd parties [14:23:43] ottomata: but mediawiki extension docs have always been on mediawiki.org [14:23:52] ottomata: that is why those are there [14:24:25] yeahh, but there is def a lot of effort describing how people can and should use things that sounds as if it is worded for non wmf stuff [14:24:54] nuria: i'm considering moving everything new to wikitech, and just poaching the good stuff from mw.org, ,and then deprecating all the docs on mw.org [14:25:11] and also dropping all refferences and support for thigns like on wiki schemas, etc. [14:25:34] EventLogging extension will eventually just become an event POSTing client [14:25:39] ottomata: i think the original intent might have been making it easy to use for third parties but i do not think 3rd parties used it [14:25:44] yeah [14:25:59] there is so much documentation! [14:26:03] and it is really good! [14:26:07] just a bit out of date now! [14:26:31] ottomata: some generic parts still apply [14:26:45] ottomata: cause the intent of the client has not changed [14:26:56] yes [14:27:00] some of it is great, and i want to keep that stuff [14:27:11] lots of the stuff about how to design schemas, not to send to much, etc. [14:27:12] is good [14:27:23] but also relevant not just for EventLogging [14:27:34] so I'm trying to figure out how to fit it into overall Event Platform docs [14:28:57] ottomata: a possible start might be trimming the parts blatantly outdated and see what's left [14:29:55] hm i guess i can do that [14:29:59] would probably be easier [14:31:19] ok nuria if no one will get mad, i will do that, and start editing the mw.org pages to make them all about new stuff [14:36:10] 10Analytics, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10Gehel) Within my limited understanding of Cassandra, the plan looks good to me. A few additional notes for the maps cluster: * all data in there can be re-generated. It's time consuming... [14:45:35] heh nuria someone added in wiki translation support to some of the el docs on mw.org [14:45:39] makes it really hard to edit... [14:45:59] dunno how in wiki translation works, but everythhing is at least duplicated? [14:46:15] ottomata: seems unwise [14:46:47] going to remove it...hope no one gets mad! [15:01:01] a-team...standup? [15:01:10] oh i have 2 events on my cal [15:01:34] yea today's at 6 😑 [15:01:46] i mean in one hour [15:02:27] yes [15:02:34] ottomata: in 1 hour [15:03:56] weird [15:11:15] 10Analytics, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10elukey) @Gehel yes we are in the same position, namely we'll refresh our cluster during next fiscal, but basically a year from now (IIRC the same for maps) so it would make sense to star... [15:20:05] FYI I am about to roll restart the Hadoop masters for openjdk upgrades [15:20:54] nuria: it's in 10min according to the cal [15:21:12] fdans: yes [15:21:55] fdans: today manager's meeting was shorten and i thought earlier rather than later would be better cc a-team [15:22:18] ok in 8 minis then? [15:22:39] 10Analytics, 10Performance-Team, 10Readers-Web-Backlog (Tracking): Review referer configuration of origin/origin-when-crossorigin/origin-when-cross-origin - https://phabricator.wikimedia.org/T248526 (10Nuria) Sorry, I totally missed this ping. We get about around 1 million pageviews of safari (browser major... [15:22:40] yesh [15:24:59] 10Analytics, 10Product-Analytics: [Spike] Should EventLogging support DNT? - https://phabricator.wikimedia.org/T252438 (10jlinehan) **tl;dr: I think @kaldari gave one sound argument and two unsound arguments for not handling DNT, but you only need one sound argument. I agree that DNT handling should be disable... [15:28:06] !log restart hadoop master daemons on an-master100[1,2] for openjdk upgrades [15:28:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:30:54] a-team: standup! [15:40:25] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog (Kanban): EventLogging Server Side client should POST to EventGate - https://phabricator.wikimedia.org/T253121 (10LGoto) p:05Triage→03Medium [15:40:53] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 3 others: Set up an instance of EventStreams in beta that will allow for consuming any stream - https://phabricator.wikimedia.org/T253069 (10LGoto) p:05Triage→03Medium [15:56:44] 10Analytics, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) [16:01:48] 10Analytics, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) Thanks for creating the task @elukey, working together on this SGTM I added a step to the description to help avoid introducing duplicate metrics while both prod and buster... [16:05:57] 10Analytics, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10elukey) Thanks @herron! I had a chat with Cole on IRC to establish a clear ownership for these hosts, I am wondering if Observability could be a better owner than Analytics nowadays? [16:49:22] 10Analytics, 10Analytics-SWAP: Jupyter Notebooks TLC 2018-2019 - https://phabricator.wikimedia.org/T188275 (10Nuria) [16:54:10] hehe wrong task nuria ^ ! [16:54:27] ottomata: for teh design document? [16:54:32] ya you want [16:54:33] https://phabricator.wikimedia.org/T224658 [16:54:40] 2018 is a long time ago! [16:55:22] ottomata: ah this one: https://phabricator.wikimedia.org/T224658 [16:55:29] yup [16:55:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Nuria) [16:56:00] ottomata: corrected now [16:56:06] ty! [17:05:49] one thing that I am wondering is if an-master100[1,2] may need to get 64G of ram each [17:05:58] I just asked to dcops and it seems possible [17:06:31] I am seeing that the hdfs namenode heap (20G) is becoming a little bit full, probably we should jump to something like 32G [17:06:46] that it is still possible now, but it would reduce a lot the page cache [17:07:28] since the number of files will keep growing probably, having 128G of ram each should be good in the long term [17:08:18] especially due to the fact that a lot of people say that after the 32G mark jvm obj pointers/references switch to 64bits, and they take more memory [17:08:58] based on some calculations that I've read, it is adviced to jump from 32G to 40-something, to be worth it [17:09:20] I am not sure if we'll need to, but in that case 64G of ram will be dew [17:09:23] *few [17:10:30] if nobody disagrees I'll file a task to see if we can bump the ram [17:11:04] elukey: sounds good, adding RAM shouldn't be too expensive and easy enough tto do [17:12:45] ack :) [17:13:23] going off, ttl! o/ [17:25:21] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Nuria) Agreed, agreed, my note to self was rather whether we needed * a different* method for alarming entirely but * i think* that with a strategy like the one we used for not refined h... [17:38:16] 10Analytics, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10RLazarus) p:05Triage→03Medium a:03herron [17:44:18] (03CR) 10Nuria: [C: 03+1] "tested and looks good, I leave @mforns to confirm whether his comments have been addressed" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/594194 (https://phabricator.wikimedia.org/T213800) (owner: 10Fdans) [17:46:09] (03CR) 10Mforns: [C: 03+1] "Hey, sorry, I missed Fran's reply, LGTM!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/594194 (https://phabricator.wikimedia.org/T213800) (owner: 10Fdans) [17:55:42] (03CR) 10Nuria: [C: 03+2] Change "Active Editors" to registered user editors only [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/594194 (https://phabricator.wikimedia.org/T213800) (owner: 10Fdans) [17:56:46] 10Analytics, 10Analytics-Kanban: Language selector is not pressable in mobile site - https://phabricator.wikimedia.org/T246971 (10Nuria) @fdans I think you need to resolve conflicts before it can be CR-ed and tested [18:02:35] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Ottomata) Thoughts: Could we re-use some of the EventGate kubernetes readinessProbe logic for this? EventGate in k8s is configured with a readinessProbe, which is just a command that k8... [18:04:54] milimetric: joal dunno what you guys are up to but maybe you wanna brain bounce ingestion woes with me? [19:33:56] 10Analytics, 10MediaWiki-extensions-WikimediaEvents: PrefUpdate schema no longer receiving (most) events - https://phabricator.wikimedia.org/T253151 (10Catrope) [19:34:09] 10Analytics, 10MediaWiki-extensions-WikimediaEvents: PrefUpdate schema no longer receiving (most) events - https://phabricator.wikimedia.org/T253151 (10Catrope) [19:40:09] heya ottomata - sorry I missed the ping - Would tomorrow 3h before standup work for you? [19:40:29] phew ha no too early for me [19:40:41] s'ok joal let's talk tomorrow as team and we'll schedule some time [19:40:54] ack ottomata [19:41:21] * joal sends a polo ball and a bike to ottomata to change his ideas [19:41:35] :) [20:06:05] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: eventgate-wikimedia should expose runtime stream configuration - https://phabricator.wikimedia.org/T253157 (10Ottomata) [20:24:54] 10Analytics: MEP: canary events so we know events are flowing through pipeline - https://phabricator.wikimedia.org/T250844 (10Nuria) >Basically if there is not a specific stream name defined in wgEventStreams or in our static list, we cannot generate a canary event for it. that seems fair, also, the means by wh... [20:49:59] 10Analytics, 10Product-Analytics, 10Core Platform Team Workboards (Clinic Duty Team): Update mediawiki_user_blocks_change to log partial block parameters - https://phabricator.wikimedia.org/T252455 (10BPirkle) [20:59:31] airflow refine vvvorks!!! \\o// [20:59:33] * mforns cries [21:45:30] mforns: WHAA WOW! [21:45:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Automate ingestion and refinement into Hive of event data from Kafka - https://phabricator.wikimedia.org/T251609 (10Ottomata) Ok, new idea. If we do {T253157}, I think we can use that to solve both this and... [21:46:37] congrats mforns excited to hear more about that! [21:47:48] hey ottomata :] I will push the code to Gerrit, for you guys to see. I thinks there's still a couple functionalities that are not covered, like the delay of a couple hours before refine starts to work [21:51:53] 10Analytics, 10Performance-Team, 10Readers-Web-Backlog (Tracking): Review referer configuration of origin/origin-when-crossorigin/origin-when-cross-origin - https://phabricator.wikimedia.org/T248526 (10Krinkle) @thedj Does that mean we need one for new and one of old Safari? If so, there's presumably one we... [21:53:09] 10Analytics, 10Product-Analytics (Kanban): Create Druid tables for Druid datasources in Superset - https://phabricator.wikimedia.org/T251857 (10cchen) @fdans this `eventCount` is a column that should have been ingested as part of the job. This column is in the Druid datasource, but when mapping the datasource... [22:03:21] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: PrefUpdate schema no longer receiving (most) events - https://phabricator.wikimedia.org/T253151 (10kzimmerman) [23:31:53] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10Product-Analytics: PrefUpdate schema no longer receiving (most) events - https://phabricator.wikimedia.org/T253151 (10nettrom_WMF) Can confirm that only a limited amount of data starts flowing in on 2019-05-11. Per the two queries below, I find the drop...