[00:45:48] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:14:04] good morning [07:35:19] 10Analytics, 10SRE: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10elukey) WOW [08:07:51] 10Analytics, 10Machine-Learning-Team, 10SRE: Kubeflow on stat machines - https://phabricator.wikimedia.org/T275551 (10elukey) p:05Triage→03Medium [08:22:16] 10Analytics-Clusters: Decommisison the Hadoop backup cluster and add the worker nodes to the main Hadoop cluster - https://phabricator.wikimedia.org/T274795 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1132.eqiad.wmnet', 'an-worker1135.eqi... [08:40:49] * elukey bbiab [08:51:43] 10Analytics-Clusters: Decommisison the Hadoop backup cluster and add the worker nodes to the main Hadoop cluster - https://phabricator.wikimedia.org/T274795 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1132.eqiad.wmnet', 'an-worker1135.eqiad.wmnet', 'an-worker1136.eqiad.wmnet'] ` and wer... [09:11:57] 10Analytics-Clusters: Decommisison the Hadoop backup cluster and add the worker nodes to the main Hadoop cluster - https://phabricator.wikimedia.org/T274795 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1137.eqiad.wmnet', 'an-worker1138.eqi... [09:15:37] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), 10Patch-For-Review, and 3 others: Compensate for sampling - https://phabricator.wikimedia.org/T273454 (10Lena_WMDE) [09:29:46] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10awight) [09:29:55] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10awight) [09:30:20] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10awight) [09:30:38] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10awight) [09:30:59] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish-Sprint-2021-03-03: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10Lena_WMDE) [09:36:54] 10Analytics-Clusters: Decommisison the Hadoop backup cluster and add the worker nodes to the main Hadoop cluster - https://phabricator.wikimedia.org/T274795 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1137.eqiad.wmnet', 'an-worker1138.eqiad.wmnet'] ` and were **ALL** successful. [09:51:11] mediawiki_history for 2021-02 now available [09:51:13] niceeeee [09:58:20] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10elukey) @razzi some suggestions for the host rename plan: - instead of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/661529, let's first use `... [10:01:03] \o/ [10:07:59] joal: bonjour, ready to add 5 new workers :) [10:08:06] \o/ [10:08:10] all good for me [10:09:09] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) We could also do the transfer with the insetup role most likey (unless we have some specific FW rules that are not in `insetup` and are on its fin... [10:16:52] !log add an-worker113[2,5-8] to the Analytics Hadoop cluster [10:16:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:20:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clean up issues with jobs after Hadoop Upgrade - https://phabricator.wikimedia.org/T274322 (10elukey) [10:22:19] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish-Sprint-2021-03-03: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10phuedx) Thanks for writing this up, @awight. Do note that t... [10:26:12] 10Analytics-Radar, 10WMDE-TechWish: Filter out bot traffic for all of our metrics - https://phabricator.wikimedia.org/T276308 (10awight) [10:32:33] joal: new workers added! [10:32:40] Yeah!! [10:47:12] (03PS1) 10Awight: Filter bot traffic out of metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668032 (https://phabricator.wikimedia.org/T276308) [10:47:48] 10Analytics-Radar, 10Patch-For-Review, 10Unplanned-Sprint-Work, 10WMDE-TechWish-Sprint-2021-03-03: Filter out bot traffic for all of our metrics - https://phabricator.wikimedia.org/T276308 (10awight) [10:49:42] (03PS2) 10Awight: Filter bot traffic out of metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668032 (https://phabricator.wikimedia.org/T276308) [11:06:50] 10Analytics-Radar, 10Book-Referencing, 10Page-Previews, 10Reference Previews, 10WMDE-TechWish-Sprint-2021-03-03: Review Popups metrics deprecation and coordinate according to WMDE Tech Wishes needs - https://phabricator.wikimedia.org/T276304 (10awight) 05Open→03Invalid @phuedx After squinting for a w... [11:20:55] (03PS1) 10Awight: Fix wiring to metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668039 (https://phabricator.wikimedia.org/T271902) [11:21:24] 10Analytics-Clusters, 10Patch-For-Review: Decommisison the Hadoop backup cluster and add the worker nodes to the main Hadoop cluster - https://phabricator.wikimedia.org/T274795 (10elukey) 05Open→03Stalled Last 6 nodes to rack, stalled until T276239 is solved :) [11:22:05] * elukey lunch! afk [11:32:52] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Filter bot traffic out of metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668032 (https://phabricator.wikimedia.org/T276308) (owner: 10Awight) [12:02:47] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10phuedx) a:05phuedx→03None [12:35:25] hello teammm :] [12:36:44] looking at alarms [13:13:12] joal, I can not find mediawiki/wikitext jobs in hue, should I start them? [13:22:19] *without killing any old ones? [13:24:48] Hi mforns [13:24:55] let me check [13:25:51] mforns: there is the mediawiki-wikitext-current one running - I must have killed the mediawiki-wikitext-history one [13:28:07] mforns: mediawiki-wikitext-current-coord should be restarted with to-run date (2021-02-01), and mediawiki-wikitext-history-coord should be restarted with date 2020-12-01 - please :) [13:32:55] wow joal, the current one is invisible to me in hue-next [13:33:03] meh? [13:33:13] not cool :( [13:33:19] mforns: pagination maybe? [13:33:26] or else, I'm missing something [13:33:36] I've paginated to the end, and not found [13:33:50] the filter box doesn't work for me [13:34:15] yeah the filter box fails with our new version [13:34:30] mforns: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0021815-201202074829419-oozie-oozi-C [13:36:44] joal: I can see the job in hue, but not in hue-next [13:40:57] mforns: I found it in the list in wikitext next, but the "search in page" function was not pointing to it. SO WEIRD!! [13:41:35] Actually it has been, after a reload - very weird indeed :( [13:45:33] 10Analytics, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure, 10MW-1.36-notes (1.36.0-wmf.32; 2021-02-23): [MEP] [BUG] Timestamp format changed in migrated server-side EventLogging schemas - https://phabricator.wikimedia.org/T276235 (10mpopov) 05Open→03Resolved a:03mpopov Thank... [13:47:20] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10elukey) I finally figure out where the archive data comes from, namely stat1006: ` elukey@stat1006:/srv/published/datasets/archive/public-datasets$ du -... [14:01:05] mforns: o/ I am going to stop and reimage some hadoop workers, ok for you? It might lead to some failures in alerts@ [14:10:58] elukey: sure no problemo [14:16:15] elukey: heya - I'm ready to help with the backup stuff, but I need some more explanation please :) [14:17:10] joal: sure! So my understanding is that the files are on stat1006, and gets rsynced to thorium since they are under /srv/published [14:17:35] and then there are hardlinks to make the directories uniform, not per host [14:17:47] this is why we ended up only removing hard links [14:17:57] ah - I think I get it [14:18:19] So what ou expect me to do is to check data on stat1006, to make sure it is coherent with what we had on thorium [14:18:24] and hdfs, obviously [14:18:45] joal: if you have time yes, just to double check my assumptions - a quick sanity check I mean, not a super in depth one [14:20:16] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1099.eqiad.wmnet', 'an-worker1100.eqiad.wmnet', 'an-worker1101.eqiad.wmnet... [14:20:49] !log reimage an-worker1099,1100,1101 (GPU worker nodes) to Debian Buster [14:20:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:21:49] ack elukey - doing that [14:22:52] <3 [14:35:16] PROBLEM - HDFS missing blocks on an-master1001 is CRITICAL: 445 ge 5 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_missing_blocks https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=40&fullscreen [14:36:05] * elukey sigh [14:36:35] I am so lucky that there are blocks being only on the three workers that I am reimaging, of course [14:37:03] the blocks will be up again in a few, the reimages are in progress [14:37:15] really sorry, new use case to add to the checklist [14:41:02] I'm getting flashbacks to repairing Colossus cells after power outages, Storage SRE going white in the face because of missing stripes. [14:42:38] klausman: no why did you give me these memories back [14:42:46] ahahahah [14:43:49] Shared pain is Schadenfreude. Or something. [14:44:48] ottomata: do you know when you'll have time to move T273901 forward? This is going to be on the critical path for deploying our new updater to production. [14:44:50] T273901: Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 [14:48:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10nshahquinn-wmf) >>! In T224658#6876213, @Ottomata wrote: >> This was in the notebook terminal. I just tried in a regular SSH terminal, and I don't get these errors there. > > A... [14:48:23] elukey: I guess that choosing hosts in the same rack would help :) [14:49:18] ah gehel hm [14:49:24] it probably isn't that hard to do quickly [14:49:43] if a priority we can make that happen by just scheduling a new camus job that matches those streams [14:49:55] ottomata: that would be great! [14:50:46] 10Analytics, 10SRE: Consider Julie for managing Kafka settings, perhaps even integrating with Event Stream Config - https://phabricator.wikimedia.org/T276088 (10Ottomata) I know, "only NINETY NINE CENTS...WOW" right?! [14:50:52] joal: yes I know, I didn't think about it :) [14:50:55] elukey: data in stat1006:/srv/published/datasets/archive/public-datasets is actually rsynced to the internetz (see https://analytics.wikimedia.org/published/datasets/archive/) [14:51:24] joal: yes [14:51:42] elukey: and we wish to delete it from its source [14:52:14] elukey: shouldn't we only drop the links on theorium? [14:52:25] or maybe there is something I don't understand [14:52:32] ottomata: added a comment on the task... [14:52:56] joal: if we don't drop on stat1006, they'll get rsynced again.. we can move them to a different directory in case, but not sure if it is worth it [14:52:58] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service: Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10Gehel) @Ottomata this is going to block the deployment of the WDQS Flink based Streaming Updater. Any chance you... [14:53:52] elukey: shouldn't we prevent the rsync to happen if this what we wish? [14:53:56] * joal don't get it :( [14:54:53] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10Gehel) [14:55:32] joal: the stat100x hosts are configured to rsync anything under /srv/published to thorium, and then there is a script in there to unify all the dirs "hiding" the fact that they come from different nodes [14:55:33] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10MPhamWMF) For additional context, we are planning on deploying the new Stre... [14:55:56] and thorium is the host presenting to the internetz [14:55:58] elukey: -^ [14:56:08] exactly yes [14:56:26] elukey: so by removing the data, we'll not be presenting it anymore [14:56:33] correct [14:56:35] (I'm trying to backprocess the info) [14:56:52] And this is a known fact- ok - I had it backward [14:57:15] The point is to remove 'old' data we served [14:57:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) > Essentially—I was thinking about removing the base environment option from the UI altogether. Hm, I suppose we could do that. I left it in because it is much fast... [14:58:07] that in turn will allow us to order a new thorium with less disk space available, since the current one is capable of hosting TBs of data and we use only 1TB (and most of the data is not needed IIUC) [14:58:22] right [14:58:26] joal: I'll pause now, let's discuss with the team [14:58:44] elukey: sure no prob - I'm sorry I'm getting the actual point late :S [14:59:28] joal: nono sorry to you, it is a complicated thing that I re-discovered again today, should have been more clear - and more questions == more controls before dropping data [14:59:39] there is no rush :) [14:59:48] once we are all 100% onboard we pull the trigger [15:00:07] elukey: we have this data backed up so no big deal, but I wanted to be sure I had a correct understanding (I thought we were only backuping stuff not used) [15:00:35] and actually, 'not used' in my mind was not equivalent to 'served on the internetz' [15:00:49] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1099.eqiad.wmnet', 'an-worker1100.eqiad.wmnet', 'an-worker1101.eqiad.wmnet'] ` and were **ALL** successful. [15:01:41] RECOVERY - HDFS missing blocks on an-master1001 is OK: (C)5 ge (W)2 ge 0 https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23HDFS_missing_blocks https://grafana.wikimedia.org/dashboard/db/hadoop?var-hadoop_cluster=analytics-hadoop&orgId=1&panelId=40&fullscreen [15:02:02] 10Analytics, 10Analytics-Kanban: Upgrade UA Parser to 1.5.1+ - https://phabricator.wikimedia.org/T272926 (10Milimetric) verified the new version of UA Parser is processing data as of the restart this morning, 2021-03-03T13:00 has the new data. [15:11:14] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10Milimetric) Sorry to be so late looking at this. The newest of these files are 2.5 years old and besides us absolutely zero people know they exist (they... [15:16:20] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10JAllemandou) Ok let's be bold: let; remove it from thorium, and if it misses someone, we'll know about it. It is backed up on HDFS in any case. [15:17:06] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10elukey) @JAllemandou @Milimetric ok to also drop it from stat1006? [15:23:53] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10Milimetric) Ok to drop from everywhere in my opinion :P but definitely from everywhere not hdfs, including stat1006 [15:23:59] joal: yt? [15:25:24] or milimetric ? [15:25:32] here ottomata [15:25:33] what's up [15:25:42] quick brain bounce about https://phabricator.wikimedia.org/T273901 [15:25:48] asked you because you are thinking about gobblin [15:28:26] ottomata: I'm about to leave for kids - will be back after standup [15:35:34] (03PS1) 10Mforns: Add backticks to reserved word date in geoeditors monthly job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668111 (https://phabricator.wikimedia.org/T274322) [15:36:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clean up issues with jobs after Hadoop Upgrade - https://phabricator.wikimedia.org/T274322 (10mforns) [15:38:18] (03CR) 10Mforns: [V: 03+2] "Tested that the query works like this." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/668111 (https://phabricator.wikimedia.org/T274322) (owner: 10Mforns) [15:38:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10nshahquinn-wmf) >>! In T224658#6878793, @Ottomata wrote: >> It really simplifies things when the terminals behave identically. > In this case, the terminals do! You have to set... [15:42:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clean up issues with jobs after Hadoop Upgrade - https://phabricator.wikimedia.org/T274322 (10Milimetric) [15:47:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) A good question, I don't think SRE would want us to? But we do set it automatically in Juypter...so maybe? For me, I wouldn't want that. I wouldn't want my work to... [15:53:20] 10Analytics, 10SRE, 10Wikidata, 10Wikidata-Query-Service, 10wdwb-tech-focus: Deployment strategy and hardware requirement for new Flink based WDQS updater - https://phabricator.wikimedia.org/T247058 (10Milimetric) This is a bit of a drive-by, but have we considered https://min.io/? I went a bit deeper t... [16:16:21] milimetric: can you review https://gerrit.wikimedia.org/r/c/analytics/refinery/+/668111, I'll try to unbreak the prod job by deploying this. [16:27:08] hey mforns - how have ou deployed manually to HDFS? [16:34:40] * elukey bbiab! [16:51:17] a-team we probably want to attend the monthly tech meeting so let's cancel today's standup [16:51:23] +1 fdans [16:51:30] k [16:51:58] 10Analytics, 10Event-Platform, 10Wikidata, 10Wikidata-Query-Service, and 2 others: Automate event stream ingestion into HDFS for streams that don't use EventGate - https://phabricator.wikimedia.org/T273901 (10Ottomata) Ok here's the idea: I add a new EventStreamConfig settings block called `consumers`, wh... [16:53:44] mforns: Just wondering if "always run as funnel" was deployed to production? If so, I'll remove the field from queries. [16:58:18] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: Define event stream configuration syntax - https://phabricator.wikimedia.org/T273235 (10Ottomata) Perhaps it would be better to add top level settings specific to different producers and consumers, as described in https:... [17:05:52] joal: ok if we run the cookbook now for aqs? Will need somebody to test aqs1004 :) [17:05:55] otherwise later [17:05:59] !log rebalance kafka partitions for webrequest_upload partition 8 [17:06:05] I'm ready elukey [17:06:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:06:13] thanks elukey :) [17:06:36] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:09:00] joal: aqs1004 ready [17:09:06] ack, checking [17:09:07] 10Analytics-Clusters, 10Analytics-Kanban: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10razzi) [17:09:12] really interested to see the historical cache metrics [17:09:43] elukey: ready! [17:10:03] joal: green light to proceed with the rest? [17:10:16] yessir - sorry, I should have been more explicit elukey :) [17:10:21] super :) [17:10:46] !log update druid datasource on aqs (roll restart of aqs on aqs100*) [17:10:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:13:13] done! [17:14:04] awesome - thanks elukey [17:15:09] * elukey watches druid metrics [17:15:14] * joal too [17:15:21] that are good for the moment, some latency increases but expected [17:15:25] yup [17:15:42] elukey: do you see my cache warmup queries? [17:16:18] nice :D [17:16:44] elukey: they're definitely not spread enough, but at least there are some :) [17:17:08] joal: I am super eager to see the datasource drop on the 15th [17:17:24] so am I elukey :) [17:18:43] 10Analytics, 10Analytics-Kanban: Check data currently stored on thorium and drop what it is not needed anymore - https://phabricator.wikimedia.org/T265971 (10JAllemandou) +1, let'd drop from machines (we have HDFS backup). [17:20:22] mforns: elukey: We're a bit blocked ("medium" priority, not UBN) by the reportupdater logging issue. I have two proposed solutions, whenever you have time to take a look: https://gerrit.wikimedia.org/r/c/analytics/reportupdater/+/667192 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/666948 [17:23:48] awight: I'll let Marcel to comment, side note about a hive bug found today: https://phabricator.wikimedia.org/T276121 (for your info, not related to the code reviews) [17:23:53] (might be interesting) [17:33:36] elukey: +1 that looks like the same issue as T275757 [17:33:37] T275757: Reportupdater output can be corrupted by hive logging - https://phabricator.wikimedia.org/T275757 [17:33:58] The puppet change will in theory fix it... [17:39:18] awight: ah snap! In theory we could apply that patch globally so we don't need workarounds, would you be open for it? ETA say a couple of days [17:43:56] also mforns --^ [17:56:24] elukey: I haven't tested the patch but am certain that in my case at least, this is the issue. Random hadoop.parquet logging to stdout. [17:57:16] I'm not totally sure how to trigger that logging on purpose, but my queries fail 100% of the time so I suppose they would make a decent smoke test. [17:58:48] (03PS7) 10Milimetric: WIP: Add daily referrers Hive table and Oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) (owner: 10Bmansurov) [17:59:01] awight: ack ack! Then proceed, I'll leave Marcel to review :) [18:06:19] heya ottomata - do you still wish to talk gobblin? [18:06:54] joal: s'ok talked to milimetric and came up with https://phabricator.wikimedia.org/T273901#6879350 [18:07:18] but since realized that i don't have to do that now to get that the rdf_streaming_updater streams ingested, as search needs that now [18:07:42] so i think we can have more discussion about it at https://phabricator.wikimedia.org/T273235#6879386 [18:08:09] joal: do you think something like that would work with gobblin to discover topics to ingest? [18:09:13] ottomata: We can make it work - I think we're pushing job that would normally belong on data-gov-tech onto something else [18:10:12] ottomata: for Gobblin, scaling would be an interesting discussion - which topic to be used as single jobs (if any), how many mappers etc [18:12:48] joal: so, we do that for camus already, we have individual camus jobs for different 'volumes' sort of [18:12:52] ottomata: Having a way to tell gobblin to copy a stream from kafka to HDFS, with some settings (path, etc) is what we're after [18:12:55] is that how gobblin would work too? [18:13:23] ottomata: there are multiple ways to sort the scalability problem: multiple jobs, bigger jobs [18:13:40] hmm, the reason we have multiple jobs is to avoid smaller topic starvation [18:13:48] ottomata: I like the multiple jobs approach as it allows for easier rerun of dedicated portions [18:14:02] indeed, and we'll probably need multiple jobs anyway, for various reasons [18:14:20] yup, topic startvation is a good one (even if gobblin should be better than camus in that regard_) [18:14:29] so, that proposal is to have a stream config that indicates which job a stream should be ingested by [18:14:57] we do that now with camus, but the config setting we use is not specific for our purposes, i'm reusing the destination_event_service one [18:15:11] which is hacky and doesn't work forr streams that don't happen to use eventgate [18:15:19] so for automation, we'd need an ingestion-manager, getting data from 'something', and launching jobs as needed, whose config would be specific [18:15:30] aye [18:16:11] anyhow - short term solution is to add enough info on the 'consumers' block [18:16:49] We'll be able to get gobblin get it's sources from code we write [18:16:54] ottomata: -^ [18:18:32] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10Jdlrobson) @krinkle is this something you feel strongly about... [18:21:44] Gone for diner [18:26:28] joal: great, i think so too [18:26:28] ty [18:36:21] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:41:23] (03PS8) 10Milimetric: Add daily referrers Hive table and Oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) (owner: 10Bmansurov) [18:42:09] (03CR) 10Milimetric: [C: 04-1] "pausing for just a few hours on this, will test soon" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) (owner: 10Bmansurov) [18:49:50] (03PS7) 10Mforns: Add oozie job for session length computation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/664885 (https://phabricator.wikimedia.org/T273116) [18:49:59] (03CR) 10Mforns: Add oozie job for session length computation (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/664885 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns) [18:51:29] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: MEP: Should stream configurations be written in YAML? - https://phabricator.wikimedia.org/T269774 (10jlinehan) [18:52:29] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [Metrics Platform] Specify stream configuration syntax relevant to Metrics Platform - https://phabricator.wikimedia.org/T269774 (10jlinehan) [18:53:14] joal: I modified the session length job as per your comments, but some of them have no proper solution now... LMK whatcha think! [18:53:46] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [MEP] Determine how stream configuration is authored and deployed - https://phabricator.wikimedia.org/T269774 (10jlinehan) [18:54:37] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: [Metrics Platform] Define stream configuration syntax relevant to v1 release - https://phabricator.wikimedia.org/T273235 (10jlinehan) [18:54:51] 10Analytics, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure: [Metrics Platform] Define stream configuration syntax relevant to v1 release - https://phabricator.wikimedia.org/T273235 (10jlinehan) [19:01:18] * elukey afk! [19:33:37] * razzi afk for lunch and an errand [19:34:04] (03PS9) 10Milimetric: Add daily referrers Hive table and Oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/655804 (https://phabricator.wikimedia.org/T270140) (owner: 10Bmansurov) [19:54:29] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, and 2 others: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10Ottomata) [19:54:53] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, and 2 others: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10Ottomata) [19:54:59] 10Analytics, 10Analytics-EventLogging, 10Community-Tech, 10Event-Platform, and 4 others: CodeMirrorUsage Event Platform Migration - https://phabricator.wikimedia.org/T275005 (10Ottomata) [19:55:04] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: ReferencePreviewsBaseline Event Platform Migration - https://phabricator.wikimedia.org/T275007 (10Ottomata) [19:55:07] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: ReferencePreviewsCite Event Platform Migration - https://phabricator.wikimedia.org/T275008 (10Ottomata) [19:55:15] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: ReferencePreviewsPopups Event Platform Migration - https://phabricator.wikimedia.org/T275009 (10Ottomata) [19:55:24] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TemplateDataApi Event Platform Migration - https://phabricator.wikimedia.org/T275011 (10Ottomata) [19:55:28] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: TemplateDataEditor Event Platform Migration - https://phabricator.wikimedia.org/T275012 (10Ottomata) [19:55:34] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10Patch-For-Review: TwoColConflictConflict Event Platform Migration - https://phabricator.wikimedia.org/T275013 (10Ottomata) [19:55:40] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: TwoColConflictExit Event Platform Migration - https://phabricator.wikimedia.org/T275014 (10Ottomata) [19:55:47] 10Analytics, 10Event-Platform, 10WMDE-TechWish, 10MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), 10Patch-For-Review: VisualEditorTemplateDialogUse Event Platform Migration - https://phabricator.wikimedia.org/T275015 (10Ottomata) [21:14:29] hey ottomata do you have a few minutes? [21:36:01] cdanis: kinda async but what's up!? [21:36:26] ottomata: re: https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/667948/1/config.yaml#102 -- How are you imagining including HTTP response headers? Are you talking about the response headers from eventgate itself? [21:36:35] I feel like there's kind of a chicken/egg problem there if so [21:41:47] (03CR) 10Nettrom: [C: 03+2] Update schema to handle quickview playback events [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/663703 (https://phabricator.wikimedia.org/T263154) (owner: 10Eric Gardner) [21:45:04] qq: During AQS testing, I discovered a minor error in my hive query in this CR: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/654924 . Right now, that CR is merged but not deployed. How should I go about fixing this? Pull the CR changes, make the modification, and make new CR? or just wait until deployment, then make the correction? [21:46:16] cdanis: hm good q ya maybe you are right, we have to field to support storing response_headers, so I'd prefer if we just made it explicit what was being done with the config [21:46:18] maybe [21:46:22] just calling the config [21:46:31] happy to call it http_request_headers_to_fields or something [21:46:41] http_request_headers_to_fields is ok [21:46:42] yeah [21:46:46] that sounds fine [21:50:35] ok! [21:58:49] 10Analytics-Radar, 10MobileFrontend, 10XAnalytics, 10Readers-Web-Backlog (Tracking): MobileFrontend should use XAnalytics extension - https://phabricator.wikimedia.org/T217859 (10Jdlrobson) [22:18:54] lexnasser: no need to wait for deployment; if you make a new CR and it's merged, that one will be deployed [22:24:33] razzi: thanks for the response! the file in question was created in the merged CR that I linked, but that file addition hasn't shown up in the AQS repo yet. And so I'm thinking that, if I pull the changes from the CR itself and make the correction, then that new change will duplicate the changes from the original CR [22:25:42] so I could make a new CR on top of this merged CR, but the changes from this merged CR have not yet manifested for me to modify on top of it yet [22:26:01] *manifested in the repo [22:34:32] lexnasser: hmm, I don't understand how a file from refinery would end up in the AQS repo (is that https://gerrit.wikimedia.org/r/admin/repos/analytics/aqs?) [22:35:00] oops, I meant refinery repo. all else the same [22:35:41] oops wait I missed that [22:36:02] yeah actually was just me being stupid, thanks for your help though! [22:36:20] no worries, glad you were able to sort it out :)