[00:38:12] 10Analytics: Many new metrics in Graphite for WDQS-Streaming-Updater-POC - https://phabricator.wikimedia.org/T255044 (10colewhite) From discussion in -analytics, @dcausse indicated that they are safe to remove. [00:53:27] PROBLEM - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:54:58] RECOVERY - Check the last execution of monitor_refine_eventlogging_analytics_failure_flags on an-launcher1001 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:19:02] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Product-Analytics, 10Epic: Spark sessions can provision kerberos tickets in a more predictable manner - https://phabricator.wikimedia.org/T246132 (10elukey) 05Open→03Resolved Let's discuss the killing notebooks part in Newpyter (as it is already... [06:19:04] 10Analytics-Radar, 10wmfdata-python, 10Epic, 10Product-Analytics (Kanban): Analysts cannot reliably use wmfdata to run SQL queries against Hive databases - https://phabricator.wikimedia.org/T245891 (10elukey) [06:19:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10elukey) [06:21:55] 10Analytics, 10Analytics-Cluster: Hadoop jobs that generate large temporary files can take down nodes - https://phabricator.wikimedia.org/T187139 (10elukey) 05Open→03Resolved a:03elukey We do have now containers running in Yarn, not sure if it would be possible to cause another issue like this one but ne... [06:22:40] 10Analytics, 10Analytics-Cluster: Prevent notebooks on spark to launch 2 pyspark instances instead of 1 - https://phabricator.wikimedia.org/T152522 (10elukey) Is it something still valid or shall we close? :) [06:23:42] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) 05Open→03Resolved a:03Marostegui Analytics role is now split between two hosts: replication lag and InnoDB purge lag are under control now. This should have improved Quarry'... [06:25:14] 10Analytics-Radar: Add CI job Oozie XML stylesheet validation for the analytics/refinery repository - https://phabricator.wikimedia.org/T147072 (10elukey) [06:25:40] 10Quarry, 10Data-Services, 10cloud-services-team (Kanban): Quarry or the Analytics wikireplicas role creates lots of InnoDB Purge Lag - https://phabricator.wikimedia.org/T251719 (10Marostegui) So far, having placed 2 hosts (1010 and 1011) on Analytics role seems to be keeping the InnoDB purge lag (as well as... [07:12:11] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['dbstore1003.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto... [07:30:44] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbstore1003.eqiad.wmnet'] ` and were **ALL** successful. [07:42:49] 10Quarry, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) 05Open→03Resolved a:03Marostegui We have split the analytics role between two hosts, which looks like it is helping with the load and having the replication lag, as w... [07:44:49] 10Analytics, 10DBA: Upgrade analytics dbstore databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254870 (10Marostegui) [07:45:15] 10Quarry, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Mike_Peel) I am still getting the same error at https://quarry.wmflabs.org/query/40539 . Trying on toolforge now... [08:35:39] 10Quarry, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Marostegui) >>! In T246970#6213872, @Mike_Peel wrote: > I am still getting the same error at https://quarry.wmflabs.org/query/40539 . Trying on toolforge now... I ran that query on an... [08:47:20] 10Quarry, 10Data-Services: Quarry: Lost connection to MySQL server during query - https://phabricator.wikimedia.org/T246970 (10Mike_Peel) >>! In T246970#6213955, @Marostegui wrote: > I ran that query on an **idle** host and it took 25 minutes, which is right on the edge of the 30 minutes killer Quarry has, so... [09:27:16] I am about to move matomo to the new buster backend [09:49:05] migrated! [10:21:05] 10Analytics, 10Analytics-Cluster: Upgrade Kafka Brokers to Debian Buster - https://phabricator.wikimedia.org/T255123 (10elukey) [10:22:16] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, 10User-Elukey: replace onboard NIC in kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T236327 (10elukey) Thanks a lot! We don't have a lot of time during these days, would it be ok to schedule something early next Q? (In July I mean) [10:33:20] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10elukey) [10:37:50] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Move Matomo to Debian Buster - https://phabricator.wikimedia.org/T252740 (10elukey) piwik.wikimedia.org points to matomo1002 now, everything looks good. I'll leave the two VMs running for a couple of days in case a quick rollback is needed, and then I'll... [10:38:18] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Move Matomo to Debian Buster - https://phabricator.wikimedia.org/T252740 (10elukey) p:05Triage→03Medium a:03elukey [11:32:39] * elukey lunch! [12:33:05] 10Analytics, 10Analytics-Cluster: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10elukey) This needs to be coordinated with our users since it will be invasive, things like jupyter venvs will need to be re-created etc.. [12:46:23] elukey: i see notifications are disabled for analytics1028 [12:46:37] can i ackknowlage the current alert if so what message? [12:46:56] current alert is Yarn Nodemanagers in unhealthy status, result NaN [12:56:28] jbond42: yes you can definitely ack those, it is a test cluster but NaN is a little bit weird [12:57:54] thanks, i didn;t look at the alert as figuered it was probably known [13:01:31] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10MBH) @Marostegui it isn't the original query. Recently I optimized it by replacing views "revision" => "revision_userindex" and "actor" => "actor_revision". Before that it was killed after 30... [13:04:06] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) >>! In T247978#6215006, @MBH wrote: > @Marostegui it isn't the original query. Recently I optimized it by replacing views "revision" => "revision_userindex" and "actor" => "actor_... [13:05:09] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Product-Analytics (Kanban): Creation of a new POSIX group and system user for the Product Analytics team - https://phabricator.wikimedia.org/T255039 (10jbond) p:05Triage→03Medium [13:09:13] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10MBH) @Marostegui But query https://quarry.wmflabs.org/query/12570 now contains another text, that it was contain on March 18, date of creation of this ticket. How you know what text was conta... [13:13:06] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) >>! In T247978#6215032, @MBH wrote: > @Marostegui But query https://quarry.wmflabs.org/query/12570 now contains another text, that it was contain on March 18, date of creation of... [13:31:07] 10Analytics, 10Analytics-Cluster: Put 6 GPU-based Hadoop worker in service - https://phabricator.wikimedia.org/T255138 (10elukey) [13:32:08] 10Analytics, 10Analytics-Kanban: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10elukey) [13:33:04] PROBLEM - Check the last execution of archiva-gitfat-link on archiva1002 is CRITICAL: CRITICAL: Status of the systemd unit archiva-gitfat-link https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:33:14] 10Analytics, 10Analytics-Cluster: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) [13:33:56] 10Analytics, 10Analytics-Cluster: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10elukey) [13:34:45] 10Analytics: Resetting Kerberos access for sguebo - https://phabricator.wikimedia.org/T254035 (10sguebo_WMF) >>! In T254035#6209002, @elukey wrote: > ` > elukey@krb1001:~$ sudo manage_principals.py delete sguebo > elukey@krb1001:~$ sudo manage_principals.py create sguebo --email_address=sguebo@wikimedia.org > Pr... [13:35:25] 10Analytics, 10Analytics-Cluster: Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10elukey) [13:35:43] 10Analytics, 10Analytics-Cluster: Upgrade the Cassandra AQS cluster to Cassandra 3.11 - https://phabricator.wikimedia.org/T255141 (10elukey) [13:37:39] 10Analytics, 10Analytics-Cluster: Upgrade the Hadoop Analytics cluster to BigTop - https://phabricator.wikimedia.org/T255142 (10elukey) [13:38:55] ottomata: o/ I am adding more tasks to https://phabricator.wikimedia.org/tag/analytics-cluster/, I think it gives a better and clearer view of how packed we are during the next Q (still a lot of things missing) [13:40:44] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10MBH) I re-runned this query in its old form. Successfully executed in 278 sec. [13:42:15] for example, we have to refresh 16 hosts, add 6 for GPUs (related to this FY) and next FY add 24 [13:42:51] yeahhhhh [13:43:12] a ton of hosts and ops, if we don't automate we'll die :D [13:43:14] indeed! [13:43:38] then there is also bigtop [13:43:43] ahahah [13:47:35] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T255145 (10elukey) [13:47:44] 10Analytics: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10elukey) [13:47:48] elukey: o/ - Do you wish me trying to move to more opsy? I know I'll be slower than you or ottomata, but maybe it could help? [13:48:21] joal: nono no need, I think that we just need to be better in scheduling tasks, the new board is meant to do it [13:48:45] I also hope that we'll hire an SRE soon, plenty of tasks to have fun on while I watch :D [13:49:56] 10Analytics, 10Analytics-Cluster: Put 24 Hadoop worker nodes in service (cluster expansion) - https://phabricator.wikimedia.org/T255146 (10elukey) [13:51:45] 10Analytics, 10Analytics-Cluster: Refresh Druid nodes (druid100[1-3]) - https://phabricator.wikimedia.org/T255148 (10elukey) [13:57:30] !log removed accidentally added page_restrictions column(s) on Hive table event.mediawiki_user_blocks_change after a incorrect schema change was merged (no data was ever set in this column) [13:57:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:01:15] 10Quarry, 10DBA, 10Data-Services: Quarry query became work much slower - https://phabricator.wikimedia.org/T247978 (10Marostegui) Sweet! [14:04:15] (03PS2) 10Milimetric: Show languages in dropdown [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604065 (https://phabricator.wikimedia.org/T246971) [14:04:28] RECOVERY - Check the last execution of archiva-gitfat-link on archiva1002 is OK: OK: Status of the systemd unit archiva-gitfat-link https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:04:29] mforns: if you wouldn't mind, that's ready for review ^ [14:04:41] I spent most of my time trying not to change everything [14:04:45] :) [14:09:44] milimetric: looking [14:18:59] * elukey errand for a bit! [14:29:26] (03PS1) 10Mforns: Update changelog.md for v0.0.126 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/604725 [14:30:05] (03CR) 10Mforns: [V: 03+2 C: 03+2] "self-merging for deployment" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/604725 (owner: 10Mforns) [14:31:12] Starting build #44 for job analytics-refinery-maven-release-docker [14:42:46] Project analytics-refinery-maven-release-docker build #44: 09SUCCESS in 11 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/44/ [14:45:43] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604065 (https://phabricator.wikimedia.org/T246971) (owner: 10Milimetric) [14:46:29] Starting build #16 for job analytics-refinery-update-jars-docker [14:46:50] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.126 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604733 [14:46:51] Project analytics-refinery-update-jars-docker build #16: 09SUCCESS in 22 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/16/ [14:48:36] (03CR) 10Mforns: [V: 03+2 C: 03+2] Add refinery-source jars for v0.0.126 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604733 (owner: 10Maven-release-user) [14:57:01] (03PS1) 10Mforns: Bump up refinery-source jar versions for v0.0.126 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604745 [14:57:53] (03CR) 10Mforns: [V: 03+2 C: 03+2] Bump up refinery-source jar versions for v0.0.126 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604745 (owner: 10Mforns) [14:58:16] !log deployed refinery-source v0.0.126 [14:58:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:01:12] !log started refinery deploy for v0.0.126 [15:01:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:02:19] ping ottomata [15:04:33] mforns: what node? [15:04:41] hehe :) [15:04:45] multichannel talk [15:04:47] 1007 [15:04:52] elukey: ^ [15:04:57] weird! [15:05:48] not for space issues, is there an error msg from scap deploy? [15:06:01] No such file or directory (2)\nrsync: link_stat "/git-fat/b508f1fb4f058d3ed975446bdb961682c8c19360" [15:06:14] maybe I was too fast after merging the jars [15:07:50] mforns: possible - could also be related to jars having been deleted from archiva [15:07:52] OH NO [15:10:57] (03PS1) 10Milimetric: Release 2.7.6 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604752 [15:32:13] (03PS2) 10Milimetric: Release 2.7.6 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604752 [15:36:20] (03CR) 10Milimetric: [C: 03+2] "tested on canary wikistats-canary.wmflabs.org/test202006" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604752 (owner: 10Milimetric) [15:47:21] (03PS2) 10Milimetric: [WIP] Clean up data flow as pertains to state [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/604387 [15:47:43] (03Abandoned) 10Milimetric: Revert "Fix language dropdown for ios devices" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/603541 (owner: 10Milimetric) [16:13:00] 10Analytics, 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10Milimetric) [16:13:17] 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10Milimetric) [16:36:48] nuria: stats.wikimedia.org is updated, working time range and language selector, tested on mobile and desktop [16:38:06] 10Analytics, 10Fundraising-Backlog: Dashboard for CentralNotice impression rates using Druid, centralnotice_analytics and CN events - https://phabricator.wikimedia.org/T254792 (10DStrine) This is a good conversation but looking at the fr-tech roadmap in the next few months, this is out of scope. We're pushing... [16:42:28] * nshahquinn a-team: the dev version of the KaiOS app is sending event to the new KaiOSAppFirstRun stream, but I can't find any trace of them in Hive or Kafka. What could be going on? (FYI, this is urgent because the app will be released in the next few days, but we'll have to hold that if we can't get this working.) [16:43:22] Here's a sample of an event that was sent: [16:43:27] `https://en.wikipedia.org/beacon/event?{”schema”:“KaiOSAppFirstRun”,“revision”:20158341,“event”:{“app_version”:“1.0.0”,“app_id”:“df6df592e4aef9a8be65”},“webHost”:“en.wikipedia.org”,“wiki”:“enwiki”}` [16:44:51] They've been sending these events since about two days ago, so it can't be processing delays [16:46:25] When I run `kafkacat -C -b kafka-jumbo1001.eqiad.wmnet:9092 -t eventlogging_KaiOSAppFirstRun` I don't get any output at all [16:46:39] yeah, me too [16:46:40] hm [16:47:07] There was nothing in the eventerror stream either [16:47:10] nshahquinn: I confirm that - even when using -o beginnign [16:47:44] * joal will be back after diner [16:48:38] I'm looking at kafkacat -C -b kafka-jumbo1001.eqiad.wmnet:9092 -t eventlogging-client-side | grep KaiOSFirstRun [16:50:47] just thinking out loud, but yeah, the steps are: [16:52:04] send event -> get 204 -> eventlogging-client-side -> processor -> validate -> eventlogging_KaiOSAppFirstRun (eventlogging_valid_mixed) but I'm fuzzy on the exact details every time [16:53:27] nshahquinn: is that the exact thing? the url needs to be url encoded, right? [16:55:57] e.g. the url query that is sent by eventlogging looks like [16:56:08] ?%7B%22event%22%3A%7B%22source_page_id%22%3A251410%2C%22source_namespace%22%3A0%2C%22source_title%22%3A%22%E3%83%AF%E3%82%B9%E3%83%97_(CV-7)%22%2C%22source_url%22%3A%22https%3A%2F%2Fja.wikipedia.org%2Fwiki%2F%25E3%2583%25AF%25E3%2582%25B9%25E3%2583%2597_(CV-7)%22%2C%22page_title%22%3A%22%E3%83%9B%E3%83%BC%E3%83%8D%E3%83%83%E3%83%88_(CV-8)%22%2C%22page_id%22%3A252359%2C%22page_namespace%22%3A0%7D%2C%22schem [16:56:08] a%22%3A%22VirtualPageView%22%2C%22webHost%22%3A%22ja.wikipedia.org%22%2C%22wiki%22%3A%22jawiki%22%2C%22revision%22%3A17780078%7D; [16:56:36] * nshahquinn ottomata: I don't think that's the exactly thing. That's what Stephane (the engineer) sent me over Slack. [16:56:41] ok [16:57:20] i think we need to repro this, is there a way you can emit an event from the app and we can see if it comes anywhere? [16:57:30] we can grep webrequest and eventlogging-client-side for it [16:57:43] if we don't see it in webrequest, then the client isn't sending it right [16:57:56] ottomata: the same code has been sending events to other streams okay so I think it's not the client...but yeah, let's try [16:58:15] I just used my browser to send a test event: `https://en.wikipedia.org/beacon/event?{%22schema%22:%22KaiOSAppFirstRun%22,%22revision%22:20158341,%22event%22:{%22app_version%22:%221.0.0%22,%22app_id%22:%22test%22},%22webHost%22:%22en.wikipedia.org%22,%22wiki%22:%22enwiki%22}` [16:58:49] It got a 204 response [17:00:02] nshahquinn: ok am watching now [17:00:04] do it again [17:00:29] cool, got it in client side [17:00:47] nshahquinn: again please [17:01:08] ottomata: I just did it a second time...want it a third? [17:01:25] (using "test_2" as the app_id) [17:01:26] hmm no indication of problem in process logs hang on [17:02:12] hmm i haven't seen your second one [17:02:50] again please nshahquinn [17:03:19] ottomata: done ("test_3") [17:04:03] i see test 3 in eventlogging-client-side [17:04:04] hm [17:04:14] but no indication of error [17:04:15] hmmm [17:04:57] again nshahquinn :) [17:05:39] ottomata: done ("test 4") [17:06:30] are these working properly? if so, I can ask an engineer to send one from the actual app [17:07:49] i thin they are [17:08:01] i actually just manually processed oneof the events that made it [17:08:04] it looks like it processed just right [17:08:05] and validated [17:09:00] ottomata: I still don't see anything in Kafka, though...should there be a delay? [17:09:06] no [17:09:14] i processed it locally, didn't send to kafka [17:09:19] not sure what is happenign, am investigating [17:09:59] OHHHHHHH [17:10:01] i know. [17:10:02] i think. [17:10:06] i thnk,. [17:10:09] ottomata: oooooooh [17:10:13] do tell :) [17:10:19] there are new kafka brokers, and your topic is on one of them [17:10:22] not sure why that would matter yet [17:10:26] but it does somehow... [17:10:34] maybe a firewall rule? [17:10:56] yeah, I just noticed that if I leave `kafkacat -C -b kafka-jumbo1001.eqiad.wmnet:9092 -t eventlogging_KaiOSAppFirstRun` running for 10 seconds or so, I get timeouts from the brokers [17:11:08] % ERROR: Local: Broker transport failure: kafka-jumbo1007.eqiad.wmnet:9092/1007: Connect to ipv4#10.64.32.106:9092 failed: Connection timed out (after 31595ms in state CONNECT) [17:11:08] % ERROR: Local: Broker transport failure: kafka-jumbo1008.eqiad.wmnet:9092/1008: Connect to ipv4#10.64.48.121:9092 failed: Connection timed out (after 31593ms in state CONNECT) [17:11:08] % ERROR: Local: Broker transport failure: kafka-jumbo1009.eqiad.wmnet:9092/1009: Connect to ipv4#10.64.48.140:9092 failed: Connection timed out (after 31594ms in state CONNECT) [17:12:44] hmmm elukey yt? [17:13:01] do we need new network acl rules for new kafka jumbo brokers? [17:13:54] hmm, so your events are actually making it to your topic [17:14:05] we just acn't consume them from a stat box [17:14:07] hm [17:14:30] which likely means they are failing consumption by camus for refine into hive tooo [17:14:59] ahhhhhh, interesting [17:19:16] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Add new kafka brokers kafka-jumbo100[789] to the jumbo-eqiad Kafka cluster - https://phabricator.wikimedia.org/T252675 (10Ottomata) @elukey @akosiaris @ayounsi I think the Analytics VLAN ACLs need to be adjusted to allow connections to these new hosts. T... [17:19:22] nshahquinn: ^ [17:19:28] good new is, you are doing everything right :) [17:21:21] ottomata: thank you for digging into it! I'll let the team know they don't need to worry. Do subscribe me to the Phab ticket when you file it :) [17:32:14] ottomata: sorry I am here [17:34:20] ottomata: yeah we need the ips in there, adding them now [17:37:30] ottomata: we are also missing the new kafka-main hosts [17:37:44] archiva1002 [17:37:46] :D [17:38:11] jumbo first [17:38:51] lovely jumbo 1007-9 don't have AAAA records -.0 [17:41:21] whaa doh [17:45:53] ottomata: is it super urgent or shall we wait for Arzhel's +1? [17:45:57] (likely tomorrow my morning) [17:48:28] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10elukey) 05Resolved→03Open [17:48:31] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move Archiva to Debian Buster - https://phabricator.wikimedia.org/T252767 (10elukey) [17:49:04] elukey: kinda urgent, camus is failing to import events rn [17:50:58] but it probably has been for a bit, and this is the first complaint [17:51:03] sooo if you prefer to wait i think it will be ok [17:51:07] nshahquinn: ^ yes/no? [17:53:04] ottomata: it should be fine to wait a day (especially if current events will be backfilled) [17:55:35] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10elukey) Of course I made a mistake, namely not checking the artifacts that we explicitly reference in refinery. I assumed that keeping only from 0.0.115 onward artifacts would be enough... [17:58:05] ottomata: I'll do it tomorrow morning first thing [17:58:21] also, let me know what you think about --^ [17:58:27] (the archiva issue) [17:58:45] the problem with scap is of course due to my clean up, I completely forgot refinery [18:01:48] ottomata : we will need to re re-run refine since those hosts were racked and put on service no? [18:04:58] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10Nuria) There is a 3rd option, right? we get all the older jars we are missing from HDFS and upload them to archiva. In that scenario: 1) no job re-starts are needed 2) no code changes a... [18:05:09] ping ottomata 1on 1? [18:05:35] elukey: just added my thoughts to ticket [18:05:57] elukey: i think there is a (maybe easier) 3rd option [18:06:24] elukey: but it might be more involved that waht i anticipate so do let me know [18:07:23] nuria: no idea how to re-upload them, but if feasible ok :) [18:07:42] elukey: reupload to archiva you mean, right? [18:08:02] yes correct [18:08:13] but it is also kinda weird that we reference refinery-hive-0.0.46.jar [18:08:17] or similar [18:08:52] elukey: yes, but let's separate those two projects, one restarting jobs with old jars and one archiva maintenance [18:09:40] nuria: in theory the can be done separately, no job restart is needed now if we drop the old jars from refinery [18:10:16] it can be done at any pace later on [18:10:27] elukey: right cause they are in cluster [18:10:30] elukey: https://archiva.wikimedia.org/#upload [18:11:33] elukey: now, ideally we want to do changes in archiva w/o doing them elsewhere, now, do let me know if you disagree [18:13:13] nuria: at some point if we want to do cleanups we'll have to drop from refinery too due to git fat (not sure if I got the question sorry) [18:14:56] anyway, I am fine with all the scenarios, I'll work tomorrow morning on fixing this whatever the decision is :) [18:15:13] the next time this will be a good reminder of checking better :) [18:18:40] I -1 uploading to archiva - There are a bunch of jars to upload and it needs to be done manually (conf, names etc) - This is both error-prone and will take long time [18:19:11] I'd rather triple check jar usage and minimize upload-need first, and then fianlly if needed upload the minimal set there [18:29:41] (03PS7) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [18:30:01] I am heaadiung out, joal if you have a moment tomorrow let's sync ok? [18:30:11] hey elukey - here I am [18:30:34] I'll have time tomorrow, but not before early afternoon [18:30:39] elukey: --^ [18:31:27] joal: sure np! I'll prep a code review for the artfacts not needed in case we want to proceed in that way [18:32:15] heading out folks, sorry for the mess! [18:32:17] ack elukey - I'm pretty sure we'll need standup agreement before moving, which means nothing before Monday (silent tomorrow) [18:32:20] Bye elukey [18:32:28] yeah makes sense [18:32:30] No big deal elukey - have a good evening [18:36:07] (03PS8) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [18:36:43] ok nshahquinn will make sure they are backfilled, i think they will be automatically as soon as camus can start pulling them [18:36:45] not sure but I will make sure. [18:36:59] elukey: its ok to wait til tomorrow [18:37:30] joal: mforns, I think my event patch is bascially ready. All the needed functionality is there, will still need to try it with real stream config api, etc. and don't have any jobs using it yet [18:37:47] but it is at the point we should review and discuss before I move on I think [18:37:51] ok [18:37:59] ack ottomata - will rad tomorrow [18:38:01] +e [18:38:07] :] [18:38:30] ottomata: sounds good! thank you. If it's fixed tomorrow, there's no need to backfill since all we have so far are dev events, but it could go live early next week and it's crucial that we don't lose any real user events. [18:38:41] ok great [18:42:34] (03PS9) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [18:42:55] (03PS10) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [18:50:35] (03PS11) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [18:53:32] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10JAllemandou) I wouldn't go for option 2, as the rollback is really making us going backward (less available space, no cleanup etc). About options 1 and 3 I prefer 1 for two reasons - E... [19:02:10] (03PS12) 10Ottomata: Add classes to use EventStreamConfig with EventSchemaLoader to aide in event ingestion tasks [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/603582 (https://phabricator.wikimedia.org/T251609) [19:05:23] 10Analytics, 10Fundraising-Backlog: Dashboard for CentralNotice impression rates using Druid, centralnotice_analytics and CN events - https://phabricator.wikimedia.org/T254792 (10AndyRussG) >>! In T254792#6215864, @Milimetric wrote: > We're ready to help here, just help us coordinate. Thanks so much! Hugely a... [19:33:46] (03PS1) 10Neil P. Quinn-WMF: Whitelist fields in the KaiOSAppFirstRun data stream [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604846 [19:58:09] (03CR) 10Nuria: "I think we need to comment that we are storing the app_id unhashed to be able to stablish consent" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/604846 (owner: 10Neil P. Quinn-WMF) [20:08:45] 10Analytics, 10Analytics-Kanban: Language selector is not working anywhere now - https://phabricator.wikimedia.org/T246971 (10Nuria) a:05fdans→03Milimetric [20:10:16] milimetric, mforns : i can deploy wikistats to take things off your plate , let me know if it sounds good [20:13:33] milimetric, mforns : nvm i see you already deployed [20:14:20] thx though, I tested and think it looks generally ok [20:42:24] 10Analytics-Radar, 10Performance-Team: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10Nuria) You can do: >hdfs dfs -text /wmf/data/raw/eventlogging/eventlogging_NavigationTiming/hourly/2020/06/04/18/eventlogging_NavigationTiming.1003.0.8930.777500533.1591293600000 > out2.txt... [20:54:25] 10Analytics-Data-Quality, 10QuickSurveys, 10Research, 10WMDE-Technical-Wishes-Team, and 4 others: Remove Do Not Track support for QuickSurveys - https://phabricator.wikimedia.org/T254224 (10Jdlrobson) [21:08:37] 10Analytics-Kanban, 10Analytics-Radar, 10Privacy Engineering, 10Privacy, and 3 others: Identify pending analyses needing access to data older than 90 days - https://phabricator.wikimedia.org/T250857 (10MNeisler) [21:12:51] 10Analytics-Kanban, 10Analytics-Radar, 10Privacy Engineering, 10Privacy, and 3 others: Identify pending analyses needing access to data older than 90 days - https://phabricator.wikimedia.org/T250857 (10MNeisler) [21:39:12] 10Analytics, 10Product-Analytics: Request admin access to Superset - https://phabricator.wikimedia.org/T255207 (10cchen) [23:28:20] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 3 others: Eventlogging Client Side can use the stream config module to dynamically adjust sampling rates - https://phabricator.wikimedia.org/T234594 (10jlinehan) [23:28:25] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10MW-1.35-notes (1.35.0-wmf.27; 2020-04-07), and 2 others: EventLogging MEP Upgrade - https://phabricator.wikimedia.org/T238544 (10jlinehan) [23:28:38] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 3 others: Eventlogging Client Side can use the stream config module to dynamically adjust sampling rates - https://phabricator.wikimedia.org/T234594 (10jlinehan) [23:28:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 9 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10jlinehan) [23:30:48] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 3 others: EventLogging MEP Upgrade Phase 3 - https://phabricator.wikimedia.org/T234594 (10jlinehan) [23:58:04] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10jwang) @Nuria, @Bmueller Please find the bot edits by namespaces in [[ https://docs.google.com/spreadsheets/d/1GzyDzCuOAjEU6sF3Gs0fiP...