[00:15:54] RECOVERY - Check unit status of monitor_refine_eventlogging_analytics on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:29:36] PROBLEM - Check unit status of monitor_refine_eventlogging_analytics on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_analytics https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:43:40] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:55:02] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [01:36:46] !log rebalance kafka partitions for webrequest_text partitions 21,22 [01:36:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [01:54:44] 10Analytics, 10Product-Infrastructure-Team-Backlog, 10Wikimedia Taiwan, 10Chinese-Sites, 10Pageviews-Anomaly: Top read is showing one page that had fake traffic in zhwiki - https://phabricator.wikimedia.org/T274605 (10Shizhao) The task has been solved? [03:10:30] 10Analytics, 10Product-Infrastructure-Team-Backlog, 10Wikimedia Taiwan, 10Chinese-Sites, 10Pageviews-Anomaly: Top read is showing one page that had fake traffic in zhwiki - https://phabricator.wikimedia.org/T274605 (10cooltey) 05Open→03Resolved a:03cooltey Thanks @Shizhao, this is resolved. [03:41:59] 10Analytics-Clusters, 10Analytics-Kanban, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10razzi) Alright! @fgiunchedi I added the alert to superset, and when it alerted on Icinga, @ottomata and I got an alert... [04:21:30] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [04:32:48] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:07:55] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade furud/flerovium to Debian Buster - https://phabricator.wikimedia.org/T278421 (10razzi) @elukey I figured the easiest way to see if I was on the right track was to make a wip patch - see https://gerrit.wikimedia.org/r/c/operations/puppet/+/... [05:20:20] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:31:36] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:11:24] 10Analytics-Clusters: Disk filling up on `/` on an-coord1001 - https://phabricator.wikimedia.org/T279304 (10elukey) I had to run `sudo find -mtime +15 -exec rm -f {} \;` for /var/log/hive today, we probably want to have a gzip rolling appender like we did for T276906. @razzi: > The hive-log4j.properties file... [06:32:28] !log move hue.wikimedia.org to an-tool1009 (from analytics-tool1001) [06:32:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:34:28] we are about to complete the py2->py3 migration :D [06:57:38] so move completed, now hue seems not allowing me in [06:59:06] maybe it is something cas/httpd related, on the logs I come up as anon [07:03:51] ahh yes [07:03:53] CASAuthNHeader CAS-User [07:03:58] not X-CAS-UID [07:09:44] moritzm: o/ do I need to restart something on the idp nodes after changing the settings from hue-next -> hue ? [07:16:41] I am missing the X-CAS-uid [07:16:42] mmmm [07:23:13] now I see it via tcpdump [07:23:14] X-CAS-uid: elukey [07:23:17] but can't enter anyway [08:23:15] elukey: just making sure the incident is over then will come back to you [08:24:03] jbond42: yes yes of course! [08:32:54] ok elukey seems over now let me know what the issues is [08:37:29] jbond42: still no idea, it may be a hue bug/trap that I have never encountered. To recap: we used to have hue-next.wikimedia.org with CAS, targeting an-tool1009 (httpd + hue) [08:37:56] I merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/678861 earlier on to move hue.wikimedia.org to an-tool1009 [08:38:01] deprecating hue-next basically [08:38:11] and the old backend (analytics-tool1009) [08:39:04] the main problem now is that Hue seems to see every attempt to login (after cas, that works fine) as a 'anon' one [08:39:14] even if it is instructed to check the X-CAS-uid header [08:39:27] it worked up to this morning, no idea why now it does this [08:40:06] I tried to tcpdump the Hue port to see the HTTP headers sent by httpd, and I see X-CAS-uid: elukey [08:41:03] and it all worked fine on hue-next? [08:43:03] yep for months [08:43:11] of course when I do the migration it breaks :D [08:46:12] jbond42: can you try to login to hue.wikimedia.org ? [08:46:16] if you have a minute [08:46:21] elukey: how do i login ... yes :) [08:46:21] just to see if I am crazy or not [08:46:36] in theory it is sufficient to do the cas login [08:46:58] i just land at the login page [08:47:50] i had a quick look at the config and i cant see anything obvioulsy wrong [08:47:57] yeah I see 'anon' even for you [08:48:09] will try to investigate further, thanks a lot! [08:50:29] 10Analytics: Produce a list of wiki projects ranked by number of eligible voters in Board elections - https://phabricator.wikimedia.org/T278815 (10Qgil) No worries, let me check first what the Elections Committee and other potential volunteers interested in elections tech can and are interested in providing. [08:57:17] elukey: fyi the following is a usefull cmd for monitoring headeres between apache and the backend [08:57:20] https://wikitech.wikimedia.org/wiki/User:Jbond/debuging#Show_all_request_and_response_headeres_on_loopback [08:57:24] (just update the port) [08:57:35] as far as i can see every thing is getting set correctly [09:00:08] jbond42: same thing, thanks for checking! [09:46:46] 10Analytics-Clusters, 10Analytics-Kanban, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10fgiunchedi) >>! In T273064#7001382, @razzi wrote: > Alright! @fgiunchedi I added the alert to superset, and when it ale... [09:47:03] 10Analytics-Clusters, 10Analytics-Kanban, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10fgiunchedi) [09:50:38] !log rollback hue on an-tool1009 to 4.8, it seems that 4.9 still has issues [09:50:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:07:41] jbond42: version 4.8 works fine, will follow up with upstream [10:08:06] 10Analytics-Radar, 10observability, 10Graphite, 10Patch-For-Review, and 2 others: Broken reportupdater queries: edit count bucket label contains illegal characters - https://phabricator.wikimedia.org/T279046 (10awight) We want the raw aggregated data as well: {F34399950} [10:11:30] elukey: ack let me know if i can help, also as hue is django you may want to explore using django-cas-ng. chaom.odus added/is adding it to netbox so would be a good person to chat with, that said netbox had more complex use cases around group mapping so yymv [10:12:11] jbond42: I don't trust Hue to be honest, I'd feel way safer in having httpd in front dealing with CAS.. [10:12:52] elukey: also good, thats the most standard implmentation so definetly good with me :) [10:27:23] how do oozie jobs get loaded? I see files like https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/coord_editors_bycountry_monthly.properties but there's reference to running the jobs by hand in the comment [10:29:01] hnowlan: so oozie job workflows etc.. need to be on hdfs first (we do it as part of the refinery deployment, after scap we have a custom command to execute on an-launcher1002 to upload to hdfs) [10:29:11] Just wondering whether I could override using -Dcassandra_host=aqs1010.eqiad.wmnet in a command or if duplicating these jobs to ${jobname}_aqs_next.properties is the best way [10:29:23] elukey: ahhh okay [10:29:35] then there is a CLI command to use (with the right user, usually analytics) to start the job [10:29:50] it needs start/end time, etc.. and the .properties file contains all the info for oozie to start the job [10:29:58] like where to find stuff on hdfs, etc.. [10:30:28] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie contains some info [10:31:16] hnowlan: --^ [10:31:30] so yes the -D can be used with the oozie command [10:31:44] but we need to make sure also that all the other details are good [10:32:13] most of the time we generate intermediate datasets before loading something, and if multiple jobs are doing the same and execute at the same time we'll get some trouble [10:32:26] I don't recall how the cassandra jobs are working but we can look into them if you want [10:33:15] elukey: ahh cool, I'll read the docs. ideally we'd just load the intermediate datasets twice I guess rather than regenerate, if that's possible [10:33:59] yes yes, so you'd need to check the workflow/coordinator xml files related to the properties [10:34:07] to verify what they do [10:42:26] hnowlan: going to lunch but feel free to send a meeting invite if you want to review those files :) [10:49:04] thanks! will do :) [12:07:04] 10Analytics, 10Patch-For-Review: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10elukey) Had to rollback v4.9 from an-tool1009, discussing what to do in https://github.com/cloudera/hue/pull/2004/commits/c30ade1e146946bf4388df023611a2573f99960d with upstream [12:20:01] * klausman late lunch [12:35:08] hi team! [12:35:25] opened https://issues.apache.org/jira/browse/HIVE-25020 [12:35:28] hola mforns [12:37:02] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Upgrade the Hadoop coordinators to Debian Buster - https://phabricator.wikimedia.org/T278424 (10elukey) I opened https://issues.apache.org/jira/browse/HIVE-25020 to Hive upstream, it seems that the buster mariadb jdbc driver doesn't play well with... [12:50:34] (03CR) 10Mforns: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679387 (https://phabricator.wikimedia.org/T278424) (owner: 10Elukey) [12:56:25] (03CR) 10Elukey: [C: 03+2] Move sqoop-mediawiki-tables back to the com.mysql.jdbc.Driver [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679387 (https://phabricator.wikimedia.org/T278424) (owner: 10Elukey) [12:56:43] (03CR) 10Elukey: [V: 03+2 C: 03+2] Move sqoop-mediawiki-tables back to the com.mysql.jdbc.Driver [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679387 (https://phabricator.wikimedia.org/T278424) (owner: 10Elukey) [13:01:44] 10Analytics, 10Analytics-Kanban: Data drifts between superset_production on an-coord1001 and db1108 - https://phabricator.wikimedia.org/T279440 (10elukey) [13:13:19] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10WMDE-TechWish, 10Readers-Web-Backlog (Tracking): QuickSurveys should show an error when response is blocked - https://phabricator.wikimedia.org/T256463 (10awight) I had another thought, documented in {T280033}: that we return survey results through... [13:32:07] 10Analytics: Produce a list of wiki projects ranked by number of eligible voters in Board elections - https://phabricator.wikimedia.org/T278815 (10Niharika) @kzimmerman I would believe so. @Qgil you and I should sync-up about how to handle these requests in the future. :) [13:33:54] (03CR) 10Ottomata: [C: 03+2] ProduceCanaryEvents - include httpRequest body in failure message [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/678919 (https://phabricator.wikimedia.org/T274951) (owner: 10Ottomata) [13:34:14] elukey: i'm going to merge https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/679372 [13:34:17] ok with you? [13:34:22] and then deploy refeiner source and refinery? [13:35:32] i hope so cause im' doing it ! :) [13:35:35] (03CR) 10Ottomata: [C: 03+2] Fix bug in Refine where table regexes were not matching properly [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/679372 (owner: 10Ottomata) [13:35:43] ottomata: o/ sure! [13:35:54] do you want me to build source? It is my ops week [13:36:01] (if you are busy with other things) [13:37:12] elukey: sure! i'll make a changelog commit and you can take it from there? [13:38:03] (03PS1) 10Ottomata: Changelog entry for 0.1.5 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/679802 [13:38:30] elukey: ^ fire away, thank you! [13:43:44] milimetric: yt? [13:43:50] or joal: ? [13:43:52] yeah ottomata [13:43:56] jo is off this week [13:43:57] quick bc? [13:44:01] omw [13:46:37] ottomata: sure! [13:49:26] (03CR) 10Elukey: [C: 03+2] Changelog entry for 0.1.5 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/679802 (owner: 10Ottomata) [13:52:02] 10Analytics-Radar, 10AHT-Backlog, 10Anti-Harassment, 10CheckUser, and 3 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10Niharika) [13:56:04] (03Merged) 10jenkins-bot: Changelog entry for 0.1.5 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/679802 (owner: 10Ottomata) [13:57:27] refinery source is building [13:57:28] https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/lastBuild/console [13:57:40] I have some meetings but I should be able to manage also the refinery symlink update [14:09:59] elukey: ok let me know i can take over anytime [14:29:08] started the second step to update refinery's jars: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/lastBuild/console [14:29:20] after this we should review the change and then we'll be ready to deploy refinery [14:29:47] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.1.5 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679818 [14:30:01] ah! [14:30:25] ottomata: --^ [14:45:10] (03CR) 10Ottomata: [C: 03+1] Add refinery-source jars for v0.1.5 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679818 (owner: 10Maven-release-user) [14:45:12] +1 elukey [14:54:58] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add refinery-source jars for v0.1.5 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/679818 (owner: 10Maven-release-user) [14:55:36] ottomata: deploying refinery! [14:56:57] !log deploy refinery via scap - weekly train [14:56:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:00:50] ottomata: you can prep the puppet change in the meantime [15:01:07] to force refine to use 0.1.5 [15:09:29] oh already started yesterday elukey didn't add you as reviewer, one sec i have to add that too [15:10:14] 10Analytics: Decommission analytics-tool1001 and all the CDH leftovers - https://phabricator.wikimedia.org/T280262 (10elukey) [15:10:30] ottomata: refinery deploy done! [15:10:42] ok great! [15:19:32] elukey: enablingg puppet on an-coord1001 [15:19:34] sorry [15:19:37] an-test-coord1001* [15:20:16] elukey: that did [15:20:18] - org.mariadb.jdbc.Driver [15:20:18] + com.mysql.jdbc.Driver [15:21:22] oh elukey did you deploy to test? [15:21:51] deploying to test now [15:27:43] ottomata: ah snap I didn't yes! [15:28:18] the diff for an-test-coord1001 is ok [15:28:33] I deployed today the libmysql-dev package [15:32:13] k coo [16:02:51] yo razzi [16:23:18] 10Analytics: Use inclusive language - https://phabricator.wikimedia.org/T280268 (10Milimetric) [16:23:56] fdans: Can I assign this task to you for the Wikistats addition of the pageviews-per-country data: https://phabricator.wikimedia.org/T207171 ? And should I create a child task specifically for Wikistats, like the analog of https://phabricator.wikimedia.org/T263697? [16:24:48] lexnasser: yes sounds good! don't worry about the subtask :) [16:28:34] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10lexnasser) a:05lexnasser→03fdans Passing this task over to Francisco to carry out the implementation of this data int... [16:57:18] 10Analytics, 10SRE, 10netops: Audit analytics firewall filters - https://phabricator.wikimedia.org/T279429 (10fdans) a:03razzi [16:58:02] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Gerrit-Privilege-Requests, and 2 others: Create or identify an appropriate Gerrit group for +2 rights on schemas/event/secondary - https://phabricator.wikimedia.org/T279089 (10fdans) [17:04:33] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban: Import of MediaWiki tables into the Data Lakes mangles usernames - https://phabricator.wikimedia.org/T230915 (10fdans) p:05Medium→03High [17:05:30] 10Analytics-Radar, 10Data-Services, 10Developer-Advocacy (Apr-Jun 2021), 10cloud-services-team (Kanban): Mitigate breaking changes from the new Wiki Replicas architecture - https://phabricator.wikimedia.org/T280152 (10fdans) [17:10:12] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation: Mention QRank in “Analytics Datasets” - https://phabricator.wikimedia.org/T278416 (10fdans) a:03Ottomata [17:13:05] 10Analytics, 10Product-Analytics, 10Research: Use Hive/Spark timestamps in Refined event data - https://phabricator.wikimedia.org/T278467 (10fdans) p:05Triage→03Low [17:16:14] 10Analytics-Radar, 10WMDE-Analytics-Engineering: wmde-toolkit-analyzer-build.service fails on stat1007 - https://phabricator.wikimedia.org/T278665 (10fdans) [17:17:07] 10Analytics, 10Analytics-Kanban, 10Cassandra: Store AQS schema and grants in git - https://phabricator.wikimedia.org/T278701 (10fdans) [17:20:57] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Data-Infrastructure, 10Technical-Debt: Replace usages of Linker::link() and Linker::linkKnown() in extension EventLogging - https://phabricator.wikimedia.org/T279328 (10fdans) cc @Mholloway [17:23:36] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Readers-Web-Backlog: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10fdans) a:03mforns [17:23:38] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Infrastructure-Team-Backlog, 10Readers-Web-Backlog: VirtualPageView should use EventLogging api to send virtual page view events - https://phabricator.wikimedia.org/T279382 (10fdans) p:05Triage→03High [17:26:30] 10Analytics, 10Analytics-Kanban: Data drifts between superset_production on an-coord1001 and db1108 - https://phabricator.wikimedia.org/T279440 (10fdans) p:05Triage→03High [17:27:52] 10Analytics, 10Analytics-Wikistats: New Wikivoyages are only partially included in Stats - https://phabricator.wikimedia.org/T279564 (10fdans) p:05Triage→03High [17:28:59] ottomata, razzi - I'd like to nuke analytics-tool1001 tomorrow morning, I opened a code review about it, lemme know if you have anything against it (old hue host) [17:30:49] +1 [17:41:12] sgtm [17:43:04] RECOVERY - Check unit status of monitor_refine_event on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:43:19] Like I mentioned in standup, I'm hoping to deploy the motd fix: https://gerrit.wikimedia.org/r/c/operations/puppet/+/670013 [17:43:20] It's a small change and should be safe, but since all hosts use motd, I'm wondering if I should alert some channel before deploying? [17:47:13] (03PS1) 10Lex Nasser: Implement delay for pageviews/per-article and mediarequests/per-file endpoints to reduce Cassandra load [analytics/aqs] - 10https://gerrit.wikimedia.org/r/679899 (https://phabricator.wikimedia.org/T261681) [17:48:17] (03Abandoned) 10Lex Nasser: Add time interval limits to pageviews per-articles and mediarequests per-file AQS endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/661827 (https://phabricator.wikimedia.org/T261681) (owner: 10Lex Nasser) [17:54:18] razzi: I +1ed it, but my 2c: add a note about your intention in #wikimedia-sre with a TL;DR and wait say 30 mins before merging [17:54:25] so people will have the chance to review it [17:54:39] elukey: sounds good [17:57:47] ottomata: are you going to take care of the refine reruns etc.. or should I? If so I'll do it tomorrow morning [17:58:07] my plan is to run all the refine checkers with a 3/4 day time span [17:58:15] and see what refine failures etc.. are left [17:59:11] ah razzi, I almost forgot [17:59:27] there is a nice alert for an hadoop worker waiting for you :) [17:59:32] I haven't touched it [17:59:48] actually there are two [17:59:56] one for an-worker1100 and one for an-master1001 [18:00:00] they are both related [18:00:06] elukey: yup i will do it! i'm doing it forr eventlogging_analytics now [18:00:08] it takes a while! [18:00:13] to look back so far i guess [18:00:20] i'm also watching refine_eventlogging_legacy [18:00:29] a refine_event has happened [18:00:34] i think i'll launch that now looking back too [18:00:53] ottomata: ack perfect! If anything is left to do please ping me in here and I'll keep going tomorrow morning! [18:01:06] i did run RrefineMonitor for refine_event and it looks good now! [18:01:11] \o/ [18:01:13] ok cool, I'll look at those alerts elukey [18:01:51] razzi: we can discuss them tomorrow morning in case you have doubts (in case open a task!) [18:03:41] going afk for dinner, ttl! [18:17:41] (03CR) 10jerkins-bot: [V: 04-1] Implement delay for pageviews/per-article and mediarequests/per-file endpoints to reduce Cassandra load [analytics/aqs] - 10https://gerrit.wikimedia.org/r/679899 (https://phabricator.wikimedia.org/T261681) (owner: 10Lex Nasser) [18:26:59] Voici le premier, qui colle bien à la crise du #Covid . [18:26:59] #ThisIsFine [18:27:12] wow :) [18:28:21] joal: je ne comprends pas, qui est le premier? [18:28:54] this was a copy-paste mistake razzi - From here: https://twitter.com/AllanBARTE/status/1382702351027437568 [18:29:18] razzi: I shouldn't leave my IRC term open when not working :) [18:29:23] :P [19:45:56] * razzi lunchtime [20:03:14] hiya AndyRussG [20:03:16] yt? [20:45:22] 10Analytics: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10Ottomata) [20:47:23] 10Analytics: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10Ottomata) [20:53:38] PROBLEM - eventgate-analytics-external validation error rate too high on alert1001 is CRITICAL: 372.4 gt 2 https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate https://grafana.wikimedia.org/d/ePFPOkqiz/eventgate?orgId=1&refresh=1m&var-service=eventgate-analytics-external&var-stream=All&var-kafka_broker=All&var-kafka_producer_type=All&var-dc=thanos [21:01:22] ^ filed https://phabricator.wikimedia.org/T280294 [21:13:19] !log rebalance kafka partitions for webrequest_text partition 23 [21:13:19] last one!!!!!!! [21:13:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:57:01] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: Replace usages of Linker::link() and Linker::linkKnown() in extension EventLogging - https://phabricator.wikimedia.org/T279328 (10Mholloway) a:03Mholloway [21:57:22] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Product-Data-Infrastructure, and 2 others: Replace usages of Linker::link() and Linker::linkKnown() in extension EventLogging - https://phabricator.wikimedia.org/T279328 (10Mholloway) p:05Triage→03Medium