[04:22:57] PROBLEM - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:08:59] RECOVERY - Check unit status of eventlogging_to_druid_navigationtiming_hourly on an-launcher1002 is OK: OK: Status of the systemd unit eventlogging_to_druid_navigationtiming_hourly https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:22:32] PROBLEM - Check unit status of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:45:59] RECOVERY - Check unit status of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:18:45] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10Majavah) [12:40:36] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10JAllemandou) [12:45:04] (03PS4) 10Ottomata: Add support for finding RefineTarget inputs from Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673604 (https://phabricator.wikimedia.org/T212451) [12:47:29] 10Quarry: 500 server error - https://phabricator.wikimedia.org/T278233 (10LClightcat) [12:47:58] 10Quarry: 500 server error - https://phabricator.wikimedia.org/T278233 (10Majavah) [12:48:01] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10Majavah) [12:50:15] (03PS1) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [12:58:26] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10elukey) This is interesting for @hnowlan for sure :) [12:59:04] (03PS2) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [13:05:56] 10Analytics: AQS Cassandra storage: revisit capacity based on newer patterns - https://phabricator.wikimedia.org/T278234 (10JAllemandou) [13:45:47] 10Analytics-Clusters, 10Analytics-Kanban: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10ema) >>! In T255973#6840918, @Ottomata wrote: > Although, it is a go library, which I'm not sure we have much tooling around dealing with.... [14:01:18] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10Urbanecm_WMF) p:05Triage→03Unbreak! Boldly triaging this as an UBN. And, issue confirmed: {F34182244} [14:22:29] (03PS3) 10Ottomata: Rename whitelist to allowlist for Refine sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670269 (https://phabricator.wikimedia.org/T273789) [14:22:31] (03PS6) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [14:22:33] (03PS5) 10Ottomata: Add support for finding RefineTarget inputs from Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673604 (https://phabricator.wikimedia.org/T212451) [14:22:35] (03PS3) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [14:25:38] what's up with kafka, how come I can't consume any events and camus is complaining? [14:25:51] (I'm slow and reading the backscroll doesn't tell me too much) [14:26:04] whicih topic milimetric ? [14:26:25] eqiad k8s is offline, so all mw and EL events are flowing through codfw, which means data is all in the codfw topics [14:26:31] which is why you should use stream config to get the topics to consume [14:26:32] :) [14:27:09] curl 'https://meta.wikimedia.org/w/api.php?action=streamconfigs&streams=mediawiki.revision-create&all_settings=true' | jq . [14:34:38] oh I see, I was just listing them on the broker, cool. Thanks, let me know if I can help with the alarms [14:51:07] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10dcaro) a:03dcaro [14:51:10] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10dcaro) Quarry web is failing to connect to mysql, looking: ` Mar 23 14:47:02 quarry-web-01 uwsgi-quarry-web[441]: pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'quarry-db-01.qu... [14:55:32] oh weird, in codfw there's revision_create and revision-create, the latter has data [14:56:06] stream config is great, but it would be nice to have a better interface into it that knew about the brokers too [14:56:42] * joal is watching milimetric planning for a new UI in his head [14:57:08] oh I have the UI instantly, the hard part is getting rid of it to make room for gobblin [14:57:15] how's that going joal, can we sync for a bit? [14:57:20] :D [14:57:25] sure milimetric - to the cave [14:57:30] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10dcaro) Not sure if it's expected, but maridb is running only on localhost ip on quarry-db-01: ` tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 18433/mysqld ` [15:04:58] a-team: I have a 2 hour meeting on annual planning so I won't be able to make it to our meetings, sorry [15:13:57] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10LClightcat) It is worth mentioning that, at least for me, the service has returned to normal operation. [15:37:01] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10dcaro) Yes, this is fixed for now [15:37:14] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10dcaro) 05Open→03Resolved [16:01:35] milimetric: indeed broker discovery would be nicer, but its a bit difficult...some clusters have all topics [16:02:07] or...do you mean brokers in a cluster...or kafka cluster broker discovery? [16:02:36] mforns: pinggggg [16:29:30] 10Analytics, 10Product-Analytics (Kanban): Hive table neilpquinn.toledo_pageviews missing almost all data - https://phabricator.wikimedia.org/T277781 (10nshahquinn-wmf) p:05High→03Medium [16:41:01] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10nray) @Edtadros There was a patch on this ticket that was reverted, m... [16:41:25] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10nray) a:05nray→03Edtadros [16:42:00] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Product-Data-Infrastructure, and 2 others: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10Edtadros) >>! In T218835#6938556, @nray wrote: > @Edtadros There was... [16:47:01] 10Analytics-Radar, 10observability, 10serviceops, 10Patch-For-Review, and 2 others: Create a separate 'mwdebug' cluster - https://phabricator.wikimedia.org/T262202 (10thcipriani) [16:47:21] Hi Analytics! is it all ready on your side for me to deploy per-country endpoints to RESTBase? https://github.com/wikimedia/restbase/pull/1289 [16:48:24] 10Analytics-Radar, 10observability, 10serviceops, 10Patch-For-Review, and 2 others: Create a separate 'mwdebug' cluster - https://phabricator.wikimedia.org/T262202 (10thcipriani) Is this still in progress or is this work superseded by #mw-on-k8s work? [16:50:38] 10Analytics-Radar, 10Growth-Scaling, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10Tgr) This is done - the next time the script runs (on April 1st) it will delete all old records. As before, records ar... [16:54:13] 10Analytics: Cleanup cassandra keyspaces and host - https://phabricator.wikimedia.org/T278231 (10NiteshKumar123) I want to complete this task . Please help me how to solve it. I'm new on that platform. [17:05:07] milimetric: anything else to do before restbase is deployed? [17:07:01] * elukey bbiab [17:10:16] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Data-Infrastructure, 10Product-Analytics (Kanban): [MEP] [BUG] Timestamp format changed in migrated client-side EventLogging schemas - https://phabricator.wikimedia.org/T277253 (10kzimmerman) [17:17:20] 10Analytics, 10Product-Analytics: Default table creation settings results in warnings when querying - https://phabricator.wikimedia.org/T277822 (10nshahquinn-wmf) This has also caused T275233. [17:43:29] * elukey afk! [17:58:46] (03CR) 10Bstorm: [C: 03+2] multiinstance: Attempt to make quarry work with multiinstance replicas [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [17:59:21] (03Merged) 10jenkins-bot: multiinstance: Attempt to make quarry work with multiinstance replicas [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/632804 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [18:19:21] (03PS4) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [18:19:51] Does anybody know what to make of this Icinga/DPKG alert: CHECK_NRPE STATE UNKNOWN (https://alerts.wikimedia.org/?q=alertname%3DIcinga%2FDPKG&q=%40receiver%3Dirc-spam)? [18:19:56] Happening on an-launcher1002 [18:20:33] (03PS5) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [18:24:34] huh no [18:24:45] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=an-launcher1002&service=DPKG [18:24:52] lexnasser: github is down so... I can't check the status, but we don't operate restbase, basically the platform team will deploy that as part of their regular deploy and it'll just start working [18:25:13] Pchelolo: lexnasser milimetric i think you are missing each other's pings :) [18:25:23] milimetric: Pchelolo asked about deploying it earlier [18:25:35] ah :) [18:25:42] backscroll is a little nuts today [18:25:51] yeah. I'm ready to deploy just wondering if it's all good on your side [18:26:06] thank you for cross-pinging ottomata [18:28:47] Pchelolo: yeah, I think it's all set on our side. thanks for checking! [18:29:26] cool. will ping when deployed [18:33:15] (03PS7) 10Ottomata: Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [18:33:26] (03PS6) 10Ottomata: Add support for finding RefineTarget inputs from Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/673604 (https://phabricator.wikimedia.org/T212451) [18:33:31] (03PS6) 10Ottomata: Improve Refine failure report email [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/674304 [18:37:43] * razzi lunch [18:41:04] (03CR) 10Bstorm: "I see this is currently checked out live on the web instance. In order to deploy https://gerrit.wikimedia.org/r/c/analytics/quarry/web/+/6" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/665472 (https://phabricator.wikimedia.org/T275277) (owner: 10Framawiki) [18:45:01] 10Analytics, 10Event-Platform, 10Product-Data-Infrastructure, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) [18:45:07] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Structured-Data-Backlog, and 2 others: SuggestedTagsAction Event Platform Migration - https://phabricator.wikimedia.org/T267351 (10Ottomata) [18:45:12] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: MobileWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T267347 (10Ottomata) [18:45:32] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (10Ottomata) [18:45:59] 10Analytics, 10Editing-team, 10Event-Platform, 10Patch-For-Review: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10Ottomata) [18:46:02] 10Analytics, 10Editing-team, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: VisualEditorFeatureUse Event Platform Migration - https://phabricator.wikimedia.org/T267353 (10Ottomata) [18:48:39] 10Analytics, 10Event-Platform, 10Product-Data-Infrastructure, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: PrefUpdate Event Platform Migration - https://phabricator.wikimedia.org/T267348 (10Ottomata) 05Open→03Resolved [18:48:43] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:48:48] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Structured-Data-Backlog, and 2 others: SuggestedTagsAction Event Platform Migration - https://phabricator.wikimedia.org/T267351 (10Ottomata) 05Open→03Resolved [18:48:52] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:48:54] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: MobileWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T267347 (10Ottomata) 05Open→03Resolved [18:48:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:48:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (10Ottomata) 05Open→03Resolved [18:49:01] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:49:06] 10Analytics, 10Editing-team, 10Event-Platform, 10Patch-For-Review: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10Ottomata) 05Open→03Resolved [18:49:08] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:49:12] 10Analytics, 10Editing-team, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: VisualEditorFeatureUse Event Platform Migration - https://phabricator.wikimedia.org/T267353 (10Ottomata) 05Open→03Resolved [18:49:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [18:49:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [19:06:06] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) @JAllemandou @mforns The patches are ready for review! Sorry there are so many, but I think the outcome is... [19:06:53] (03CR) 10Jdlrobson: [C: 03+1] "Does this one look backwards compatible to you Ottomata?" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/672740 (https://phabricator.wikimedia.org/T275794) (owner: 10Phuedx) [19:11:40] (03CR) 10Ottomata: "Yup, looks good, you are just adding a new non required field." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/672740 (https://phabricator.wikimedia.org/T275794) (owner: 10Phuedx) [19:33:24] 10Quarry, 10cloud-services-team (Kanban): Prepare Quarry for multiinstance wiki replicas - https://phabricator.wikimedia.org/T264254 (10Bstorm) @bd808 noticed that querying `meta_p` is currently broken by the handling of that URL in the backend. That needs patching. [19:34:29] 10Analytics, 10Event-Platform, 10Inuka-Team: InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10AMuigai) [19:35:47] 10Analytics, 10Event-Platform, 10Inuka-Team: KaiOSAppFeedback Event Platform Migration - https://phabricator.wikimedia.org/T267345 (10AMuigai) [20:04:35] (03CR) 10Razzi: Update mysql resolver to work with cloud replicas (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666209 (https://phabricator.wikimedia.org/T274690) (owner: 10Milimetric) [20:23:40] (03PS1) 10Bstorm: multiinstance support: fix the meta_p query logic [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/674427 (https://phabricator.wikimedia.org/T264254) [20:26:51] (03PS2) 10Bstorm: multiinstance support: fix the meta_p and centralauth query logic [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/674427 (https://phabricator.wikimedia.org/T264254) [20:33:22] (03CR) 10BryanDavis: [C: 03+1] multiinstance support: fix the meta_p and centralauth query logic [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/674427 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [20:46:32] 10Analytics, 10LDAP-Access-Requests: Add razzi to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T278262 (10Ottomata) [20:46:41] 10Analytics, 10LDAP-Access-Requests: Add razzi to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T278262 (10Ottomata) [20:50:24] 10Analytics, 10LDAP-Access-Requests: Add razzi to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T278262 (10Ottomata) 05Open→03Resolved Done. [21:25:05] lexnasser: deployed. but it's telling me the data's not there yet: https://wikimedia.org/api/rest_v1/metrics/pageviews/top-by-country/enwiki/all-access/2021/12 [21:38:27] (03CR) 10Bstorm: [C: 03+2] "I'm going to try to quick-deploy this in order to prevent confusion on those two databases." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/674427 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [21:38:57] (03Merged) 10jenkins-bot: multiinstance support: fix the meta_p and centralauth query logic [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/674427 (https://phabricator.wikimedia.org/T264254) (owner: 10Bstorm) [21:50:57] 10Quarry, 10Patch-For-Review, 10cloud-services-team (Kanban): Prepare Quarry for multiinstance wiki replicas - https://phabricator.wikimedia.org/T264254 (10Bstorm) Ok, it's looking pretty ok now except that I wish I had a nice big popup if someone forgets the new "database" field [22:07:57] Pchelolo: Thanks so much for deploying it! That's actually the 'top-by-country' endpoint that you're referring to, and you queried data for that for December of this year, so it wouldn't have any data for that [22:08:16] but I confirm that top-per-country works: https://wikimedia.org/api/rest_v1/metrics/pageviews/top-per-country/AT/all-access/2021/01/01 [22:08:40] again, thanks for making this process so smooth and quick! [22:09:14] no prob [22:09:30] woooh! nice! [22:37:33] 10Analytics-Radar, 10Growth-Scaling, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10Etonkovidova) Checked in betalabs - works as expected. [23:16:31] (03CR) 10Jdlrobson: [C: 03+2] universalLanguageSelector: Add timeToChangeLanguage property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/672740 (https://phabricator.wikimedia.org/T275794) (owner: 10Phuedx) [23:17:01] (03Merged) 10jenkins-bot: universalLanguageSelector: Add timeToChangeLanguage property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/672740 (https://phabricator.wikimedia.org/T275794) (owner: 10Phuedx) [23:40:27] 10Quarry: Quarry: HTTP 500 errors for all authenticated requests - https://phabricator.wikimedia.org/T278230 (10Urbanecm_WMF) Thanks, appreciated!