[05:25:02] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [05:34:05] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [05:41:03] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint): Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Tgr) a:03kostajh I think this is done, per the last two comments (please correct me... [05:42:36] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint): Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) >>! In T261411#6573592, @Tgr wrote: > I think this is done, per the last t... [06:11:43] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [06:13:14] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint): Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Tgr) >>! In T261411#6573596, @Marostegui wrote: > Thanks @Tgr - keep in mind that the... [06:17:06] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint): Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Tgr) That reminds me: there is a chance this table would have to be in a separate exte... [06:28:32] 10DBA, 10Data-Persistence, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint): Add a link engineering: Determine format for accessing and storing link recommendations - https://phabricator.wikimedia.org/T261411 (10Marostegui) If that needs to happen, we'd need to get some sleep between iterations to... [06:56:25] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [07:56:10] I am doing some tests with the mw api, any suggestion for a cool user agent for a bot reading files/images? [07:58:49] https://developers.whatismybrowser.com/useragents/explore/hardware_type_specific/car/ [07:59:24] Or maybe https://developers.whatismybrowser.com/useragents/explore/hardware_type_specific/glasses/ [07:59:43] ha ha [07:59:49] I don't want to impersonate a Tesla [08:00:08] I settled for now on WMFBackups/0.4 [08:00:12] :( [08:25:48] jynus: was the check also run on the 10.4 backup sources? [08:25:54] on all [08:25:56] nice [08:26:01] and they were clean [08:26:03] that's good news [08:26:06] I didn't exclude any dbstore for backups [08:26:18] only found issues on that common on db2099 [08:26:23] *commons [08:26:43] however, while the advice you got was ok [08:26:57] I don't think, based on what we know is the key to the issue [08:27:10] no, of course not [08:27:12] remember labsdb10XX was loaded from a dump [08:27:15] it was just a comment from them [08:27:18] yeah [08:27:19] yeah, they are aware [08:27:26] we run it, we exclude that [08:27:48] no problem, just mentioning I doubt it is the end of it :-D [08:28:42] we should run it also next time we do a recovery [08:28:52] to exclude mariabackup issues [08:29:36] yep, good idea [08:29:41] BTW, should I upgrade dbprov hosts (where prepare happens) to the latest package versions? [08:29:54] in 10.4 you mean? [08:30:08] I am guessing that is the only one that was recently upgraded? [08:30:19] yes, but it only includes the fix for galera, don't bother [08:30:20] we only have 1 host with x1 on 10.4/buster [08:30:28] there will be another 10.4. release in 2 weeks I think [08:30:33] ok [08:30:34] the scheduled 10.4.16 [08:30:40] so we can skip 10.4.15 [08:30:42] what about 10.1? [08:31:02] EOL already as you said? [08:31:17] we are on 10.1.44, I think the latest is 10.1.47, but I haven't put effort on compiling those as we are getting rid of them and lack of time :( [08:31:34] EOL yeah, the latest is 10.1.47 I think, but I'd need to double check [08:31:37] that's good for me, as long as there is no higher version [08:31:45] live than on dbprovs [08:31:48] that is my only concern [08:31:53] no, there are not [08:31:56] cool then [08:32:48] heads up, for production tests, I will use a new dir on dbprov2003 called "/srv/backups/media" [08:32:56] production tests of media backups [08:33:13] (it won't be the final location) [09:08:43] if everthing goes according to plan, in 2 hours we will have the first super-super-naive backup of testwiki images [09:16:27] so we can drop testwiki without worrying? [09:18:31] I don't think developers would be happy about that :-D [09:20:53] plus I said I could backup, not that I could recover :-D [09:30:01] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [10:05:21] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: orchestrator: Get packages into WMF apt - https://phabricator.wikimedia.org/T266023 (10Kormat) v3.2.3 has been uploaded: ` # apt policy orchestrator orchestrator: Installed: (none) Candidate: 1:3.2.3 Version table: 1:3.2.3 1001 1001... [11:05:15] Check this bug: https://bugs.mysql.com/bug.php?id=53588 the fix is just changing a constant definition, but it makes blackhole unusuable with InnoDB [11:06:45] I remember running into that one on 5.7 :( [11:07:25] I understand if there was no contributed patch or if the fix was complex [11:07:39] but it has been there ongoing for 10 years with a reasonable patch [11:08:05] it was applied on percona [11:08:08] let me check mariadb [11:08:36] I remember I actually ran on that on 5.7 when 5.7 was still super new and I was like: must be 5.7 and then I saw it was there since 5.5 [11:08:36] do we have any available test host I can use to create a test table? [11:08:42] db1077 [11:08:45] thanks [11:08:52] why do you need a blackhole one? [11:09:24] I don't need it now, but one of the options for binlog backus was blackhole [11:09:32] as a sort of cheap binlog server [11:09:43] * marostegui looks at jynus "binlog server" [11:09:52] no no, don't think badly [11:10:04] it also more likely won't use the server [11:10:11] I know what you mean ;-) [11:10:22] but I check if we can definitely not use it [11:12:18] we don't have blackhole enabled [11:12:47] "In MariaDB 10.1 and later, the storage engine's plugin will have to be installed." [11:12:59] is it a plugin on mysql too? I cannot remember [11:13:01] testing it temporarilly on db1077 [11:13:09] but it comes preinstalled [11:13:13] ah ok [11:13:15] by default on percona/mysql [11:13:57] it is not fixed on 10.4 [11:13:58] root@db1077:/opt/wmf-mariadb104# find . | grep black [11:13:58] ./lib/plugin/ha_blackhole.so [11:14:02] pheeew [11:14:05] so only percona [11:14:07] I was like: do we have to compile it.. [11:14:18] yeah, it is compiled [11:14:19] jynus: maybe worth checking mariadb's jira and report it if it is not there [11:14:36] I was noting that it wasn't enabled by default and that was different than mysql only [11:14:53] I may do [11:15:38] and they be even more open to apply it if it is disabled by default [11:16:08] yeah, and if reported early, maybe it would still be on 10.4 [11:16:14] as we are only on .15 [11:17:13] I see no duplicate, so I will file the feature request [11:18:05] +1 [11:31:07] https://jira.mariadb.org/browse/MDEV-24017 [11:32:14] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [11:32:37] THanks, voted and subscribed [12:07:38] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) [12:08:31] 10DBA, 10Data-Persistence: Enable replication eqiad -> codfw and other checks - https://phabricator.wikimedia.org/T261914 (10Marostegui) All tables came clean. All those that reported differences, were confirmed as false positives by second runs. The false positives were found at: `enwiki.page` `itwiki.page`... [12:12:11] oof. orchestrator's .deb installs to `/usr/local/orchestrator` [12:14:51] that's easy fix, it should have a configurable prefix [12:15:08] if done properly upstream, of course [12:15:24] what language does orchestrator use, kormat? [12:15:29] right now we're using upstream's debs [12:15:32] it's written in go [12:15:52] ah, then it makes sense being there [12:20:52] marostegui: hi [12:21:15] mszabo: hello! welcome :) [12:21:51] mszabo: this is Amir1's torture channel :) [12:22:09] :D [12:52:00] mszabo: o/ [13:01:32] cron from cumin: "Alias wikireplicas-analytics matched 0 hosts" [13:01:39] I will check config [13:01:56] jynus: I was told are new hosts with new role that would have been setup this week [13:02:06] ah, ok [13:02:09] so I didn't bother to remove it to just be re-added [13:02:31] not sure if they are late or there is a bug in the alias query [13:02:48] jynus: yes, those are expected to match the new clouddb hosts, once they are installed [13:02:53] "O:wmcs::db::wikireplica::analytics" [13:03:00] so what is the current alias then? [13:03:49] uff there is the normal and the dedicated ones [13:03:54] confusing [13:04:07] what do you mean normal and dedicated? [13:04:11] I am checking out of curiosity [13:04:50] ah, the dedicated may be for analytics [13:05:39] this is the current alias: https://phabricator.wikimedia.org/P13058 [13:07:25] but the multiinstance are new, the others should have stuff? [13:11:11] volans: I think I caught a bug [13:11:22] a typo, the email is right [13:11:46] what's wrong? [13:12:07] where is aliases.yaml I will send a patch, I can take care of it [13:12:13] puppet [13:16:44] easy fix: check-cumin-aliases [13:16:53] easy fix: https://gerrit.wikimedia.org/r/c/operations/puppet/+/636016 [13:30:24] alias works now and hopefully I haven't broken anything in the way (99% guaranteed!) [13:42:22] 10DBA, 10Operations, 10User-Kormat: orchestrator: Add service monitoring - https://phabricator.wikimedia.org/T266338 (10Kormat) [22:32:53] PROBLEM - MariaDB sustained replica lag on db1081 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [22:36:15] RECOVERY - MariaDB sustained replica lag on db1081 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [23:40:23] PROBLEM - MariaDB sustained replica lag on db1081 is CRITICAL: 14.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [23:56:53] RECOVERY - MariaDB sustained replica lag on db1081 is OK: (C)2 ge (W)1 ge 0.6 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104