[01:40:35] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [01:43:01] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [06:26:00] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1084.eqiad.wmnet - https://phabricator.wikimedia.org/T276302 (10Marostegui) [06:31:06] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10Marostegui) Looking good: ` [06:29:42] marostegui@cumin1001:~$ sudo cumin 'db11[76-84].eqiad.wmnet' 'free -g ; echo ; df -hT /srv; echo ; pvs ; echo ; megacli -LdPdInfo -a0 | eg... [06:33:34] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1161 is now on dbctl but depooled. Won't pool till Monday [06:48:34] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [07:01:34] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1156.eqiad.wmnet'] ` The log ca... [07:05:04] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) For db1165 which will replace db1085 in s6, what I will do: - Do not reimage db1165 to Stretch, instead will leave it as Buster and... [07:05:20] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:18:52] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [07:21:46] PROBLEM - MariaDB sustained replica lag on pc2008 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [07:22:54] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1156.eqiad.wmnet'] ` and were **ALL** successful. [07:23:20] RECOVERY - MariaDB sustained replica lag on pc2008 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2008&var-port=9104 [07:36:51] 10DBA, 10Patch-For-Review: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [07:37:22] 10DBA, 10Patch-For-Review: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [07:40:07] 10DBA, 10Patch-For-Review: Switchover s7 from db1086 to db1136 - https://phabricator.wikimedia.org/T274336 (10Marostegui) [07:41:05] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:42:12] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [07:43:52] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) [07:44:12] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:23:12] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [09:23:25] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [09:37:50] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) s1 eqiad progress: [] db1083 master [] db1084 api [] db1099:3311 recentchanges et al [] db1105:3311 recentchanges et al [] db1106:3311 sanitarium master [] db1118 [] db1119 api [... [09:38:06] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1004 [] db1160 [] db1155 [] db1150 [] db1149 [] db1148 [] db1147 [] db1146 [] db1145 [] db1144 [] db1143... [09:38:08] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1004 [] db1160 [] db1155 [] db1150 [] db1149 [] db1148 [] db1147 [] db1146 [] db1145 [] db1144 [] db1143 [] db1142 [] db1141 [] db1138 [... [09:38:10] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [09:39:05] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [09:43:55] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [09:44:09] 10Blocked-on-schema-change, 10DBA: Drop default of rc_timestamp - https://phabricator.wikimedia.org/T276156 (10Marostegui) [09:44:45] 10Blocked-on-schema-change, 10DBA: Schema change to make rc_id unsigned and rc_timestamp BINARY - https://phabricator.wikimedia.org/T276150 (10Marostegui) [09:57:21] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) @Ladsgroup this can be deployed? As always with indexes renames, I would deploy this change to a host in s6 eqiad, and leave it a... [10:05:26] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Ladsgroup) Yeah it should be safe now. The fix for the change got deployed now (https://gerrit.wikimedia.org/g/mediawiki/core/+/7addd74ec3fbc... [10:05:54] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Marostegui) Thank you! [10:09:53] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) [10:13:27] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10jcrespo) @Marostegui @Kormat This is a follow up to our conversation on our last meeting: It doesn't have to happen now, but it would help me to have a sorted list (even i... [10:14:29] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui) The first one will be s6 for sure, and the second one either s5 or s2. I am more inclined for s5, but subject to change. [10:15:15] marostegui: Thank you. I just create more work for you :( [10:16:04] Amir1: But it is something we will benefit from in the long run! So happy to assign it to kormat! [10:16:07] Amir1: that's why we're frenemies :) [10:16:22] marostegui: 🤬 [10:16:37] lol [10:16:41] * Amir1 hides [10:17:44] btw. I stopped creating new tickets for drifts to let the current ones gets reduced [10:18:06] best way to reduce drifts is to stop checking for them. 🤔 [10:18:12] in mid-term I want to make it automated [10:18:29] Amir1: T276292 says "After 1.34 is deployed everywhere", but we're on MW 1.36 now? [10:18:29] T276292: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 [10:18:29] Amir1: 👍 [10:18:47] kormat: let me introduce you to my good friend tmp1 index [10:19:01] Majavah: sorry, I meant wmf.34 [10:19:26] Amir1: sounds like a most important index [10:19:49] 10Blocked-on-schema-change, 10DBA: Schema change for renaming new_name_timestamp to rc_new_name_timestamp in recentchanges - https://phabricator.wikimedia.org/T276292 (10Ladsgroup) [10:20:15] marostegui: care to elaborate for kormat the one and only tmp1? [10:21:09] kormat: btw this is also fun: https://phabricator.wikimedia.org/T132416 [10:21:14] oh good old days [10:21:58] I remember users in enwiki saying checking history of some pages is sometimes times out sometimes is blazing fast, that was why [10:22:00] more than a year to complete that one :) [10:22:37] o wow https://i.imgur.com/plBnAt5.png [10:23:03] LOOOOL [10:23:16] :D [10:23:56] Amir1: that task looks like a nightmare [10:24:38] and that's the most important table we basically have in production [10:25:01] kormat, were you in the times where tables had different partitions- on purpose? [10:26:17] T239453 and T233625 [10:26:17] T239453: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 [10:26:18] T233625: Change PK and remove partitions from the logging table - https://phabricator.wikimedia.org/T233625 [10:27:33] jynus: looks like i missed that, thankfully :) [10:28:00] not that partitioning was a bad idea or we shouldn't use them [10:28:22] the problems was the differeces between servers even within the same section [10:28:40] and how difficult was to do a schema change or data fix [10:30:16] I still don't know why schema of Russian Wikiquote was so different from the rest of wikis [11:40:55] applying tendril changes, won't guarantee I will put tendril down! [13:16:31] ohia jynus! [13:16:43] hi [13:16:55] so the wdwb-tech-focus board is current my tech related wikidata wikibase inbox [13:16:59] so that would be the place right now [13:17:05] "my tech related wikidata wikibase inbox" [13:17:12] I read words there :-D [13:17:24] I know some of the words :-DDD [13:17:25] my / the teams, but it is not quite official yet / still being formed [13:17:30] ok [13:17:51] there is a bunch of automation though, so if you tag something as DBA & wikidata it will land on that workboard anyway [13:17:52] so, whenever you have an official stuff, just end an email saying "do this", point to a wiki page [13:18:00] and we will do as asked [13:18:06] yes will do! :) [13:18:12] I am just intimidated with the amount of tags [13:18:22] e.g. we do the same, don't get me wrong [13:18:37] but for "outsiders" we habe a single point of entry (DBA) [13:18:40] *have [13:18:44] I just had a simmiarl chat with the platform team figuring out which board was their inbox :) [13:18:45] and then we take care of the rest [13:18:49] he he [13:19:10] but of course wikidata is a tad larger than databases [13:19:13] in scope [13:19:14] yeah, so our one true initial inbox would be the wikidata board, but that large [13:19:24] *that is large [13:19:26] (frontend, backend, query service, content, etc.) [13:19:58] so not complaining, more like asking "is this ok (what I did)?" [13:20:48] feel free to basically tell us "do it like this" [13:20:59] as I hate CCing random people [13:21:15] which sometimes doesn't have anything to do at its current position [13:32:36] 10Blocked-on-schema-change, 10DBA: Drop default of revactor_timestamp - https://phabricator.wikimedia.org/T267767 (10Kormat) 05Open→03Resolved Status: Complete! [13:32:45] 🎉 🎉 [13:33:32] \o/ [13:34:15] ~3 months since i started on it. that's pretty good really. [13:38:57] kormat: good work [13:39:28] but also requires a meme: https://i.imgflip.com/5279dv.jpg [13:40:40] ah haha [13:40:43] i was expecting MCR :) [13:40:54] so many to choose! [13:44:25] haha [14:27:21] 10DBA, 10Analytics, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > How would it handle if the replica goes down for a long time or even forever (ie... [15:09:45] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Andrew) * @LSobanski can we discuss getting this into your quarterly goals for q4? Or failing that, into annual goals for next year? [15:29:05] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: TBD) rack/setup/install db11[76-84] - https://phabricator.wikimedia.org/T273566 (10RobH) 05Open→03Resolved [15:33:58] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) @jcrespo you can take db1176 for the media backups with its pair in codfw being db2151 [16:04:29] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10jcrespo) Thank you. I am busy now with other setups, but I will hopefully take over both before the end of the quarter. Obviously I will reuse an existing profile for dedicated dbs, but do you t... [16:05:10] PROBLEM - MariaDB sustained replica lag on pc2009 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [16:05:44] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) >>! In T275633#6925542, @jcrespo wrote: > Thank you. I am busy now with other setups, but I will hopefully take over both before the end of the quarter. > No rush, take your time.... [16:06:46] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10jcrespo) > I would try to re-use misc one as much as you can m6? Seems wrong, as this is temporary. [16:08:02] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10Marostegui) Maybe misctest? [16:09:01] 10DBA, 10Patch-For-Review: Productionize db21[45-52] and db11[76-84] - https://phabricator.wikimedia.org/T275633 (10jcrespo) Thanks. [16:10:06] RECOVERY - MariaDB sustained replica lag on pc2009 is OK: (C)2 ge (W)1 ge 0.4 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=pc2009&var-port=9104 [17:51:13] 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage, 10Goal: Research storage solutions for media backups - https://phabricator.wikimedia.org/T264190 (10jcrespo) 05Open→03Resolved Research (analysis) and Design finished for now, we are now in implementation phase: T276442 and T276445. Documentation... [17:51:18] 10Data-Persistence-Backup, 10SRE, 10SRE-swift-storage, 10Epic, 10Goal: WMF media storage must be adequately backed up in a remote location - https://phabricator.wikimedia.org/T262668 (10jcrespo) [18:38:23] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10Etonkovidova) Checked in betalabs. The table has... [19:41:37] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10Tgr) Submitting reviews of recommendations is not... [21:42:51] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10Etonkovidova) >>! In T266446#6926500, @Tgr wrote:... [21:55:43] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10Tgr) The backend part that this part is about is i...