[00:36:29] 10DBA, 10MediaWiki-General, 10TechCom-RFC: RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Krinkle) [00:36:46] 10DBA, 10MediaWiki-General, 10TechCom-RFC: RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Krinkle) [00:37:53] 10DBA, 10MediaWiki-General, 10TechCom-RFC: RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Krinkle) [00:38:11] 10DBA, 10MediaWiki-General, 10TechCom-RFC: RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Krinkle) a:05Krinkle→03None [00:38:25] 10DBA, 10MediaWiki-General, 10TechCom-RFC, 10Performance-Team (Radar): RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Krinkle) [01:27:08] 10DBA, 10MediaWiki-General, 10TechCom-RFC, 10Performance-Team (Radar): RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10DannyS712) While it doesn't necessarily address the overall use of enum, discussion above regarding encouraging normalization tables led me to a pro... [01:29:00] 10DBA, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: [RFC] improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Krinkle) [01:31:40] 10DBA, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Krinkle) [01:35:46] 10DBA, 10MediaWiki-General, 10TechCom-RFC, 10Performance-Team (Radar): RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Ladsgroup) I honestly support banning using enum altogether. @jcrespo's point of view is more on the DBA side. Let me share my developer point of vi... [03:11:37] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10Patch-For-Review: ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in production) - https://phabricator.wikimedia.org/T251188 (10ARamirez_WMF) [03:12:01] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10Patch-For-Review: ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in production) - https://phabricator.wikimedia.org/T251188 (10ARamirez_WMF) [04:43:51] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) a:05Jclark-ctr→03jcrespo Per my IRC chat with John, assigning this back to Jaime as the on-site part is done Than you John! [04:48:20] 10DBA, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) labsdb1011 keeps catching up nicely: ` root@labsdb1011:~# mysql -e "show all slaves status\G" | grep Seconds Seconds_Behind_Master: 115303 Seconds_Be... [05:07:00] 10DBA: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May - https://phabricator.wikimedia.org/T251982 (10Marostegui) 05Open→03Resolved This was done. RO starts: 05:00:30 RO stops: 05:03:28 [05:07:02] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:08:49] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:09:05] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) 05Open→03Resolved This is all done! [05:14:47] 10DBA: Improve output message readabiliy of transfer.py - https://phabricator.wikimedia.org/T252802 (10Marostegui) Oh nice work! :-) I will definitely not miss all the verbosity there hehe Thanks you @Privacybatm for helping out here! [05:32:52] there is an increase in deletions on enwiki after read only, strange [05:33:08] https://grafana.wikimedia.org/d/000000273/mysql?panelId=2&fullscreen&orgId=1&from=1590028382839&to=1590039182839&var-dc=eqiad%20prometheus%2Fops&var-server=db1083&var-port=9104 [07:03:57] jynus: did you do dbctl changes in cumin2001? [07:04:04] there are pending changes to commit there [07:04:11] for es1019 [07:05:21] I think that is just the missleading error message [07:05:37] same thing that happens to you sometimes [07:05:55] But the difference is that a dbctl config diff now shows changes [07:06:11] I didn't touch cumin2001 [07:06:19] Ah it is now gone [07:06:26] I didn't realise you just made the change :) [07:06:33] I just saw -operations [07:06:42] so yep, nevermind, sorry for the alarm [07:07:17] what I meant is that the alert warns sometimes depending on when the check is done [07:07:39] even if it is only for the 1-2 seconds to commit :-D [07:08:02] yeah, but I didn't see in operations that a dbctl change was done [07:08:05] so I was like uuuuh? [07:08:20] is there a way to lock changes like with scap? [07:08:34] e.g. if there was an important maintenance [07:08:44] No that I know of [09:08:58] The moment is arriving, labdsb10011 is almost done with catching up with replication [09:09:03] so we need to decide if we want to: [09:09:07] 1) pool it directly [09:09:11] 2) restart and see what happens [09:27:04] If it hasn't crashed, I am tempted to go for #2, so we can see what happens once it gets load [09:47:37] 10DBA, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) labsdb1011 is up-to-date: ` # mysql.py -hlabsdb1011 -e "show all slaves status\G" | grep Seconds Seconds_Behind_Master: 0 Seconds_Behind_Master: 0... [10:26:59] jynus: any suggestion on the above? ^ [10:27:20] not really [10:27:31] we should have a copy on db1141 [10:28:34] ok, then I think I will go ahead and pool labsdb1011 after the meeting [10:28:53] All the alters went thru fine as well [10:28:55] No errors [10:29:07] I can maybe issue a select thru some big tables across all wikis [10:30:17] maybe even categorylinks, which is the one that we've seen on the errors [10:30:57] the issue based on the error msg and ticket suggests purges+secondary indexes [10:31:07] yeah [10:31:25] I will do a quick select count on category links or something [10:31:27] so maybe start a transaction, leave it open [10:31:29] replicate [10:31:36] then select by secondary index [10:31:47] idk [10:32:00] I can try that yeah [10:32:08] but it is hard to replicate exactly what might be happening [10:32:14] but it definitely involves purging yeah [10:32:21] or at least the errors always shows that [10:41:38] I would like to see how the other analytics host behaves after purge kicks in [10:50:18] 10DBA, 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10Aklapper) >>! In T161296#5005686, @jcrespo wrote: > We can thing of enabling extra metrics later, but stalling this as the basic work is done (not a blocker anymo... [10:51:15] 10DBA, 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) 05Stalled→03Resolved a:03jcrespo [10:51:19] 10DBA, 10Operations, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo) [10:52:39] 10DBA: Degraded performance on parsercache with buster/mariadb upgrade - https://phabricator.wikimedia.org/T252761 (10jcrespo) As a followup of T161296 we need to research the changes on buster to understand new metrics making scrapping slower, plus if we should enable or disable more metrics for buster. [10:54:38] 10DBA: Degraded performance on parsercache with buster/mariadb upgrade - https://phabricator.wikimedia.org/T252761 (10jcrespo) [10:54:43] 10DBA, 10Operations, 10observability, 10Patch-For-Review: MySQL metrics monitoring - https://phabricator.wikimedia.org/T143896 (10jcrespo) [11:19:48] 10DBA: Improve output message readabiliy of transfer.py - https://phabricator.wikimedia.org/T252802 (10Privacybatm) hehe, Thank you :-) [12:08:03] jynus: going to repool labsdb1011 [12:08:05] cross your fingers [12:10:30] 10DBA: Productionize db114[1-9] - https://phabricator.wikimedia.org/T252512 (10Marostegui) [12:13:18] ok, I see queries arriving [12:14:37] 10DBA, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) labsdb1011 is now serving queries. Quarry seems to be working fine too: https://quarry.wmflabs.org/query/45075 [12:33:18] 10DBA: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May - https://phabricator.wikimedia.org/T251982 (10Agusbou2015) [12:33:54] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [13:02:29] 10DBA, 10Cloud-Services, 10CPT Initiatives (API Gateway): Prepare and check storage layer for api.wikimedia.org - https://phabricator.wikimedia.org/T246946 (10WDoranWMF) [13:02:50] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10Privacybatm) {F31835685} How about writing our document with Sphinx? [13:03:52] 10DBA, 10Patch-For-Review: Add more information to --help option of transfer.py - https://phabricator.wikimedia.org/T253219 (10jcrespo) > How about writing our document with Sphinx? Just send a patch :-D [13:43:31] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) > Per my IRC chat with John Could you tell me more, as before a processor error was mentioned, but then a board change? [13:45:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Marostegui) >>! In T250602#6155334, @jcrespo wrote: >> Per my IRC chat with John > > Could you tell me more, as before a processor error was mentioned, but then a board change? My cha... [13:58:26] https://grafana.wikimedia.org/d/000000377/host-overview?panelId=3&fullscreen&orgId=1&refresh=5m&var-server=labsdb1011&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&from=now-40d&to=now [13:58:47] if the CPU keeps like that, 10.4 confirm what we saw on wikidatawiki, good improvements on CPU usage [13:58:53] with compression [13:59:14] cool [14:00:27] I am going to test 10.4 on a backup source: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/597780 [14:01:54] oh cool [14:02:30] that way I could run compare- if I find a buster client with the current one [14:03:00] you mean the same sections? [14:03:34] yes, it will be setup with s1 and s6, which are already on other host [14:04:08] then give it some actions to check it doesn't crash again [14:04:40] nice [15:46:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) a:05jcrespo→03Jclark-ctr Hi, I cannot reinstall the server because the remote ipmi interface doesn't work (and the ssh or the https acesses, that are... [20:29:14] 10Blocked-on-schema-change, 10DBA: Babel schema changes - https://phabricator.wikimedia.org/T253342 (10Reedy) [20:34:51] 10Blocked-on-schema-change, 10DBA: Apply Babel schema change expanding babel_lang in Wikimedia production - https://phabricator.wikimedia.org/T253342 (10Jdforrester-WMF) [20:36:08] 10Blocked-on-schema-change, 10DBA: Apply Babel schema change expanding babel_lang in Wikimedia production - https://phabricator.wikimedia.org/T253342 (10Jdforrester-WMF) Surely this is the #schema-change and {T226546} would have been #blocked-on-schema-change (except it was merged ahead of the change being app... [20:43:21] 10Blocked-on-schema-change, 10DBA: Apply Babel schema change expanding babel_lang in Wikimedia production - https://phabricator.wikimedia.org/T253342 (10Reedy) >>! In T253342#6156559, @Jdforrester-WMF wrote: > Surely this is the #schema-change and {T226546} would have been #blocked-on-schema-change (except it... [20:50:17] 10Blocked-on-schema-change, 10DBA: Apply Babel schema change expanding babel_lang in Wikimedia production - https://phabricator.wikimedia.org/T253342 (10Jdforrester-WMF) Yes, I know what the docs say, I think they're confused. [22:32:05] 10DBA: Upgrade and restart s1 (enwiki) primary database master: Thu 21th May - https://phabricator.wikimedia.org/T251982 (10Agusbou2015)