[05:58:36] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on es1023 - https://phabricator.wikimedia.org/T268796 (10Marostegui) a:03wiki_willy @wiki_willy this host is under warranty, can we order a new disk from Dell?
[05:58:53] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on es1023 - https://phabricator.wikimedia.org/T268796 (10Marostegui) p:05Triage→03Medium
[06:10:37] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui)
[06:19:21] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Deploy labsdbuser and views to new clouddb hosts - https://phabricator.wikimedia.org/T268312 (10Marostegui) @Bstorm have you found any other grant issues or should I go ahead and deploy all those roles/users to the rest of the clou...
[06:20:58] <wikibugs>	 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) Let's go ahead for the 01/12/2020
[06:52:51] <wikibugs>	 10DBA, 10Operations, 10CAS-SSO, 10User-jbond: Request new database for idp.wikimedia.org - https://phabricator.wikimedia.org/T268327 (10Marostegui) Added the new DB to the misc doc https://wikitech.wikimedia.org/w/index.php?title=MariaDB%2Fmisc&type=revision&diff=1889656&oldid=1889330
[07:05:17] <wikibugs>	 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui)
[07:07:32] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui)
[07:09:04] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui) clouddb1016:3315:   - Data copied from db1124:3315 - Host added to tendril and zarcillo - Root password changed - Replication started from:...
[07:13:31] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui)
[07:13:54] <wikibugs>	 10DBA, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Productionize clouddb10[13-20] - https://phabricator.wikimedia.org/T267090 (10Marostegui)
[08:06:49] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui)
[08:10:45] <jynus>	 I am checking x1 backups
[08:26:05] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui)
[08:32:46] <jynus>	 something weird happend with the systemd timer yesterday- it didn't execute
[08:33:19] <jynus>	 cumin2001 systemd[1]: regular_snapshot.service: Current command vanished from the unit file, execution of the command list won't be resumed.
[08:34:24] <jynus>	 So how exactly do I convice systemd to execute it? and why it failed on codfw but not on eqiad?
[08:35:20] <marostegui>	 Nov 25 16:41:48 cumin2001 puppet-agent[7994]: (/Stage[main]/Profile::Mariadb::Backup::Transfer/Systemd::Timer::Job[regular_snapshot]/Systemd::Unit[regular_snapshot.service]/File[/lib/systemd/system/regular_snapshot.service]/content) content changed '{md5}90e5b0cd94a53b44591913ebe688f247' to '{md5}583d2cd189709b2a570ac3325b8de746'
[08:35:25] <jynus>	 yes
[08:35:27] <marostegui>	 So something changed there?
[08:35:27] <jynus>	 that I know
[08:35:33] <jynus>	 I deployed this:
[08:35:57] <jynus>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/643223/4/modules/profile/manifests/mariadb/backup/transfer.pp
[08:36:22] <jynus>	 but from that I expected to run the new command, not to fail, and it only failed on 1 out of 2 hosts (it worked on cumin1001)
[08:36:29] <marostegui>	 But from that same puppet run, it looks like it broke?
[08:36:37] <marostegui>	 cause it is at the same time that the refresh was done
[08:36:42] <marostegui>	 at 16:41:48
[08:36:57] <marostegui>	 although later it does a reload what worked apparently: Nov 25 16:41:48 cumin2001 puppet-agent[7994]: (/Stage[main]/Profile::Mariadb::Backup::Transfer/Systemd::Timer::Job[regular_snapshot]/Systemd::Unit[regular_snapshot.service]/Exec[systemd daemon-reload for regular_snapshot.service]) Triggered 'refresh' from 1 event
[08:37:26] <jynus>	 refresh is ok, but backups were 2 hours later?
[08:38:11] <jynus>	 so I am worried that our puppet timer code is unreliable under some conditions
[08:38:55] <jynus>	 maybe there is a race condition on update or something?
[08:39:07] <marostegui>	 there are no logs apart from those on why it failed?
[08:39:19] <jynus>	 on the timer side, no
[08:39:34] <jynus>	 just the "it vanished, bye!" :-)
[08:39:52] <jynus>	 I am going to do an ensure => absent, ensure => present
[08:40:16] <jynus>	 and them report to code maintainers to see if they have some idea
[08:40:33] <jynus>	 check if it repeats again
[08:41:00] <jynus>	 we have monitoring for this, so we would have caught it (This is why x1 and soon other backups checks will fail)
[08:41:30] <jynus>	 but if it is a systemd puppet code timer issue it is more worrying because it is used for other stuff too
[08:41:41] <marostegui>	 https://phabricator.wikimedia.org/T255132#6214939
[08:41:57] <marostegui>	 Maybe worth checking with him to see if he found something else about it
[08:42:00] <jynus>	 yeah
[08:42:17] <jynus>	 although if you can see, there was a merge after that
[08:42:27] <jynus>	 supposedly avoiding the issue
[08:42:34] <jynus>	 I will report there that it happened for us
[08:42:38] <marostegui>	 yep
[08:42:40] <jynus>	 maybe as a workaround
[08:42:53] <jynus>	 we should disable and reenable timers when modifying them
[08:43:21] <jynus>	 it is not like cron didn't have its own issues (e.g. when disabling them)
[08:43:28] <jynus>	 (re:puppet)
[08:43:40] <jynus>	 thank marostegui you helped me a lot with this
[08:48:38] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui) s7 eqiad progress  [x] dbstore1003 [] db1136 [] db1127 [x] db1116 [x] db1101 [x] db1098 [] db1094 [x] db1090 [] db1086 [] db1079
[08:53:50] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui)
[09:09:37] <wikibugs>	 10DBA, 10Orchestrator: Configure mariadb to notice/recover from replication issues quicker - https://phabricator.wikimedia.org/T268320 (10Marostegui)
[09:25:28] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Refactor mariadb puppet code - https://phabricator.wikimedia.org/T256972 (10Kormat)
[09:25:36] <wikibugs>	 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat, 10User-jbond: Standardize/centralize mapping from section to mariadb port/socket and prom-mysql-exporter port - https://phabricator.wikimedia.org/T257033 (10Kormat) 05Open→03Resolved a:03Kormat I think it's good enough to resolve at this point...
[09:27:43] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1015.eqiad.wmnet - https://phabricator.wikimedia.org/T268810 (10Marostegui)
[09:28:44] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1015.eqiad.wmnet - https://phabricator.wikimedia.org/T268810 (10Marostegui)
[09:30:47] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1016.eqiad.wmnet - https://phabricator.wikimedia.org/T268812 (10Marostegui)
[09:31:14] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1016.eqiad.wmnet - https://phabricator.wikimedia.org/T268812 (10Marostegui)
[09:39:10] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1016.eqiad.wmnet - https://phabricator.wikimedia.org/T268812 (10Marostegui)
[09:51:08] <jynus>	 I have added documentation at: https://wikitech.wikimedia.org/wiki/Mysql.py
[09:53:59] <kormat>	 jynus: nice! i just made a small edit to correct one bit
[09:59:19] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on es1023 - https://phabricator.wikimedia.org/T268796 (10wiki_willy) a:05wiki_willy→03Cmjohnson @Marostegui - for sure.  Moving over to @cmjohnson to start the RMA process with Dell (S/N: DTJT513 for a Dell PowerEdge R740xd).  Thanks, Willy  >>! In T268796#...
[09:59:56] <jynus>	 with a combination of those we could start thinking to create man pages, but sadly, while syncronyzing doc.wikimedia.org and packages documentation is trivial, not so much for wikitech pages
[10:00:35] <jynus>	 for transfer.py I ended up linking it: https://wikitech.wikimedia.org/wiki/Transfer.py#Usage
[10:04:24] <wikibugs>	 10DBA, 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Config, and 2 others: Create integration test env for wmfmariadbpy - https://phabricator.wikimedia.org/T265266 (10Kormat)
[10:30:10] <jynus>	 sobanski: free offsite hosting? https://aws.amazon.com/opendata/open-data-sponsorship-program/
[10:49:06] <sobanski>	 jynus: sounds like we could also use this for Wiki replicas ;)
[10:49:37] <sobanski>	 It's only for two years though
[11:37:53] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui)
[11:58:15] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui)
[12:17:29] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10LSobanski) @Jclark-ctr based on the DC entry schedule, when do you expect you will be able to take a look at this? Knowing this would allow us to bette...
[12:22:11] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) @lsobanski I will be on site Monday
[12:24:04] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10LSobanski) @Cmjohnson Would it be possible to plan for racking 5 instead of 3 of the new hosts in one go? It would help us prepare fot Sanitarium host Buster/10.4...
[12:24:32] <wikibugs>	 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10LSobanski) Thanks!
[12:40:33] <arturo>	 having an onsite wiki-replica is the wet dream of many companies (google, amazon, etc) RE: amazon free hosting
[12:41:30] <wikibugs>	 10Blocked-on-schema-change, 10DBA: Schema change for renaming namespace_title index on watchlist - https://phabricator.wikimedia.org/T268004 (10Marostegui) s8 progress   [x] dbstore1005 [] db1126 [x] db1116 [x] db1114 [] db1111 [] db1109 [] db1104 [x] db1101 [x] db1099 [x] db1092 [] db1087
[13:43:29] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1017.eqiad.wmnet - https://phabricator.wikimedia.org/T268825 (10Marostegui)
[13:43:41] <wikibugs>	 10DBA, 10decommission-hardware: decommission es1017.eqiad.wmnet - https://phabricator.wikimedia.org/T268825 (10Marostegui) a:03LSobanski
[13:43:48] <marostegui>	 \o/
[13:48:51] <kormat>	 :D
[14:03:12] <wikibugs>	 10DBA, 10mariadb-optimizer-bug: Investigate possible optimizer regression on 10.4.17 with DELETE statements - https://phabricator.wikimedia.org/T268457 (10Marostegui) a:03Marostegui
[14:04:09] <wikibugs>	 10DBA: Evaluate the impact of changing innodb_change_buffering to inserts - https://phabricator.wikimedia.org/T263443 (10Marostegui) This is running by default on all the clouddb hosts.
[14:06:16] <wikibugs>	 10DBA: Add a link engineering: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10Marostegui) @kostajh - reminder we are still waiting on knowing from where this database will be accessed. I could grant 10.64.% or whatever, but if there's something more concrete, that'd be us...
[15:12:30] <wikibugs>	 10DBA, 10Beta-Cluster-Infrastructure, 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki)
[15:13:19] <wikibugs>	 10DBA, 10Beta-Cluster-Infrastructure, 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki) 05Open→03Resolved a:03jijiki I am marking this as resolved 🎉
[15:58:38] <wikibugs>	 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Fantastic, thank you so much! We'll update you when the release is complete.