[01:30:29] jynus: do you know if "SHOW GLOBAL VARIABLES LIKE 'gtid_slave_pos'" has less locking that "SHOW SLAVE STATUS"? [04:06:52] 10DBA, 10Data-Services, 10Tool-Global-user-contributions, 10Toolforge, 10cloud-services-team (Kanban): Database error: Unable to connect to s7.web.db.svc.eqiad.wmflabs - https://phabricator.wikimedia.org/T182916#3951551 (10Krinkle) 05Open>03Resolved a:03Krinkle Closing per {T186436}. The connection... [06:28:00] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3951635 (10Marostegui) 05Open>03Resolved Worked fine this time - thanks! ``` root@db2039:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380312089D0) Port... [06:52:24] 10DBA, 10Operations, 10ops-eqiad: db1051 database host BBU issues - https://phabricator.wikimedia.org/T186049#3951640 (10Marostegui) 05Open>03Resolved After a full recharge it looks good now: ``` root@db1051:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Voltage: 3936 mV Current... [07:42:55] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951658 (10Marostegui) [07:43:10] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951668 (10Marostegui) p:05Triage>03Normal [08:20:55] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951677 (10Marostegui) There are no views associated to these databases. [09:02:08] I am going to failover labsdb1011 to labsdb1010, you ok with it? [09:02:13] jynus: ^ [09:02:20] https://gerrit.wikimedia.org/r/408762 [09:02:44] it is temporary [09:09:53] done and now switching back [09:15:59] and now 1009 to 1010 [09:19:37] yes [09:20:55] done [09:20:58] switching back [09:21:03] I love how easy it is now to operate labs [09:23:01] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951658 (10jcrespo) T181925 [09:24:23] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951790 (10Marostegui) 05Open>03declined >>! In T186685#3951786, @jcrespo wrote: > T181925 Thanks - I knew it was familiar and I recalled talking about this before, but as I couldn't find the ticket... [09:26:38] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951794 (10jcrespo) why closing this? This is perfectly valid! [09:29:03] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951795 (10Marostegui) I closed it because I was agreeing with myself from the past :-) : T181925#3825796 [09:32:19] 10DBA, 10Data-Services: labsdb1010 crashed - https://phabricator.wikimedia.org/T186579#3951798 (10Marostegui) We discussed this in private and it was clarified that we actually were on the same page, just that I expressed myself in a non too clear way :-) It is all clarified and yes, we will rebuild this host... [09:37:03] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3951799 (10Marostegui) 05declined>03Open We have chatted about it in private and we believe they should be delete from labs indeed, as they are marked as deleted and the "policy" is: if deleted or pri... [09:44:23] 10DBA, 10Data-Services, 10Tool-Global-user-contributions, 10Toolforge, 10cloud-services-team (Kanban): Database error: Unable to connect to s7.web.db.svc.eqiad.wmflabs - https://phabricator.wikimedia.org/T182916#3951807 (10jcrespo) As I said, 10 increased to 15-20 is something I am open to do, and we did... [11:19:56] 10DBA, 10Operations, 10ops-eqiad: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#3952063 (10Marostegui) p:05Triage>03Normal [11:38:24] s1 and s4 backups are finally running on dbstore2001 [11:38:52] I will now handle m* backups [11:46:31] \o/ [11:46:34] great news [11:50:47] fully manually, there is not integration with cron yet [12:04:14] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3952200 (10Marostegui) @Anomie can you generate some more writes on testwiki? I have sanitized existing records and now ar_comment_i... [12:10:10] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3952208 (10Marostegui) I will delete these databases from sanitarium with replication tomorrow if we all agree. [12:33:35] 10DBA, 10Data-Services: Remove deleted wikis from wikireplicas - https://phabricator.wikimedia.org/T186685#3952278 (10jcrespo) I do, we can reverse that if we want, but that would take more effort (reimporting all deleted wikis). I would prefer to work towards deleting them from production, too :-). [12:43:32] answer me when you can- I will want to put dbstore1001 with one of the roles of es200* (mysql with no monitoring) [12:43:43] to clear icinga [12:43:49] +1 :) [12:43:55] as we think that is not coming back [12:44:05] nope [12:44:14] I also tried innodb force recovery [12:44:20] to see if it could even come back read only [12:44:21] but not even [12:44:55] the dbstore2002 is me with the backups [12:45:03] :) [12:45:49] I guess it is not really worth creating manual backups for m* hosts [12:46:01] I will just fix and move the automated ones directly [12:46:16] the mariadb::backup role or whatever [12:46:29] yeah [12:46:39] I wouldnt bother with m* backups now [12:47:01] well, we can move? some of the existing ones, just in case [12:47:20] and setup the future ones [12:47:47] there should be enough space for that [12:49:17] 1.4T . of /srv/backups [12:52:56] actually, m* backups ran yesterday [12:53:27] ugh [12:53:54] root@dbstore1001:/srv/backups$ ls -lhaDr <--- much fun [12:55:49] I think it is just emacs mode? [12:56:49] oh wtf! [12:56:50] haha [13:17:34] 10DBA, 10Wikimedia-General-or-Unknown: Request to manually rename a gadget in fr.wiki database - https://phabricator.wikimedia.org/T186702#3952422 (10Aklapper) Removing #MediaWiki-database as this is not about the database related code in the MediaWiki core software code repository. Removing #gadgets as this i... [13:29:15] ^that is netops, isn't it, because it requires a "network admin"? [13:29:52] Yeah, but not sure if even a network admin is what they really need [13:30:43] did you deploy the repool? [13:30:52] I wanted to do some stuff on enwiki [13:31:14] 10DBA: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599#3952433 (10Marostegui) [13:33:10] which repool? [13:35:04] db1089 [13:35:48] Yeah, I repooled that one - I wasn't aware you'd need it [13:35:48] sorry! [13:35:56] no, I dont need it [13:36:14] I was just asking now if you planned to do more stuff on enwiki soon [13:36:24] yeah, I am working on the text table [13:36:30] but we can coordinate, no problem [13:36:38] for today I am done with enwiki [13:37:02] I am not sure if I will have time today [13:37:13] I think backups have more priority [13:37:24] yeah, agreed [13:37:25] but I want at some point handle [13:37:40] https://phabricator.wikimedia.org/T186266 [13:38:09] which may need a depool and or an upgrade [14:15:21] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3952478 (10Anomie) I just deleted two revisions on testwiki, ar_id 90260 and 90261. Let me know if you need more. [14:33:04] 10DBA, 10Operations, 10ops-eqiad: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952557 (10jcrespo) a:03jcrespo claiming it for cleaning up purposes only. [14:38:39] 10DBA: convert dbstore1001 to multi-instance + InnoDB compressed by importing db shards to it - https://phabricator.wikimedia.org/T159430#3952595 (10jcrespo) 05Open>03declined T186596 happened, we should decline this and create a new one setting up the new 2 provisioning hosts. [14:40:33] 10DBA, 10Goal, 10Patch-For-Review: Decommission database hosts <= db2031 (tracking) - https://phabricator.wikimedia.org/T176243#3952607 (10jcrespo) [14:42:16] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#3952614 (10jcrespo) Only dbstore1002 pending, which we are in talks with #analytics to replace soon CC @Ottomata @elukey (not sure if there is a ticket already). [14:47:40] 10DBA, 10Operations: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179#3952629 (10jcrespo) This is important, but not a goal for this quarter- we are still blocked on mediawiki extension maintainers to be compatible with it; however, all databases (misc, x1, parsercache, e... [14:50:58] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952677 (10jcrespo) @Marostegui, let me know what you think of the plan: * Deploy the above patch * Move current backup files to d... [14:56:42] 10DBA, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog (Kanban): Update duplicate handling in reading lists API - https://phabricator.wikimedia.org/T184680#3892212 (10jcrespo) We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on att... [14:57:48] 10DBA, 10Data-Services: Add base36 functions to ToolForge database - https://phabricator.wikimedia.org/T185673#3952711 (10jcrespo) We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending features requests due to that- but site reliability is... [15:02:31] 10DBA, 10Data-Services: Create method for accessing user watchlists in database queries - https://phabricator.wikimedia.org/T182948#3952732 (10jcrespo) 05Open>03declined I am going to decline this based on our own feedback, but that doesn't mean we cannot continue discussing other methods of doing what you... [15:02:34] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3952734 (10jcrespo) [15:34:28] 10DBA, 10Data-Services: Make Dispenser's principle_links table accessible in new Wiki replica cluster - https://phabricator.wikimedia.org/T180636#3952815 (10jcrespo) I think this could be added to the analytics databases without problem- but it may take some time due to other ongoing issues, plus the work need... [15:37:02] 10DBA, 10Data-Services: "ERROR 2006 (HY000): MySQL server has gone away" failures for a variety of queries against Wiki Replica servers - https://phabricator.wikimedia.org/T180380#3952840 (10jcrespo) 05Open>03Resolved a:03jcrespo Sadly, labsdb IS a hostile envirnement, and the way to solve this, which is... [15:39:53] 10DBA, 10Phabricator, 10Security: Improve privilege separation for phabricator's config files and mysql credentials - https://phabricator.wikimedia.org/T146055#3952848 (10jcrespo) We need to catch up on a lot of pending DBA-phabricator tasks (this, some pending failovers, upgrades to strech/mariadb 10.1) and... [15:41:01] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3952855 (10jcrespo) We are currently experiencing a bit of a crisis with several emergencies on core systems, we may have some delay on attending... [15:41:46] 10DBA, 10Community-Tech, 10MediaWiki-extensions-GlobalPreferences, 10Patch-For-Review, 10Schema-change: DBA review for GlobalPreferences schema - https://phabricator.wikimedia.org/T184666#3952856 (10jcrespo) We are currently experiencing a bit of a crisis with several emergencies on core systems, we may... [15:42:47] 10DBA: Failover existing eqiad database backup system to the new codfw database logical backup system - https://phabricator.wikimedia.org/T184697#3892734 (10jcrespo) a:03jcrespo [15:45:36] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3952876 (10Anomie) This has been waiting since December 2016, a little more time won't hurt anything. [15:47:28] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3952878 (10jcrespo) Well, it was supposed to happen just after comments in our schedule, so sorry about that. [15:52:41] 10Blocked-on-schema-change, 10DBA, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#2871964 (10Marostegui) Yeah, I was planning to take this as soon as the refactored comments schema change is completely done (which it is now, pe... [15:58:13] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3952904 (10Marostegui) >>! In T186596#3952677, @jcrespo wrote: > @Marostegui, let me know what you think of the plan: > * Deploy t... [16:03:19] the tests confirm my supicions [16:03:55] separating storage and mysql, and having mysqls on more than one server almost scales linearly [16:04:13] it took 1h30m to backup enwiki [16:13:01] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3952964 (10Marostegui) >>! In T174569#3952478, @Anomie wrote: > I just deleted two revisions on testwiki, ar_id 90260 and 90261. Let... [16:15:45] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3952976 (10Marostegui) [16:40:34] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3953148 (10Marostegui) The following shards, on all the servers, are looking good: s1, s2, s4,s5,s6,s7,s8. Pending to check s3 which... [17:06:45] jynus: for T186689 I want to test capturing the `max_user_connections` error. Could you temporarily change user `u11106` (my personal account) to have a max quota of 1 connection? [17:06:45] T186689: toolforge: xtools: exceed max database connections - https://phabricator.wikimedia.org/T186689 [17:08:54] sure, please request it on ticket for security reasons [17:09:50] okay thx :) [17:10:17] I can see 11106 is your account, but we have to be sure and check you are in control of that via phab [17:10:44] you do not want random IRC people to close your account with no accountability :-) [17:11:09] hehe, makes sense [17:17:38] jynus: so I should use the security form? https://phabricator.wikimedia.org/maniphest/task/edit/form/2/ [17:18:40] no, just comment on the ticket your request, I will try to process it when I have the time (currently busy with some imporant stuff) [17:19:16] okay no problem [17:19:45] T186730 [17:19:45] T186730: Change user u11106 to have max 1 open connection - https://phabricator.wikimedia.org/T186730 [17:21:02] 10DBA: Change user u11106 to have max 1 open connection - https://phabricator.wikimedia.org/T186730#3953292 (10jcrespo) a:03jcrespo [18:18:56] 10DBA: Change user u11106 to have max 1 open connection - https://phabricator.wikimedia.org/T186730#3953527 (10jcrespo) 05Open>03Resolved Done on the 3 labsdb backends for wikireplicas (not on toolsdb). ``` SHOW GRANTS FOR 'u11106'; GRANT USAGE ON *.* TO 'u11106'@'%' ... WITH MAX_USER_CONNECTIONS 1; ``` [18:38:14] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3953599 (10jcrespo) I am currently on step 2, "Moving current backup files to dbstore2001", FYI, `/srv/backups/_mysqldump`, will t... [18:45:52] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: dbstore1001 crashed: Multibit ECC errors were detected on the RAID controller. - https://phabricator.wikimedia.org/T186596#3953621 (10Cmjohnson) I will not be available this week..Let's circle back to this mid-week next week please. [19:36:55] jynus: what is this mysql migration you mentioned? [19:37:09] possibility, that is