[04:25:53] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Positions where db1141 was stopped for restart: {P11328} [04:35:20] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Good news, db1141 has been restarted 10 minutes ago, and so far no crashes and no errors on the log. All clean. I am going to wait... [04:38:41] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for awawiki - https://phabricator.wikimedia.org/T251410 (10Marostegui) #cloud-services-team please create the views also on db1141 (temporary host replacing labsdb1011) [04:38:46] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for gomwiktionary - https://phabricator.wikimedia.org/T250706 (10Marostegui) #cloud-services-team please create the views also on db1141 (temporary host replacing labsdb1011) [05:42:01] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Marostegui) Data check between db1138 and db1081 (candidate master) finished successfully. [06:14:27] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Marostegui) Blocked a maintenance window on the deployment's calendar for tomorrow. [06:49:21] 10DBA: Degraded performance on parsercache with buster/mariadb upgrade - https://phabricator.wikimedia.org/T252761 (10Marostegui) I would like to proceed with this task next week and finish moving parsercache to 10.4 (we can create an specific task to follow up and investigate why the exporter is reporting such... [07:32:54] 10DBA: Create a wiki page with information on what to do when a master crashes and comes back as read only - https://phabricator.wikimedia.org/T253832 (10Marostegui) [07:35:11] 10DBA, 10Wikimedia-Incident: Create a wiki page with information on what to do when a master crashes and comes back as read only - https://phabricator.wikimedia.org/T253832 (10Marostegui) [07:39:50] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Marostegui) Incident report (draft status) created: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200528-s4_(commonswik... [07:49:14] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) db1141 check data finished up clean, I have checked grants, roles and query killer, and the host is ready to be pooled. [07:49:37] jynus kormat I would like to proceed with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/599153/ [07:50:03] looking [07:52:01] left a comment [07:52:40] not sure what you mean [07:53:26] the commit message only talks about adding db1141 to the analytics role, but the change itself also replaces labsdb1011 with labsdb1010 [07:53:40] ah I see what you mean [07:54:31] 10DBA, 10Patch-For-Review: Package transferpy framework under wmfmariadbpy - https://phabricator.wikimedia.org/T253736 (10Privacybatm) I have uploaded a new patch set with a working deb file inside dist folder (https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/598984/2/dist/transferpy_1.0-1_a... [07:54:40] amended it [07:58:50] +1'd [08:13:57] 10DBA, 10MediaWiki-General, 10TechCom-RFC, 10Performance-Team (Radar): RFC: Discourage use of MySQL's ENUM type - https://phabricator.wikimedia.org/T119173 (10Ladsgroup) >>! In T119173#6170721, @Krinkle wrote: >>>! In T119173#6154249, @Ladsgroup wrote: >> - Not all DBMSes support ENUM, for example sqlite t... [08:16:33] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) db1141 is now serving the analytics role. I can query it finely and I can see other connections arriving from the proxy: ` marosteg... [08:17:08] db1141 pooled, so far so good, queries flowing and no errors on the mariadb log [08:50:09] 10DBA, 10Documentation, 10Wikimedia-Incident: Create a wiki page with information on what to do when a master crashes and comes back as read only - https://phabricator.wikimedia.org/T253832 (10Aklapper) [08:51:20] marostegui: hi, i am trying to recreate what you did on https://phabricator.wikimedia.org/T243800#5875456 but dbproxy1007 is gone [08:52:43] not sure I can identify the "ro" password in the repo you mention. just seeing regular $gerrit_db_pass so far [08:52:53] you can always check the active proxy by checking dns, for instance: [08:52:53] host m2-master [08:52:53] m2-master.eqiad.wmnet is an alias for dbproxy1015.eqiad.wmnet. [08:53:46] I assume you want reviewdb (gerrit) [08:53:48] thanks, that works. it shows me the gerrit_db_pass is not the one i want though [08:54:27] yes, reviewdb. but the one for the readonly user [08:54:29] I don't remember the details (I would need to re-read the task) but I believe there was a mismatch between passwords in the DB and the one on the gerrit config [08:54:40] https://phabricator.wikimedia.org/T243800#5875456 [08:55:11] yes, that is exactly what i am trying to fix . "The gerritro user password is on the pw repo, in the gerritfile." [08:55:33] right [08:56:09] We can change the gerritro password in the DB if you like [08:56:14] i don't see a gerritfile though. [08:56:46] there is /srv/private/modules/passwords/manifests/init.pp with passwords::gerrit [08:56:52] that has a gerrit_db_pass [08:56:58] but that is not the ro one [08:57:17] but somewhere you found one that worked.. from your comment [08:57:21] Yes [08:57:25] the one on the PW repo [08:57:36] what do you mean by PW repo? [08:57:50] root@cumin1001:/home/marostegui# mysql --skip-ssl -hdbproxy1015 -p -ugerritro [08:57:50] Enter password: [08:57:50] Welcome to the MariaDB monitor. Commands end with ; or \g. [08:57:50] Your MariaDB connection id is 38158050 [08:57:53] mutante: pwstore repo [08:58:13] marostegui: ooooh, pwstore.. ! [08:58:29] see, i was thinking it means private repo, module passwords [08:58:36] ah! sorry hehe! [08:58:38] then i can fix it :) [08:58:44] I will edit my comment to make it clearer [08:58:54] done! [08:59:00] hah, thanks! [08:59:20] i dont now how it ever got into pwstore :) it existed before [08:59:36] but i thin i got it now then.. will test [09:01:32] ahaha, git log tells me how "Importing passwords from iron:/srv/passwords" in 2015 (!sic) :) [09:02:15] wooow [09:02:16] XD [09:04:57] yea, that pass works. adding it into private/hieradata and will tell puppet to use it if on the test server. thanks again [09:05:31] excellent! [09:05:35] thank you [09:07:31] 10DBA, 10Patch-For-Review: Package transferpy framework under wmfmariadbpy - https://phabricator.wikimedia.org/T253736 (10Privacybatm) [09:32:05] 10DBA, 10Documentation, 10Sustainability (Incident Prevention): Create a wiki page with information on what to do when a master crashes and comes back as read only - https://phabricator.wikimedia.org/T253832 (10Peachey88) [09:42:00] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Removed the binary backup from labsdb1011 from `/srv/production/labsdb1011` which was around 6.6T On the other hand, I am copying t... [09:59:36] 10DBA, 10Documentation, 10Patch-For-Review, 10Sustainability (Incident Prevention): Create a wiki page with information on what to do when a master crashes and comes back as read only - https://phabricator.wikimedia.org/T253832 (10Marostegui) 05Open→03Resolved Done: https://wikitech.wikimedia.org/wiki/... [09:59:41] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Marostegui) [10:53:36] 10DBA, 10Patch-For-Review: Package transferpy framework under wmfmariadbpy - https://phabricator.wikimedia.org/T253736 (10jcrespo) Let me think about it. Things are getting more and more complex, maintaining a lot of (mostly unrelated) stuff in the same repo. How would you see about splitting out transferpy to... [11:38:43] I have updated too https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Data_loss [11:38:58] ^May be interesting as a summary for kormat [11:50:56] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) db1141 keeps serving traffic without any errors or issues. Those are good news. I am going to leave it running like this till Mond... [11:51:14] 10DBA, 10Patch-For-Review: Package transferpy framework under wmfmariadbpy - https://phabricator.wikimedia.org/T253736 (10Privacybatm) Yeah, I will think about it, Thank you. [12:03:25] jynus: looks good, thanks :) [12:03:45] 10DBA: Productionize db213[6-9] and db2140 - https://phabricator.wikimedia.org/T252985 (10Kormat) [14:18:05] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Jclark-ctr) corrected user name. Jcrespo confirmed able to log in [14:21:00] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) a:05Jclark-ctr→03jcrespo [14:21:29] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Jclark-ctr) fedex tracking says parts to arrive friday 5/29 @Marostegui would you want to do this tomorrow 3-4pm est. I would prefer... [15:19:54] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Johan) @Jclark-ctr @Marostegui Does this mean you're not doing the 05:00 UTC window tomorrow? We've been informing the communities abo... [15:21:27] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10jcrespo) @Johan plan continues as usual- @Jclark-ctr information is unrelated to the user impacting maintenance. [15:22:22] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10Johan) OK, great, thanks. [15:22:55] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10Wikimedia-Incident: db1138 (s4 master) crashed due to memory issues - https://phabricator.wikimedia.org/T253808 (10jcrespo) I talked to @Jclark-ctr on IRC, hw replacement will likely happen on Tuesday next week. Sw emergency maintenance (read only)...