[00:46:09] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) Thanks! Can someone put them somewhere I can actually get them? As I don't believe I actually have access to labsdb1010 (via ssh anyway)... [05:25:17] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Marostegui) The file is at: `deploy1001:/home/reedy/ep_dumps.tar` [06:16:00] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [06:37:28] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) >>! In T203709#4610685, @Marostegui wrote: > I am going to deploy this change on s5... [06:43:49] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [06:48:02] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [06:50:11] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [06:55:56] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [06:59:23] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:00:27] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:20:38] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:28:39] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [07:31:38] dbstore2002 is lag too behind (228148) it won't catch-up until the backups run, we shall disable backup jobs [07:33:17] banyek: it is fine if it is lagging, the backup will just be old [07:33:43] the problem with the alter tables was the metadata locking, which would probably make mydumper not to finish, but if it is lagging and no alters are running, I am fine with that [07:33:58] ok then [07:43:37] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) All eqiad is done now. Once we have failed over back to eqiad the 10th October, I will start doing codfw and get this over with :-) [07:55:05] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10Marostegui) [08:01:25] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) a:03jcrespo [08:24:11] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) I am creating this database, but we need to know which server or servers this will be accessed from. [08:24:37] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) [08:33:43] dbproxy1004 reboot finished going forward to dbproxy1009 [08:33:57] \o/ [08:39:46] !log upgrade & reboot dbproxy1009 [08:39:46] banyek: Not expecting to hear !log here [08:39:53] hehe [08:39:59] you need to !log on wikimedia-operations [08:40:27] It happens to me a lot here, not the first time and will not be the last! :) [08:49:05] dbproxy1009 finished btw [08:49:11] going forward [08:58:07] marostegui: check the patch now [08:58:16] yeah, just +1ed ) [08:58:18] :) [08:58:22] thanks [08:58:32] you are welcome! [08:59:41] dbproxy1005 is back [09:04:52] 10DBA, 10Research, 10Patch-For-Review: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) This is ready, but without knowing the source of the connections, it may not work without firewall changes: ``` root@neodymium:~$ mysql --skip-ssl -h m... [09:05:08] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) [09:27:35] I did not mentioned here just in op chanel, but dbproxy1005,1007,1009 is upgraded [09:27:41] 1008 [09:28:28] You are !loging those, so it is good :-) [09:29:24] fsck I have to redo it on dbproxy1007 [09:29:54] https://www.irccloud.com/pastebin/HajvslEe/ [09:31:07] Yeah, looks like the full-upgrade wasn't run before the reboot? [09:32:16] actually on none of the hosts - shame on me [09:32:22] uh? :( [09:32:49] the good news is that the hosts are still in downtime, and we know they can reboot - no hw issues, so I redo them quick [09:32:58] hehe [09:33:04] (and needless to make double checks as well the hosts are already checked) [09:33:15] my bad [09:34:09] dbproxy1008 is done for sure [09:38:38] banyek: remember I gave you a recipe to reboot mariadb hosts? [09:38:58] https://phabricator.wikimedia.org/P7496 [09:40:15] yes [09:40:27] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) So far so good having s5 fully done with the schema change and not eqiad. I will wa... [09:48:02] fixed those [09:48:19] now the dbproxy1006 is the last not active proxy, I do the upgrade [09:52:03] 10DBA, 10Patch-For-Review: Create Icinga alerts on backup generation failure - https://phabricator.wikimedia.org/T203969 (10jcrespo) ``` root@db1115:~$ for section in s1 s2 s3 s4 s5 s6 s7 s8 x1 m1 m2 m3 m5; do sudo -u nagios python3 check_mariadb_backups.py -s $section -d eqiad; done Backups for s1 at eqiad ar... [09:53:04] ^ marostegui: I am relatively happy with the current state, I would like to deploy even with limited functionality, add more later [09:53:25] I made a small comment [09:54:18] sure, that is an easy change [09:54:23] :) [09:54:38] although note that I didn't add some of those because the name of the check will include it [09:54:58] Yeah, I guessed that, but if for some reason you want to run it manually, it could be useful [09:55:16] the name would be something like "Backup of {section} in {datacenter}" [09:56:34] yeah, that is good [09:57:05] aaaand done. dbproxy100[4|5|6|7|8|9] upgraded & rebooted [10:00:44] marostegui: see latest amend: https://gerrit.wikimedia.org/r/461665 [10:05:13] will work on setting up a proper alert now https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/461679/ [10:05:53] Exciting times! :) [10:07:21] on dbproxy1008 the nagios is red for systemd and ferm [10:07:33] journalctl -u ferm [10:07:37] restart it [10:07:38] ```-- Logs begin at Tue 2018-09-25 09:33:15 UTC, end at Tue 2018-09-25 10:06:35 UTC. -- [10:07:38] Sep 25 09:33:16 dbproxy1008 systemd[1]: Starting ferm firewall configuration... [10:07:39] Sep 25 09:33:25 dbproxy1008 ferm[527]: Starting Firewall: fermError in /etc/ferm/conf.d/10_prometheus-node-exporter line 4: [10:07:39] Sep 25 09:33:25 dbproxy1008 ferm[527]: prometheus1003.eqiad.wmnet prometheus1004.eqiad.wmnet [10:07:39] Sep 25 09:33:25 dbproxy1008 ferm[527]: ) [10:07:39] Sep 25 09:33:25 dbproxy1008 ferm[527]: [10:07:39] Sep 25 09:33:25 dbproxy1008 ferm[527]: ) [10:07:40] Sep 25 09:33:25 dbproxy1008 ferm[527]: <-- [10:07:40] Sep 25 09:33:25 dbproxy1008 ferm[527]: DNS query for 'prometheus1003.eqiad.wmnet' failed: NXDOMAIN [10:07:41] Sep 25 09:33:25 dbproxy1008 ferm[527]: (warning). [10:07:41] Sep 25 09:33:25 dbproxy1008 systemd[1]: ferm.service: Main process exited, code=exited, status=255/n/a [10:07:42] Sep 25 09:33:25 dbproxy1008 systemd[1]: Failed to start ferm firewall configuration. [10:07:42] Sep 25 09:33:25 dbproxy1008 systemd[1]: ferm.service: Unit entered failed state. [10:07:43] banyek: restart it [10:07:43] Sep 25 09:33:25 dbproxy1008 systemd[1]: ferm.service: Failed with result 'exit-code'.``` [10:07:53] banyek: use a paste on phabricator, don't paste so many lines here [10:08:12] there is a an utility called phaste that will automate that for you from the servers [10:08:45] journalctl -u ferm | tail -n 20 | phaste [10:09:01] I didn't know phaste!! Nice :) [10:09:10] ohwow [10:09:11] https://phabricator.wikimedia.org/P7588 [10:09:29] banyek: restart ferm [10:11:05] ok, and that also fixed it [10:16:06] I think now I take a break for lunch and a walk [11:59:08] 10DBA, 10User-Banyek: dbstore2002 tables compression status check - https://phabricator.wikimedia.org/T204930 (10Banyek) there are still tables to compress from s2@dbstore2002, as it was mentioned in T204593. ``` +--------------+------------------+ | table_schema | table_name | +--------------+----------... [12:00:33] 10DBA, 10Patch-For-Review, 10User-Banyek: dbstore2002 s2 crashed - https://phabricator.wikimedia.org/T204593 (10Banyek) 05Open>03Resolved The original problem solved, the compression part will continue in T204930 [12:19:30] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) @Pchelolo what do you think about T205294#4613658? @jcrespo thank you! > Password in the puppet private repo, send a puppet patch with a place to handle it and I will fil... [12:20:20] FYI, I checked the rebooted dbproxy servers, unfortunately the CPUs in those systems are too old to be covered by the Intel microcode updates. This means that they have the latest kernels and Spectre v1/v2 is fixed (as those are fixed in the Linux kernel source code), but spectre v3/v4 and l1tf need additional microcode changes which Intel doesn't provide for older CPUs (those servers are OOW since 2014) [12:23:34] so until the hosts not get decommissioned there's no fix? [12:23:44] moritzm: those are pending to be replaced [12:24:01] https://phabricator.wikimedia.org/T202367 [12:27:07] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) [12:28:56] sure,I'm aware of the replacements, just an FYI [12:29:01] banyek: yep, unfortunately [12:35:23] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) Dumps LGTM My only question would be about how we package them... One big tar solves the problem, but chances are they won't want every... [12:42:24] ping [12:42:42] irccloud was broke [12:42:59] FYI I am researching on eventlogging due to m4 maintenance [12:49:04] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) > Any similar patch I can get an inspiration from? Here it is an example of how I get a password into a variable and then I write into a template that writes a configuration... [12:51:39] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) One last thing- accounts are "free" to creat -we can create as many as you need- we created just an account but it has all rights. If you are going to expose in anyway this t... [12:52:23] cool [12:52:56] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10Patch-For-Review: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) Hey @Bstorm the dumps that @jcrespo did LGTM, but I have repackaged them slightly to make them easier for end users... [12:59:18] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10Patch-For-Review: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Bstorm) @ArielGlenn, there's a lot going on with the rsyncing on dumps servers. I imagine that there's some kind of notion... [13:13:21] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [13:25:42] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [13:33:57] 10DBA, 10Patch-For-Review, 10User-Banyek: Reclone db1114 (s1 api) from another API host - https://phabricator.wikimedia.org/T203565 (10Marostegui) @banyek why do we have this still open if we have T204926? The scope of the ticket (reclone the host) is done. [13:35:08] 10DBA, 10Patch-For-Review, 10User-Banyek: Reclone db1114 (s1 api) from another API host - https://phabricator.wikimedia.org/T203565 (10Banyek) @maristegui I am ok with closing this [13:35:33] 10DBA, 10Patch-For-Review, 10User-Banyek: Reclone db1114 (s1 api) from another API host - https://phabricator.wikimedia.org/T203565 (10Banyek) 05Open>03Resolved [13:35:59] 10DBA, 10Patch-For-Review, 10User-Banyek: Reclone db1114 (s1 api) from another API host - https://phabricator.wikimedia.org/T203565 (10Marostegui) I am fine too - just asking if there was some reason behind keeping it open that I might have missed :-) [13:43:08] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) refined the maintenance operation with @elukey; Tomorrow 10:00am CEST we do the maintenance of db1107. The steps will be: - @elukey stops the eventlogging service on eventlog1002 - @elukey disables the syncer a... [13:44:34] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Marostegui) Is there any specific reason to start with the master first? [13:45:08] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Banyek) Yes, the replica is the one which is actively used [13:45:42] 10DBA, 10User-Banyek: Maintenance M4 cluster - https://phabricator.wikimedia.org/T205288 (10Marostegui) Got it! Thanks [13:47:47] 10Blocked-on-schema-change, 10MediaWiki-Change-tagging, 10MediaWiki-Database, 10Wikidata-Campsite, 10User-Ladsgroup: Schema change for adding indexes of ct_tag_id - https://phabricator.wikimedia.org/T203709 (10Marostegui) [14:21:31] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10Patch-For-Review: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Bstorm) The files are in place now on both servers. [14:40:57] 10DBA, 10Patch-For-Review: Create Icinga alerts on backup generation failure - https://phabricator.wikimedia.org/T203969 (10jcrespo) With naive size checks: ``` root@db1115:~$ for section in s1 s2 s3 s4 s5 s6 s7 s8 x1 m1 m2 m3 m5; do sudo -u nagios python3 check_mariadb_backups.py -s $section -d codfw -f1000... [14:45:12] 10DBA, 10Research: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) @jcrespo could you please create an account with username 'recommendationapiservice' with the 'SELECT' right only? As for the services to connect from, I think for now I'l... [14:48:51] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10Patch-For-Review: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Bstorm) Look right to everyone? https://dumps.wikimedia.org/other/educationprogram/ [14:54:33] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown, 10Patch-For-Review: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) Looks right to me! Thanks! :) [15:02:35] 10DBA: BBU problems dbstore2002 - https://phabricator.wikimedia.org/T205257 (10Papaul) Both dbstore2002 and db2064 have HP Smart Array P420i Controller [15:28:05] 10DBA, 10Research, 10Patch-For-Review: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) Read only account deployed: ``` root@neodymium:~$ mysql --skip-ssl -h m2-master.eqiad.wmnet -urecommendationapiservice -p$pass2 Welcome to the MariaDB... [15:38:29] 10DBA, 10Research, 10Patch-For-Review: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10bmansurov) @jcrespo thanks! Looks like I misunderstood you. If DB creation is done, then I'll talk to the Services to team about the productionizing part. [15:44:08] Hi everybody [15:44:13] 10DBA, 10Operations, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10bmansurov) 05Open>03Resolved a:03bmansurov @Pchelolo the database has been setup (T205294). I think this task is complete as far as storage is concerned. I'll... [15:44:30] if you have 5 mins I'd like to follow up on the dbstore1002 work to do during the next quarters [15:44:48] elukey: Can we do that tomorrow? I was almost wrapping up [15:44:54] sure :) [15:45:00] I can also send an email [15:45:01] is that ok? [15:45:04] sure :) [15:45:07] yep yep [15:47:10] 10DBA, 10Operations, 10Research, 10Services (designing): Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10jcrespo) [15:47:11] 10DBA, 10Research, 10Patch-For-Review: Request to create database and account for recommendation API - https://phabricator.wikimedia.org/T205294 (10jcrespo) 05Open>03Resolved Please handle the productionization on a separate ticket: host to connect: m2-master.eqiad.wmnet TLS: disabled user: recommendati... [16:05:51] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: Execute the schema change for Partial Blocks - https://phabricator.wikimedia.org/T204006 (10aezell) Thanks @Marostegui! [17:02:23] 10DBA, 10Data-Services, 10Datasets-General-or-Unknown: Archive and drop education program (ep_*) tables on all wikis - https://phabricator.wikimedia.org/T174802 (10Reedy) [17:38:27] marostegui: hey, around? [17:54:50] Amir1: sort of [17:54:55] Amir1: anything urgent? [17:58:05] no no, enjoy the day [17:58:19] marostegui: btw. Thanks for schema change [18:00:07] Amir1: No problem! I am expecting to get it done in eqiad in the next few days [18:00:12] Talk to you tomorrow! [18:08:36] Awesome