[04:41:23] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) I will check if the raid is on sda, because the host is correctly set to be allowed to be re-imaged: ` db1114|db... [05:14:08] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) [05:21:39] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) @Cmjohnson can we schedule the BBU replacement for Monday 15th? db1078 is no longer a master. The failover was performed successfully: Times in UTC:... [05:28:34] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) The raid is sdb and we need it to be sda for db.cfg to work: ` Disk /dev/sdb: 3.5 TiB, 3840699359232 bytes, 7501365936 s... [05:33:07] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10jcrespo) We of course could make sdb work, but that would make this servers special, compared to the rest. Maybe a disk was not adde... [05:41:48] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) So, I have been checking out the RAID menu on the controller, but unfortunately over `vsp` it doesn't show most of the o... [07:19:49] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) I have been trying to check if there is something else defined on a storage level but it is impossible to see anything w... [08:32:28] parsercache still recovering and going up on hits https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=now-3d&to=now [08:32:39] almost there [08:34:55] cool, it will take some time [08:35:46] space used increased by 500GB [08:36:16] 500GB?? [08:36:39] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=12&fullscreen&orgId=1&var-server=pc1007&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&from=now-7d&to=now [08:36:52] if the units are right, yes [08:37:27] it increased 5% by looking at that graph [08:37:30] that graph is useless to me [08:37:42] it is now 2.6TB [08:38:14] so there is a mistake somehow on the parser cache graphs [08:38:38] this is ok: https://grafana.wikimedia.org/d/000000274/prometheus-machine-stats?panelId=18&fullscreen&orgId=1&var-server=pc1007&var-datasource=eqiad%20prometheus%2Fops&from=now-7d&to=now-1m [08:39:13] but this is not: https://grafana.wikimedia.org/d/000000106/parser-cache?orgId=1&from=1554712741518&to=1554971941518&panelId=6&fullscreen&var-contentModel=wikitext [08:39:35] or maybe it is 500GB among all servers, which there are 4 [08:41:01] yeah, cause from what I can see it grew 5% and it is now 2.6TB [08:41:10] so that is roughly around 125G per server… [08:48:30] is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/492321/ okay to merge? it was originally to remove trusty support in the mariadb module, but was extended to install mariadb-backup on stretch/buster and install 10.3 on buster as well [08:54:05] I think it is ok, some things may error out, but that is ok [08:54:23] maybe run pcc on some hosts just in case? [08:55:12] ack, will do that in a bit [09:18:43] https://puppet-compiler.wmflabs.org/compiler1002/15690/ [09:18:48] seems fine to me [09:20:51] I think we can remove the old dependencies from the older package [09:20:58] but that can happen on a separate patch [09:21:50] e.g. libaio1 and libjemalloc1 are already dependencies of the mariadb packages [09:22:45] plus the mysqld_safe bits. I will give it a second pass at some point [09:23:50] I voted +1 [09:25:01] ack [09:25:28] the second pass I mean on a different patch, from me that can go now [09:35:00] ack, merging [09:41:04] reverting, there's no mariadb-backup in stretch [09:41:19] only in buster [09:46:08] fixed in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/502959/ [10:05:35] and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/502960/ [10:10:50] 10DBA, 10Schema-change: Apply enum changes to (img|oi|fa)_major_mime on production - https://phabricator.wikimedia.org/T72005 (10Rillke) 05Open→03Invalid @Marostegui Community interest in this extension does not appear to be overwhelming and yesterday T66548 was declined. The extension can't be deployed, a... [11:07:30] <_joe_> hi, database friends! I would like your input on the "mysqli" section of https://phabricator.wikimedia.org/T211488#5102911 [11:08:02] <_joe_> specifically if you see any setting of mysqlnd that is harmful or we could improve. [11:12:40] there is one thing I asked to enable [11:12:47] let me find the ticket [11:16:49] _joe_: this one, which for some reason is resolved https://phabricator.wikimedia.org/T214248 [11:17:15] <_joe_> jynus: uhm [11:17:53] mysqli.allow_local_infile [11:17:53] <_joe_> well I think they thought it was just for phabricator? [11:18:05] it should be everywhere, specially on mw [11:18:19] others could need it for some reason [11:18:22] <_joe_> cool, but that's not the case even for HHVM [11:18:34] <_joe_> so, can you please open a new task? [11:18:38] ok [11:18:43] <_joe_> I mean I can do it now for php7 [11:18:51] <_joe_> as I'm changing some ini settings already [11:18:52] I can check the others too [11:19:11] but that was the one I got from the top of my mind [11:21:52] I don't think we should set net_read_buffer_size, but let it handle on the db size, unless defaults are ignored; needs more careful testing to say one way or another [11:22:39] most others are pure adminitrative or monitoring options that I don't think are too important [11:23:35] the buffer and the timeout are the ones we should care about [11:23:40] *buffers [12:11:05] <_joe_> we already set the timeout [12:19:27] <_joe_> jynus: we already have mysqli.allow_local_infile = 0 [12:19:33] <_joe_> both on hhvm and php [12:19:35] <_joe_> by default [12:19:40] <_joe_> that's the default setting [12:21:17] oh [12:21:20] that is good [12:21:44] <_joe_> yes, I am adding it to my patch to make it explicit nonetheless [12:21:58] <_joe_> so that if someone decides to change it, they'll know they need feedback [12:22:01] that is why examinating it is not obvious, as I said on my comment [12:22:11] defaults are complicated to detect [12:23:06] the other non-concrete thing I commented is, remember the saga with the timeout increase, I think it was conncetion timeout, and having to patch HHVM? [12:23:27] I wonder if we will have more issues like that [12:35:20] <_joe_> it was connection timeout, yes [12:35:30] <_joe_> and we already have set that to the same value [12:35:36] cool [12:35:58] remember me what was migrated already, cronjobs or jobqueue, or neither? [12:36:13] <_joe_> neither [12:36:16] cool [12:36:21] <_joe_> I have to finish a core patch [12:36:33] as in, give me a heads up when either, as those tend to be more delicate [12:36:34] <_joe_> and ask for reviews [12:36:37] <_joe_> sure [12:36:45] <_joe_> all SRE will be alerted when we do that [12:36:50] delicate is the wrong word [12:36:54] <_joe_> probably in a couple weeks [12:37:00] not in a rush :-D [12:37:23] for the mysql part, are the ones more likely to have issues [12:38:00] <_joe_> nope, we need first to go through adding this patch, adding a mw-config patch, testing those resolved all the issues and didn't create new ones [12:38:09] <_joe_> then raise the percentage of users directed there [12:38:28] cool! [13:23:14] 10DBA, 10monitoring, 10Epic: Improve database alerting (tracking) - https://phabricator.wikimedia.org/T172492 (10Marostegui) [13:23:50] 10DBA, 10Epic: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10Marostegui) [13:55:43] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) @jcrespo @Marostegui I disable the SD card and it is working [14:00:27] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) Please look if the configuration looks right so i can do the same on the other 5 servers ` root@db2102:~# fdisk -l Disk /d... [14:26:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) So the SD disablement did the trick! :-) The server looks good now: ` root@db2102:~# df -hT Filesystem Type... [14:28:49] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [14:32:02] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) [14:37:08] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Papaul) @Marostegui or @jcrespo you are free to take the task [14:40:18] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) [14:41:04] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [14:41:21] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) 05Open→03Resolved Thanks @Papaul! This server is ready to be productionized at: {T220572} ` root@db2102:~# lsb_rele... [14:46:05] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups - https://phabricator.wikimedia.org/T219461 (10Marostegui) [14:53:51] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) These new hosts have a HP408i controller and I have noticed this: {P8388} @MoritzMuehlenhoff is kindly taking a look :) [15:07:04] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10MoritzMuehlenhoff) This is weird, do we have a second server of that model for comparison? I don't even see the controller is lspci (it should identify as "Subsystem: Hewlett-Packard C... [15:15:43] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [15:15:47] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) We will have them today or tomorrow as Papaul is installing them right now. [15:21:33] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) @MoritzMuehlenhoff db2097 is online now and it is one of the new ones, same batch as db2102. You can also check there. Keep in mind that even if the controller doesn't appe... [15:32:33] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [15:41:39] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10MediaWiki-Database, and 2 others: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255 (10Umherirrender) [15:46:52] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [15:59:02] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10MoritzMuehlenhoff) The RAID controller shows up in early device detection by the kernel: ` [ 4.385654] smartpqi 0000:5c:00.0: added scsi 0:1:0:0: Direct-Access HPE LOGICA... [16:06:24] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Papaul) @MoritzMuehlenhoff I can take a look. Since those are new GEN10 servers there are a lot of changes in the BIOS on where to find stuffs. [16:07:01] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) Thanks @papaul! Feel free to take either db2097 or db2102 down anytime you want to check them. They have no data [16:27:32] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Papaul) @MoritzMuehlenhoff unfortunately I do not see anything helpful in the BIOS setting on db2097. [16:29:08] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) [16:33:05] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Papaul) Doing some reading on the HP site to see if i can find anything. https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00018944en_us [16:37:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) codfw dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10Papaul) Putting this here for reference debian-installer For some reason, and I heard some rumours that this is a known bug, I had to disable USB support... [16:53:02] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) So to sum up. We can use the storage: ` root@db2102:~# df -hT /srv Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 3.5T 3.6G 3.5T... [17:04:28] 10DBA: Productionize eqiad and codfw source backup hosts & codfw backup test host - https://phabricator.wikimedia.org/T220572 (10Marostegui) @Papaul have you double checked that the RAID controller is not set up to work as HBA mode?