[05:48:42] 10DBA: Remove ar_comment from sanitarium triggers - https://phabricator.wikimedia.org/T234704 (10Marostegui) [05:53:38] 10DBA, 10Patch-For-Review: Decommission db2067.codfw.wmnet - https://phabricator.wikimedia.org/T233185 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2067.codfw.wmnet` - db2067.codfw.wmnet (**PASS**) - Downtimed host on Icinga - Downtimed management... [05:56:39] 10DBA, 10Operations: Decommission db2043-db2070 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [05:56:50] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [05:57:23] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) 05Open→03Resolved All these hosts have been sent for decommissioning. Going to close this for now. [06:21:38] 10DBA: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Marostegui) [06:22:17] 10DBA: Decommission db2070.codfw.wmnet - https://phabricator.wikimedia.org/T239684 (10Marostegui) p:05Triage→03Normal [06:22:41] 10DBA, 10Operations: Decommission db2043-db2070 - https://phabricator.wikimedia.org/T228258 (10Marostegui) [07:38:30] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for mnwwiki - https://phabricator.wikimedia.org/T235743 (10Marostegui) >>! In T235743#5707189, @gerritbot wrote: > Change 554159 **merged** by Phamhi: > [operations/puppet@production] wmcs: don't process lines starting wi... [08:27:41] 10DBA, 10Patch-For-Review: Decommission db1062.eqiad.wmnet - https://phabricator.wikimedia.org/T239188 (10Marostegui) [09:05:40] I just saw a minor mistake [09:05:52] +wmf-mariadb103, wmf-mariadb104, wmf-mysql57 or wmf-mysql80 [09:06:04] should be percona instead of 57 [09:06:29] ah I see [09:06:34] Fixing [09:06:39] thanks [09:07:10] also should vendor for percona be percona or mysql? I cannot remember where it is used [09:07:29] Yeah, I wondered about that [09:07:32] ah [09:07:38] it should be "percona-server" [09:07:43] vendor? [09:07:53] yeah, vendor is a bad name [09:07:59] it should be "service name" [09:08:11] systemctl start $vendor [09:08:44] but autostart is only used on very specific hosts, none on production [09:09:12] systemctl status percona-server [09:09:13] ● percona-server.service - percona-server database server [09:11:29] I wanted it a shorter name, but it was the official name and the one on the package, so it was just easier to leave it like that [09:11:46] yeah, it makes sense once you know what it refers to [09:11:51] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554252 [09:45:41] 10DBA: decommission db2065.codfw.wmnet - https://phabricator.wikimedia.org/T239046 (10Marostegui) [10:31:28] 10DBA: Remove partitions from revision table - https://phabricator.wikimedia.org/T239453 (10Marostegui) [12:41:51] I am deploying a fix for the "Stale-full only: 14 (archiva1001, ...), Fresh: 85 jobs" alert [12:42:02] <3 [12:42:14] weeks and months work in weird ways: https://gerrit.wikimedia.org/r/c/operations/puppet/+/554281 [12:42:31] so between the first monday of a month there can be 28 days or 35 [12:43:19] wow [12:43:48] backups are working as expected [12:44:02] it is just the alerting, isn't? [12:44:03] but the check assumed 30 days [12:44:30] the new backups are scheduled, but will be done later this week [12:48:23] This should never happen: https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-job=archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva&from=1575366491137&to=1575377291137 [12:48:32] but it did because we set the bar too low [12:48:45] not because the other didn't go down yet [12:49:30] what should never happen? [12:49:48] backup_age > expected_freshness [12:50:08] sorry, I sent the wrong link [12:50:26] this: [12:50:28] this:https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-job=archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva&from=1575366491137&to=1575377291137&fullscreen&panelId=35 [12:50:35] aaaah right [12:50:36] yeah [12:51:27] it is ok, that is why I wanted the metrics early [12:51:37] definitely :) [12:51:37] so it would help the logica, and viceversa [12:58:08] we gave us the additional few days we need: https://grafana.wikimedia.org/d/413r2vbWk/bacula?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-job=archiva1001.wikimedia.org-Monthly-1st-Fri-production-var-lib-archiva&from=1575291461284&to=1575464261284&fullscreen&panelId=35 [12:58:46] as well as "Fresh: 99 jobs" [13:00:04] 99 jobs are the ones we have scheduled as of today? [13:01:43] 99 are the total number of tasks configured [13:02:20] in the last 24 hours, 128 jobs were run/attempted [13:18:27] we get aproximatelly 98 daily tasks + 24 gerrit hourly backups, more or less [16:51:58] 10DBA, 10Performance-Team, 10conftool: #dbctl: manage 'externalLoads' data - https://phabricator.wikimedia.org/T229686 (10CDanis) >>! In T229686#5704377, @Marostegui wrote: > Any rough ETA on when externalLoads will be able to be handled by `dbctl`? Before EoQ. [18:03:26] jynus: marostegui I don't know if you remember the spikes in reads of s8, It also is effecting PC, I'm really close to cracking the issue open but I need to query PC. Is there an easy way to do it? [18:03:47] PC == parsercache? [18:03:52] yup [18:04:07] not sure if there is wikiadmin accounts there, you can try [18:04:39] if not, if you can tell me the keys, I can copy them to you on a production host [18:04:50] okay, let me try with the same passwords [18:05:41] I think it should work because I think *nmie and other non-roots researched them [18:05:58] but I could be wrong [18:06:18] It definitely did at one point [18:06:43] in any case, please file a ticket if we can get something on your behalf, I was almost on the door (sorry) [18:09:54] (or to re-allow access) [18:10:37] I'm in [18:10:40] thanks. [18:10:48] cool [18:17:38] Amir1 right now: https://thumbs.gfycat.com/MasculineFemaleHarvestmen-mobile.mp4 [18:18:05] good luck /me exits the matrix [18:18:41] lol, please insert a floppy disk teaching me how to debug this [23:07:33] 10DBA, 10TechCom-RFC: MediaWiki database policy and/or guidelines (2019) - https://phabricator.wikimedia.org/T220056 (10Krinkle) [23:14:02] 10DBA, 10TechCom-RFC: MediaWiki database policy and/or guidelines (2019) - https://phabricator.wikimedia.org/T220056 (10Krinkle)