[00:36:11] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) @Marostegui Dell wants for us to run onboard hardware diagnostics which can take many hours before completion . ` Papaul, Apolog... [04:39:24] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Marostegui) From what I can see this host isn't assigned to a partman recipe, but I am going to leave this to @jcrespo as this host is going to th... [04:43:48] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Thank you @Papaul - let me know when you want me to have the host ready for you and I will make sure to have MySQL stopped there. [06:22:09] PROBLEM - MariaDB sustained replica lag on db2074 is CRITICAL: 4.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2074&var-port=9104 [06:23:00] I think I know what that is [06:24:29] The SpecialMostLinked scripts were running and they are known for being a bit heavy [06:26:35] RECOVERY - MariaDB sustained replica lag on db2074 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2074&var-port=9104 [06:29:05] PROBLEM - MariaDB sustained replica lag on db2127 is CRITICAL: 4.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104 [06:35:53] RECOVERY - MariaDB sustained replica lag on db2127 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104 [06:49:26] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) @Papaul, as a general rule, al db* hosts with the same spec, as far as first install, they should have the custom/db.cfg recipe. I believ... [06:52:21] what is the resouce that gets constrained, cpu or Io? You know? [06:54:19] its io or memory bandwidth [06:55:00] and host summary dashboard is horrible to detect this [06:56:10] I think it also snowballs- it creates 1 second of lag, which makes processes force a stall, which creates a processoverload [06:56:39] https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=37&orgId=1&var-server=db2127&var-port=9104&from=1599718550859&to=1599719977642 [07:00:24] jynus: from what I can see the transfer between es hosts is still running, the new hosts arrived and I guess they'll be ready by next week, what should we do for the next transfers to avoid hitting the data transfer issue? [07:01:09] assuming this completes with no issue, my best bet is puppet ownership change of sqldata creating a racecondition in which data cannot be written [07:01:28] so the advice would be to transfer to a different dir than sqldata [07:01:34] ok, I will do that [07:01:36] then rename [07:01:56] either that or disable puppet, but I think the first would be easier [07:02:14] I cannot fix this on programming because the issue would be on how puppet executes [07:03:04] if it completes, the way to close the ticket is reproduce it with a smaller transfer size [07:03:31] But this would have hit us in any other transfer, as puppet executes several times during a normal 1TB transfer, no? [07:04:05] not really because I guess normally the writes on sqldata are not very frequent [07:04:23] and the transfer is small that the race condition is unfrequent [07:04:24] What do you mean with writes? [07:04:47] so puppet only changes the sqldata permissions [07:04:56] and writes can happen before and after puppet runs [07:05:13] it is only on the miliseconds between ownership change and mode change that the issue happens [07:05:21] what do you mean with writes? [07:05:40] writes to sqldata dir [07:05:46] not subdirs [07:05:48] but which writes? [07:05:58] writes from transfer, from tar [07:06:16] but that should be the same on an es transfer and on a normal 1 or 2TB transfer, unless I am missing something [07:06:19] what's the difference? [07:06:37] so as far as I think [07:06:51] it is a combination of situations that are very unlikely [07:07:10] issues happening after the first error I think were just something else (bad state) [07:09:01] but what is the difference between a sX transfer and esX transfer (apart from the size)? [07:09:15] so several things [07:09:32] most transfers are: [07:09:46] backups (which are not affected by puppet change, as they write somewhere else) [07:10:09] recoveries (which are not affected by puppet change because they don't write sqldata, only files inside) [07:10:57] an the few clones you may have done, they happened to be small enough that the error wouldn't happen (puppet has to run at the same time than a top-level write is done) [07:11:12] for example, enwiki only has like 10 writes on top level [07:11:19] most of the time is transfering contents of enwiki [07:11:32] es is likely to have 900 of those writes [07:11:39] plus more time for puppet runs [07:12:01] normal runs take around 50 minutes, in which only 1 or 2 puppet runs will happen [07:12:34] while in es it is 12 hours (12*3 runs) [07:12:44] why do the permissions get changed? [07:13:28] so I didn't setup this, but I guess the idea is to enforce that a running mysql has its directory available for mysql [07:13:56] jynus: i'm not asking why puppet sets the permissions. i'm asking why they are not already set to what puppet expects [07:14:00] the problem is puppet applies the changes not atomicly, so it removes write permissions, and then changes ownership (or the other way round) [07:14:25] kormat: I think untar writes things as root, and the sets the permisions to the original ones [07:14:30] kormat: Because the directory gets created when the transfer starts, then puppet comes and realises oh: those are not the permissions it should have, then changes them [07:15:04] so I belive it to be a race condition between puppet and untar writing sqldata dir [07:15:07] But my question is, why that doesn't happen on every single transfer, despite of the sizes [07:15:34] I mentioned before, on other kinds of transfers (xtrabackup and decompress, sqldata is not touched) [07:16:21] also it doesn't happen always, only when certain write and puppet runs happen at the same time [07:16:50] I will do the transfering to a different directory as a workaround for the next servers [07:16:50] the time in wich puppet is inconsistent would be the time between runs of chmod and chown [07:17:03] does untar produce any useful error message when this happens? [07:17:30] the transfer finished [07:17:32] well, not as much untar as the pipe breaking [07:17:37] 2020-09-10 07:01:51 INFO: 9366738018624 bytes correctly transferred from es2014.codfw.wmnet to es2026.codfw.wmnet [07:17:37] ----- OUTPUT of '/sbin/iptables -...t 4400 -j ACCEPT' ----- [07:17:37] iptables: Bad rule (does a matching rule exist in that chain?). [07:17:44] 2020-09-10 07:01:52 WARNING: Firewall's temporary rule could not be deleted [07:17:51] other than that, it looks like it finished correctly [07:17:55] but we can reproduce it with shorter transfers to confirm [07:18:47] marostegui: it should say checksum matches before the "correctly transferred" [07:19:27] did we leave checksuming enabled? [07:19:39] parallel one, which is new on 1.0 [07:19:41] I don't see that sentence so far, I am still checking [07:19:48] maybe it was not enabled [07:20:19] 100.0% (1/1) of nodes failed to execute command '/bin/bash -c "[ ...ource.md5sum" ]"': es2014.codfw.wmnet [07:20:19] 100.0% (1/1) of nodes failed to execute command '/bin/bash -c "[ ...ource.md5sum" ]"': es2014.codfw.wmnet [07:20:19] 0.0% (0/1) success ratio (< 100.0% threshold) for command: '/bin/bash -c "[ ...ource.md5sum" ]"'. Aborting. [07:20:19] 0.0% (0/1) success ratio (< 100.0% threshold) for command: '/bin/bash -c "[ ...ource.md5sum" ]"'. Aborting. [07:20:19] 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [07:20:19] 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting. [07:20:21] ----- OUTPUT of '/bin/bash -c "[ ...arget.md5sum" ]"' ----- [07:20:43] checksum = False [07:20:47] it is disabled on config [07:21:03] check parallel_checksum = True [07:21:24] also ignore the succes ratio of cumin, it is based on the exit return [07:21:41] and sometimes we run command to check it returns something other than 0 [07:22:01] So other than those I cannot see any other message about checksumming [07:22:13] can I connect to the screen, to double check? [07:22:19] absolutely [07:22:20] I thought I had enabled it [07:22:29] if you're 100% sure you're running a RO command (like grep and similar) you an use -x, --ignore-exit-codes [07:22:53] 2020-09-10 07:01:50 INFO: Parallel checksum of source on es2014.codfw.wmnet and the transmitted ones on es2026.codfw.wmnet match. [07:22:59] but then you can distinguish only by the output, not anymore by exit code 0/non-zero [07:23:02] volans: nice one, I will implement that [07:23:14] ^marostegui sorry about the verbose mode [07:23:24] it is very verbose :-), but it checked ok [07:23:35] md5sums coincide [07:23:54] 7522382c3ba4a4dbe97ae6c5f57d822f - on both [07:24:02] excellent, thanks [07:24:11] we were running in verbose mode to debug [07:24:34] you should be able to not use it for later transfers, things will be much clear [07:24:52] so my plan is to reproduce the issue on a test host [07:25:04] with a short transfer + manual puppet runs [07:25:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) Apologies for the last minute change, the upgrades for these 2x PDUs will be postponed until a later date. Both dc-ops engineers at eqiad are recov... [07:25:12] to be 100% sure it is that [07:26:06] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Nintendofan885) [07:26:26] again, this would only affect clonings (not backups and recoveries) to /srv/sqldata (or any other dir that changes owner and mode on puppet to something other than root) [07:26:57] of course, I have not proof of that, but that is what I want to get [07:27:08] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Wed, Sept 9 PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) Latest update: Due to another separate injury, the upgrades for these 2x PDUs will be postponed again for a later date. No PDU upgrades for the rest... [07:27:17] I think also after the first error and a lot of tests there was some confusion after the few errors [07:28:03] some of those (cleanup and concurrency) would be best handled on the latest version [07:29:22] I will paste a sumary with my initial suppositions and how I will try to prove them [07:29:25] on ticket [07:30:05] for the record and posterity let's also mention what's the preferred workaround (transfer to a different directory and then rename) [07:30:38] I would profer if that was not even an issue, but it is a recommendation I give you now, will add it [07:30:59] either by changing how transfer works or how puppet works [07:31:40] yes, that's why it is a workaround [07:31:48] for example, if we know puppet is changing permissions of sqldata, maybe we can ask transfer to copy only the contents, but not the dir itself [07:36:31] PROBLEM - MariaDB sustained replica lag on db2127 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104 [07:39:24] 10DBA: transfer.py fails when copying data between es hosts - https://phabricator.wikimedia.org/T262388 (10jcrespo) After having tested the firewall extensively on other tests, I believe it is indeed related to puppet runs, which mid-transfer would alter the permissions of /srv/sqldata both in mode and owner, li... [07:40:39] one option would be to change puppet writing (mode changing) the datadir into a check [07:41:06] not because not touching transferpy, but because it could also impact other tooling that writes there [07:42:03] although to be fair, mode changes only happen because the initial write is done as root, and only later it is restored the original ownership [07:42:11] RECOVERY - MariaDB sustained replica lag on db2127 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104 [07:42:36] other option would be for transfer to disable puppet on target while transfer happens [07:44:30] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) a:03Marostegui es2026 has been cloned, not pooling it yet - will let it run till Monday, as this would be the first production es (RO) host running 10.4 and Buster. [07:44:47] sorry for the overhead this is causing you [07:45:00] but this is an issue it was important to detect [07:48:23] it is ok, not overhead, I will do the workaround for the next hosts and we can track the progress on the task [07:49:49] it is also possible I was doing that without thinking [07:50:21] as in, when I did a clone, I always wrote to a different dir because I was worried about puppet getting in the way [07:51:46] and I didn't tell you because I really didn't have a concrete reason why I was doing it [07:51:56] it is ok jaime, we've found a workaround, we can work on more long-term solutions once we've fully identified the issue with the work pending at https://phabricator.wikimedia.org/T262388#6449717 [08:31:37] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts: ` ['db2141.codfw.wmnet'] ` The log can be fo... [08:44:03] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) [09:04:12] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for jawikivoyage - https://phabricator.wikimedia.org/T260482 (10jhsoby) >>! In T260482#6441747, @jcrespo wrote: > @jhsoby this task only relates to cloud infrastructure- it won't make search (or anything else) work on a w... [09:12:00] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2141.codfw.wmnet'] ` and were **ALL** successful. [09:12:46] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for jawikivoyage - https://phabricator.wikimedia.org/T260482 (10jcrespo) >>! In T260482#6449838, @jhsoby wrote: > Thanks, I was confused then. I'll open a new task for this instead. Thank you, that would work too. Sorry... [09:23:40] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) [09:25:21] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) @Papaul, this is all completed after my patch. Only leaving it open so you can see it (e.g. in case you need to do something else not on... [10:32:17] PROBLEM - MariaDB sustained replica lag on db1123 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1123&var-port=9104 [10:32:33] ^ that is probably me [10:39:51] RECOVERY - MariaDB sustained replica lag on db1123 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1123&var-port=9104 [11:59:56] 10DBA, 10SRE-tools, 10conftool, 10serviceops, and 2 others: Alerting spam and wrong state of primary dc source info on databases while switching dc from eqiad -> codfw - https://phabricator.wikimedia.org/T261767 (10Marostegui) @RLazarus what do you want to do with this task? is this something that needs fi... [12:17:15] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [12:19:45] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) s6 eqiad progress: [] db1085.eqiad.wmnet [] db1088.eqiad.wmnet [] db1093.eqiad.wmnet [] db1096.eqiad.wmnet [] db1098.eqiad.wmnet [] db111... [12:20:18] 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) [13:16:41] 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) @Jdforrester-WMF like we've done with some other wikis in the past, can we just truncate the tables and consider this done? [13:20:20] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) After our IRC chat, this is scheduled for Monday 14th [13:25:45] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Papaul) 05Open→03Resolved [13:31:38] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10BrownHairedGirl) >>! In T262239#6445454, @Marostegui wrote: > It is a very complex and hard to operate infrastructure, and that's... [13:33:56] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Marostegui) >>! In T262239#6450454, @BrownHairedGirl wrote: >>>! In T262239#6445454, @Marostegui wrote: >> It is a very complex a... [14:02:40] 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Jdforrester-WMF) That's fine by me. [14:10:09] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [15:14:43] 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Bugreporter) Compare: (not to do so) {T169928} {T227717} (to do so but first rename them) {T260112} Personally I support the latter - first rename databa... [15:19:11] 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) We cannot rename a database, that's not supported by Mysql unfortunately :-( [15:50:36] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10BrownHairedGirl) >>! In T262239#6450462, @Marostegui wrote: > The service is effectively up, but with lag. I just took a quick lo... [16:04:39] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2027.codfw.wmnet ` The... [16:23:38] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2027.codfw.wmnet'] ` Of which those **FAILED**: ` ['es2027.codfw.wmne... [16:25:06] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2027.codfw.wmnet ` The... [16:45:42] 10DBA, 10Goal: Expand database provisioning/backup service to accomodate for growing capacity and high availability needs - https://phabricator.wikimedia.org/T257551 (10jcrespo) db2141 has been fully setup as a new buster backup source, with s1 an s6, and added to tendril and zarcillo. [17:10:58] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2027.codfw.wmnet'] ` and were **ALL** successful. [17:20:59] Hello, is anyone around to help with an issue is #-operations? [18:07:22] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2028.codfw.wmnet ` The... [18:09:41] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [18:28:52] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [18:48:31] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2028.codfw.wmnet'] ` and were **ALL** successful. [19:14:46] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jijiki) [19:16:13] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jijiki) [20:05:42] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2029.codfw.wmnet ` The... [20:44:30] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2029.codfw.wmnet'] ` and were **ALL** successful. [20:47:18] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [20:51:30] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2030.codfw.wmnet ` The... [21:11:40] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [21:31:22] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2030.codfw.wmnet'] ` and were **ALL** successful. [21:33:21] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2031.codfw.wmnet ` The... [22:13:10] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2031.codfw.wmnet'] ` and were **ALL** successful. [22:14:12] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2032.codfw.wmnet ` The... [22:50:09] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [22:55:06] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2032.codfw.wmnet'] ` and were **ALL** successful. [22:55:59] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2033.codfw.wmnet ` The... [23:35:29] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2033.codfw.wmnet'] ` and were **ALL** successful. [23:44:39] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2034.codfw.wmnet ` The...