[00:36:11] <wikibugs>	 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) @Marostegui Dell wants for us to run onboard hardware diagnostics which can take many hours before completion .   ` Papaul,     Apolog...
[04:39:24] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Marostegui) From what I can see this host isn't assigned to a partman recipe, but I am going to leave this to @jcrespo as this host is going to th...
[04:43:48] <wikibugs>	 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Thank you @Papaul - let me know when you want me to have the host ready for you and I will make sure to have MySQL stopped there.
[06:22:09] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on db2074 is CRITICAL: 4.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2074&var-port=9104
[06:23:00] <marostegui>	 I think I know what that is
[06:24:29] <marostegui>	 The SpecialMostLinked scripts were running and they are known for being a bit heavy
[06:26:35] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on db2074 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2074&var-port=9104
[06:29:05] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on db2127 is CRITICAL: 4.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104
[06:35:53] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on db2127 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104
[06:49:26] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) @Papaul, as a general rule, al db* hosts with the same spec, as far as first install, they should have the custom/db.cfg recipe. I believ...
[06:52:21] <jynus>	 what is the resouce that gets constrained, cpu or Io? You know?
[06:54:19] <jynus>	 its io or memory bandwidth
[06:55:00] <jynus>	 and host summary dashboard is horrible to detect this
[06:56:10] <jynus>	 I think it also snowballs- it creates 1 second of lag, which makes processes force a stall, which creates a processoverload
[06:56:39] <jynus>	 https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=37&orgId=1&var-server=db2127&var-port=9104&from=1599718550859&to=1599719977642
[07:00:24] <marostegui>	 jynus: from what I can see the transfer between es hosts is still running, the new hosts arrived and I guess they'll be ready by next week, what should we do for the next transfers to avoid hitting the data transfer issue?
[07:01:09] <jynus>	 assuming this completes with no issue, my best bet is puppet ownership change of sqldata creating a racecondition in which data cannot be written
[07:01:28] <jynus>	 so the advice would be to transfer to a different dir than sqldata
[07:01:34] <marostegui>	 ok, I will do that
[07:01:36] <jynus>	 then rename
[07:01:56] <jynus>	 either that or disable puppet, but I think the first would be easier
[07:02:14] <jynus>	 I cannot fix this on programming because the issue would be on how puppet executes
[07:03:04] <jynus>	 if it completes, the way to close the ticket is reproduce it with a smaller transfer size
[07:03:31] <marostegui>	 But this would have hit us in any other transfer, as puppet executes several times during a normal 1TB transfer, no?
[07:04:05] <jynus>	 not really because I guess normally the writes on sqldata are not very frequent
[07:04:23] <jynus>	 and the transfer is small that the race condition is unfrequent
[07:04:24] <marostegui>	 What do you mean with writes?
[07:04:47] <jynus>	 so puppet only changes the sqldata permissions
[07:04:56] <jynus>	 and writes can happen before and after puppet runs
[07:05:13] <jynus>	 it is only on the miliseconds between ownership change and mode change that the issue happens
[07:05:21] <marostegui>	 what do you mean with writes?
[07:05:40] <jynus>	 writes to sqldata dir
[07:05:46] <jynus>	 not subdirs
[07:05:48] <marostegui>	 but which writes?
[07:05:58] <jynus>	 writes from transfer, from tar
[07:06:16] <marostegui>	 but that should be the same on an es transfer and on a normal 1 or 2TB transfer, unless I am missing something
[07:06:19] <marostegui>	 what's the difference?
[07:06:37] <jynus>	 so as far as I think
[07:06:51] <jynus>	 it is a combination of situations that are very unlikely
[07:07:10] <jynus>	 issues happening after the first error I think were just something else (bad state)
[07:09:01] <marostegui>	 but what is the difference between a sX transfer and esX transfer (apart from the size)?
[07:09:15] <jynus>	 so several things
[07:09:32] <jynus>	 most transfers are:
[07:09:46] <jynus>	 backups (which are not affected by puppet change, as they write somewhere else)
[07:10:09] <jynus>	 recoveries (which are not affected by puppet change because they don't write sqldata, only files inside)
[07:10:57] <jynus>	 an the few clones you may have done, they happened to be small enough that the error wouldn't happen (puppet has to run at the same time than a top-level write is done)
[07:11:12] <jynus>	 for example, enwiki only has like 10 writes on top level
[07:11:19] <jynus>	 most of the time is transfering contents of enwiki
[07:11:32] <jynus>	 es is likely to have 900 of those writes
[07:11:39] <jynus>	 plus more time for puppet runs
[07:12:01] <jynus>	 normal runs take around 50 minutes, in which only 1 or 2 puppet runs will happen
[07:12:34] <jynus>	 while in es it is 12 hours (12*3 runs)
[07:12:44] <kormat>	 why do the permissions get changed?
[07:13:28] <jynus>	 so I didn't setup this, but I guess the idea is to enforce that a running mysql has its directory available for mysql
[07:13:56] <kormat>	 jynus: i'm not asking why puppet sets the permissions. i'm asking why they are not already set to what puppet expects
[07:14:00] <jynus>	 the problem is puppet applies the changes not atomicly, so it removes write permissions, and then changes ownership (or the other way round)
[07:14:25] <jynus>	 kormat: I think untar writes things as root, and the sets the permisions to the original ones
[07:14:30] <marostegui>	 kormat: Because the directory gets created when the transfer starts, then puppet comes and realises oh: those are not the permissions it should have, then changes them
[07:15:04] <jynus>	 so I belive it to be a race condition between puppet and untar writing sqldata dir
[07:15:07] <marostegui>	 But my question is, why that doesn't happen on every single transfer, despite of the sizes
[07:15:34] <jynus>	 I mentioned before, on other kinds of transfers (xtrabackup and decompress, sqldata is not touched)
[07:16:21] <jynus>	 also it doesn't happen always, only when certain write and puppet runs happen at the same time
[07:16:50] <marostegui>	 I will do the transfering to a different directory as a workaround for the next servers
[07:16:50] <jynus>	 the time in wich puppet is inconsistent would be the time between runs of chmod and chown 
[07:17:03] <kormat>	 does untar produce any useful error message when this happens?
[07:17:30] <marostegui>	 the transfer finished
[07:17:32] <jynus>	 well, not as much untar as the pipe breaking
[07:17:37] <marostegui>	 2020-09-10 07:01:51  INFO: 9366738018624 bytes correctly transferred from es2014.codfw.wmnet to es2026.codfw.wmnet
[07:17:37] <marostegui>	 ----- OUTPUT of '/sbin/iptables -...t 4400 -j ACCEPT' -----
[07:17:37] <marostegui>	 iptables: Bad rule (does a matching rule exist in that chain?).
[07:17:44] <marostegui>	 2020-09-10 07:01:52  WARNING: Firewall's temporary rule could not be deleted
[07:17:51] <marostegui>	 other than that, it looks like it finished correctly
[07:17:55] <jynus>	 but we can reproduce it with shorter transfers to confirm
[07:18:47] <jynus>	 marostegui: it should say checksum matches before the "correctly transferred"
[07:19:27] <marostegui>	 did we leave checksuming enabled?
[07:19:39] <jynus>	 parallel one, which is new on 1.0
[07:19:41] <marostegui>	 I don't see that sentence so far, I am still checking
[07:19:48] <jynus>	 maybe it was not enabled
[07:20:19] <marostegui>	 100.0% (1/1) of nodes failed to execute command '/bin/bash -c "[ ...ource.md5sum" ]"': es2014.codfw.wmnet
[07:20:19] <marostegui>	 100.0% (1/1) of nodes failed to execute command '/bin/bash -c "[ ...ource.md5sum" ]"': es2014.codfw.wmnet
[07:20:19] <marostegui>	 0.0% (0/1) success ratio (< 100.0% threshold) for command: '/bin/bash -c "[ ...ource.md5sum" ]"'. Aborting.
[07:20:19] <marostegui>	 0.0% (0/1) success ratio (< 100.0% threshold) for command: '/bin/bash -c "[ ...ource.md5sum" ]"'. Aborting.
[07:20:19] <marostegui>	 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
[07:20:19] <marostegui>	 0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
[07:20:21] <marostegui>	 ----- OUTPUT of '/bin/bash -c "[ ...arget.md5sum" ]"' -----
[07:20:43] <marostegui>	 checksum = False
[07:20:47] <marostegui>	 it is disabled on config
[07:21:03] <jynus>	 check parallel_checksum = True
[07:21:24] <jynus>	 also ignore the succes ratio of cumin, it is based on the exit return
[07:21:41] <jynus>	 and sometimes we run command to check it returns something other than 0
[07:22:01] <marostegui>	 So other than those I cannot see any other message about checksumming 
[07:22:13] <jynus>	 can I connect to the screen, to double check?
[07:22:19] <marostegui>	 absolutely
[07:22:20] <jynus>	 I thought I had enabled it
[07:22:29] <volans>	 if you're 100% sure you're running a RO command (like grep and similar) you an use -x, --ignore-exit-codes
[07:22:53] <jynus>	 2020-09-10 07:01:50  INFO: Parallel checksum of source on es2014.codfw.wmnet and the transmitted ones on es2026.codfw.wmnet match.
[07:22:59] <volans>	 but then you can distinguish only by the output, not anymore by exit code 0/non-zero
[07:23:02] <jynus>	 volans: nice one, I will implement that
[07:23:14] <jynus>	 ^marostegui sorry about the verbose mode
[07:23:24] <jynus>	 it is very verbose :-), but it checked ok
[07:23:35] <jynus>	 md5sums coincide
[07:23:54] <jynus>	 7522382c3ba4a4dbe97ae6c5f57d822f  - on both
[07:24:02] <marostegui>	 excellent, thanks
[07:24:11] <jynus>	 we were running in verbose mode to debug
[07:24:34] <jynus>	 you should be able to not use it for later transfers, things will be much clear
[07:24:52] <jynus>	 so my plan is to reproduce the issue on a test host
[07:25:04] <jynus>	 with a short transfer + manual puppet runs
[07:25:10] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) Apologies for the last minute change, the upgrades for these 2x PDUs will be postponed until a later date.  Both dc-ops engineers at eqiad are recov...
[07:25:12] <jynus>	 to be 100% sure it is that
[07:26:06] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Nintendofan885)
[07:26:26] <jynus>	 again, this would only affect clonings (not backups and recoveries) to /srv/sqldata (or any other dir that changes owner and mode on puppet to something other than root)
[07:26:57] <jynus>	 of course, I have not proof of that, but that is what I want to get
[07:27:08] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Wed, Sept 9 PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) Latest update: Due to another separate injury, the upgrades for these 2x PDUs will be postponed again for a later date.  No PDU upgrades for the rest...
[07:27:17] <jynus>	 I think also after the first error and a lot of tests there was some confusion after the few errors
[07:28:03] <jynus>	 some of those (cleanup and concurrency) would be best handled on the latest version
[07:29:22] <jynus>	 I will paste a sumary with my initial suppositions and how I will try to prove them
[07:29:25] <jynus>	 on ticket
[07:30:05] <marostegui>	 for the record and posterity let's also mention what's the preferred workaround (transfer to a different directory and then rename)
[07:30:38] <jynus>	 I would profer if that was not even an issue, but it is a recommendation I give you now, will add it
[07:30:59] <jynus>	 either by changing how transfer works or how puppet works
[07:31:40] <marostegui>	 yes, that's why it is a workaround
[07:31:48] <jynus>	 for example, if we know puppet is changing permissions of sqldata, maybe we can ask transfer to copy only the contents, but not the dir itself
[07:36:31] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on db2127 is CRITICAL: 4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104
[07:39:24] <wikibugs>	 10DBA: transfer.py fails when copying data between es hosts - https://phabricator.wikimedia.org/T262388 (10jcrespo) After having tested the firewall extensively on other tests, I believe it is indeed related to puppet runs, which mid-transfer would alter the permissions of /srv/sqldata both in mode and owner, li...
[07:40:39] <jynus>	 one option would be to change puppet writing (mode changing) the datadir into a check
[07:41:06] <jynus>	 not because not touching transferpy, but because it could also impact other tooling that writes there
[07:42:03] <jynus>	 although to be fair, mode changes only happen because the initial write is done as root, and only later it is restored the original ownership
[07:42:11] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on db2127 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2127&var-port=9104
[07:42:36] <jynus>	 other option would be for transfer to disable puppet on target while transfer happens
[07:44:30] <wikibugs>	 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) a:03Marostegui es2026 has been cloned, not pooling it yet - will let it run till Monday, as this would be the first production es (RO) host running 10.4 and Buster.
[07:44:47] <jynus>	 sorry for the overhead this is causing you
[07:45:00] <jynus>	 but this is an issue it was important to detect
[07:48:23] <marostegui>	 it is ok, not overhead, I will do the workaround for the next hosts and we can track the progress on the task
[07:49:49] <jynus>	 it is also possible I was doing that without thinking
[07:50:21] <jynus>	 as in, when I did a clone, I always wrote to a different dir because I was worried about puppet getting in the way
[07:51:46] <jynus>	 and I didn't tell you because I really didn't have a concrete reason why I was doing it
[07:51:56] <marostegui>	 it is ok jaime, we've found a workaround, we can work on more long-term solutions once we've fully identified the issue with the work pending at https://phabricator.wikimedia.org/T262388#6449717
[08:31:37] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts: ` ['db2141.codfw.wmnet'] ` The log can be fo...
[08:44:03] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo)
[09:04:12] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for jawikivoyage - https://phabricator.wikimedia.org/T260482 (10jhsoby) >>! In T260482#6441747, @jcrespo wrote: > @jhsoby this task only relates to cloud infrastructure- it won't make search (or anything else) work on a w...
[09:12:00] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2141.codfw.wmnet'] `  and were **ALL** successful.
[09:12:46] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for jawikivoyage - https://phabricator.wikimedia.org/T260482 (10jcrespo) >>! In T260482#6449838, @jhsoby wrote: > Thanks, I was confused then. I'll open a new task for this instead.  Thank you, that would work too. Sorry...
[09:23:40] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo)
[09:25:21] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10jcrespo) @Papaul, this is all completed after my patch. Only leaving it open so you can see it (e.g. in case you need to do something else not on...
[10:32:17] <icinga-wm>	 PROBLEM - MariaDB sustained replica lag on db1123 is CRITICAL: 2.2 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1123&var-port=9104
[10:32:33] <marostegui>	 ^ that is probably me
[10:39:51] <icinga-wm>	 RECOVERY - MariaDB sustained replica lag on db1123 is OK: (C)2 ge (W)1 ge 0.8 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1123&var-port=9104
[11:59:56] <wikibugs>	 10DBA, 10SRE-tools, 10conftool, 10serviceops, and 2 others: Alerting spam and wrong state of primary dc source info on databases while switching dc from eqiad -> codfw - https://phabricator.wikimedia.org/T261767 (10Marostegui) @RLazarus what do you want to do with this task? is this something that needs fi...
[12:17:15] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat)
[12:19:45] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat) s6 eqiad progress: [] db1085.eqiad.wmnet [] db1088.eqiad.wmnet [] db1093.eqiad.wmnet [] db1096.eqiad.wmnet [] db1098.eqiad.wmnet [] db111...
[12:20:18] <wikibugs>	 10Blocked-on-schema-change, 10DBA, 10Operations, 10User-Kormat: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831 (10Kormat)
[13:16:41] <wikibugs>	 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) @Jdforrester-WMF like we've done with some other wikis in the past, can we just truncate the tables and consider this done?
[13:20:20] <wikibugs>	 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) After our IRC chat, this is scheduled for Monday 14th
[13:25:45] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Papaul) 05Open→03Resolved
[13:31:38] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10BrownHairedGirl) >>! In T262239#6445454, @Marostegui wrote: > It is a very complex and hard to operate infrastructure, and that's...
[13:33:56] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Marostegui) >>! In T262239#6450454, @BrownHairedGirl wrote: >>>! In T262239#6445454, @Marostegui wrote: >> It is a very complex a...
[14:02:40] <wikibugs>	 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Jdforrester-WMF) That's fine by me.
[14:10:09] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[15:14:43] <wikibugs>	 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Bugreporter) Compare: (not to do so) {T169928} {T227717} (to do so but first rename them) {T260112}  Personally I support the latter - first rename databa...
[15:19:11] <wikibugs>	 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) We cannot rename a database, that's not supported by Mysql unfortunately :-(
[15:50:36] <wikibugs>	 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10BrownHairedGirl) >>! In T262239#6450462, @Marostegui wrote: > The service is effectively up, but with lag. I just took a quick lo...
[16:04:39] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2027.codfw.wmnet ` The...
[16:23:38] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2027.codfw.wmnet'] `  Of which those **FAILED**: ` ['es2027.codfw.wmne...
[16:25:06] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2027.codfw.wmnet ` The...
[16:45:42] <wikibugs>	 10DBA, 10Goal: Expand database provisioning/backup service to accomodate for growing capacity and high availability needs - https://phabricator.wikimedia.org/T257551 (10jcrespo) db2141 has been fully setup as a new buster backup source, with s1 an s6, and added to tendril and zarcillo.
[17:10:58] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2027.codfw.wmnet'] `  and were **ALL** successful.
[17:20:59] <longma>	 Hello, is anyone around to help with an issue is #-operations?
[18:07:22] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2028.codfw.wmnet ` The...
[18:09:41] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[18:28:52] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[18:48:31] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2028.codfw.wmnet'] `  and were **ALL** successful.
[19:14:46] <wikibugs>	 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jijiki)
[19:16:13] <wikibugs>	 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jijiki)
[20:05:42] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2029.codfw.wmnet ` The...
[20:44:30] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2029.codfw.wmnet'] `  and were **ALL** successful.
[20:47:18] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[20:51:30] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2030.codfw.wmnet ` The...
[21:11:40] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[21:31:22] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2030.codfw.wmnet'] `  and were **ALL** successful.
[21:33:21] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2031.codfw.wmnet ` The...
[22:13:10] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2031.codfw.wmnet'] `  and were **ALL** successful.
[22:14:12] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2032.codfw.wmnet ` The...
[22:50:09] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul)
[22:55:06] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2032.codfw.wmnet'] `  and were **ALL** successful.
[22:55:59] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2033.codfw.wmnet ` The...
[23:35:29] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2033.codfw.wmnet'] `  and were **ALL** successful.
[23:44:39] <wikibugs>	 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` es2034.codfw.wmnet ` The...