[01:20:07] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence) - https://phabricator.wikimedia.org/T260819 (10Papaul) [01:24:34] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) @Marostegui i can only get you 1 server at A1 [01:27:27] 10DBA: "Wikimedia\Rdbms\DBQueryError" - https://phabricator.wikimedia.org/T260962 (10RoySmith) [01:28:12] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [01:29:07] 10DBA: "Wikimedia\Rdbms\DBQueryError" - https://phabricator.wikimedia.org/T260962 (10Reedy) [06:01:31] 10DBA, 10Patch-For-Review: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) The above patch "fixed" the alert, or at least it made the web request 10x faster (from 12 seconds to 0.272 s). Now we need to apply an equivalent patch to tendril, as I can see both the tree and host... [06:04:36] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10jcrespo) @Papaul (manuel is on vacations until Monday), what about 1 on A1 and 2 on A6? Same row but it looks like it could fit it? [06:56:51] 10DBA, 10Patch-For-Review: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) Manually applying this fix to tendril for /host and /tree too seems to get rid of long running queries on db1115: {F32190821} The graphs are slow, but I am not sure if they were always have been like t... [07:01:54] 10DBA, 10Patch-For-Review: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) I've just realized that the QPS shows 0 for all hosts, so this is not really a fix as much as we fail the query immediately. Debugging. [07:27:41] 10DBA, 10Patch-For-Review: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) I had a few formatting errors, fixed on the latest upload. [07:38:50] 10DBA, 10Patch-For-Review: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) @kormat @Marostegui Please check tendril works normally (for some meaning of "normal"), and I will close this once the workaround has been deployed. [08:36:14] 10DBA: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10Kormat) They both seem normal to me. [08:38:49] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Kormat) 05Open→03Resolved No further issues seen with db2125, so i'm going to resolve this task. [08:51:41] 2020-08-20 16:02:27 SYS1001 System is turning off. [08:51:47] but it doesn't say why [08:51:56] "System CPU Resetting." [08:52:30] I wasn't online when that happened [08:58:43] there is literally no longs in the last 5 days [09:02:14] it seems it did a normal stop after puppet ran [09:04:07] but no one was logged in at the time [09:04:17] maybe a cumin command, but I don't think so [09:05:28] I found it [09:05:36] Aug 20 16:00:47 dbprov2003 systemd-logind[1310]: Power key pressed. [09:05:48] Aug 20 16:00:47 dbprov2003 systemd-logind[1310]: Powering Off.. [09:06:00] maybe it was put down by accident by dc ops [09:06:04] no big deal [09:09:46] ohh. ok. nice find! [09:10:41] I will ask dc ops [09:10:48] because it could an installation error [09:10:59] e.g. a cable making the button half-pressed or something [09:11:12] * kormat nods [09:13:15] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) 05Resolved→03Open Hey, Papaul, just to discard intruders on dc or other major hardware issues, could you maybe accidentally have pressed th... [09:13:20] thanks for pointing out dbprov being down, I sometimes skip over hosts down [09:13:27] np :) [09:13:50] the backups being stale alert would have taken 1 extra day, and that would have led to delays until monday [09:14:06] did you see the tendril/dbtree fix? [09:15:40] i did [09:16:14] I made a mistake because I yolo'd it [09:16:33] it happens :) [09:16:46] 10DBA: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10jcrespo) 05Open→03Resolved a:03jcrespo [09:17:01] not sure if the query is back to normal levels, I saw one time it taking 2 seconds to run [09:17:16] but I don't think I want to spend more time on that [09:17:33] icinga is at least not complaining any more [09:17:48] yes, it made the query 10x faster [09:17:56] more or less [09:31:33] 10DBA, 10User-Kormat: Create testing environment for db automation - https://phabricator.wikimedia.org/T256602 (10Kormat) [09:34:20] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat) [09:34:27] jynus: ty [09:38:11] 10DBA, 10User-Kormat: Create testing environment for db automation - https://phabricator.wikimedia.org/T256602 (10Kormat) Status update: - We have a pontoon env running in the `mariadb104-test` cloud VPS project. - We have these nodes: -- puppetmaster -- puppetdb -- cumin -- 4x db nodes managed by puppet -- 1x... [13:27:58] 10DBA, 10MediaWiki-extensions-OAuthRateLimiter, 10Patch-For-Review, 10Platform Team Initiatives (API Gateway), and 3 others: Review request for a new database table for OAuthRateLimiter - https://phabricator.wikimedia.org/T258711 (10Naike) [13:30:42] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul) dbprov2003 is in D4 an i do not recall working in D4 yesterday when on site. i worked in D2 and C3. the only action taken in D4 yesterday was to... [17:07:37] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) [17:09:28] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) [17:46:19] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (2020-08-15) rack/setup/install dbprov1003.eqiad.wmnet - https://phabricator.wikimedia.org/T258750 (10Cmjohnson) [17:47:28] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) [17:47:52] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) [17:48:21] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried)