[04:35:03] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) 05Stalled→03Open [04:43:25] 10DBA: pl_namespace index on pagelinks is unique only in s8 - https://phabricator.wikimedia.org/T256685 (10Marostegui) 05Stalled→03Open [04:43:32] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Followup), 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [05:02:19] 10DBA, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Patch-For-Review, 10User-Marostegui: DBA review for Echo push notification subscription tables - https://phabricator.wikimedia.org/T246716 (10Marostegui) >>! In T246716#6414387, @Mholloway wrote: > Great! Looks like we're close... [05:08:10] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Marostegui) 05Stalled→03Open [05:10:58] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) @Papaul do you want me to attempt to get es2026 installed? [05:15:57] PROBLEM - MariaDB sustained replica lag on db1083 is CRITICAL: 438 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1083&var-port=9104 [05:16:06] ^ expected [05:17:06] I thought I had downtimed the host, but I didn't, now I have done it for real [08:02:02] 10DBA: pl_namespace index on pagelinks is unique only in s8 - https://phabricator.wikimedia.org/T256685 (10Marostegui) The master is done: ` # mysql.py -hdb1109 wikidatawiki -e "show create table pagelinks\G" *************************** 1. row *************************** Table: pagelinks Create Table: CRE... [08:02:25] 10DBA: pl_namespace index on pagelinks is unique only in s8 - https://phabricator.wikimedia.org/T256685 (10Marostegui) 05Open→03Resolved [08:02:36] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Sustainability (Incident Followup), 10WorkType-NewFunctionality: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [08:03:13] 10DBA: pl_namespace index on pagelinks is unique only in s8 - https://phabricator.wikimedia.org/T256685 (10Marostegui) [08:26:53] PROBLEM - MariaDB sustained replica lag on db1109 is CRITICAL: 305 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1109&var-port=9104 [08:27:02] ^ expected [08:28:48] basically all DB problems are expected. the only thing that's unexpected is when they work [08:32:13] As I am not used to downtime eqiad hosts, I messed up some .eqiad and .codfw files I use to downtime stuff :p [08:35:57] has IRC notices been compartmentalized? I don't remember them logging here except the lag one [08:36:02] (not complaining) [08:36:17] ah, it IS a replica lag one [08:36:32] This is the new alert we are still "tweaking", so it only alerts here [08:36:48] yeah, no problem, I though all mariadb ones were moved [08:36:59] and wouldn't complain, just wasn't aware [09:27:59] marostegui: quick one for you, do you recall how the grants for debmonitor are setup? [09:28:12] volans: mysql grants? [09:28:15] yes [09:28:20] I can check it for you [09:28:29] <3 [09:29:19] volans: excluding the backup grants I guess this is what you want to know : [09:29:19] GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE ON `debmonitor`.* TO `debmonitor`@`10.64.32.62`; [09:30:07] moritzm: ^^^ wondering how it can work :D [09:30:28] volans: there are obviously more IPs, but the grants are the same [09:30:41] marostegui: is 10.64.16.72 part of them? [09:31:11] volans: no from what I can see, but let me dig cause maybe we have 10.64.% or something [09:31:13] let me check [09:31:52] volans: https://phabricator.wikimedia.org/P12502 [09:31:57] looking [09:32:34] the dbproxy are expected I guess [09:32:39] yep [09:32:44] but restbase2015? [09:32:46] restbase2015-a.codfw.wmnet. wut? [09:32:48] maybe an old host [09:32:52] that got the IP reused [09:33:07] yeah, 10.192.32.22 currently resolves to restbase2015-a.codfw.wmnet [09:33:22] isn't the mysql host-based authentication mechanism the best? [09:33:26] marostegui: those are grants or connections? [09:33:32] grants [09:33:49] ok we need to re-audit them and fix them :D [09:33:59] but still can't understand how debmonitor2001 works [09:34:10] volans: Yeah, I am sure it is a left over from some hosts decommissioning [09:34:10] *debmonitor1002 [09:34:32] volans: I will remove that grant then [09:34:45] marostegui: wait a sec there ar emore [09:35:54] THat IP is only allowed for the debmonitor user, haha [09:36:19] which one? [09:36:27] the restbase one [09:42:41] marostegui: so, I'm on debmonitor1002 now (10.64.16.72) and I can query stuff on the debmonitor db with the debmonitor user and pass set by puppet connecting to m2-master.eqiad.wmnet [09:42:57] I tried selects only so far [09:44:03] volans: let me dig [09:44:28] we're happu it works, but I'm surprised it does :) [09:45:11] yeah, writes are working as well, I updated a package on mw1284 and the updated was properly recorded via the apt hook [09:45:32] volans: you are connecting with the debmonitor user for sure? [09:47:09] db=_mysql.connect(host="m2-master.eqiad.wmnet",user="debmonitor",passwd="...",db="debmonitor") [09:47:21] from python [09:47:26] using the same package django uses [09:47:35] I can reconnect if you want to see the connection open [09:47:49] ahh. m2-master is a dbproxy [09:47:53] * volans reconnected [09:48:36] volans: yeah, but you are going thru the proxy [09:49:06] I thought you were hitting db1107.eqiad.wmnet [09:49:07] directly [09:49:15] sure, that's what you gave me to set at the time :) [09:49:34] yeah, but then that makes sense that it is working [09:52:58] so shouldn't be the grants just the dbproxies then? [09:53:05] I don't think we connect directly ever [09:53:13] then I will get rid of the others [09:53:25] it should always go thru the proxies yep [09:53:31] (also that means we're not really limiting anything, but that's another story :) ) [09:54:51] volans: I will prepare a patch for this, as I don't even see those grants on the grants file, so I will get all that sorted [09:55:00] volans: will send you the patch when it is ready [09:56:03] marostegui: that's great, thanks a lot [10:00:13] I joined the PDU maintenance meetnig [10:00:15] XD [10:00:18] Sorry, going for ours [10:01:33] jynus: you joining? [10:15:51] 10DBA, 10Cloud-Services, 10Platform Team Initiatives (API Gateway): Prepare and check storage layer for api.wikimedia.org - https://phabricator.wikimedia.org/T246946 (10hnowlan) This wiki is now live, user creation should be enabled. [10:59:44] 10DBA, 10Data-Services, 10Platform Team Initiatives (API Gateway), 10cloud-services-team (Kanban): Prepare and check storage layer for api.wikimedia.org - https://phabricator.wikimedia.org/T246946 (10Marostegui) 05Stalled→03Open a:05Marostegui→03None Thanks for the heads up @hnowlan - triggers are... [11:19:53] volans: thanks for nerdsnipping me into debmonitor grants...this is what I am feeling now https://www.youtube.com/watch?v=AbSehcT19u0 [11:33:10] marostegui: same when i look at puppet :) [11:33:28] haha [11:33:36] ahaha [11:33:39] sorry about that [11:33:55] I know you aren't [11:34:40] marostegui: just one dbproxy in codfw? [11:34:45] yep [11:39:45] I didn't get PS2 :) [11:42:41] marostegui: I'm confused, which one is the right proxy? dbproxy2001 or dbproxy2002? [11:43:11] dbproxy2002, fixing that [11:43:45] ahhh got it now :D [11:45:21] thx again :) [11:50:45] volans: can you confirm if everything keeps working for you? [11:50:54] I have dropped all the grants but the dbproxy* ones [11:57:59] let me doublecheck a write change to the debmonitor db [11:58:05] thanks moritzm [11:58:06] checking [11:58:45] so far looks good to me [11:58:53] but also debmonitor connection would not be killed right? [11:59:01] wanna me to force a reconnect? [11:59:19] yeah, updated from a host was properly recorded [12:00:11] volans: yeah, let's try that just in case (even though I see them thru proxies already) [12:00:52] ack, restarting uwsgi on debmonitor1002 [12:01:38] {done} [12:02:06] all keeps looking good here [12:02:19] great! [13:27:45] 10DBA, 10Goal: Expand database provisioning/backup service to accomodate for growing capacity and high availability needs - https://phabricator.wikimedia.org/T257551 (10Jclark-ctr) [14:21:39] 10DBA, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) @dpifke all good from your side? Can this be resolved? [14:43:17] 10DBA, 10Operations, 10ops-eqiad: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Jclark-ctr) Hello John, Thank you for the AHS. However, as per the AHS we see no hardware errors and the log event page also seems to be empty. Hence, request you to assist us with the sc... [15:01:31] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) @Marostegui go for it [15:02:52] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['es2026.codfw.wmnet... [15:14:20] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) @Papaul can you check the cable/switch/interface? ` PXE-E61: Media test failure, check cable ` [15:25:19] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) @Marostegui holiday today in the U.S so not at the DC. It is not a cable problem ` papaul@asw-a-codfw# run show interfac... [15:26:17] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) @Papaul sure, no need to get it done today - you shouldn't be checking phab even! :-) Enjoy your day off! :) [15:52:28] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) No luck there @Papaul, things I have noticed: - the mac address on the DHCP file was pointing to the 10G interface.... [16:03:17] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2026.codfw.wmnet'] ` Of which those **FAILED**: ` ['es2026.codfw.wmne...