[05:58:39] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 05WMF-NDA: Drop tables with no events in last 90 days. - https://phabricator.wikimedia.org/T161855#3145706 (10Marostegui) That is fine, thanks for the heads up. Normally I like to run those statements with: ``` set session sql_log_bin=0 ; DROP TABLE IF... [07:41:50] 10DBA, 13Patch-For-Review: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079#3145821 (10Marostegui) Will leave this alter table running on db2019 (codfw master) on Monday, it will generate lag on codfw. [07:51:26] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 05WMF-NDA: Drop tables with no events in last 90 days. - https://phabricator.wikimedia.org/T161855#3145859 (10jcrespo) +1 [08:20:53] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3145898 (10Marostegui) Things that we need to do while codfw is active: - ALTER eqiad s4 master: T73563 - ALTER eqiad shards: T130067 - ALTER eqiad shards: T147166 - ALTER eqiad masters (if we finish on time the sl... [08:21:14] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3145903 (10Marostegui) [08:35:08] marostegui, do you remember which ticket did I transform the special slaves on? [08:35:21] it was a bug report more or less handled by anomie [08:35:30] i think so yes [08:35:32] I know the key word [08:35:38] what was it [08:35:39] (if it is the one I believe) [08:35:40] give me a sec [08:35:51] https://phabricator.wikimedia.org/T147747 [08:35:53] is that one? [08:36:09] yes, thank you [08:36:12] yw! [08:36:24] I want to see why the special slave is not special [08:36:32] if it was moved, maintenance, etc. [08:38:44] how could I miss an unpartitioned slave? maybe it didn't have the role at the time? [08:40:12] look at this [08:40:13] https://gerrit.wikimedia.org/r/#/c/297235/ [08:41:08] but it doesn't say why [08:41:25] failover maybe db1034 failed? [08:41:55] db1034 seem to have the right indexes [08:42:26] I am going to use it back and see how it evolves [08:42:46] db1062 will need to be depooled for alter anyway [08:42:59] there is no ticket related to db1034 around that date (or I cannot find it) [08:50:59] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3145999 (10jcrespo) > Feel free to edit this comment and add/change stuff as needed. Sadly, phabricator is not a wiki- either put it on the header, a wiki or an etherpad :-) [08:51:47] ^I cannot edit your comment [08:51:55] ah! no worries, will do it now :) [08:52:12] right you can only edit the task description :) [08:55:49] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146014 (10Marostegui) https://etherpad.wikimedia.org/p/dba-dc-tasks [08:56:03] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146015 (10Marostegui) [09:01:52] I think I am going to move it to a wiki- I am adding the commands and I do not want to non notice changes there [09:01:58] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: [Task] Remove "all" option for Special:EntitiesWithout*" - https://phabricator.wikimedia.org/T161631#3146023 (10thiemowmde) [09:02:28] sure, that is fine too [09:02:39] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: [Task] Remove "all" option for Special:EntitiesWithout*" - https://phabricator.wikimedia.org/T161631#3137680 (10thiemowmde) p:05High>03Low [09:02:57] imagine somone makes a joke and puts a DROP there [09:03:10] yeah, that's true XD [09:11:27] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146044 (10jcrespo) I have migrated that to the wiki for security reasons, as I have added details of that and added some extra tasks , and questions about others: https://wikitech.wikimedia.org/wiki/Switch_Datacen... [09:11:36] https://wikitech.wikimedia.org/wiki/Switch_Datacenter/planned_db_maintenance [09:11:47] We should put them in order of importance [09:12:41] try to do as many as possible, but stop at some point if we run out of time [09:13:28] agreed [09:13:37] I have done some edits [09:13:42] me too [09:13:43] check it when you have time [09:14:42] the master failover I have not told you [09:14:52] no, you haven't [09:14:52] thinking about decomissioning old servers [09:14:56] ah [09:15:05] yes, we have some masters that will go away [09:15:07] we probably want to minimize distruption [09:15:07] good catch [09:15:09] if possible [09:15:24] yes, it is a good idea [09:15:26] by putting newer masters [09:15:35] needs work [09:15:59] yes [09:28:38] 10DBA, 07Epic: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146087 (10jcrespo) [09:40:17] db1066.eqiad.wmnet has issues with the new code [09:40:33] errors? [09:40:34] https://logstash.wikimedia.org/goto/de5e7f665991fc8c771874fac75098ac [09:40:51] of :( [09:41:05] not sure why it is executed there [09:41:15] but I am going to depool it [09:41:42] mmm it has the correct PK, but not the index [09:42:41] you want me to add it? [09:43:12] first: [09:43:21] https://gerrit.wikimedia.org/r/345811 [09:43:27] then we can discuss :-) [09:43:45] +1ed [09:44:09] ah, I see [09:44:21] unlike the other fix [09:44:30] this wasn't pinned to a single role [09:44:38] the other API servers do have the index [09:44:46] yeah, it has to be only that [09:44:56] yeah it is weird because it has the correct PK but not the indexes [09:45:00] because I only see this host with errors in the last hours [09:45:03] whereas the other have the whole thing "wrong" [09:45:08] pk, and all that [09:45:13] once it is depooled I can run the alter [09:45:15] I have it handy [09:45:17] we stopped enwiki alter [09:45:27] that was probably going to happen [09:45:43] now we need the alters, rathern than being blocked :-) [09:46:45] we need the index to exist so we can ignore it [09:46:50] XDDD [09:47:03] ok, i will deploy it once you've depooled it [09:47:12] probably index hints should be done so that they only work if indexes exist [09:47:32] but that is difficult to do with performance [09:47:43] whithout doubling the roundtrip [09:47:58] or keeping expensive cache over 40000 dbs [09:51:01] i see you deployed already [09:51:07] want me to run the alter then? [09:51:25] yeah, no more errors now [09:51:28] ok [09:52:02] that was probably the only server without that index in the whole cluster [09:52:18] running now [09:53:37] isn't T159319 fun? [09:54:53] if it lacks other indexes, we can add them now "for free", too :-) [09:56:37] haha [09:57:11] we have a 2x1 offer in indexes today [09:58:21] special Friday deal? [10:00:03] the other "doesn't exist in table" in the last 7 days is 2 requests to arbcom_arwiki, which I fixed previously this week- and those were probably tests, not real traffic [10:01:04] good good [10:01:19] let's see how long this one takes [10:03:56] for the other change, eswiki seems happy [10:04:02] nice [10:04:23] (I only notice that issue when I was about to check the other deploy) [10:05:12] yeah, it is kind of a constant stream of an average of 80 errors or so in logtash [10:05:17] so not that that that much [10:05:23] but not nice indeed to leave it like that the weekend [10:05:45] the problem is the noise [10:05:53] so many closure errors [10:05:57] so many long running queries [10:06:09] that when a real issue happens- we do not notice [10:09:58] 10DBA, 07Epic, 05codfw-rollout: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146214 (10Aklapper) [11:15:32] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3146307 (10jcrespo) There are a lot of differences between the master and dbstore1002 on: ``` itwiki plwiki svwiki ``` One (maybe... [11:18:49] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3146329 (10Marostegui) db1066 is done: ``` root@neodymium:/home/marostegui# mysql --skip-ssl -hdb1066 enwiki -e "show create table revision\G" *************************... [11:24:31] error count seems to be low again [11:24:48] :) [11:25:07] I would leave that ticket open for a week [11:25:11] yes [11:25:12] agreed [11:25:21] and monitor every morning [11:25:27] to catch it if it happens [12:35:58] Based on the latest issues, I have slightly modified the kibana DBQuery Dashboard [12:35:58] hopefully for the better [12:37:03] feedback is welcome [13:17:02] modules/mariadb/manifests/packages.pp seems to be entirely precise-based; is this needed? [13:17:21] same for modules/mysql/manifests/server/package.pp [13:25:19] paravoid I will have a look at it- we do not use package.pp at all on production [13:25:28] we use package_wmf.pp [13:25:35] yeah, it looks like it since it relies on some precise packages :) [13:25:42] so let's kill the dead code? [13:25:51] maybe it is time to delete it and rename package_wmf -> package [13:25:57] nod [13:26:02] I am worried if someone uses the other [13:26:09] let me check it properly [13:26:13] noone should be using precise [13:26:24] yeah, I understand [13:26:32] but I prefer to kill it fully [13:26:36] if possible [13:26:42] sure, by all means :) [13:26:45] let me ask if someone uses the non _wmf [13:26:52] and I will either fix it or delete it [13:27:04] like who? [13:27:11] other users of our puppet repository you mean? [13:27:15] yes [13:27:24] labs/beta/etc [13:27:30] labs doesn't have precise anymore either [13:27:38] this is not about procise [13:27:46] this is about mariadb-server [13:27:51] vs wmf-mariadb10 [13:27:59] sort of [13:28:12] it relies on packages from the precise-wikimedia repository [13:28:18] which will very soon cease to exist [13:28:22] I am not going to fix it if nobody uses it [13:28:28] I prefer to just delete it [13:28:35] and presumably packages in the precise-wikimedia repository are built against precise libraries etc. as well [13:28:37] let me ask around [13:28:42] and I will fix it [13:28:44] ok [13:28:44] do not worry [13:29:25] even if someone uses it, I think we can merge both files os they are properly mantained [13:29:46] unlike now, where I do not touch the packages.pp [13:30:12] I intend to kill the precise-wikimedia suite in our apt repository on Monday [13:30:31] so the packages won't be there anymore [14:02:15] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3146611 (10jcrespo) a:03jcrespo I am kickstarting the replication right now- this required different pupetization of the repliation "grants". I wi... [14:02:34] \o/ [14:06:15] paravoid, I think I wasn't clear enough before. My fault. [14:06:16] I think there could be trusty instances using that class [14:06:38] (yes, even with the precise stuff) [14:06:47] ok [14:06:48] I want to delete directly [14:07:00] rather than just the first patch I sent [14:07:03] that can go now [14:07:32] I will work to delete now the whole class fully- and that will take more (becase it is not on service I own) [14:07:44] services* [14:26:31] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 13Patch-For-Review, 03Wikidata-Sprint: [Task] Remove "all" option for Special:EntitiesWithout*" - https://phabricator.wikimedia.org/T161631#3146645 (10matej_suchanek) [14:30:20] 10DBA, 10Analytics, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3146654 (10Marostegui) Hi! How's the process to decommission db1047 going? [15:26:50] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3146758 (10jcrespo) 05Open>03Resolved a:03jcrespo This is done. Some small follups (not related to jessie) at: T157359 [15:54:49] 10DBA, 07Epic, 05codfw-rollout: Meta DBA ticket for the DC switchover - https://phabricator.wikimedia.org/T155099#3146861 (10Marostegui) [16:17:24] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3146925 (10jcrespo) Ok, now puppet works, but either it puppet needs more work or it fails silently-this needs more researech. Replication is not wo... [17:06:07] 10DBA, 10Analytics-EventLogging, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Improve eventlogging replication procedure - https://phabricator.wikimedia.org/T124307#3147093 (10Ottomata) ping @Marostegui, in case you didn't see it: https://gerrit.wikimedia.org/r/345646