[00:03:53] 10Blocked-on-schema-change, 10DBA, 10Community-Tech: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543452 (10kaldari) [00:04:13] 10Blocked-on-schema-change, 10DBA, 10Community-Tech: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543466 (10kaldari) [00:12:04] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Operations, 10Wikidata, and 5 others: Cache invalidations coming from the JobQueue are causing lag on several wikis - https://phabricator.wikimedia.org/T164173#3543488 (10Krinkle) [00:19:50] 10Blocked-on-schema-change, 10DBA, 10Community-Tech: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543495 (10MusikAnimal) [05:24:05] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3543712 (10Marostegui) Can you give more details about this rename? Number of edits? Wikis with the biggest number of edits? Thanks [05:25:42] 10DBA, 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio: Global rename: Opdire657 → Sakiv; supervision needed - https://phabricator.wikimedia.org/T173834#3541365 (10Marostegui) Most of the edits are on arwiki, which belongs to s7. When would you like to do this rename? [05:29:44] 10Blocked-on-schema-change, 10DBA, 10Community-Tech: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543452 (10Marostegui) a:05jcrespo>03None Hi, We will take care of this, but if possible try to avoid assigning it directly to Jaime (unless he spec... [06:55:13] 10DBA, 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio: Global rename: Opdire657 → Sakiv; supervision needed - https://phabricator.wikimedia.org/T173834#3543759 (10MarcoAurelio) @Marostegui I'll see you in IRC. I'm avalaible today. [07:49:46] 10DBA, 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio: Global rename: Opdire657 → Sakiv; supervision needed - https://phabricator.wikimedia.org/T173834#3543838 (10MarcoAurelio) 05Open>03Resolved Done. [07:56:00] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3542365 (10MarcoAurelio) * `Papa1234` has a total global edit count of 123,428, of which 120,674 are on dewiki (s5) * dewiki has flaggedrevs, according t... [07:57:07] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3543846 (10Marostegui) >>! In T173859#3543842, @MarcoAurelio wrote: > * `Papa1234` has a total global edit count of 123,428, of which 120,674 are on dewi... [07:58:46] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3543864 (10MarcoAurelio) 05Open>03stalled p:05Triage>03Low Okay, therefore I am marking this as stalled/blocked pending resolution of the indexin... [07:59:42] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3543870 (10MarcoAurelio) [07:59:48] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3543869 (10MarcoAurelio) [08:00:53] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3543874 (10Marostegui) Hi, This is now blocking an user rename - T173859. If we can get this merged on mediawik... [08:03:33] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3543876 (10MarcoAurelio) @Marostegui and @jcrespo The operation this task refers to was a user merge, which is not... [08:10:14] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3543879 (10Marostegui) >>! In T172207#3543876, @MarcoAurelio wrote: > @Marostegui and @jcrespo The operation this... [09:10:36] 10Blocked-on-schema-change, 10DBA, 10Community-Tech: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543963 (10Marostegui) [09:15:55] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543964 (10jcrespo) > We will take care of this Actually, we will not, as you can see on https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change c... [09:16:18] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543967 (10jcrespo) [09:17:40] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3543972 (10Marostegui) >>! In T173891#3543964, @jcrespo wrote: >> We will take care of this > > Actually, we will not, as you can see on https://wikitech.wikimedia.org/w... [09:42:23] marostegui: are you with the hosts with shrinking diks space available? [09:42:37] if not, I will take that as it is becoming more of a problem [09:42:59] I am with db1041 now [09:43:23] need help with the other? [09:43:38] db1028 is on s7 too, so we should probably pool another one to replace that one [09:43:44] because db1041 was depooled for a long time [09:43:46] but db1028 is pooled [09:43:47] ok [09:43:51] so maybe we need to replace it basically [09:43:59] and we can checksum its data later [09:44:05] (i can do the checksumming part) [09:46:15] we have db1069 pending to be reimaged [09:46:23] or we can use some of the new ones [09:46:50] one of those, though, is a partitioned one [09:47:25] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#3544010 (10jcrespo) [09:47:31] 10DBA, 10Patch-For-Review: Migrate dbstore2001 to multi instance - https://phabricator.wikimedia.org/T168409#3544009 (10jcrespo) 05Resolved>03Open [09:49:22] also let me know your opinion on https://gerrit.wikimedia.org/r/373256 [09:50:22] Ah, yes, that makes sense to me. dbstore2001 is basically dead now :( [09:50:58] so I say this [09:51:07] db1034 from s7 does not have disk issues yet (81% disk usage) [09:51:11] I pool db1069 for now, to give it at lese some use [09:51:28] copy it from db1041, which was the old master (most reliable one) [09:51:34] it was the old old master [09:51:37] and pool it as vslow, dump, substituyting [09:51:56] did you see: https://phabricator.wikimedia.org/T163190#3540214 ? [09:52:09] db1028 [09:52:11] basically the end of the comment [09:53:09] interesting [09:53:25] tag summary I think could be a derived content [09:53:30] I would ask the developer [09:53:43] and maybe it can be rebuilt, or at least that part? [09:54:00] could be, my main concern is other though [09:54:28] Like, if you copy it from db1041 we would start serving potentially, different data (as you can see, it is very reduced risk) from one slave thatn from the others [09:54:33] As all the others look exactly the same [09:55:15] I think that specific table loses relevance as time goes by [09:55:25] meaning older rows on that table [09:55:34] that is why I am more inclined to option 2 [09:55:35] I would do it nice and slow, but we are a bit on a clock [09:55:44] (I was already taring the content of db1041) [09:57:59] if you archive the data before it is fixed, which checking it in the first place? [09:58:21] what? [09:58:35] *why [09:58:40] Ah [09:58:49] basically to see the content and its relation with the rest of the slaves [09:58:56] And how critical/old it is [09:59:10] yes, but the whole point of the check is to fix the hosts that will stay [09:59:44] Then we need to fix all of them, including codfw [09:59:47] as I said there [09:59:53] how? [10:00:15] only db1041 and db1062 (masteR) look the same [10:00:21] the rest look the same among themselves [10:00:36] and that is why I said that we are serving consistent data while reading, because they all look the same [10:00:52] the two of them that have different data are db1041 (depooled for a looong time) and db1062 (master, which doesn't get reads) [10:00:58] ok, that wasn't clear to me with your comment [10:01:03] ah, sorry [10:01:21] maybe we should clarify that better on the description [10:01:56] 1. Assume that as db1041 has been depooled for a long time and db1062 is the master (and hence doesn't get reads), take a full dump of db1041 and assume that all the slaves are correct and once we have to do a switch over, rebuild db1062 from a slave. [10:01:58] then, most likely the master is wrong? [10:02:04] That is what I wrote, how can we improve it? [10:02:22] My theory is: db1062 was cloned from db1041 as it was the old old master [10:02:26] ANd that is why [10:02:31] ? [10:02:51] what does it mean old old master? [10:03:01] We have db1033, which was the previous master before db1062, and db1041 which was the old master when db1033 was the master [10:03:15] So when we decided that db1062 would be the new master, we recloned it from db1041 (the old master at the time) [10:03:18] It is a bit lioso [10:03:19] XD [10:03:44] "db1041 which was the old master when db1033 was the master" [10:03:46] ? [10:03:53] yes [10:04:01] db1033 was the previous master before db1062 became the master [10:04:02] I think you got it wrong [10:04:33] master failover was db1033 -> db1041 -> db1062 [10:04:56] Ah [10:04:59] Then I got it wrong [10:06:19] if db1041 and db1062 are wrong, I would trust db1033 and the others more [10:06:29] yeah, because db1033 look like the others actually [10:07:44] the thing is, if you archive it before fixing, how can you know what to fix? [10:08:16] I didn't take any decision, I was taring the content because it was already generated [10:08:29] I suggested those two ideas to see what we coiuld agree on [10:08:35] no, the export is ok [10:08:42] I mean the actual host data [10:09:12] I am not understanding what you mean now :( [10:09:41] ok, let's start with: what are you going to do next about that ticket? as in, what would you like to do? [10:10:28] I would like to consider db1041 wrong, save its data just in case and decom it. Why? Because we haven't served those rows for months now, meaning that we are serving consistend data across all the slaves [10:10:40] So I would assume that we haven't had any complains in during those months [10:10:56] but db1062 is aparently problematic, too? [10:11:00] right? [10:11:03] yes [10:11:09] but in that regard, it is not serving data [10:11:17] as: no one is reading from it (i assume) [10:11:52] which host did you compare it with? and how will you fix the others if you decom it? [10:14:31] i compared db1041 with db1079, and for the issues that differed, I compared them with the rest of the hosts (as they were old rows) [10:14:50] if we need to fix the others in the end, we would need to manually run the updates to make the rows the same [10:15:12] which updates? if the server is no longer there, how can you get those updates? [10:15:33] db1062 needs to be failed over anyways (T172459), so we can use that one too if needed [10:15:34] T172459: eqiad row D switch upgrade - https://phabricator.wikimedia.org/T172459 [10:15:35] maybe you got a list or something, I am just asking :-) [10:15:37] we still have db1062 [10:15:50] which has same content as db1041 [10:15:53] ok [10:15:59] that is what I wanted to know [10:16:22] so "I will decom db1041 because we can use db1062 as a reference/comparison" [10:16:30] is that the summary? [10:16:36] yes [10:16:43] we stil have db1062 with the same data [10:17:08] the thing is, that is not what you had in the last comment [10:17:15] so it was confusing me [10:17:31] do you see why I was confused? [10:17:52] yeah, i could see why. I am not but because I wrote it :p [10:17:54] Let me ammend it [10:18:09] add it to the summary [10:18:15] comments get lost [10:18:29] ok [10:18:37] "after doing X, I reached conclusion Y so I am going to do Z" [10:20:40] ok [10:21:06] I would ask the same for the s3/db1015 task [10:21:20] yes, i was thinking about that too [10:21:31] it is not clear to me what is the follow up [10:21:38] even if it is "let's do nothing" [10:21:43] :-) [10:23:36] I assume it will be something like "pending work, fix tables on comment #234324 for all hosts" [10:27:49] not necesarilly "*I* am going to do", I was in fact offering help with this [10:28:10] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3544067 (10Marostegui) [10:28:17] Let me know if this is more clear [10:28:24] so my proposal, which I hope is correct based on your comments [10:28:32] would be to pool db1069 copied from db1033 [10:29:26] and pool it as a substite of db1028's role, decomming 41, 33 and 28 [10:29:39] that makes total sense [10:29:40] but only if no data can be lost with that [10:29:49] which I am not 100% sure [10:31:08] later, pool a new host to substitute db1034 role [10:31:16] not all of this has to happen soon [10:31:30] only the minimum enough to solve the disk issues [10:31:32] right now in danger we have db1041 and db1028 only yes [10:32:06] so basically, I was asking you your opinion on what should be the plan of action, knowing you knew more about the state of those :-) [10:33:06] I think that is the right plan [10:33:14] Because we have the same data on db1062, which will remain [10:33:47] and db1033's data is the same as db1028 as I checked it for the issues that differ between db1041 and db1079 [10:34:00] so that means db1062 will be pending to "fix", assuming the other are the right ones? [10:34:03] however, db1033's other's data has not been checksumed against the rest [10:34:07] yes [10:34:27] I would like to have someone evaluate if it could be the other way round [10:34:40] the tag extension owner or something [10:34:44] yeah [10:34:49] that could be the other posibility [10:34:55] in any case, we do have both versions of the data [10:35:01] my fear basically is to not lose any data [10:35:05] yes [10:35:16] yeah, that is my fear too [10:35:34] the problem with decommision before fix [10:35:54] is that as replication continues, it gets not trivial to check for differences [10:36:00] yep, :( [10:36:22] the fact that the same tables appear again and again [10:36:32] probably means that ther writes are not statement-safe [10:36:55] archive, tag* and the others [10:36:59] yeah, i wonder how that drift was made in the first place [10:37:01] maybe a crash? [10:37:13] could be a crash, or what i say [10:37:23] unsafe statements [10:37:32] some of those were fixed AFAIK [10:37:42] the insert select on archive [10:37:55] but maybe old row problems are still there [10:38:33] that is why I would like to involve some developers [10:38:37] to check it is no longer ongoing [10:38:51] yeah [10:39:00] makes sense to include others on these issues [10:41:21] ^the s7 edit is a perfect, not confusing summary, thanks [10:41:49] :) [10:46:30] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3544200 (10Marostegui) Hi, I have confirmed that the bug is fixed on 10.0.32: First attempt, forcing the INPLACE works: ``` root@PRODUCTION s4 slave[commo... [11:59:07] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3544251 (10Marostegui) [12:03:04] 10DBA, 10Operations, 10ops-eqiad: Decommission db1041 - https://phabricator.wikimedia.org/T173915#3544266 (10Marostegui) [12:05:24] 10DBA, 10Operations, 10ops-eqiad: Decommission db1041 - https://phabricator.wikimedia.org/T173915#3544279 (10Marostegui) [12:17:36] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission db1041 - https://phabricator.wikimedia.org/T173915#3544324 (10Marostegui) [12:57:27] nice writeup on the backups guys [12:57:55] i've left some comments and questions about some things I thought were still somewhat unclear [13:00:32] thanks [13:00:41] I will have a look at them [13:02:15] I am replying and "resolving" some of those now mark, thanks! [13:02:22] ok [13:12:58] marostegui: i'm asking those questions since I think the text should be clarified, not because I want the answer in comments :) [13:13:17] so e.g. "what is an event' -> write "MySQL event" in the text [13:13:21] Aaah ok .) [13:13:21] otherwise the context is unclear [13:13:22] :) [13:13:25] sure! [13:13:39] I fixed a few of those on the text directly, but others I wasn't aware that that is what you were after [13:13:42] roger! [13:14:26] and for example "hot backups" are defined in the terminology, but "hot snapshot" COULD potentially mean something else in the context where it's used [13:14:32] so best be a bit more verbose about it there too [13:32:06] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission db1041 - https://phabricator.wikimedia.org/T173915#3544468 (10Marostegui) [13:59:50] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3544611 (10Reedy) `flaggedrevs.fr_user` is an int. So it's a user id. For a rename, we don't touch this, just any... [14:00:46] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3544614 (10Marostegui) @MarcoAurelio See: T172207#3544611 Looks like we can proceed [14:01:10] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3544615 (10Marostegui) Thanks a lot @Reedy for your explanation! :-) [14:59:32] 10DBA, 10Operations, 10ops-eqiad: RAID crashed on db1078 - https://phabricator.wikimedia.org/T173365#3545014 (10Cmjohnson) @Marostegui The ssd has been replaced. Please resolve after rebuild [15:09:01] 10DBA, 10Operations, 10ops-eqiad: RAID crashed on db1078 - https://phabricator.wikimedia.org/T173365#3545079 (10jcrespo) Thank you, Chris! @RobH @Cmjohnson if you are ok with that, //with less priority//, we would like some disk degradation testing at some point in the future. [16:10:11] 10DBA, 10MediaWiki-Watchlist, 10Wikimedia-General-or-Unknown, 10Wikimedia-log-errors: User with 40000 entries in their Watchlist cannot access it on Commons anymore: Database error - https://phabricator.wikimedia.org/T171898#3545390 (10Strainu) It was also reported on rowiki for users with as few as 2K fol... [16:20:27] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3545400 (10kaldari) >if possible try to avoid assigning it directly to Jaime I assigned it to Jaime because the documentation said the task should be "owned by #DBA", whi... [16:43:14] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3545571 (10jcrespo) > It only records IPs in cases where a logged out user makes an edit, which should be public data I trust you, I just want explicit #security ok, we... [17:26:23] 10DBA, 10Patch-For-Review, 10Wiki-Setup (Create): Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3545824 (10Krinkle) [17:26:54] 10DBA, 10Patch-For-Review, 10Wiki-Setup (Create): Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#3545838 (10Krinkle) [17:29:28] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3545916 (10Steinsplitter) >>! In T173859#3544614, @Marostegui wrote: > @MarcoAurelio See: T172207#3544611 > Looks like we can proceed Yepp, Like last ti... [17:41:34] 10DBA, 10Operations, 10Wikimedia-Site-requests: script & docs to rename wiki databases - https://phabricator.wikimedia.org/T83609#3546125 (10Krinkle) [17:41:53] 10DBA, 10Operations, 10Wikimedia-Site-requests: script & docs to rename wiki databases - https://phabricator.wikimedia.org/T83609#916322 (10Krinkle) [17:54:18] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546336 (10Bawolff) > Should the data be made available on the labs replicas and/or dumps: Yes, nothing in the table is private data The table should not be in dumps - i... [17:55:41] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546376 (10Bawolff) [17:56:29] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3546392 (10MarcoAurelio) [17:56:32] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename supervision request: Papa1234 → Karl-Heinz Jansen - https://phabricator.wikimedia.org/T173859#3546393 (10MarcoAurelio) [17:56:47] 10DBA, 10MediaWiki-extensions-FlaggedRevs, 10MediaWiki-extensions-UserMerge, 10Patch-For-Review, 10Schema-change: flaggedrevs.fr_user is unindexed - https://phabricator.wikimedia.org/T172207#3490037 (10MarcoAurelio) Yep, thanks @Reedy. [17:59:16] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546429 (10kaldari) >The table should not be in dumps - it could contain revision deleted IPs. @Bawolff: How are revision deleted IPs currently dealt with in dumps? [18:04:58] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546484 (10Bawolff) >>! In T173891#3546429, @kaldari wrote: >>The table should not be in dumps - it could contain revision deleted IPs. > @Bawolff: How are revision delet... [18:13:01] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546531 (10jcrespo) Yes, mediawiki is used for dumping, not sql, as a precaution. @kaldari - this means that you should not close this ticket (as we dbas and cloud have... [19:29:13] 10DBA, 10Community-Tech, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3546800 (10kaldari) Ran `foreachwiki sql.php /srv/mediawiki/php/maintenance/archives/patch-ip_changes.sql`. Could someone verify that the new `ip_changes` table is live i...