[08:25:40] marostegui: do you run it with sudo? with PYTHONPATH? [08:28:34] I guess you are using a different version of the auto_schema repo . I rolled back and I was able to start on codfw on s8 [09:39:15] federico3: good, remember to check the database maintenance map before starting on a new section, or simply ping me. My schema change was just started in s7 [09:39:59] I looked at the map and saw only es* at the time [09:41:17] Yes, and s3 from yesterday [10:06:14] is this expected? https://grafana.wikimedia.org/goto/6HV_JXUHg?orgId=1 [10:06:53] only on those sections? [10:07:04] definitely not from our side, as we don't touch them since they were set up [10:07:23] 21:04 cjming@deploy1003: Finished scap sync-world: Backport for InstrumentConfigsFetcher: Make updating configs asynchronous (T398422) (duration: 46m 25s) [10:07:23] T398422: MetricsPlatform: InstrumentConfigFetcher: Make fetching asynchronous - https://phabricator.wikimedia.org/T398422 [10:07:33] Matches pretty much [10:09:27] there was an increase on es1/es2 and s3, but only ms* kept high after 2 hours [10:10:17] I have commented [10:10:22] https://phabricator.wikimedia.org/T398422#11023104 [12:33:23] marostegui: I added testing for the bug in auto schema and wrote https://gitlab.wikimedia.org/repos/data_persistence/dbtools/auto_schema/-/merge_requests/10#note_153617 - the PR should be passing tests, do we want to wait for Amir to be back to progress on it? [15:23:33] jynus: I don't know if you're still working, but: I was trying to follow https://wikitech.wikimedia.org/wiki/Media_storage/Backups#Querying_files and logged into ms-backup{1,2}003 to run sudo -u mediabackup query-media-file and I get the error 'sudo: unknown user mediabackup' \n 'sudo: error initializing audit plugin sudoers_audit'. Is this expected? [15:23:54] thats new [15:24:17] would you like me to open a ticket? [15:24:21] and doesn't look like a code issue, but a sudo puppet issue [15:24:36] wait, ms-backup{1,2}003 ? [15:24:51] I never setup such a hosts [15:25:22] yes, I logged into each host - the instructions say to use a ms-backup* host, so I did ssh ms-backup1[TAB] and picked the lowest number (likewise in codfw) [15:25:47] backup2003 is a External Storage database backups (backup::es) [15:25:49] says the motd [15:25:55] no no [15:26:09] backup2003 != ms-backup[12]00[12] [15:26:12] oh, ms-backup not backup, doh [15:26:20] sorry, brain not working evidently [15:26:36] you got a sudo error because the user doesn't exist there [15:26:49] but neither the script or config or anything [15:29:09] the number is always confusing [15:29:13] the key was the motd [15:29:17] "ms-backup1001 is a Media backups worker server (mediabackup::worker)" [15:31:13] can I trouble you with one more stupid question? I'm looking at https://phabricator.wikimedia.org/T399892 and I'd like to see what if any backups we have for the original file (i.e. the 2016 version of https://en.wikipedia.org/wiki/File:Der_Schatz_(1923).jpg ) I tried looking for wikipedia-en-local-public.7c 'archive/7/7c/20170515005919!Der_Schatz_(1923).jpg' [i.e. using container name and full path] but get no hits [15:31:19] sure [15:31:44] though swift stat on that path gets me an upload date of 14 Jul 2025, which is not very promising [15:32:02] as I am guessing a few minutes won't change much, let me finish something I have in flight and I will have a look [15:32:15] I have a db down [15:32:17] that's entirely fine, thanks (it can wait 'til tomorrow obviously too) [15:39:18] I see 2 versions of the image [15:40:23] you do? I can only see 1 (the current 365x273 version) [15:40:44] but they point to the same image (they generate actual new hashes from the file, so they were the same when they ran) [15:41:08] let me share the metainfo [15:41:11] on the ticket? [15:41:20] oh, right, yes I see the output on ms-backup1001 looking at the original [15:42:01] if the overwrite happened before the first backup, I won't be able to help [15:42:04] jynus: please do. AFAICT, the upload date on the ones in backups are in 2017, which corresponds with the over-write; but I don't think we've backed up the archived version at all [15:42:28] we have, but it may have been post-overwrite [15:42:39] I belive the backups started in 2019 [15:42:46] how do I find the archived version in the backups? [15:42:48] I can still check the backup archive [15:43:00] but it is the same file [15:43:05] see the sha256 hash [15:43:19] that'd be helpful (swift stat wikipedia-en-local-public.7c 'archive/7/7c/20170515005919!Der_Schatz_(1923).jpg' shows me an object modified 2025-07-14 ) [15:43:21] there is a non-publicly exposed file_history [15:43:47] sorry, I think we are not understanding each other, let's sync [15:44:15] when did the overwrite happened, 2025? [15:44:20] (supposedly) [15:45:30] No. Supposedly the "big" version was uploaded in 2016, a bot replaced it with a "small" version in 2017, and then a reversion was attempted in 2025 [15:46:04] yes, but what I mean is a backup happened on 2021 and 2025 [15:46:18] and it was the same file [15:46:53] do you suspect there could be an older backup with the original file? [15:47:08] that substituted the 2025 backup version? [15:47:20] in that case I can check the logs and the file_history [15:48:20] AIUI, when the bot uploaded the second / "small" version, the first / "big" version should have been moved to an archive location. I was wondering if we had a backup of that. [15:48:55] ...but something confusing has obviously gone on with this object [15:48:59] yep, my question is when that do you suspect happen (the wrong overwrite)? [15:49:10] *that happened [15:49:23] even if it is a theory, that would help finding it [15:49:42] because if it is in 2016, I am sure we won't have it [15:49:56] One theory would be back in 2017 (when obviously we won't have a backup), but I'm wondering given the timestamp on the archive object (2025-07-14) whether a botched undo attempt accidentally over-wrote it then. [15:50:10] ok, that is what I wasn't understanding [15:50:18] if it is the first case, I won't be able to help [15:50:25] I may be for the second [15:50:27] first> understood [15:50:40] as first backup I think was 2019 [15:51:20] The WP history says 23:26 13 July 2025 [maybe in my local timezone?] which is a bit before the archive object timestamp of Mon, 14 Jul 2025 08:17:29 (eqiad) or 08:17:30 (codfw) [15:51:33] the codfw backup ran on 2022-01-11 17:48:02 [15:51:38] and by then there was only 1 frile [15:51:50] *file [15:52:23] So at that point in time the object wikipedia-en-local-public.7c 'archive/7/7c/20170515005919!Der_Schatz_(1923).jpg' didn't exist? [15:52:37] at least not on codfw [15:53:17] it is also possible that it exists but I haven't backed up yet for the 2025 one [15:53:34] I do real time for new uploads, but only every few months for other changes [15:53:51] mmm [15:53:53] interesting [15:54:04] on logs: [2022-01-11 17:03:27,449] INFO:backup enwiki Der_Schatz_(1923).jpg 267b0307902617adc7d10d5122022cd891944c43 [15:54:11] [2022-01-11 17:08:18,948] INFO:backup enwiki Der_Schatz_(1923).jpg 02fecd50d1ce2ba4c11eca41df94605427950163 [15:54:33] ah, that second one is a different sha [15:54:40] so maybe the backup got overwrite (which is ok, because I don't delete backups, just metadata will be not direct) [15:54:44] this is what I think it happened [15:54:59] as metadata was in an inconsistent state, the backup process did what it thought best [15:55:12] do you think you could fish out that second object from the backups, then, please? [15:55:15] the good news is I never delete backups (except the cases you are aware) [15:55:17] yep [15:55:22] it looks likely at this point [15:55:34] Cool, that'd be great, since it's presumably the 640x480 that is wanted here [15:55:54] [2024-04-02 14:19:48,392] INFO:backup File enwiki Der_Schatz_(1923).jpg 02fecd50d1ce2ba4c11eca41df94605427950163 2016-01-23 17:01:22 was updated correctly and its old metadata moved to history [15:56:01] ^ this would confirm my theory [15:56:17] so it won't appear in the easy interface [15:56:31] but I also don't delate metadata either, I just archive preciselly for this cases [15:56:41] where mediawiki gives garbage [15:56:55] but I don't have the easy process for garbaga metadata, I need to do it more manually [15:57:02] Right; can you upload it to the phab ticket, and then someone with suitable commons-fu can upload it as a "new" version, please? That's super helpful (even if we will probably never know how the metadata got garbled) [15:57:09] manual> sure [15:57:17] yep, but let me first make sure I have it [15:57:47] but then the overwrite would have happen in 2024, which is even scarier [15:57:47] :) [15:58:00] because I belive there was no activity then? [15:58:33] there was an edit made in September 2024, but in theory only to the image category [15:58:46] 2024-04-02 [15:59:00] No obvious-to-me changes made then [15:59:01] so it was before April [15:59:16] that's of course the backup time, it could had happen a few months before [16:01:14] it's __possible__ it was in an inconsistent state that swiftrepl didn't notice, and then when we started using rclone instead in 2023 that made it consistent-but-wrong [16:01:42] at some point we could consider backing up swift too [16:01:55] and we still have pending designing a strategy for the mis cluster btw [16:02:05] or I have, with your help [16:02:40] YM the apus ceph cluster? Yes, I am happy to answer questions about that (though probably not right this minute :) ) [16:02:47] ofc [16:03:30] could you help me double check the sha1 of the image that's available? [16:04:05] ack, two ticks [16:04:23] I think 267b0307902617adc7d10d5122022cd891944c43 is the one we have [16:04:34] and 02fecd50d1ce2ba4c11eca41df94605427950163 the one missing [16:05:48] I think currently-available is 267b0307902617adc7d10d5122022cd891944c43 [16:05:56] so yes, I agree with you [16:06:51] sadly I don't see a backup file entry with the other sha1 [16:06:57] let me try codfw [16:12:25] 🤞 [16:12:53] marostegui: can I start a schema change on s1 and/or s6? [16:15:47] Emperor: let me know if wikipedia-en-local-deleted.0c:0/c/l/0clgbiqubi3cksukl7hzezn37pwm0rn.jpg tells you something [16:17:16] ack [16:17:30] the problem is the database says it was in state "backed up", but there is no registry of that [16:18:12] jynus: swift stat wikipedia-en-local-deleted.0c 0/c/l/0clgbiqubi3cksukl7hzezn37pwm0rn.jpg returns me 404 in both DCs [16:18:57] sorry :( [16:22:57] I think I understood [16:23:32] when it was recovered with the same name, it updated the entry and inherited the properties, including the backup status [16:23:56] but as it was technically the same name, it was just archived [16:24:10] So I don't think I have it [16:24:38] I will try to have a look in other way, but without having its sha256, it will be difficult [16:25:24] maybe I can search it by size [16:25:43] OK, thanks. Shall I update the ticket, or do you want to keep looking a bit longer first? [16:25:52] give me some time [16:26:03] I can leave it for tomorrow if you have to go [16:26:12] but I want to try a bit more just in case [16:26:32] please go ahead (and thank you) [16:27:44] when the original metadata is bad, it is hard to get good data out [16:36:12] so what happened is that when the file was undeleted, the metadata was updated but the state of the backup was kept as backed up [16:36:19] I don't think we ever had a backup of that [16:36:37] and potentially it was hard-deleted: https://web.archive.org/web/20241213125007/https://en.wikipedia.org/wiki/File:Der_Schatz_(1923).jpg [16:36:55] for copyright reasons [16:38:33] I am going to try to do a last effort search on storage by size, which will take some time [16:38:49] but before probably the logs will tell me we haven't backed up such a file [16:39:32] OK, thanks for trying [16:39:40] as the hash was overwritten becuse the original wasn't available [16:40:05] I just read the metadata from the db [16:41:12] when it says "Calculated (267b0307902617adc7d10d5122022cd891944c43) and queried (02fecd50d1ce2ba4c11eca41df94605427950163) sha1 checksum are not the same for "Der_Schatz_(1923).jpg"" [16:41:22] it can mean that the file has moved [16:41:31] or that the file doesn't exist anymore [16:42:10] backups when they find that, they still do the backups anyway [16:42:17] but no logs of that [16:43:19] [2021-08-19 16:15:58,139] ERROR:backup Download of "enwiki Der_Schatz_(1923).jpg 02fecd50d1ce2ba4c11eca41df94605427950163" failed [16:43:25] attempted on 2021, failed [16:43:31] which means by then it was gone [16:43:56] so the loss predates the first backup [16:44:25] will do the find anyway [16:44:34] but I think I have all data [16:46:37] Thanks; can you summarise on the ticket? And sorry to put you to so much work for a file that's long gone [16:46:44] yep [16:46:47] doing [16:48:17] TY! [21:41:48] FIRING: PuppetDisabled: Puppet disabled on aqs1012:9100 - https://wikitech.wikimedia.org/wiki/Puppet/Runbooks#Puppet_Disabled - https://grafana.wikimedia.org/d/yOxVDGvWk/puppet?var-cluster=aqs&viewPanel=14 - https://alerts.wikimedia.org/?q=alertname%3DPuppetDisabled