[05:00:13] mutante: fyi Amir will be out until tuesday :) [08:38:13] FIRING: SystemdUnitFailed: mariadb.service on pc1017:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:38:43] Emperor: o/ [08:39:09] I am checking something on the ms-be2083's BMC Web UI storage [08:39:31] the Physical view shows all the disks, but only 12 of them are in JBOD mode [08:39:45] the rest seems to be in unconfigured good, like they were hot spare [08:42:43] if I force jbod the tasks seems to work, but the state doesn't change [08:45:28] 12 [08:45:30] err :) [08:46:42] that's rather worrying :( [08:48:13] RESOLVED: SystemdUnitFailed: mariadb.service on pc1017:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:48:50] Emperor: https://phabricator.wikimedia.org/T371400#10257924 [08:52:27] trying a chassis reset to see if anything changes [08:56:10] nope [09:00:38] not sure at what point it's worth going back to the vendor [09:02:09] I think we can wait for dcops to give us their assessment, maybe there is an explanation for it and/or a quick way to fix it. If nothing comes up I'd be in favor to escalate to the vendor and ask for more info [09:24:03] 'k [12:41:58] FIRING: [3x] SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.32.service on moss-be2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:46:58] FIRING: [3x] SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.32.service on moss-be2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:48:13] RESOLVED: [3x] SystemdUnitFailed: ceph-59ea825c-2a67-11ef-9c1c-bc97e1bbace4@osd.32.service on moss-be2003:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:39:11] Emperor: re: T377827, it's almost like using a bunch of rsync'd sqlite databases isn't a great idea [13:39:11] T377827: Disk near-full warnings on ms swift backends for container filesystems due to some bloated sqlite files - https://phabricator.wikimedia.org/T377827 [13:40:41] it's not ideal. Do you have any feelings over "try vacuum on over-big db" vs "delete and let it rsync" ? [13:42:31] the former being, shutdown-vacuum-startup? [13:43:42] yes [13:43:50] I guess the concern there being that swift might not gracefully handle having it's databases moved out from under it? [13:44:48] it _shouldn't_ matter, as the database is the same afterwards as before; but I would stop swift while I did it in case it doesn't obey write locks in sqlite or something [13:44:55] delete-and-rsync seems like the safest, but if we were ever going to operationalize this, then vacuum might be better [13:44:56] (it's probably overly paranoid of me) [13:45:13] it doesn't sound overly paranoid to me [13:45:25] I wouldn't take it for granted [13:45:31] Mmm. [13:46:20] shall I try a stop-puppet/stop swift/vacuum/restart and see how it goes? If anything smells funny we can always delete instead [13:47:03] sure [13:48:16] OK. [13:50:30] I was trying to think of how to (safely) test doing a vacuum on a running process [13:51:48] I think you're right though, it _should_ handle it gracefully, it seems like you're supposed to be able to do this. [13:52:04] Mmm [13:55:04] many years ago I sat in a conference room and took exception with the idea that "a bunch of sqlite databases" would be "clever". As much as I like to say "I told you so", I'm not accustomed to waiting this long. [13:56:03] what is the etiquette for something like this? is it acceptable to reach out to someone you haven't talked to in 10+ years just to rub something in their face? [13:56:26] the ring structure also assumes all container dbs are ~ the same size, which is how we end up with widely varying disk usage on the SSDs, because only the 256 thumb dbs are huge, and they're not evenly spread across the nodes [13:57:18] hey, I like sqlite [13:57:36] whether it is suitable for a use case or not it's something else ... [13:59:12] assuming all container dbs would similar in size sounds like a pretty faulty assumption [13:59:55] OK, vacuum done, swift restarted [14:02:54] Emperor: not to try to pivot away from the topic at hand, but maybe we should (simultaneously) be working to raise the priority of an alternative thumb caching strategy [14:03:07] urandom: funny you should say that [14:05:52] I spoke to traffic the other week, and we came up with some options, and sent them to kwaku.ofori for input. The upshot of which is that the next steps are a) look at doing some thumb deletion as a stopgap b) talk to h.nowlan about thumbor capacity (probably also w.r.t "cache" capacity) c) get traffic to look at costs of a dedicated thumb-cache service (inspired by the figures from the previous) [14:07:39] that sounds reasonable. [14:07:55] there's obviously non-zero complexity in a) [14:08:02] yes... was about to say [14:08:22] the only one there that will move the needle in the next months+ is (a) [14:08:56] (b) is fairly long term, and (c) is almost...dubious [14:09:39] maybe dubious isn't fair, but it's certainly not something to make plans around [14:10:39] Emperor: so is (a) basically implementing the LRU thing? A sidecar to cull infrequently accessed thumbs? [14:11:53] that sounds like something fun to work on, even if it's not something we should have to be doing :) [14:13:03] No, I think more "delete x% of "old" thumbnails as roughly a one-off" [14:13:31] where "old" is a determined by when they were uploaded? [14:14:23] (I guess it'd almost have to be...) [14:15:16] yep [14:15:18] Which of course is where (b) comes in, because that will be indiscriminate in terms of how hot a thumb is [14:15:34] Yep [14:16:21] But if we e.g. delete all thumbs over a year old, then any that do get regenerated won't get hit again. [14:16:38] Ya [14:17:01] [of course, once we're done with all that deleting, we'll probably have to do a bunch of VACUUM...] [14:17:14] yeah, I was about to say [14:18:05] All of that sounds reasonable so long as (c) doesn't take "years" [14:19:23] meaning, "periodic adhoc deletes as a reaction to utilization" is a good strategy as long as we're not doing it in perpetuity [14:20:32] Yeah, so b) is partly to explore the tradeoff between thumbor capacity and cache size (one end of which is "how much thumbor capacity would we need if we didn't cache thumbs beyond the CDN at all, and how much would that cost?") [14:21:18] ok [14:22:10] what sort of capacity questions would you have about thumbor? happy to help! [14:22:25] (I assume this isn't related to the long-running issue relating to changing thumb sizes) [14:22:55] yeah, we are definitely very far on the spectrum of minimizing thumbor capacity by caching all the things, lots of room to push that slider over [14:23:16] hnowlan: it's quantity of thumbs, generally [14:23:32] specifically, it's being driven by the number of thumbs in a container, etc [14:23:53] I mean, that's at least what spawned this particular discussion [14:24:07] so currently thumbor currently already uses a fairly significant amount of capacity [14:24:11] in terms of k8s [14:24:21] T377827 (as an example of the problems many thumbs creates) [14:24:22] hnowlan: hey. I was/will send you a more formal email, but. Basically, it's a question of how many thumbs/second thumbor can currently do without pain, and how much extra capacity we would need to scale that up [14:24:23] we'd be looking at rewriting parts or all of it rather than just adding more replicas [14:24:31] T377827: Disk near-full warnings on ms swift backends for container filesystems due to some bloated sqlite files - https://phabricator.wikimedia.org/T377827 [14:24:34] (which *needs to happen* regardless of this converstaion) [14:25:14] but currently there is no real dev support and prioritising that has not gone well despite my best efforts [14:25:38] much as I would love to do a rewrite myself it's not sustainable [14:25:54] however, we could use this requirement to try to drive this actually getting taken seriously [14:25:55] hnowlan: a rewrite of thumbor? [14:25:58] my fantasy graphic is a curve of "cache size" against "thumbor capacity" [14:26:34] urandom: either partial or complete. worst case scenario a good bit of work on rearchitecting it [14:26:45] err worst case as in best effort [14:27:24] hnowlan: would any of that rearchitecting also provide an opportunity to eliminate the swift middleware? [14:27:55] urandom: It'd be necessary to improve performance and throughput, which would be mandatory if we're eliminating middleware [14:27:55] urandom: one of the secret benefits of a dedicated thumbnail cache is we could stop doing stupid stuff in a custom 404 handler [14:28:05] i.e. to get a place where we could send thumbnail requests to thumbor, and have it use swift as caching. [14:28:09] by eliminating the middleware you mean still using swift for backing but getting rid of the handler? [14:28:14] ah [14:28:26] not to make this a hobby horse, but we need to get fundamental commitments about support for thumbor from the foundation before we do this stuff [14:28:35] [or our hypothetical thumbnail caching service] [14:29:09] Emperor: right, I'm still reluctant to hold my breath on that [14:29:12] :) [14:29:17] * urandom forgot the smiley face [14:29:57] hnowlan: by commitment, you mean an owner outside of SRE? [14:30:04] yep, at least for codebase changes [14:30:32] currently there are no official maintainers and one unofficial maintainer (and it is this guy) [14:31:10] hnowlan: awww, don't say that! you're very official to me! [14:31:21] * urandom forgot the smiley face again [14:31:23] :) [14:33:27] :D [17:15:36] arnaudb: oh man, he is in Oman, right ACK :)