[00:30:30] my question is, does the 320px thumbnail is always created automatically irrespective of aspect ratio or other things? (re @bd808: https://wikitech.wikimedia.org/wiki/Robot_policy#Media_API_rules) [07:32:29] Because when I tried a random image with 320px thumb, it did not trigger cache. Rather the same cache header as with some custom thumbnail [08:03:33] hey, do you know if it would be feasible to implement anubis for a tool on toolforge since it is getting spammed by bots? [08:15:33] Guest57: If you mean https://github.com/TecharoHQ/anubis, the license is ok, so should be fine to use as long as it does not send/store any personal info (a quick look at https://anubis.techaro.lol/docs/design/how-anubis-works seems to point that way) [08:16:05] yeah but I heard that the toolforge proxies requests and doesn't provide an IP, which anubis needs [08:21:36] if it's needed then it will not work no, if it's optional then it will [08:23:29] Sorry if it's an obvious one - but my database host is down: 7ot336yvrek.svc.trove.eqiad1.wikimedia.cloud [08:23:30] How can I bring it back up? Should I install & use openstack or something? [08:30:41] yochayco: you can use https://horizon.wikimedia.org/ [08:36:44] Yes I'm there, can't see a "restart" button for my db host : https://tools-static.wmflabs.org/bridgebot/e57a1f47/file_71122.jpg [08:39:56] @yochayco what is your project name? [08:40:17] glamwikidashboard [08:41:22] hmm I also cannot see a restart button [08:41:25] let me check what's going on [08:43:30] I'm rebooting the underlying VM, let's see if it helps [08:45:43] FATAL: could not write to file "pg_wal/xlogtemp.13": No space left on device [08:45:47] I think the disk is full :) [08:46:19] Sounds great 😅 [08:47:08] Can you temporarily resize it? I'll see what I can do to reduce the size. Probably should clear some logs [08:48:44] I'm trying to see if I can delete some postgres logs, they're taking 50% of the space [08:49:09] Exactly my thoughts. Many thanks [08:49:25] I think Trove might have a bad default for log retention [08:50:03] Agree. I will try to configure it to limit them in the future. [08:52:14] looks like it's only 10 days of logs, I'm surprised they take so much space [08:52:30] huh [08:55:34] See if you can leave me the more recent half of the logs, I'll try to understand why it happens [08:58:44] https://docs.openstack.org/trove/latest/admin/database_management.html [08:59:08] "if the size is greater than half of the data volume size" < that matches what I'm seeing [08:59:28] which means that by default your actual db can only grow to 50% of the disk space, because 50% is occupied by the logs [09:02:41] Good point [09:05:19] there was a similar issue last year T355138 [09:05:19] T355138: Rescue DBapp trove instance in glamwikidashboard project - https://phabricator.wikimedia.org/T355138 [09:05:33] I'm gonna open a new task to track the current one [09:08:34] T396724 [09:08:35] T396724: [trove] Disk full for DBapp instance in glamwikidashboard project - https://phabricator.wikimedia.org/T396724 [10:04:34] @yochayco the db seems to work now, can you try if it works for you as well? [10:14:29] The app is back online :) The db is working. Now I hope that we can prevent it from happening again. See my comment in the conversation in the task. [10:36:45] !log tools rebooting tools-prometheus-8 due to the VM having load issues (not responding to ssh) [10:36:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:02:13] !log tools.yapperbot Remove frs runfile per Sohom's request to try to unbreak the bot [13:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.yapperbot/SAL [13:23:46] There is a difference between an image being found in the CDN edge cache and existing in the Swift storage engine. I don’t know that the CDN cache state headers will tell you anything about the Swift storage state. (re @nokibsarkar: Because when I tried a random image with 320px thumb, it did not trigger cache. Rather the same cache header as with some custom...) [13:47:07] ``` [13:47:08] x-cache: cp5031 miss, cp5031 hit/1 [13:47:09] x-cache-status: hit-front [13:47:11] server-timing: cache;desc="hit-front", host;desc="cp5031" [13:47:12] server: envoy``` [13:47:14] [13:47:15] - What does it mean? [13:54:43] I think that means it did hit the cache, by the desc, maybe the repeated cp5031 means that it missed one level of the cache or similar, https://wikitech.wikimedia.org/wiki/CDN#Headers does not seem to explain the muiltple value x-cache (did just a quick read) [13:55:30] it actually does xd [13:55:42] """A comma-separated list of cache hostnames with information such as hit/miss status for each entry. This header is read right-to-left: The rightmost is the outermost cache and further entries to the left progress deeper towards the application layer. The rightmost cache is the in-memory cache while all others are disk caches. In case of cache hit, the number of times the object has been returned is also specified. Once "hit" is [13:55:42] encountered while reading right to left, everything to the left of "hit" is part of the cached object that got hit. It's whether the entries to the left missed, passed, or hit when that object was first pulled into the hitting cache.""" [13:57:35] from that I'm guessing that there's two instances of varnish in cp5031? One had the object, the other did not [15:01:44] The edge cache has two tiers. The first is an in-memory cache managed by Varnish. The second is an on-disk cache managed by Apache Traffic Server. A response like `x-cache: cp5031 miss, cp5031 hit/1` means that the object was not found in the Varnish in-memory cache, but was found in the ATS on-disk cache. [17:58:10] I read it the opposite way bd808. dcaro docs say read right to left. [17:59:42] ATS disk miss on some earlier request that got it stored in varnish memory. then later for the headers we saw above it's varnish memory hit and didn't touch ATS at all.