[01:39:46] 10serviceops: Evaluate Locust Stress Test Tool - https://phabricator.wikimedia.org/T254530 (10wkandek) 05Open→03Resolved [07:43:19] hello people [07:43:29] so wtp1025 seems to have its root partition full [07:43:46] most of the space (34G out of 44G) is for /srv/mediawiki [07:45:17] the rest of the wtp servers are using around 70/75% of their root partition [07:47:46] (just depooled it, it was causing some errors) [07:49:54] this is on /srv/mediawiki for wtp1025 [07:49:55] 1.8G php-1.35.0-wmf.24 [07:49:55] 6.6G php-1.35.0-wmf.40 [07:49:55] 10G php-1.36.0-wmf.1 [07:49:56] 11G php-1.35.0-wmf.41 [07:50:06] meanwhile on 1026 [07:50:07] 6.6G php-1.35.0-wmf.40 [07:50:07] 6.6G php-1.35.0-wmf.41 [07:50:08] 6.6G php-1.36.0-wmf.1 [07:52:01] seems that more /cache/ is used on wtp1025, but I am not sure what can be dropepd and what not [07:52:04] any idea? [07:58:41] https://grafana.wikimedia.org/d/000000377/host-overview?panelId=28&fullscreen&orgId=1&var-server=wtp1025&var-datasource=eqiad%20prometheus%2Fops&from=1595528111141&to=1595577493645 [07:58:56] there was a big jump at around 7:12 UTC [08:12:31] 10serviceops, 10Prod-Kubernetes: helm2 version string breaks recent helmfile versions - https://phabricator.wikimedia.org/T258773 (10JMeybohm) p:05Triage→03Medium [08:13:37] all right will open a task [08:18:00] 10serviceops, 10Operations: wtp1025's root partition full - https://phabricator.wikimedia.org/T258775 (10elukey) [08:23:48] elukey, I delete various /srv/mediawiki/php-1.35.0-wmf.41/cache/l10n/upstream/.~tmp~ stuff and it looks ok now [08:23:52] I deleted* [08:24:14] ack, didn't know exactly what was ok to drop :) [08:24:59] I 'll keep it depooled though until scap shows up and does what it does to ensure everything is ok [08:32:52] <_joe_> as I said elsewhere, that server is badly partitioned [08:33:05] <_joe_> it has a 890 GB unused lvs volume [08:38:02] leaving this here, perhaps interesting https://keptn.sh [08:48:42] _joe_ sure but then it is worth to either reimage it or to extend manually the lvs volume no? [08:53:51] 10serviceops, 10Prod-Kubernetes, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Patch-For-Review: "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10akosiaris) >>! In T257679#6302296, @colewhite wrote: >>>! In T257679#6302180, @akosiaris wrote: >> `/proc/sy... [08:54:02] 10serviceops, 10Prod-Kubernetes, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Patch-For-Review: "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10akosiaris) p:05Triage→03Medium [09:22:57] 10serviceops: Deploy proxy gutter pool fuctionality - https://phabricator.wikimedia.org/T258779 (10jijiki) [09:23:27] 10serviceops: Deploy proxy gutter pool fuctionality - https://phabricator.wikimedia.org/T258779 (10jijiki) [09:23:30] 10serviceops, 10Operations, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [09:23:53] 10serviceops: Roll out proxy gutter pool - https://phabricator.wikimedia.org/T258779 (10jijiki) [11:18:16] 10serviceops, 10Prod-Kubernetes: helm2 version string breaks recent helmfile versions - https://phabricator.wikimedia.org/T258773 (10JMeybohm) a:03JMeybohm Looking into this I figured we needed a helm sec update to. So I've imported 2.16.9, fixed the build to use dh_golang and set the correct version numbers. [11:30:09] 10serviceops, 10Prod-Kubernetes, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Patch-For-Review: "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10JMeybohm) > That discrepancy should be chased down. AIUI that limit is calculated by number of CPUs availab... [15:36:24] 10serviceops, 10observability, 10Developer Productivity: Logstash entries from php7-fatal-error.php use level "ERR" instead of "ERROR" - https://phabricator.wikimedia.org/T248181 (10herron) Interesting! It looks like this should already be handled, but is not due to possible bug in our config. From the sou... [17:31:15] 10serviceops, 10Operations: wtp1025's root partition full - https://phabricator.wikimedia.org/T258775 (10wkandek) Back to 76%: php-1.36.0-wmf.1 and .41 are now way smaller. ` wkandek@wtp1025:/srv/mediawiki$ df -k Filesystem 1K-blocks Used Available Use% Mounted on udev 32927420 0 32... [22:46:47] 10serviceops, 10Prod-Kubernetes, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Patch-For-Review: "Failed to fork" errors on kubernetes100[1,3,4] - https://phabricator.wikimedia.org/T257679 (10Mholloway) I haven't yet managed to reproduce the zombie chromium process issue locally, but the approach in...