[01:16:35] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Krinkle) p:05Triage>03Normal
[01:39:38] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Legoktm) Is there a reason this needs to be something separate an...
[02:16:29] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Core-Platform-Team, 10fixcopyright.wikimedia.org, 10Patch-For-Review: Deploy EU copyright stuff to beta cluster - https://phabricator.wikimedia.org/T203299 (10Legoktm)
[02:17:35] <wikibugs>	 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Krinkle) >>! In T203694#4564943, @Legoktm wrote: > Is there a rea...
[02:18:24] <wikibugs>	 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q2, 10Composer, 10Upstream, 10Wikimedia-production-error (Shared Build Failure): Error "TransportException 404 Not Found" in Jenkins jobs using composer - https://phabricator.wikimedia.org/T182266 (10Legoktm) 05Open>03Resolved a:03Krinkle I...
[05:33:42] <wikibugs>	 10Deployments, 10Core-Platform-Team, 10TechCom-RFC, 10I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360 (10Legoktm)
[05:34:29] <wikibugs>	 10Deployments, 10Release-Engineering-Team, 10Core-Platform-Team, 10MediaWiki-extensions-LocalisationUpdate, and 2 others: Localization Cache Redo - https://phabricator.wikimedia.org/T78802 (10Legoktm)
[05:35:06] <wikibugs>	 10Deployments, 10Release-Engineering-Team, 10Core-Platform-Team, 10TechCom-RFC, 10I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360 (10Legoktm)
[05:58:39] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nemo_bis)
[06:05:58] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nemo_bis) p:05Triage>03High
[07:40:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1094 integration-slave-docker-1026 (/: 100%): OFFLINE due to disk space
[07:42:12] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<11.11%)
[07:43:52] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<22.22%)
[07:45:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1095 integration-slave-docker-1026: OFFLINE due to disk space
[07:50:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1096 integration-slave-docker-1026: OFFLINE due to disk space
[07:55:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1097 integration-slave-docker-1026: OFFLINE due to disk space
[07:58:52] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1025 is OK: OK: All targets OK
[08:00:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1098 integration-slave-docker-1026: OFFLINE due to disk space
[08:02:11] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[08:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1099 integration-slave-docker-1026: OFFLINE due to disk space
[08:10:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1100 integration-slave-docker-1026: OFFLINE due to disk space
[08:15:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1101 integration-slave-docker-1026: OFFLINE due to disk space
[08:16:56] <wikibugs>	 10Release-Engineering-Team, 10GitHub-Mirrors, 10Wikidata, 10Composer, and 2 others: wikibase/javascript-api composer package is not installable (mainly due to a repo move) - https://phabricator.wikimedia.org/T203162 (10Addshore) 05Open>03Resolved Fixed by doing what is described in https://github.com/c...
[08:20:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1102 integration-slave-docker-1026: OFFLINE due to disk space
[08:25:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1103 integration-slave-docker-1026: OFFLINE due to disk space
[08:30:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1104 integration-slave-docker-1026: OFFLINE due to disk space
[08:34:15] <wikibugs>	 (03CR) 10Hashar: "TLDR: we can drop the noexec flag" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[08:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1105 integration-slave-docker-1026: OFFLINE due to disk space
[08:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1106 integration-slave-docker-1026: OFFLINE due to disk space
[08:42:21] <wikibugs>	 (03CR) 10Hashar: "> Patch Set 6:" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[08:45:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1107 integration-slave-docker-1026: OFFLINE due to disk space
[08:50:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1108 integration-slave-docker-1026: OFFLINE due to disk space
[08:55:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1109 integration-slave-docker-1026: OFFLINE due to disk space
[08:56:08] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Schnark)
[09:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1110 integration-slave-docker-1026: OFFLINE due to disk space
[09:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1111 integration-slave-docker-1026: OFFLINE due to disk space
[09:10:09] <apergos>	 hashar: we need a deployer for an urgent fix (see -operations); yes, on a Friday.  who might be in this tz with the appropriate rights?
[09:10:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1112 integration-slave-docker-1026: OFFLINE due to disk space
[09:10:31] <hashar>	 apergos: me
[09:13:10] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar)
[09:13:26] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar) T203750 being hotfixed by reverting a patch.  Revert is https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/458743
[09:15:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1113 integration-slave-docker-1026: OFFLINE due to disk space
[09:19:02] <apergos>	 for people's amusement
[09:19:25] <apergos>	 while looking for an easy link which might overflow the language link list on th sidebar I ran into
[09:19:50] <apergos>	 https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Portal:Current_events
[09:20:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1114 integration-slave-docker-1026: OFFLINE due to disk space
[09:25:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1115 integration-slave-docker-1026: OFFLINE due to disk space
[09:30:12] <hashar>	 !log integration-slave-docker-1025 lower number of executors from 5 to 4.  8 CPUS can not sustain 5 concurrent Quibble builds | T201972
[09:30:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1116 integration-slave-docker-1026: OFFLINE due to disk space
[09:30:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[09:30:18] <stashbot>	 T201972: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972
[09:32:10] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-Addshore: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972 (10hashar) integration-slave-docker-1025 had 5 Quibble jobs in parallel and that slowed down the...
[09:35:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1117 integration-slave-docker-1026: OFFLINE due to disk space
[09:40:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1118 integration-slave-docker-1026: OFFLINE due to disk space
[09:45:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1119 integration-slave-docker-1026: OFFLINE due to disk space
[09:49:23] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10Amire80)
[09:50:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1120 integration-slave-docker-1026: OFFLINE due to disk space
[09:55:17] <wmf-insecte>	 maintenance-disconnect-full-disks build 1121 integration-slave-docker-1026: OFFLINE due to disk space
[10:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1122 integration-slave-docker-1026: OFFLINE due to disk space
[10:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1123 integration-slave-docker-1026: OFFLINE due to disk space
[10:08:28] <wikibugs>	 10Project-Admins: User-jijiki personal project request - https://phabricator.wikimedia.org/T203773 (10jijiki)
[10:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1124 integration-slave-docker-1026: OFFLINE due to disk space
[10:15:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1125 integration-slave-docker-1026: OFFLINE due to disk space
[10:20:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1126 integration-slave-docker-1026: OFFLINE due to disk space
[10:25:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1127 integration-slave-docker-1026: OFFLINE due to disk space
[10:27:55] <wikibugs>	 10Project-Admins: User-jijiki personal project request - https://phabricator.wikimedia.org/T203773 (10Aklapper) 05Open>03Resolved a:03Aklapper Requested public project @user-jijiki has been created: https://phabricator.wikimedia.org/project/view/3572/  [[ https://www.mediawiki.org/wiki/Phabricator/Project_...
[10:30:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1128 integration-slave-docker-1026: OFFLINE due to disk space
[10:31:19] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Aklapper) I'd love to have input / an opinion from @harej here. :)
[10:33:23] <wikibugs>	 10Project-Admins: Create a Component project for ScienceSource - https://phabricator.wikimedia.org/T203667 (10Aklapper) 05Open>03Resolved a:03Aklapper Requested public project #ScienceSource has been created: https://phabricator.wikimedia.org/project/view/3573/  Please encourage interested people to visit...
[10:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1129 integration-slave-docker-1026: OFFLINE due to disk space
[10:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1130 integration-slave-docker-1026: OFFLINE due to disk space
[10:45:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1131 integration-slave-docker-1026: OFFLINE due to disk space
[10:48:39] <wikibugs>	 10Release-Engineering-Team: Create production code deployment management process - https://phabricator.wikimedia.org/T203703 (10Aklapper)
[11:03:00] <wikibugs>	 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Aklapper) Also see {T195244} / potential dup?
[11:03:47] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) https://github.com/wikimedia/mediawiki-extensions-OAuth/blob/master/backend/MWOAuth.hooks.php#L132  ``` [W5JaD...
[11:05:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) @Ladsgroup Can this be a consequence of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/OAuth/+/452063/ ?
[11:06:46] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) Probably. Will check in a sec
[11:10:39] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) Yup, It's official. I'm stupid :D how could I miss such thing. Fix it in a second.
[11:54:50] <wikibugs>	 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891 (10Aklapper) > What systems would you see consuming this list? * Supersede m...
[12:20:05] <wikibugs>	 (03PS1) 10Hashar: In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786
[12:23:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%)
[12:25:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1151 integration-slave-docker-1026 (/: 97%): OFFLINE due to disk space
[12:30:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1152 integration-slave-docker-1026: OFFLINE due to disk space
[12:31:44] <wikibugs>	 (03PS2) 10Hashar: In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786
[12:31:50] <wikibugs>	 (03PS1) 10Hashar: Explicitly set Quibble --log-dir [integration/config] - 10https://gerrit.wikimedia.org/r/458788
[12:31:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 (owner: 10Hashar)
[12:32:31] <wikibugs>	 (03CR) 10Hashar: "Seems easier to have log and src under /workspace.  That follows up https://gerrit.wikimedia.org/r/c/integration/quibble/+/451294" [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 (owner: 10Hashar)
[12:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1153 integration-slave-docker-1026: OFFLINE due to disk space
[12:38:11] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[12:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1154 integration-slave-docker-1026: OFFLINE due to disk space
[12:45:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1155 integration-slave-docker-1026: OFFLINE due to disk space
[12:50:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1156 integration-slave-docker-1026: OFFLINE due to disk space
[12:55:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1157 integration-slave-docker-1026: OFFLINE due to disk space
[13:00:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1158 integration-slave-docker-1026: OFFLINE due to disk space
[13:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1159 integration-slave-docker-1026: OFFLINE due to disk space
[13:05:36] <Amir1>	 thcipriani: I highly doubt the lfs is the reason that mirroring is broken. I would try deleting and making these repos again: scoring/ores/ores, scoring/ores/editquality, scoring/ores/draftquality and scoring/ores/wikiclass
[13:05:46] <Amir1>	 because they mirrored from lfs commit
[13:10:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1160 integration-slave-docker-1026: OFFLINE due to disk space
[13:15:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1161 integration-slave-docker-1026: OFFLINE due to disk space
[13:16:13] <pmiazga>	 hey, I have a question regarding hosting static html on wmf infrastructure, Reading-web-team has a prototype that we need to put somewhere live (static html)
[13:16:32] <pmiazga>	 what is the best approach, use the people.wmf? create new instance on wmflabs?
[13:19:20] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes, 10Patch-For-Review, 10User-Ladsgroup: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) a:03Ladsgroup
[13:20:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1162 integration-slave-docker-1026: OFFLINE due to disk space
[13:21:57] <pmiazga>	 greg-g thcipriani ^ any ideas?
[13:22:23] <pmiazga>	 (regarding hosting static html, not maitenance-disconnect)
[13:25:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1163 integration-slave-docker-1026: OFFLINE due to disk space
[13:28:04] <greg-g>	 pmiazga: people.wikimedia is a pretty good choice for a demo/prototype if it's not going to see real users, same with a cloud vps but that's a lot more overhead.
[13:30:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 1164 integration-slave-docker-1026: OFFLINE due to disk space
[13:32:31] <pmiazga>	 greg-g if we want to show that prototype to some real users (ab testing, checking do they like the new approach), then do not use the people.wikimedia?
[13:35:07] <greg-g>	 I guess the question is: what is this and why is it not merged code behind a feature flag deployed to production?
[13:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1165 integration-slave-docker-1026: OFFLINE due to disk space
[13:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1166 integration-slave-docker-1026: OFFLINE due to disk space
[13:40:46] <pmiazga>	 greg-g - too many code changes, style, going through reviews,mergin stuff might take too much time
[13:40:48] <pmiazga>	 https://mobile-contributions.firebaseapp.com/nav3.html
[13:41:18] <pmiazga>	 this is the prototype, we just want to gather user feedback before we start implementing the real thing
[13:42:18] <pmiazga>	 it's prototype, quick and dirty but has a really nice UI
[13:42:50] <pmiazga>	 the link to that prototype will be available on mediawWiki page plus we're going to give it to some small group of users and ask for their opinion
[13:45:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1167 integration-slave-docker-1026: OFFLINE due to disk space
[13:50:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1168 integration-slave-docker-1026: OFFLINE due to disk space
[13:53:12] <greg-g>	 pmiazga: peopledot is probably fine for just user feedback then
[13:53:33] <wikibugs>	 (03PS1) 10Hashar: Stop hardcoding TMPDIR=/tmp [integration/quibble] - 10https://gerrit.wikimedia.org/r/458817
[13:53:35] <wikibugs>	 (03PS1) 10Hashar: Postgres datadir is automatically cleanup [integration/quibble] - 10https://gerrit.wikimedia.org/r/458818
[13:53:54] <pmiazga>	 ok, thx greg-g
[13:55:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1169 integration-slave-docker-1026: OFFLINE due to disk space
[14:00:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1170 integration-slave-docker-1026: OFFLINE due to disk space
[14:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1171 integration-slave-docker-1026: OFFLINE due to disk space
[14:06:25] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) >>! In T203607#4564546, @hashar wrote: > I am definitely a fan of having J...
[14:09:47] <wikibugs>	 (03PS1) 10Hashar: SQLite backend did not call parent __init__ [integration/quibble] - 10https://gerrit.wikimedia.org/r/458820
[14:10:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1172 integration-slave-docker-1026: OFFLINE due to disk space
[14:15:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1173 integration-slave-docker-1026: OFFLINE due to disk space
[14:16:43] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Aklapper) How is this task different from T158360?
[14:20:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1174 integration-slave-docker-1026: OFFLINE due to disk space
[14:34:23] <wikibugs>	 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10Ladsgroup) p:05Triage>03High Triaging this is as high since I couldn't push any change on ores not to beta or prod because of this for weeks now. The weird thing for me is that git lf...
[14:39:33] <wikibugs>	 (03PS1) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826
[14:45:14] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10greg) p:05High>03Triage Given the open RFC another task presupposing an outcome should be "Needs Triage".
[15:08:39] <thcipriani>	 Amir1: I can delete and remake those repos today if that's what you need.
[15:09:32] <Amir1>	 thcipriani: I can't say for sure if it fixes everything but that's the standard procedure in software engineering ("Have you tried turning it off and on again?")
[15:10:14] <thcipriani>	 :)
[15:10:17] <thcipriani>	 fair enough
[15:10:24] <thcipriani>	 I'll give that a shot and ping you when it's done
[15:11:03] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to remove space after not operator - https://phabricator.wikimedia.org/T203799 (10Umherirrender)
[15:12:16] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Umherirrender)
[15:13:18] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Reedy) I note we purposely seem to choose this "pattern" for javascript (I guess for less characters)
[15:14:40] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Umherirrender) >>! In T203800#4566644, @Reedy wrote: > I note we purposely seem to choose this "pattern" for javascript (I guess for less characters)  The minify use it on the fly to sa...
[15:15:38] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Reedy) We use it in non minified code too  ```lines=25 Targets     Occurrences of '!!' in Directory /Users/reedy/PhpstormProjects/mediawiki/core/resources/src Found Occurrences  (67 usa...
[15:17:27] <wikibugs>	 (03PS2) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826
[15:24:39] <James_F>	 [from yesterday] greg-g: Given there's no train next week, can there be a Tuesday "morning" SWAT slot added at 18:00 UTC? (It's currently empty.)
[15:27:58] <wikibugs>	 (03CR) 10Hashar: "Made the database dir relative to the workspace and absolute." [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 (owner: 10Hashar)
[15:29:40] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Harej) I like option b the best, but I think then we would need a new/adjusted name, since "tool" in common parlance does n...
[15:29:45] <greg-g>	 James_F: yeah, sorry, saw that late in my day (I'm in Tennessee right now), let's not, just because that'd be right before the last switchover slot (for that day).
[15:30:29] <James_F>	 OK.
[15:30:50] <greg-g>	 sorry :/
[15:33:55] <wikibugs>	 (03PS2) 10Hashar: Stop hardcoding TMPDIR=/tmp [integration/quibble] - 10https://gerrit.wikimedia.org/r/458817
[15:35:10] <wikibugs>	 (03PS2) 10Hashar: Postgres datadir is automatically cleanup [integration/quibble] - 10https://gerrit.wikimedia.org/r/458818
[15:38:29] <wikibugs>	 (03PS2) 10Hashar: SQLite backend did not call parent __init__ [integration/quibble] - 10https://gerrit.wikimedia.org/r/458820
[15:39:05] <wikibugs>	 (03PS3) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826
[15:39:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 (owner: 10Hashar)
[15:40:18] <wikibugs>	 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), and 2 others: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) The patch is merged but it...
[15:45:09] <wikibugs>	 (03PS4) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826
[15:49:26] <wikibugs>	 (03CR) 10Hashar: "Quibble has the database dir created under the hardcoded /tmp. I have made a few patches to clean that up and let us specify a different d" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm)
[15:50:34] <hasharAway>	 thcipriani: legoktm: I have send a few Quibble patches to let us relocate the database directory from /tmp  to where ever we want (quibble --db-dir).   That should let us mount a tmpfs which is not /tmp  but /workspace/database and then point Quibble at it.
[15:51:07] <hasharAway>	 ref is T203181
[15:51:08] <stashbot>	 T203181: Quibble MariaDB should use a tmpfs as a datadir - https://phabricator.wikimedia.org/T203181
[16:04:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<11.11%)
[16:05:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1195 integration-slave-docker-1026 (/: 96%): OFFLINE due to disk space
[16:06:31] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Aklapper) Thanks! Heh, https://www.mediawiki.org/wiki/Naming_things ... Indeed. `#Technical-solution-request` is too vague...
[16:09:15] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Legoktm) I feel like we already had this discussion before?
[16:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1196 integration-slave-docker-1026: OFFLINE due to disk space
[16:14:20] <wikibugs>	 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Legoktm) Yep, see T134103#2625089 and T134103#2625093. I don't think anything has really changed since then.
[16:14:26] <legoktm>	 so why is it that 1026 keeps getting offlined
[16:15:15] <wmf-insecte>	 maintenance-disconnect-full-disks build 1197 integration-slave-docker-1026: OFFLINE due to disk space
[16:18:17] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) ``` hashar@contint1001:~$ nc 127.0.0.1 25 220 contint1001.wikimedia.org ES...
[16:18:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10thcipriani) Each running quibble container takes up space (sometimes a lot of...
[16:18:58] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) a:03hashar
[16:19:10] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[16:20:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1198 integration-slave-docker-1026: OFFLINE due to disk space
[16:21:33] <thcipriani>	 legoktm: marxarelli and I were just talking bout that -- I think it's because it has lots of executors and so each container downloading stuff is adding to /var/lib/docker/[whatever] as its running. It's recovering because we use --rm to run the containers so when the jobs finish the container layer is deleted.
[16:22:02] <legoktm>	 hmm
[16:22:21] <legoktm>	 would cutting down one executor slot on that host help?
[16:25:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1199 integration-slave-docker-1026: OFFLINE due to disk space
[16:27:32] <marxarelli>	 we could lower the executors on it for a temporary fix. i think we need to solve the disk space issue in general by moving docker containers off of the / partition
[16:28:14] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Legoktm) >>! In T203737#4566405, @Aklapper wrote: > How is this task different from T158360?  That ticket was a vague request to undeploy LU that got retitled into a re-e...
[16:28:20] <marxarelli>	 imho, the best way forward on that is to free up extents on the lvm volume (currently 100% is used for /srv) and configure dockerd to use the device mapper storage driver
[16:29:20] <hasharAway>	 +1
[16:29:28] <hasharAway>	 devicemapper I have no idea how it is to be configured
[16:29:33] <hasharAway>	 seems to be done via puppet
[16:29:38] <marxarelli>	 that would entail changing `profile::labs::lvm::srv` in puppet to specify `size => "50%FREE"` or some such value
[16:30:02] <hasharAway>	 for lvm it would be nice to have /srv/jenkins and /srv/docker on two different partitions
[16:30:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1200 integration-slave-docker-1026: OFFLINE due to disk space
[16:30:21] <marxarelli>	 for already configured instances, we'd have to manually resize the volume, since the puppet module for `labs_lvm::volume` doesn't handle changes to size
[16:30:39] <hasharAway>	 we can dish them out and rebuild them from scratch
[16:30:46] <marxarelli>	 hasharAway: reading the docker docs on the dm storage driver, i don't even think we need a mount
[16:30:55] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) Sounds like a plan!  Will ping you Monday
[16:30:57] <marxarelli>	 we just need to free up extents and point docker at the lvm device itself
[16:31:08] <marxarelli>	 it will handle creation of volumes, snapshots, and such
[16:31:18] <hasharAway>	 okkk
[16:31:24] <hasharAway>	 what I mean is to have three slots:
[16:31:34] <hasharAway>	 1) for the os on /
[16:31:42] <hasharAway>	 2) for Jenkins workspace  (on /srv/jenkins maybe ?)
[16:31:50] <hasharAway>	 3) for Docker (on /srv/docker or devicemapper magic)
[16:32:03] <marxarelli>	 sounds good, but i would just clarify:
[16:32:07] <marxarelli>	 1) os on /
[16:32:19] <marxarelli>	 2) leave /srv on lvm but using fewer extents
[16:32:35] <marxarelli>	 3) docker on raw lvm volume using remaining extents
[16:32:50] <hasharAway>	 +1 :]
[16:33:35] <hasharAway>	 on my side, I will adjust Quibble
[16:34:00] <marxarelli>	 cool
[16:34:01] <hasharAway>	 there are a few low hanging fruit to slightly speed it up
[16:34:10] <hasharAway>	 and gotta deal with that /tmp | mysql on tmpfs thing
[16:34:39] <marxarelli>	 once we have the disk space issue solved, i think we should tune bigmem executors back up a bit, collect/analyze more build duration data
[16:34:58] <marxarelli>	 it was actually really easy to import data from the json api into a google sheet yesterday
[16:35:08] <marxarelli>	 i should have gone that route all along instead of fussing with statsd :)
[16:35:18] <hasharAway>	 just had to sneak a ruby oneliner, haven't you? :]
[16:35:53] <hasharAway>	 oh
[16:36:07] <hasharAway>	 and Quibble could use some metrics by itself  similar to what scap is doing
[16:36:18] <hasharAway>	 it probably just need a few copy paste :]
[16:39:46] <hasharAway>	 ok
[16:39:58] <hasharAway>	 I am too tired at this point to keep digesting the few tickets related to train
[16:40:01] <hasharAway>	 going to grab water
[16:40:07] <hasharAway>	 have dinner and be back later this evening
[16:40:30] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) I don't think package-lock.json wor...
[16:49:43] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nikerabbit) I am very much interested in having working LU (or equivalent) and willing to work on it. But I looked through the RFC and lists maybe one concrete issues and...
[16:52:14] <Krinkle>	 marxarelli: btw, are you collecting time data for pass or pass and fail?
[16:53:39] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10dduvall) Just to reiterate what was talked about in IRC (#wikimedia-releng), o...
[16:53:51] <marxarelli>	 Krinkle: good question :) i'm not sure!
[16:54:32] <marxarelli>	 as of now, the data is likely skewed for different reasons
[16:55:01] <marxarelli>	 and i'm not sure we're getting enough data from just that single job/project/branch
[16:57:15] <marxarelli>	 so i looked at some data for that same job pulled from the json api instead, and filtered for only successful builds during a more narrow time range (one where executors weren't being changed around and job configurations weren't being drastically altered), and it did show an interesting different in performance between node types
[16:57:31] <Krinkle>	 might be best to track pass only, given both infra failures and genuine failures would skew timing in a way that probably isn't useful.
[16:57:46] <Krinkle>	 such as ENOSPC :)
[16:57:50] <marxarelli>	 but i'm not sure i trust that data at the moment either. thcipriani suggested that the disk-space maintenance script might also be skewing results
[16:58:03] <marxarelli>	 Krinkle: totally. makes sense
[16:58:36] <Krinkle>	 Hm.. yeah. It does change the game, although note that it doesn't stop jobs, it just offlines a node with in Jenkins means to not start new jobs. 
[16:59:45] <marxarelli>	 right. but the lower concurrency might allow running builds to execute more efficiently than they would without having to contend with newly scheduled builds
[17:00:08] <Krinkle>	 Ah, I see.
[17:00:11] <Krinkle>	 that's interesting.
[17:00:16] <marxarelli>	 i'm not sure that would make a massive difference, honestly. i'd like to revisit the data once we fix the disk space issue
[17:00:28] <Krinkle>	 Towards being depooled, those last few jobs would have more resources than normal.
[17:00:32] <Krinkle>	 still being rewarded somewhat
[17:00:33] <Krinkle>	 interesting
[17:00:44] <Krinkle>	 yeah
[17:00:49] <marxarelli>	 i can share the google sheet with you if you like
[17:00:59] <Krinkle>	 I can't promise I'll have time ot look at it, but sure :)
[17:01:02] <marxarelli>	 your input would be helpful
[17:01:06] <marxarelli>	 cool :)
[17:09:54] <wikibugs>	 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499 (10MGChecker) Is there any way to change task subtype manually? I think we should document this new future properly on MediaWiki.org really soon. I guess most people don't even know that these feature exists and if/how they can...
[17:42:16] <wikibugs>	 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10greg) Just to be clear: what is being asked here is to resume running l10nupdate on the weekend (which was turned off due to issues with l10nupdate breakage). To speak to...
[17:55:43] <shinken-wm>	 PROBLEM - Puppet errors on deployment-certcentral03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[17:58:53] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<22.22%)
[18:00:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1218 integration-slave-docker-1006 (/: 95%): OFFLINE due to disk space
[18:00:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 1218 integration-slave-docker-1025 (/: 98%): OFFLINE due to disk space
[18:00:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 1218 integration-slave-docker-1026 (/: 97%): OFFLINE due to disk space
[18:01:08] <wikibugs>	 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) p:05High>03Normal @Ladsgroup I've temporarily given the research-ores group push permissions to `ref/heads/*` for projects under `scoring/*` so that you can manually mirro...
[18:01:11] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%)
[18:05:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 1219 integration-slave-docker-1006: OFFLINE due to disk space
[18:05:15] <wmf-insecte>	 maintenance-disconnect-full-disks build 1219 integration-slave-docker-1025: OFFLINE due to disk space
[18:05:15] <wmf-insecte>	 maintenance-disconnect-full-disks build 1219 integration-slave-docker-1026: OFFLINE due to disk space
[18:05:41] <wikibugs>	 (03PS1) 10Hashar: docker: drop /var/lib/mysql from Quibble containers [integration/config] - 10https://gerrit.wikimedia.org/r/458858
[18:06:15] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Niedzielski) If newer versions of Node.js cann...
[18:06:46] <wikibugs>	 (03CR) 10Hashar: "untested :] But Quibble initializes and spawns Mariadb using a temporary directory. So surely the Debian initial database is of no use." [integration/config] - 10https://gerrit.wikimedia.org/r/458858 (owner: 10Hashar)
[18:07:00] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1006 is CRITICAL: CRITICAL: integration.integration-slave-docker-1006.diskspace.root.byte_percentfree (<50.00%)
[18:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1220 integration-slave-docker-1006: OFFLINE due to disk space
[18:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1220 integration-slave-docker-1025: OFFLINE due to disk space
[18:11:13] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK
[18:12:10] <Krinkle>	 !log Marking integration-slave-docker-1026 offline (ENOSPC)
[18:12:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[18:12:48] <bawolff>	 jenkins seems out of disk space
[18:13:11] <bawolff>	 or at least, that's what i think https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/9628/console is telling me
[18:13:27] <Krinkle>	 http://sal.releng.team |
[18:14:28] <Krinkle>	 https://phabricator.wikimedia.org/T202457
[18:15:07] <wikibugs>	 10Project-Admins: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10Yaron_Koren)
[18:15:09] <Krinkle>	 marxarelli: If it is just 1026 (which is 100% in my experience, failing jobs for space is always that one), can we take it out of rotation until we figure out how to solve it?
[18:15:16] <Krinkle>	 like, permanently?
[18:15:30] <Krinkle>	 I mean, not just for a few minutes.
[18:15:46] <wmf-insecte>	 maintenance-disconnect-full-disks build 1221 integration-slave-docker-1006: OFFLINE due to disk space
[18:16:00] <Krinkle>	 I'm spending half my day aborting builds, clearing +2 scores, rebasing, and re+2'ing.
[18:17:00] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1006 is OK: OK: All targets OK
[18:17:02] <marxarelli>	 Krinkle: absolutely. or we can lower the number of executors to something like 4
[18:17:28] <Krinkle>	 Yeah, if the issue is active jobs/workspaces, I don't know.
[18:17:32] <Krinkle>	 But I trust you on that :)
[18:18:00] <marxarelli>	 i'll set it to 4 executors and keep an eye on it
[18:18:31] <marxarelli>	 if it still has problems, i'll just take it out of rotation permanently
[18:18:38] <Krenair>	 bawolff, that'll be one of the integration slave instances rather than the jenkins master luckily
[18:18:48] <Krenair>	 17:54:22 Building remotely on integration-slave-docker-1026 (stats-T201972.bigmem blubber DebianJessieDocker m4executor) in workspace /srv/jenkins-workspace/workspace/quibble-vendor-mysql-php70-docker
[18:18:49] <stashbot>	 T201972: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972
[18:18:52] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1025 is OK: OK: All targets OK
[18:19:13] <marxarelli>	 !log setting integration-slave-docker-1026 executors to 4 to avoid disk space exhaustion due to concurrent builds
[18:19:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[18:19:31] <bawolff>	 Krenair: Jenkins be like magic to me, I don't know the different parts :)
[18:20:22] <wmf-insecte>	 maintenance-disconnect-full-disks build 1222 integration-slave-docker-1006: OFFLINE due to disk space
[18:20:28] <Krenair>	 bawolff, hey it gets much worse than that :)
[18:20:36] <Krenair>	 I know very little about Jenkins
[18:20:42] <shinken-wm>	 RECOVERY - Puppet errors on deployment-certcentral03 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:25:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1223 integration-slave-docker-1006: OFFLINE due to disk space
[18:30:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1224 integration-slave-docker-1006: OFFLINE due to disk space
[18:30:20] <legoktm>	 !log started gear_client on contint1001
[18:30:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[18:32:31] <marxarelli>	 thcipriani: how hard would it be to modify maintenance-disconnect-full-disks to bring nodes back online?
[18:32:55] <marxarelli>	 integration-slave-docker-1006 seems fine now
[18:34:59] <thcipriani>	 marxarelli: I was thinking about that this morning
[18:35:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1225 integration-slave-docker-1006: OFFLINE due to disk space
[18:35:41] <thcipriani>	 it's a little tricky since I can't execute scripts on offline machines
[18:36:10] <marxarelli>	 i guess what's more troubling, is why it even temporarily used up 12G of the 14G free disk space
[18:36:39] <marxarelli>	 during one build ostensibly?
[18:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1226 integration-slave-docker-1006: OFFLINE due to disk space
[18:45:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1227 integration-slave-docker-1006: OFFLINE due to disk space
[18:50:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1228 integration-slave-docker-1006: OFFLINE due to disk space
[18:55:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1229 integration-slave-docker-1006: OFFLINE due to disk space
[18:56:57] <wmf-insecte>	 Project beta-scap-eqiad build #221354: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221354/
[19:00:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1230 integration-slave-docker-1006: OFFLINE due to disk space
[19:05:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1231 integration-slave-docker-1006: OFFLINE due to disk space
[19:09:49] <wmf-insecte>	 Project beta-scap-eqiad build #221355: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221355/
[19:10:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1232 integration-slave-docker-1006: OFFLINE due to disk space
[19:14:52] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<11.11%)
[19:15:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1233 integration-slave-docker-1006: OFFLINE due to disk space
[19:15:13] <Krinkle>	 !log marked integration-slave-docker-1025 offline (no space), aborted builds manualy
[19:15:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[19:16:22] <Krinkle>	 for the record, I'm only doing that when I see one my builds actually fail, not related to shinken alert
[19:22:13] <shinken-wm>	 PROBLEM - SSH on integration-slave-docker-1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:22:36] <wmf-insecte>	 Project beta-scap-eqiad build #221356: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221356/
[19:29:52] <shinken-wm>	 PROBLEM - SSH on integration-slave-docker-1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:32:05] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Move CI docker storage engine to device mapper - https://phabricator.wikimedia.org/T203841 (10thcipriani) p:05Triage>03Normal
[19:33:49] <wmf-insecte>	 maintenance-disconnect-full-disks build 1236 integration-slave-docker-1006: OFFLINE due to disk space
[19:35:13] <wmf-insecte>	 maintenance-disconnect-full-disks build 1237 integration-slave-docker-1006: OFFLINE due to disk space
[19:35:49] <wmf-insecte>	 Project beta-scap-eqiad build #221357: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221357/
[19:37:08] <Reedy>	 19:35:48 sudo: option '--preserve-env' doesn't allow an argument
[19:39:12] <Krinkle>	 thcipriani: ^ looks like latest scap commit causes an issue
[19:39:15] <Krinkle>	 https://github.com/wikimedia/scap/commit/36d2144d2453d352bf09453291aa0683a60fdbf6
[19:40:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1238 integration-slave-docker-1006: OFFLINE due to disk space
[19:40:42] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Free up LVM extents for Docker devicemapper on new Jenkins Agents - https://phabricator.wikimedia.org/T203842 (10thcipriani) p:05Triage>03Normal
[19:42:01] <Krinkle>	 marxarelli: sorry to nag, ignore if you're on to something, but would like to confirm/clarify what it is we're doing about the disk space issue currently. Is there something we can revert to or change to make it go away like it was before 2-3- weeks ago?
[19:42:24] <marxarelli>	 Krinkle: working with thcipriani on that now
[19:42:29] <marxarelli>	 we're taskifying
[19:42:36] <Krinkle>	 it's very counter productive with almost every commit requiring at least 3 attempts and 2 man hours to merge. Not every build fails, but every commit has at least 1 job that fails, as such making it impossible to do much.
[19:42:39] <Krinkle>	 cool :)
[19:42:53] <marxarelli>	 it's a mess
[19:43:14] <marxarelli>	 but we have a promising path forward
[19:44:00] <Krinkle>	 let me know if you need an ear for brainstorming or an extra hand, but otherwise, no need to hold up to explain, don't want to stand in the way :) Thanks!
[19:44:17] <marxarelli>	 cool, i appreciate it
[19:45:14] <wmf-insecte>	 maintenance-disconnect-full-disks build 1239 integration-slave-docker-1006: OFFLINE due to disk space
[19:45:29] <marxarelli>	 Krinkle: are you still having issues with integration-slave-docker-1026 offline?
[19:45:41] <thcipriani>	 Reedy: I'll look at scap in a few minutes, current error is my fault.
[19:46:07] <Krinkle>	 marxarelli: not for about 20min. but still quite a few commits pending from last hour. will let you know.
[19:47:05] <shinken-wm>	 RECOVERY - SSH on integration-slave-docker-1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0)
[19:48:54] <wmf-insecte>	 Project beta-scap-eqiad build #221358: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221358/
[19:49:11] <marxarelli>	 Krinkle: k
[19:49:56] <wikibugs>	 (03PS1) 10Umherirrender: Enable seccheck for EUCopyrightCampaign [integration/config] - 10https://gerrit.wikimedia.org/r/458882
[19:53:03] <marxarelli>	 so, the short-term tl;dr is that we're ditching 1025/1026 and provisioning more m1.mediums for the weekend. the long-term tl;dr is that we're going to replace m1.mediums with bigmems or something similar, shrink the logical volume for /srv to use 20G of the larger lvm volume group, and configure docker to use the device mapper storage driver and the free volume group extents on /dev/vda4
[19:53:11] <marxarelli>	 Krinkle: ^
[19:53:12] <shinken-wm>	 PROBLEM - SSH on integration-slave-docker-1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:53:57] <Krinkle>	 marxarelli: cool, andthe m1.med are 1executor?
[19:54:28] <marxarelli>	 hmm... i think they will be m2executors?
[19:54:40] <marxarelli>	 thcipriani: ^ ?
[19:55:03] <hasharDinner>	 or give them a fancy name
[19:55:27] <hashar>	 potentially we could reuse the flavor name as the jenkins label
[19:55:30] <Krinkle>	 instance-1023 has label m4 and 1 executor
[19:55:35] <hashar>	 so "bigmem" 
[19:55:53] <thcipriani>	 the m1.mediums we're provisioning over the short term? are, IIRC, labelled m4executors (for a reason I don't know) but have 1 executor
[19:56:04] <Krinkle>	 anyhow, that sounds good. I gather the ones we're making more of short-term are the 1 executor ones we used to have only.
[19:56:25] <thcipriani>	 exactly
[19:56:32] <marxarelli>	 ah, sorry, m4executor is what i meant
[20:00:12] <wmf-insecte>	 maintenance-disconnect-full-disks build 1242 integration-slave-docker-1006: OFFLINE due to disk space
[20:00:47] <marxarelli>	 i suppose we should repool 1006 as well
[20:01:25] <marxarelli>	 !log bringing integration-slave-docker-1006 online again since disk space has been reclaimed
[20:01:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL
[20:02:00] <wmf-insecte>	 Project beta-scap-eqiad build #221359: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221359/
[20:02:14] <thcipriani>	 ^ scap should recover shortly
[20:08:03] <shinken-wm>	 RECOVERY - SSH on integration-slave-docker-1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0)
[20:15:33] <wmf-insecte>	 Project beta-scap-eqiad build #221360: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221360/
[20:17:47] <wikibugs>	 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219
[20:18:54] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar)
[20:19:12] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar)
[20:19:16] <wikibugs>	 10Project-Admins, 10User-MarcoAurelio: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10MarcoAurelio) a:03MarcoAurelio
[20:19:23] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar) p:05Triage>03Low
[20:26:15] <wikibugs>	 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10MarcoAurelio) I'm not sure, but maybe we could make this work by having Diffusion to observe the github repos, and at the same time have difussion push the changes to gerrit using {K18}?...
[20:27:37] <wikibugs>	 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10thiemowmde) I support the idea for PHP. From my experience, I can't think of a reason to ever use `!!` in PHP. So as long as it's an auto-fix sniff, go for it.  For JavaScript it's a li...
[20:29:53] <wmf-insecte>	 Project beta-scap-eqiad build #221361: 04STILL FAILING in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221361/
[20:31:13] <thcipriani>	 next one, for sure
[20:38:35] <wikibugs>	 10Project-Admins, 10User-MarcoAurelio: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10MarcoAurelio) 05Open>03Resolved Created:  * #mediawiki-extensions-approved_revs * #mediawiki-extensions-tinymce * #mediawiki-extensions-vefo...
[20:38:40] <shinken-wm>	 PROBLEM - Host integration-slave-docker-1026 is DOWN: CRITICAL - Host Unreachable (10.68.22.190)
[20:42:36] <wikibugs>	 (03PS1) 10Hashar: tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894
[20:42:54] <wikibugs>	 (03CR) 10Hashar: [C: 032] tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar)
[20:44:45] <shinken-wm>	 RECOVERY - SSH on integration-slave-docker-1009 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0)
[20:45:07] <wikibugs>	 (03Merged) 10jenkins-bot: tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar)
[20:45:18] <wmf-insecte>	 maintenance-disconnect-full-disks build 1251 integration-slave-docker-1009 (/: 96%): OFFLINE due to disk space
[20:48:18] <wikibugs>	 (03CR) 10Hashar: "Doc at https://doc.wikimedia.org/docker-pkg/" [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar)
[20:48:38] <hashar>	 thcipriani: https://doc.wikimedia.org/docker-pkg/ :]
[20:49:47] <thcipriani>	 fancy!
[20:50:11] <wmf-insecte>	 maintenance-disconnect-full-disks build 1252 integration-slave-docker-1009: OFFLINE due to disk space
[20:50:50] <shinken-wm>	 PROBLEM - SSH on integration-slave-docker-1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[20:51:18] <shinken-wm>	 PROBLEM - Free space - all mounts on integration-slave-docker-1009 is CRITICAL: CRITICAL: integration.integration-slave-docker-1009.diskspace.root.byte_percentfree (<100.00%)
[20:54:48] <thcipriani>	 hrm, I still can't ssh to that instance :\
[20:55:10] <wmf-insecte>	 maintenance-disconnect-full-disks build 1253 integration-slave-docker-1009: OFFLINE due to disk space
[20:56:14] <shinken-wm>	 RECOVERY - Free space - all mounts on integration-slave-docker-1009 is OK: OK: integration.integration-slave-docker-1009.diskspace._srv.byte_percentfree (More than half of the datapoints are undefined) integration.integration-slave-docker-1009.diskspace.root.byte_percentfree (More than half of the datapoints are undefined)
[20:59:06] <Krinkle>	 No news is good news?
[21:02:40] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar)
[21:03:28] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar)
[21:03:38] <hashar>	 I found a new ones with Zuul/Gerrit madness https://phabricator.wikimedia.org/T203846
[21:04:02] <hashar>	 manually merging a change that is in the gate  would cause Gerrit to reject Zuul request to submit the change
[21:04:12] <hashar>	 thus Zuul considers the change has failed and cancel all jobs behind in the queue
[21:04:20] <hashar>	 and thus reshuffle all of them :^\
[21:06:00] <thcipriani>	 huh
[21:06:36] <hashar>	 that is why some changes have been waiting for 1+hour
[21:10:04] <Hauskatze>	 Hi hashar ^^ I'm not sure if there's an issue but most jobs in the gate-and-submit queue seems to be stuck at +1 hour ?
[21:10:27] <paladox>	 Hauskatze he is aware :)
[21:10:39] <Hauskatze>	 I've just logged in
[21:10:47] <paladox>	 just before you joined he was talking about it :)
[21:10:48] <hashar>	 Hauskatze: yeah thank you I noticed that as well
[21:10:50] <hashar>	 the reason is  https://phabricator.wikimedia.org/T203846
[21:10:56] <hashar>	 some changes got manually merged
[21:11:07] * Hauskatze looks
[21:11:17] <hashar>	 but the change is still in Zuul
[21:11:26] <hashar>	 when the change jobs are completed, Zuul attempts to submit the change in Gerrit
[21:11:29] <hashar>	 but Gerrit rejects it 
[21:11:43] <Hauskatze>	 aha
[21:11:46] <hashar>	 thus Zuul considers the change to have failed to merge and cancel all the jobs behind
[21:11:56] <hashar>	 and reenqueue all changes, triggering new jobs
[21:11:58] <paladox>	 hashar i belive to set allowPostSubmit you do it in All-Projects
[21:11:59] <Hauskatze>	 what about not gluing together the changes and let jenkins test all of them separately?
[21:12:03] <hashar>	 yea
[21:12:04] <paladox>	 or where ever the labels are defined
[21:12:15] <hashar>	 I am reading the doc at https://gerrit.wikimedia.org/r/Documentation/config-labels.html#label_allowPostSubmit
[21:12:22] <hashar>	 but it is said that it defaults to true :/
[21:19:51] <wmf-insecte>	 Yippee, build fixed!
[21:19:51] <wmf-insecte>	 Project beta-scap-eqiad build #221362: 09FIXED in 49 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221362/
[21:21:02] <wikibugs>	 10Phabricator: AphrontCountQueryException on my Profile Page (T75720) - https://phabricator.wikimedia.org/T203849 (10RheingoldRiver)
[21:21:40] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) I have tried on https://gerrit.wikimedia.org/r/#/c/test/gerrit-ping/+/458903/ a merged change with Verified+...
[21:21:49] <hashar>	 oh my
[21:23:22] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar)
[21:51:47] <wikibugs>	 (03PS1) 10Hashar: wmf: ignore submit error on merged change [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/458914 (https://phabricator.wikimedia.org/T203846)
[21:53:03] <wikibugs>	 (03CR) 10Hashar: "I will have to try, I am not sure about the error content or whether "'change is merged' in err" will actually work :]" [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/458914 (https://phabricator.wikimedia.org/T203846) (owner: 10Hashar)
[21:55:25] <wikibugs>	 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) I don't think that can be fixed in #Gerrit short of reintroducing force message. It is...
[22:00:41] <shinken-wm>	 RECOVERY - SSH on integration-slave-docker-1009 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0)
[22:07:31] <paladox>	 https://gerrit-review.googlesource.com/c/gerrit/+/195431
[22:13:45] <wikibugs>	 10Phabricator: AphrontCountQueryException on my Profile Page (T75720) - https://phabricator.wikimedia.org/T203849 (10JJMC89)
[22:19:09] <hashar>	 happy week-end!
[22:31:18] <wikibugs>	 10Release-Engineering-Team (Kanban), 10MediaWiki-General-or-Unknown, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Security, 10User-zeljkofilipin: `npm audit` for mediawiki/core found 24 vulnerabilities - https://phabricator.wikimedia.org/T194280 (10Jdforrester-WMF) a:03Jdforrester-...
[22:39:34] <wikibugs>	 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) 05Open>03Resolved a:03mmodell @mmodell found some suspect messages in phab.  Editing permissions for scoring seems to have done the trick!  Now: https://github.com/wikim...
[23:38:12] <shinken-wm>	 PROBLEM - Puppet errors on deployment-certcentral-testclient03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]