[01:16:35] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Krinkle) p:05Triage>03Normal [01:39:38] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Legoktm) Is there a reason this needs to be something separate an... [02:16:29] 10Beta-Cluster-Infrastructure, 10Core-Platform-Team, 10fixcopyright.wikimedia.org, 10Patch-For-Review: Deploy EU copyright stuff to beta cluster - https://phabricator.wikimedia.org/T203299 (10Legoktm) [02:17:35] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-ResourceLoader, 10Performance-Team: Run `maintenance/resources/manageForeignResources.php verify` as a test on MediaWiki core - https://phabricator.wikimedia.org/T203694 (10Krinkle) >>! In T203694#4564943, @Legoktm wrote: > Is there a rea... [02:18:24] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q2, 10Composer, 10Upstream, 10Wikimedia-production-error (Shared Build Failure): Error "TransportException 404 Not Found" in Jenkins jobs using composer - https://phabricator.wikimedia.org/T182266 (10Legoktm) 05Open>03Resolved a:03Krinkle I... [05:33:42] 10Deployments, 10Core-Platform-Team, 10TechCom-RFC, 10I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360 (10Legoktm) [05:34:29] 10Deployments, 10Release-Engineering-Team, 10Core-Platform-Team, 10MediaWiki-extensions-LocalisationUpdate, and 2 others: Localization Cache Redo - https://phabricator.wikimedia.org/T78802 (10Legoktm) [05:35:06] 10Deployments, 10Release-Engineering-Team, 10Core-Platform-Team, 10TechCom-RFC, 10I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360 (10Legoktm) [05:58:39] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nemo_bis) [06:05:58] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nemo_bis) p:05Triage>03High [07:40:13] maintenance-disconnect-full-disks build 1094 integration-slave-docker-1026 (/: 100%): OFFLINE due to disk space [07:42:12] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<11.11%) [07:43:52] PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<22.22%) [07:45:11] maintenance-disconnect-full-disks build 1095 integration-slave-docker-1026: OFFLINE due to disk space [07:50:12] maintenance-disconnect-full-disks build 1096 integration-slave-docker-1026: OFFLINE due to disk space [07:55:12] maintenance-disconnect-full-disks build 1097 integration-slave-docker-1026: OFFLINE due to disk space [07:58:52] RECOVERY - Free space - all mounts on integration-slave-docker-1025 is OK: OK: All targets OK [08:00:12] maintenance-disconnect-full-disks build 1098 integration-slave-docker-1026: OFFLINE due to disk space [08:02:11] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [08:05:11] maintenance-disconnect-full-disks build 1099 integration-slave-docker-1026: OFFLINE due to disk space [08:10:12] maintenance-disconnect-full-disks build 1100 integration-slave-docker-1026: OFFLINE due to disk space [08:15:11] maintenance-disconnect-full-disks build 1101 integration-slave-docker-1026: OFFLINE due to disk space [08:16:56] 10Release-Engineering-Team, 10GitHub-Mirrors, 10Wikidata, 10Composer, and 2 others: wikibase/javascript-api composer package is not installable (mainly due to a repo move) - https://phabricator.wikimedia.org/T203162 (10Addshore) 05Open>03Resolved Fixed by doing what is described in https://github.com/c... [08:20:11] maintenance-disconnect-full-disks build 1102 integration-slave-docker-1026: OFFLINE due to disk space [08:25:10] maintenance-disconnect-full-disks build 1103 integration-slave-docker-1026: OFFLINE due to disk space [08:30:13] maintenance-disconnect-full-disks build 1104 integration-slave-docker-1026: OFFLINE due to disk space [08:34:15] (03CR) 10Hashar: "TLDR: we can drop the noexec flag" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm) [08:35:11] maintenance-disconnect-full-disks build 1105 integration-slave-docker-1026: OFFLINE due to disk space [08:40:11] maintenance-disconnect-full-disks build 1106 integration-slave-docker-1026: OFFLINE due to disk space [08:42:21] (03CR) 10Hashar: "> Patch Set 6:" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm) [08:45:11] maintenance-disconnect-full-disks build 1107 integration-slave-docker-1026: OFFLINE due to disk space [08:50:11] maintenance-disconnect-full-disks build 1108 integration-slave-docker-1026: OFFLINE due to disk space [08:55:12] maintenance-disconnect-full-disks build 1109 integration-slave-docker-1026: OFFLINE due to disk space [08:56:08] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Schnark) [09:00:11] maintenance-disconnect-full-disks build 1110 integration-slave-docker-1026: OFFLINE due to disk space [09:05:11] maintenance-disconnect-full-disks build 1111 integration-slave-docker-1026: OFFLINE due to disk space [09:10:09] hashar: we need a deployer for an urgent fix (see -operations); yes, on a Friday. who might be in this tz with the appropriate rights? [09:10:10] maintenance-disconnect-full-disks build 1112 integration-slave-docker-1026: OFFLINE due to disk space [09:10:31] apergos: me [09:13:10] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar) [09:13:26] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10hashar) T203750 being hotfixed by reverting a patch. Revert is https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/458743 [09:15:12] maintenance-disconnect-full-disks build 1113 integration-slave-docker-1026: OFFLINE due to disk space [09:19:02] for people's amusement [09:19:25] while looking for an easy link which might overflow the language link list on th sidebar I ran into [09:19:50] https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Portal:Current_events [09:20:11] maintenance-disconnect-full-disks build 1114 integration-slave-docker-1026: OFFLINE due to disk space [09:25:11] maintenance-disconnect-full-disks build 1115 integration-slave-docker-1026: OFFLINE due to disk space [09:30:12] !log integration-slave-docker-1025 lower number of executors from 5 to 4. 8 CPUS can not sustain 5 concurrent Quibble builds | T201972 [09:30:12] maintenance-disconnect-full-disks build 1116 integration-slave-docker-1026: OFFLINE due to disk space [09:30:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:30:18] T201972: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972 [09:32:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-Addshore: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972 (10hashar) integration-slave-docker-1025 had 5 Quibble jobs in parallel and that slowed down the... [09:35:10] maintenance-disconnect-full-disks build 1117 integration-slave-docker-1026: OFFLINE due to disk space [09:40:13] maintenance-disconnect-full-disks build 1118 integration-slave-docker-1026: OFFLINE due to disk space [09:45:10] maintenance-disconnect-full-disks build 1119 integration-slave-docker-1026: OFFLINE due to disk space [09:49:23] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.20 deployment blockers - https://phabricator.wikimedia.org/T191066 (10Amire80) [09:50:11] maintenance-disconnect-full-disks build 1120 integration-slave-docker-1026: OFFLINE due to disk space [09:55:17] maintenance-disconnect-full-disks build 1121 integration-slave-docker-1026: OFFLINE due to disk space [10:00:11] maintenance-disconnect-full-disks build 1122 integration-slave-docker-1026: OFFLINE due to disk space [10:05:11] maintenance-disconnect-full-disks build 1123 integration-slave-docker-1026: OFFLINE due to disk space [10:08:28] 10Project-Admins: User-jijiki personal project request - https://phabricator.wikimedia.org/T203773 (10jijiki) [10:10:11] maintenance-disconnect-full-disks build 1124 integration-slave-docker-1026: OFFLINE due to disk space [10:15:11] maintenance-disconnect-full-disks build 1125 integration-slave-docker-1026: OFFLINE due to disk space [10:20:11] maintenance-disconnect-full-disks build 1126 integration-slave-docker-1026: OFFLINE due to disk space [10:25:11] maintenance-disconnect-full-disks build 1127 integration-slave-docker-1026: OFFLINE due to disk space [10:27:55] 10Project-Admins: User-jijiki personal project request - https://phabricator.wikimedia.org/T203773 (10Aklapper) 05Open>03Resolved a:03Aklapper Requested public project @user-jijiki has been created: https://phabricator.wikimedia.org/project/view/3572/ [[ https://www.mediawiki.org/wiki/Phabricator/Project_... [10:30:10] maintenance-disconnect-full-disks build 1128 integration-slave-docker-1026: OFFLINE due to disk space [10:31:19] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Aklapper) I'd love to have input / an opinion from @harej here. :) [10:33:23] 10Project-Admins: Create a Component project for ScienceSource - https://phabricator.wikimedia.org/T203667 (10Aklapper) 05Open>03Resolved a:03Aklapper Requested public project #ScienceSource has been created: https://phabricator.wikimedia.org/project/view/3573/ Please encourage interested people to visit... [10:35:11] maintenance-disconnect-full-disks build 1129 integration-slave-docker-1026: OFFLINE due to disk space [10:40:11] maintenance-disconnect-full-disks build 1130 integration-slave-docker-1026: OFFLINE due to disk space [10:45:10] maintenance-disconnect-full-disks build 1131 integration-slave-docker-1026: OFFLINE due to disk space [10:48:39] 10Release-Engineering-Team: Create production code deployment management process - https://phabricator.wikimedia.org/T203703 (10Aklapper) [11:03:00] 10Release-Engineering-Team (Kanban): Refresh the Production Deployment Review Process (aka Review Queue) - https://phabricator.wikimedia.org/T203697 (10Aklapper) Also see {T195244} / potential dup? [11:03:47] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) https://github.com/wikimedia/mediawiki-extensions-OAuth/blob/master/backend/MWOAuth.hooks.php#L132 ``` [W5JaD... [11:05:39] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) @Ladsgroup Can this be a consequence of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/OAuth/+/452063/ ? [11:06:46] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) Probably. Will check in a sec [11:10:39] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) Yup, It's official. I'm stupid :D how could I miss such thing. Fix it in a second. [11:54:50] 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891 (10Aklapper) > What systems would you see consuming this list? * Supersede m... [12:20:05] (03PS1) 10Hashar: In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 [12:23:11] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%) [12:25:12] maintenance-disconnect-full-disks build 1151 integration-slave-docker-1026 (/: 97%): OFFLINE due to disk space [12:30:13] maintenance-disconnect-full-disks build 1152 integration-slave-docker-1026: OFFLINE due to disk space [12:31:44] (03PS2) 10Hashar: In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 [12:31:50] (03PS1) 10Hashar: Explicitly set Quibble --log-dir [integration/config] - 10https://gerrit.wikimedia.org/r/458788 [12:31:56] (03CR) 10jerkins-bot: [V: 04-1] In Docker default log dir to be under workspace [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 (owner: 10Hashar) [12:32:31] (03CR) 10Hashar: "Seems easier to have log and src under /workspace. That follows up https://gerrit.wikimedia.org/r/c/integration/quibble/+/451294" [integration/quibble] - 10https://gerrit.wikimedia.org/r/458786 (owner: 10Hashar) [12:35:11] maintenance-disconnect-full-disks build 1153 integration-slave-docker-1026: OFFLINE due to disk space [12:38:11] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [12:40:11] maintenance-disconnect-full-disks build 1154 integration-slave-docker-1026: OFFLINE due to disk space [12:45:12] maintenance-disconnect-full-disks build 1155 integration-slave-docker-1026: OFFLINE due to disk space [12:50:11] maintenance-disconnect-full-disks build 1156 integration-slave-docker-1026: OFFLINE due to disk space [12:55:13] maintenance-disconnect-full-disks build 1157 integration-slave-docker-1026: OFFLINE due to disk space [13:00:12] maintenance-disconnect-full-disks build 1158 integration-slave-docker-1026: OFFLINE due to disk space [13:05:11] maintenance-disconnect-full-disks build 1159 integration-slave-docker-1026: OFFLINE due to disk space [13:05:36] thcipriani: I highly doubt the lfs is the reason that mirroring is broken. I would try deleting and making these repos again: scoring/ores/ores, scoring/ores/editquality, scoring/ores/draftquality and scoring/ores/wikiclass [13:05:46] because they mirrored from lfs commit [13:10:10] maintenance-disconnect-full-disks build 1160 integration-slave-docker-1026: OFFLINE due to disk space [13:15:10] maintenance-disconnect-full-disks build 1161 integration-slave-docker-1026: OFFLINE due to disk space [13:16:13] hey, I have a question regarding hosting static html on wmf infrastructure, Reading-web-team has a prototype that we need to put somewhere live (static html) [13:16:32] what is the best approach, use the people.wmf? create new instance on wmflabs? [13:19:20] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes, 10Patch-For-Review, 10User-Ladsgroup: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10SBisson) a:03Ladsgroup [13:20:11] maintenance-disconnect-full-disks build 1162 integration-slave-docker-1026: OFFLINE due to disk space [13:21:57] greg-g thcipriani ^ any ideas? [13:22:23] (regarding hosting static html, not maitenance-disconnect) [13:25:11] maintenance-disconnect-full-disks build 1163 integration-slave-docker-1026: OFFLINE due to disk space [13:28:04] pmiazga: people.wikimedia is a pretty good choice for a demo/prototype if it's not going to see real users, same with a cloud vps but that's a lot more overhead. [13:30:14] maintenance-disconnect-full-disks build 1164 integration-slave-docker-1026: OFFLINE due to disk space [13:32:31] greg-g if we want to show that prototype to some real users (ab testing, checking do they like the new approach), then do not use the people.wikimedia? [13:35:07] I guess the question is: what is this and why is it not merged code behind a feature flag deployed to production? [13:35:11] maintenance-disconnect-full-disks build 1165 integration-slave-docker-1026: OFFLINE due to disk space [13:40:11] maintenance-disconnect-full-disks build 1166 integration-slave-docker-1026: OFFLINE due to disk space [13:40:46] greg-g - too many code changes, style, going through reviews,mergin stuff might take too much time [13:40:48] https://mobile-contributions.firebaseapp.com/nav3.html [13:41:18] this is the prototype, we just want to gather user feedback before we start implementing the real thing [13:42:18] it's prototype, quick and dirty but has a really nice UI [13:42:50] the link to that prototype will be available on mediawWiki page plus we're going to give it to some small group of users and ask for their opinion [13:45:11] maintenance-disconnect-full-disks build 1167 integration-slave-docker-1026: OFFLINE due to disk space [13:50:11] maintenance-disconnect-full-disks build 1168 integration-slave-docker-1026: OFFLINE due to disk space [13:53:12] pmiazga: peopledot is probably fine for just user feedback then [13:53:33] (03PS1) 10Hashar: Stop hardcoding TMPDIR=/tmp [integration/quibble] - 10https://gerrit.wikimedia.org/r/458817 [13:53:35] (03PS1) 10Hashar: Postgres datadir is automatically cleanup [integration/quibble] - 10https://gerrit.wikimedia.org/r/458818 [13:53:54] ok, thx greg-g [13:55:11] maintenance-disconnect-full-disks build 1169 integration-slave-docker-1026: OFFLINE due to disk space [14:00:12] maintenance-disconnect-full-disks build 1170 integration-slave-docker-1026: OFFLINE due to disk space [14:05:11] maintenance-disconnect-full-disks build 1171 integration-slave-docker-1026: OFFLINE due to disk space [14:06:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) >>! In T203607#4564546, @hashar wrote: > I am definitely a fan of having J... [14:09:47] (03PS1) 10Hashar: SQLite backend did not call parent __init__ [integration/quibble] - 10https://gerrit.wikimedia.org/r/458820 [14:10:12] maintenance-disconnect-full-disks build 1172 integration-slave-docker-1026: OFFLINE due to disk space [14:15:12] maintenance-disconnect-full-disks build 1173 integration-slave-docker-1026: OFFLINE due to disk space [14:16:43] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Aklapper) How is this task different from T158360? [14:20:11] maintenance-disconnect-full-disks build 1174 integration-slave-docker-1026: OFFLINE due to disk space [14:34:23] 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10Ladsgroup) p:05Triage>03High Triaging this is as high since I couldn't push any change on ores not to beta or prod because of this for weeks now. The weird thing for me is that git lf... [14:39:33] (03PS1) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 [14:45:14] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10greg) p:05High>03Triage Given the open RFC another task presupposing an outcome should be "Needs Triage". [15:08:39] Amir1: I can delete and remake those repos today if that's what you need. [15:09:32] thcipriani: I can't say for sure if it fixes everything but that's the standard procedure in software engineering ("Have you tried turning it off and on again?") [15:10:14] :) [15:10:17] fair enough [15:10:24] I'll give that a shot and ping you when it's done [15:11:03] 10MediaWiki-Codesniffer: Add sniff to remove space after not operator - https://phabricator.wikimedia.org/T203799 (10Umherirrender) [15:12:16] 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Umherirrender) [15:13:18] 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Reedy) I note we purposely seem to choose this "pattern" for javascript (I guess for less characters) [15:14:40] 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Umherirrender) >>! In T203800#4566644, @Reedy wrote: > I note we purposely seem to choose this "pattern" for javascript (I guess for less characters) The minify use it on the fly to sa... [15:15:38] 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10Reedy) We use it in non minified code too ```lines=25 Targets Occurrences of '!!' in Directory /Users/reedy/PhpstormProjects/mediawiki/core/resources/src Found Occurrences (67 usa... [15:17:27] (03PS2) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 [15:24:39] [from yesterday] greg-g: Given there's no train next week, can there be a Tuesday "morning" SWAT slot added at 18:00 UTC? (It's currently empty.) [15:27:58] (03CR) 10Hashar: "Made the database dir relative to the workspace and absolute." [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 (owner: 10Hashar) [15:29:40] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Harej) I like option b the best, but I think then we would need a new/adjusted name, since "tool" in common parlance does n... [15:29:45] James_F: yeah, sorry, saw that late in my day (I'm in Tennessee right now), let's not, just because that'd be right before the last switchover slot (for that day). [15:30:29] OK. [15:30:50] sorry :/ [15:33:55] (03PS2) 10Hashar: Stop hardcoding TMPDIR=/tmp [integration/quibble] - 10https://gerrit.wikimedia.org/r/458817 [15:35:10] (03PS2) 10Hashar: Postgres datadir is automatically cleanup [integration/quibble] - 10https://gerrit.wikimedia.org/r/458818 [15:38:29] (03PS2) 10Hashar: SQLite backend did not call parent __init__ [integration/quibble] - 10https://gerrit.wikimedia.org/r/458820 [15:39:05] (03PS3) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 [15:39:45] (03CR) 10jerkins-bot: [V: 04-1] Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 (owner: 10Hashar) [15:40:18] 10Beta-Cluster-Infrastructure, 10Growth-Team, 10MediaWiki-Recent-changes, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), and 2 others: Recent Changes shows Internal Error on de.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T203759 (10Ladsgroup) The patch is merged but it... [15:45:09] (03PS4) 10Hashar: Allow specifying database data directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/458826 [15:49:26] (03CR) 10Hashar: "Quibble has the database dir created under the hardcoded /tmp. I have made a few patches to clean that up and let us specify a different d" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm) [15:50:34] thcipriani: legoktm: I have send a few Quibble patches to let us relocate the database directory from /tmp to where ever we want (quibble --db-dir). That should let us mount a tmpfs which is not /tmp but /workspace/database and then point Quibble at it. [15:51:07] ref is T203181 [15:51:08] T203181: Quibble MariaDB should use a tmpfs as a datadir - https://phabricator.wikimedia.org/T203181 [16:04:11] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<11.11%) [16:05:11] maintenance-disconnect-full-disks build 1195 integration-slave-docker-1026 (/: 96%): OFFLINE due to disk space [16:06:31] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Aklapper) Thanks! Heh, https://www.mediawiki.org/wiki/Naming_things ... Indeed. `#Technical-solution-request` is too vague... [16:09:15] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Legoktm) I feel like we already had this discussion before? [16:10:11] maintenance-disconnect-full-disks build 1196 integration-slave-docker-1026: OFFLINE due to disk space [16:14:20] 10Project-Admins, 10Developer-Advocacy (Jul-Sep 2018): Sort out scope between #MediaWiki-extension-requests vs. #Technical-Tool-Request tags - https://phabricator.wikimedia.org/T198102 (10Legoktm) Yep, see T134103#2625089 and T134103#2625093. I don't think anything has really changed since then. [16:14:26] so why is it that 1026 keeps getting offlined [16:15:15] maintenance-disconnect-full-disks build 1197 integration-slave-docker-1026: OFFLINE due to disk space [16:18:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) ``` hashar@contint1001:~$ nc 127.0.0.1 25 220 contint1001.wikimedia.org ES... [16:18:23] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10thcipriani) Each running quibble container takes up space (sometimes a lot of... [16:18:58] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10hashar) a:03hashar [16:19:10] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [16:20:10] maintenance-disconnect-full-disks build 1198 integration-slave-docker-1026: OFFLINE due to disk space [16:21:33] legoktm: marxarelli and I were just talking bout that -- I think it's because it has lots of executors and so each container downloading stuff is adding to /var/lib/docker/[whatever] as its running. It's recovering because we use --rm to run the containers so when the jobs finish the container layer is deleted. [16:22:02] hmm [16:22:21] would cutting down one executor slot on that host help? [16:25:10] maintenance-disconnect-full-disks build 1199 integration-slave-docker-1026: OFFLINE due to disk space [16:27:32] we could lower the executors on it for a temporary fix. i think we need to solve the disk space issue in general by moving docker containers off of the / partition [16:28:14] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Legoktm) >>! In T203737#4566405, @Aklapper wrote: > How is this task different from T158360? That ticket was a vague request to undeploy LU that got retitled into a re-e... [16:28:20] imho, the best way forward on that is to free up extents on the lvm volume (currently 100% is used for /srv) and configure dockerd to use the device mapper storage driver [16:29:20] +1 [16:29:28] devicemapper I have no idea how it is to be configured [16:29:33] seems to be done via puppet [16:29:38] that would entail changing `profile::labs::lvm::srv` in puppet to specify `size => "50%FREE"` or some such value [16:30:02] for lvm it would be nice to have /srv/jenkins and /srv/docker on two different partitions [16:30:10] maintenance-disconnect-full-disks build 1200 integration-slave-docker-1026: OFFLINE due to disk space [16:30:21] for already configured instances, we'd have to manually resize the volume, since the puppet module for `labs_lvm::volume` doesn't handle changes to size [16:30:39] we can dish them out and rebuild them from scratch [16:30:46] hasharAway: reading the docker docs on the dm storage driver, i don't even think we need a mount [16:30:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Mail, 10Operations, and 2 others: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) Sounds like a plan! Will ping you Monday [16:30:57] we just need to free up extents and point docker at the lvm device itself [16:31:08] it will handle creation of volumes, snapshots, and such [16:31:18] okkk [16:31:24] what I mean is to have three slots: [16:31:34] 1) for the os on / [16:31:42] 2) for Jenkins workspace (on /srv/jenkins maybe ?) [16:31:50] 3) for Docker (on /srv/docker or devicemapper magic) [16:32:03] sounds good, but i would just clarify: [16:32:07] 1) os on / [16:32:19] 2) leave /srv on lvm but using fewer extents [16:32:35] 3) docker on raw lvm volume using remaining extents [16:32:50] +1 :] [16:33:35] on my side, I will adjust Quibble [16:34:00] cool [16:34:01] there are a few low hanging fruit to slightly speed it up [16:34:10] and gotta deal with that /tmp | mysql on tmpfs thing [16:34:39] once we have the disk space issue solved, i think we should tune bigmem executors back up a bit, collect/analyze more build duration data [16:34:58] it was actually really easy to import data from the json api into a google sheet yesterday [16:35:08] i should have gone that route all along instead of fussing with statsd :) [16:35:18] just had to sneak a ruby oneliner, haven't you? :] [16:35:53] oh [16:36:07] and Quibble could use some metrics by itself similar to what scap is doing [16:36:18] it probably just need a few copy paste :] [16:39:46] ok [16:39:58] I am too tired at this point to keep digesting the few tickets related to train [16:40:01] going to grab water [16:40:07] have dinner and be back later this evening [16:40:30] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) I don't think package-lock.json wor... [16:49:43] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10Nikerabbit) I am very much interested in having working LU (or equivalent) and willing to work on it. But I looked through the RFC and lists maybe one concrete issues and... [16:52:14] marxarelli: btw, are you collecting time data for pass or pass and fail? [16:53:39] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-production-error (Shared Build Failure): mediawiki-quibble docker jobs fails due to disk full - https://phabricator.wikimedia.org/T202457 (10dduvall) Just to reiterate what was talked about in IRC (#wikimedia-releng), o... [16:53:51] Krinkle: good question :) i'm not sure! [16:54:32] as of now, the data is likely skewed for different reasons [16:55:01] and i'm not sure we're getting enough data from just that single job/project/branch [16:57:15] so i looked at some data for that same job pulled from the json api instead, and filtered for only successful builds during a more narrow time range (one where executors weren't being changed around and job configurations weren't being drastically altered), and it did show an interesting different in performance between node types [16:57:31] might be best to track pass only, given both infra failures and genuine failures would skew timing in a way that probably isn't useful. [16:57:46] such as ENOSPC :) [16:57:50] but i'm not sure i trust that data at the moment either. thcipriani suggested that the disk-space maintenance script might also be skewing results [16:58:03] Krinkle: totally. makes sense [16:58:36] Hm.. yeah. It does change the game, although note that it doesn't stop jobs, it just offlines a node with in Jenkins means to not start new jobs. [16:59:45] right. but the lower concurrency might allow running builds to execute more efficiently than they would without having to contend with newly scheduled builds [17:00:08] Ah, I see. [17:00:11] that's interesting. [17:00:16] i'm not sure that would make a massive difference, honestly. i'd like to revisit the data once we fix the disk space issue [17:00:28] Towards being depooled, those last few jobs would have more resources than normal. [17:00:32] still being rewarded somewhat [17:00:33] interesting [17:00:44] yeah [17:00:49] i can share the google sheet with you if you like [17:00:59] I can't promise I'll have time ot look at it, but sure :) [17:01:02] your input would be helpful [17:01:06] cool :) [17:09:54] 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499 (10MGChecker) Is there any way to change task subtype manually? I think we should document this new future properly on MediaWiki.org really soon. I guess most people don't even know that these feature exists and if/how they can... [17:42:16] 10Deployments, 10I18n, 10Regression: Update MediaWiki localisation in Wikimedia wikis daily - https://phabricator.wikimedia.org/T203737 (10greg) Just to be clear: what is being asked here is to resume running l10nupdate on the weekend (which was turned off due to issues with l10nupdate breakage). To speak to... [17:55:43] PROBLEM - Puppet errors on deployment-certcentral03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:58:53] PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<22.22%) [18:00:13] maintenance-disconnect-full-disks build 1218 integration-slave-docker-1006 (/: 95%): OFFLINE due to disk space [18:00:14] maintenance-disconnect-full-disks build 1218 integration-slave-docker-1025 (/: 98%): OFFLINE due to disk space [18:00:14] maintenance-disconnect-full-disks build 1218 integration-slave-docker-1026 (/: 97%): OFFLINE due to disk space [18:01:08] 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) p:05High>03Normal @Ladsgroup I've temporarily given the research-ores group push permissions to `ref/heads/*` for projects under `scoring/*` so that you can manually mirro... [18:01:11] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%) [18:05:14] maintenance-disconnect-full-disks build 1219 integration-slave-docker-1006: OFFLINE due to disk space [18:05:15] maintenance-disconnect-full-disks build 1219 integration-slave-docker-1025: OFFLINE due to disk space [18:05:15] maintenance-disconnect-full-disks build 1219 integration-slave-docker-1026: OFFLINE due to disk space [18:05:41] (03PS1) 10Hashar: docker: drop /var/lib/mysql from Quibble containers [integration/config] - 10https://gerrit.wikimedia.org/r/458858 [18:06:15] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, and 2 others: Popups daily jobs currently unusable - https://phabricator.wikimedia.org/T203591 (10Niedzielski) If newer versions of Node.js cann... [18:06:46] (03CR) 10Hashar: "untested :] But Quibble initializes and spawns Mariadb using a temporary directory. So surely the Debian initial database is of no use." [integration/config] - 10https://gerrit.wikimedia.org/r/458858 (owner: 10Hashar) [18:07:00] PROBLEM - Free space - all mounts on integration-slave-docker-1006 is CRITICAL: CRITICAL: integration.integration-slave-docker-1006.diskspace.root.byte_percentfree (<50.00%) [18:10:11] maintenance-disconnect-full-disks build 1220 integration-slave-docker-1006: OFFLINE due to disk space [18:10:11] maintenance-disconnect-full-disks build 1220 integration-slave-docker-1025: OFFLINE due to disk space [18:11:13] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [18:12:10] !log Marking integration-slave-docker-1026 offline (ENOSPC) [18:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:12:48] jenkins seems out of disk space [18:13:11] or at least, that's what i think https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/9628/console is telling me [18:13:27] http://sal.releng.team | [18:14:28] https://phabricator.wikimedia.org/T202457 [18:15:07] 10Project-Admins: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10Yaron_Koren) [18:15:09] marxarelli: If it is just 1026 (which is 100% in my experience, failing jobs for space is always that one), can we take it out of rotation until we figure out how to solve it? [18:15:16] like, permanently? [18:15:30] I mean, not just for a few minutes. [18:15:46] maintenance-disconnect-full-disks build 1221 integration-slave-docker-1006: OFFLINE due to disk space [18:16:00] I'm spending half my day aborting builds, clearing +2 scores, rebasing, and re+2'ing. [18:17:00] RECOVERY - Free space - all mounts on integration-slave-docker-1006 is OK: OK: All targets OK [18:17:02] Krinkle: absolutely. or we can lower the number of executors to something like 4 [18:17:28] Yeah, if the issue is active jobs/workspaces, I don't know. [18:17:32] But I trust you on that :) [18:18:00] i'll set it to 4 executors and keep an eye on it [18:18:31] if it still has problems, i'll just take it out of rotation permanently [18:18:38] bawolff, that'll be one of the integration slave instances rather than the jenkins master luckily [18:18:48] 17:54:22 Building remotely on integration-slave-docker-1026 (stats-T201972.bigmem blubber DebianJessieDocker m4executor) in workspace /srv/jenkins-workspace/workspace/quibble-vendor-mysql-php70-docker [18:18:49] T201972: Add some more m4executor docker slaves for Jenkins - https://phabricator.wikimedia.org/T201972 [18:18:52] RECOVERY - Free space - all mounts on integration-slave-docker-1025 is OK: OK: All targets OK [18:19:13] !log setting integration-slave-docker-1026 executors to 4 to avoid disk space exhaustion due to concurrent builds [18:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:19:31] Krenair: Jenkins be like magic to me, I don't know the different parts :) [18:20:22] maintenance-disconnect-full-disks build 1222 integration-slave-docker-1006: OFFLINE due to disk space [18:20:28] bawolff, hey it gets much worse than that :) [18:20:36] I know very little about Jenkins [18:20:42] RECOVERY - Puppet errors on deployment-certcentral03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:25:13] maintenance-disconnect-full-disks build 1223 integration-slave-docker-1006: OFFLINE due to disk space [18:30:10] maintenance-disconnect-full-disks build 1224 integration-slave-docker-1006: OFFLINE due to disk space [18:30:20] !log started gear_client on contint1001 [18:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:32:31] thcipriani: how hard would it be to modify maintenance-disconnect-full-disks to bring nodes back online? [18:32:55] integration-slave-docker-1006 seems fine now [18:34:59] marxarelli: I was thinking about that this morning [18:35:11] maintenance-disconnect-full-disks build 1225 integration-slave-docker-1006: OFFLINE due to disk space [18:35:41] it's a little tricky since I can't execute scripts on offline machines [18:36:10] i guess what's more troubling, is why it even temporarily used up 12G of the 14G free disk space [18:36:39] during one build ostensibly? [18:40:11] maintenance-disconnect-full-disks build 1226 integration-slave-docker-1006: OFFLINE due to disk space [18:45:13] maintenance-disconnect-full-disks build 1227 integration-slave-docker-1006: OFFLINE due to disk space [18:50:10] maintenance-disconnect-full-disks build 1228 integration-slave-docker-1006: OFFLINE due to disk space [18:55:11] maintenance-disconnect-full-disks build 1229 integration-slave-docker-1006: OFFLINE due to disk space [18:56:57] Project beta-scap-eqiad build #221354: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221354/ [19:00:11] maintenance-disconnect-full-disks build 1230 integration-slave-docker-1006: OFFLINE due to disk space [19:05:10] maintenance-disconnect-full-disks build 1231 integration-slave-docker-1006: OFFLINE due to disk space [19:09:49] Project beta-scap-eqiad build #221355: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221355/ [19:10:11] maintenance-disconnect-full-disks build 1232 integration-slave-docker-1006: OFFLINE due to disk space [19:14:52] PROBLEM - Free space - all mounts on integration-slave-docker-1025 is CRITICAL: CRITICAL: integration.integration-slave-docker-1025.diskspace.root.byte_percentfree (<11.11%) [19:15:11] maintenance-disconnect-full-disks build 1233 integration-slave-docker-1006: OFFLINE due to disk space [19:15:13] !log marked integration-slave-docker-1025 offline (no space), aborted builds manualy [19:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:16:22] for the record, I'm only doing that when I see one my builds actually fail, not related to shinken alert [19:22:13] PROBLEM - SSH on integration-slave-docker-1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:22:36] Project beta-scap-eqiad build #221356: 04STILL FAILING in 11 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221356/ [19:29:52] PROBLEM - SSH on integration-slave-docker-1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:32:05] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Move CI docker storage engine to device mapper - https://phabricator.wikimedia.org/T203841 (10thcipriani) p:05Triage>03Normal [19:33:49] maintenance-disconnect-full-disks build 1236 integration-slave-docker-1006: OFFLINE due to disk space [19:35:13] maintenance-disconnect-full-disks build 1237 integration-slave-docker-1006: OFFLINE due to disk space [19:35:49] Project beta-scap-eqiad build #221357: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221357/ [19:37:08] 19:35:48 sudo: option '--preserve-env' doesn't allow an argument [19:39:12] thcipriani: ^ looks like latest scap commit causes an issue [19:39:15] https://github.com/wikimedia/scap/commit/36d2144d2453d352bf09453291aa0683a60fdbf6 [19:40:11] maintenance-disconnect-full-disks build 1238 integration-slave-docker-1006: OFFLINE due to disk space [19:40:42] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Free up LVM extents for Docker devicemapper on new Jenkins Agents - https://phabricator.wikimedia.org/T203842 (10thcipriani) p:05Triage>03Normal [19:42:01] marxarelli: sorry to nag, ignore if you're on to something, but would like to confirm/clarify what it is we're doing about the disk space issue currently. Is there something we can revert to or change to make it go away like it was before 2-3- weeks ago? [19:42:24] Krinkle: working with thcipriani on that now [19:42:29] we're taskifying [19:42:36] it's very counter productive with almost every commit requiring at least 3 attempts and 2 man hours to merge. Not every build fails, but every commit has at least 1 job that fails, as such making it impossible to do much. [19:42:39] cool :) [19:42:53] it's a mess [19:43:14] but we have a promising path forward [19:44:00] let me know if you need an ear for brainstorming or an extra hand, but otherwise, no need to hold up to explain, don't want to stand in the way :) Thanks! [19:44:17] cool, i appreciate it [19:45:14] maintenance-disconnect-full-disks build 1239 integration-slave-docker-1006: OFFLINE due to disk space [19:45:29] Krinkle: are you still having issues with integration-slave-docker-1026 offline? [19:45:41] Reedy: I'll look at scap in a few minutes, current error is my fault. [19:46:07] marxarelli: not for about 20min. but still quite a few commits pending from last hour. will let you know. [19:47:05] RECOVERY - SSH on integration-slave-docker-1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0) [19:48:54] Project beta-scap-eqiad build #221358: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221358/ [19:49:11] Krinkle: k [19:49:56] (03PS1) 10Umherirrender: Enable seccheck for EUCopyrightCampaign [integration/config] - 10https://gerrit.wikimedia.org/r/458882 [19:53:03] so, the short-term tl;dr is that we're ditching 1025/1026 and provisioning more m1.mediums for the weekend. the long-term tl;dr is that we're going to replace m1.mediums with bigmems or something similar, shrink the logical volume for /srv to use 20G of the larger lvm volume group, and configure docker to use the device mapper storage driver and the free volume group extents on /dev/vda4 [19:53:11] Krinkle: ^ [19:53:12] PROBLEM - SSH on integration-slave-docker-1005 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:57] marxarelli: cool, andthe m1.med are 1executor? [19:54:28] hmm... i think they will be m2executors? [19:54:40] thcipriani: ^ ? [19:55:03] or give them a fancy name [19:55:27] potentially we could reuse the flavor name as the jenkins label [19:55:30] instance-1023 has label m4 and 1 executor [19:55:35] so "bigmem" [19:55:53] the m1.mediums we're provisioning over the short term? are, IIRC, labelled m4executors (for a reason I don't know) but have 1 executor [19:56:04] anyhow, that sounds good. I gather the ones we're making more of short-term are the 1 executor ones we used to have only. [19:56:25] exactly [19:56:32] ah, sorry, m4executor is what i meant [20:00:12] maintenance-disconnect-full-disks build 1242 integration-slave-docker-1006: OFFLINE due to disk space [20:00:47] i suppose we should repool 1006 as well [20:01:25] !log bringing integration-slave-docker-1006 online again since disk space has been reclaimed [20:01:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:02:00] Project beta-scap-eqiad build #221359: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221359/ [20:02:14] ^ scap should recover shortly [20:08:03] RECOVERY - SSH on integration-slave-docker-1005 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0) [20:15:33] Project beta-scap-eqiad build #221360: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221360/ [20:17:47] 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219 [20:18:54] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar) [20:19:12] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar) [20:19:16] 10Project-Admins, 10User-MarcoAurelio: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10MarcoAurelio) a:03MarcoAurelio [20:19:23] 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Remove composer dump-autoload --optimize from mw-fetch-composer-dev.sh and Quibble - https://phabricator.wikimedia.org/T181940 (10hashar) p:05Triage>03Low [20:26:15] 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10MarcoAurelio) I'm not sure, but maybe we could make this work by having Diffusion to observe the github repos, and at the same time have difussion push the changes to gerrit using {K18}?... [20:27:37] 10MediaWiki-Codesniffer: Add sniff to replace !! by explicit boolean cast (bool) - https://phabricator.wikimedia.org/T203800 (10thiemowmde) I support the idea for PHP. From my experience, I can't think of a reason to ever use `!!` in PHP. So as long as it's an auto-fix sniff, go for it. For JavaScript it's a li... [20:29:53] Project beta-scap-eqiad build #221361: 04STILL FAILING in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221361/ [20:31:13] next one, for sure [20:38:35] 10Project-Admins, 10User-MarcoAurelio: Need Phabricator tags for Approved Revs, TinyMCE, VEForAll, MintyDocs extensions - https://phabricator.wikimedia.org/T203833 (10MarcoAurelio) 05Open>03Resolved Created: * #mediawiki-extensions-approved_revs * #mediawiki-extensions-tinymce * #mediawiki-extensions-vefo... [20:38:40] PROBLEM - Host integration-slave-docker-1026 is DOWN: CRITICAL - Host Unreachable (10.68.22.190) [20:42:36] (03PS1) 10Hashar: tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894 [20:42:54] (03CR) 10Hashar: [C: 032] tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar) [20:44:45] RECOVERY - SSH on integration-slave-docker-1009 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0) [20:45:07] (03Merged) 10jenkins-bot: tox-doc publish job for docker-pkg [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar) [20:45:18] maintenance-disconnect-full-disks build 1251 integration-slave-docker-1009 (/: 96%): OFFLINE due to disk space [20:48:18] (03CR) 10Hashar: "Doc at https://doc.wikimedia.org/docker-pkg/" [integration/config] - 10https://gerrit.wikimedia.org/r/458894 (owner: 10Hashar) [20:48:38] thcipriani: https://doc.wikimedia.org/docker-pkg/ :] [20:49:47] fancy! [20:50:11] maintenance-disconnect-full-disks build 1252 integration-slave-docker-1009: OFFLINE due to disk space [20:50:50] PROBLEM - SSH on integration-slave-docker-1009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:51:18] PROBLEM - Free space - all mounts on integration-slave-docker-1009 is CRITICAL: CRITICAL: integration.integration-slave-docker-1009.diskspace.root.byte_percentfree (<100.00%) [20:54:48] hrm, I still can't ssh to that instance :\ [20:55:10] maintenance-disconnect-full-disks build 1253 integration-slave-docker-1009: OFFLINE due to disk space [20:56:14] RECOVERY - Free space - all mounts on integration-slave-docker-1009 is OK: OK: integration.integration-slave-docker-1009.diskspace._srv.byte_percentfree (More than half of the datapoints are undefined) integration.integration-slave-docker-1009.diskspace.root.byte_percentfree (More than half of the datapoints are undefined) [20:59:06] No news is good news? [21:02:40] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) [21:03:28] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) [21:03:38] I found a new ones with Zuul/Gerrit madness https://phabricator.wikimedia.org/T203846 [21:04:02] manually merging a change that is in the gate would cause Gerrit to reject Zuul request to submit the change [21:04:12] thus Zuul considers the change has failed and cancel all jobs behind in the queue [21:04:20] and thus reshuffle all of them :^\ [21:06:00] huh [21:06:36] that is why some changes have been waiting for 1+hour [21:10:04] Hi hashar ^^ I'm not sure if there's an issue but most jobs in the gate-and-submit queue seems to be stuck at +1 hour ? [21:10:27] Hauskatze he is aware :) [21:10:39] I've just logged in [21:10:47] just before you joined he was talking about it :) [21:10:48] Hauskatze: yeah thank you I noticed that as well [21:10:50] the reason is https://phabricator.wikimedia.org/T203846 [21:10:56] some changes got manually merged [21:11:07] * Hauskatze looks [21:11:17] but the change is still in Zuul [21:11:26] when the change jobs are completed, Zuul attempts to submit the change in Gerrit [21:11:29] but Gerrit rejects it [21:11:43] aha [21:11:46] thus Zuul considers the change to have failed to merge and cancel all the jobs behind [21:11:56] and reenqueue all changes, triggering new jobs [21:11:58] hashar i belive to set allowPostSubmit you do it in All-Projects [21:11:59] what about not gluing together the changes and let jenkins test all of them separately? [21:12:03] yea [21:12:04] or where ever the labels are defined [21:12:15] I am reading the doc at https://gerrit.wikimedia.org/r/Documentation/config-labels.html#label_allowPostSubmit [21:12:22] but it is said that it defaults to true :/ [21:19:51] Yippee, build fixed! [21:19:51] Project beta-scap-eqiad build #221362: 09FIXED in 49 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221362/ [21:21:02] 10Phabricator: AphrontCountQueryException on my Profile Page (T75720) - https://phabricator.wikimedia.org/T203849 (10RheingoldRiver) [21:21:40] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) I have tried on https://gerrit.wikimedia.org/r/#/c/test/gerrit-ping/+/458903/ a merged change with Verified+... [21:21:49] oh my [21:23:22] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) [21:51:47] (03PS1) 10Hashar: wmf: ignore submit error on merged change [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/458914 (https://phabricator.wikimedia.org/T203846) [21:53:03] (03CR) 10Hashar: "I will have to try, I am not sure about the error content or whether "'change is merged' in err" will actually work :]" [integration/zuul] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/458914 (https://phabricator.wikimedia.org/T203846) (owner: 10Hashar) [21:55:25] 10Continuous-Integration-Infrastructure, 10Gerrit, 10Release-Engineering-Team, 10Patch-For-Review, 10Zuul: Zuul cancel all changes when a change is manually merged - https://phabricator.wikimedia.org/T203846 (10hashar) I don't think that can be fixed in #Gerrit short of reintroducing force message. It is... [22:00:41] RECOVERY - SSH on integration-slave-docker-1009 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0) [22:07:31] https://gerrit-review.googlesource.com/c/gerrit/+/195431 [22:13:45] 10Phabricator: AphrontCountQueryException on my Profile Page (T75720) - https://phabricator.wikimedia.org/T203849 (10JJMC89) [22:19:09] happy week-end! [22:31:18] 10Release-Engineering-Team (Kanban), 10MediaWiki-General-or-Unknown, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Security, 10User-zeljkofilipin: `npm audit` for mediawiki/core found 24 vulnerabilities - https://phabricator.wikimedia.org/T194280 (10Jdforrester-WMF) a:03Jdforrester-... [22:39:34] 10Gerrit, 10Patch-For-Review: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) 05Open>03Resolved a:03mmodell @mmodell found some suspect messages in phab. Editing permissions for scoring seems to have done the trick! Now: https://github.com/wikim... [23:38:12] PROBLEM - Puppet errors on deployment-certcentral-testclient03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]