[00:22:57] ejegg|away: ED runs on a one hour cron [00:30:45] * paladox wonders how to run goolge-java-format for all files [00:31:17] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:32:41] PROBLEM - Free space - all mounts on integration-slave-jessie-1003 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1003.diskspace._srv.byte_percentfree (<44.44%) [00:41:19] paladox: We should get some kind of jenkins job running for our ops/software/gerrit/gerrit branch [00:41:29] yeh [00:41:38] no_justification we will need to install bazel in ci [00:41:44] Yeah :\ [00:42:21] I doin't think that will be hard [00:42:30] in docker we could do something like [00:43:11] https://blog.bazel.build/2016/01/27/continuous-integration.html makes it sound like docker no bueno? [00:43:25] no_justification something like https://github.com/GerritCodeReview/gerrit-ci-scripts/blob/master/jenkins-docker/slave-bazel/Dockerfile#L3 or we could do [00:43:26] https://gerrit-review.googlesource.com/c/gerrit-ci-scripts/+/157271 [00:43:37] no_justification nope docker is supported we use it upstream [00:44:24] Ah ok [00:44:28] Granted, that post is from 2016 [00:45:15] yep [00:45:23] they publish the debs through github [00:45:24] heh [00:47:38] It would be nice if the tools/ directory in plugins could be a submodule or something [00:47:40] So much copy+paste [00:48:05] no_justification hmm, it's because it has customisations for that plugin [00:48:13] Yeah, 3 lines :\ [00:48:19] yep [00:48:35] ./tools/bzl has no customizations [00:48:42] yeh [00:49:12] ./tools/bazel.rc has no customizations [00:49:29] yeh [00:49:35] i think it's mostly the tests [00:49:43] (the one that adds junit) [00:49:47] It's ./eclipse/{BUILD,project.sh} and ./workspace-status.sh [00:49:53] aha [00:49:56] yeh that one [00:49:56] Otherwise, identical [00:50:58] project.sh could be smart about it :p [00:51:29] yeh [00:51:29] heh [00:52:25] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4045298 (10Paladox) [00:52:27] no_justification ^^ :) [00:53:16] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4045312 (10Paladox) @hashar hi, wondering if you could help us setup this image please for bazel? [00:58:18] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4045322 (10Paladox) This uses alot of ram 2+ gb so will need to be on a docker instance that has alot of ram available. [00:58:46] -echo STABLE_BUILD_WEBHOOKS_LABEL $(rev .) [00:58:46] +echo STABLE_BUILD_$(echo $(basename $(pwd))_LABEL|tr '[a-z]' '[A-Z]' ) $(rev $(pwd)) [00:58:57] paladox: Something like that would work... [00:59:16] ah, you should propose that change upstream :) [00:59:23] (would at least have my +1) :) [01:00:08] We'd want a new repo [01:00:38] As a result: all you need to make a plugin standalone build is to include the submodule and add a WORKSPACE file [01:00:54] ah :) [01:01:32] i think proposals for repo creations get asked here https://groups.google.com/forum/#!forum/repo-discuss [01:02:59] Yeah, I'll write something up [01:04:02] :) [01:10:23] PROBLEM - Host deployment-maps01 is DOWN: CRITICAL - Host Unreachable (10.68.22.229) [01:24:39] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<11.11%) [01:25:26] RECOVERY - Host deployment-maps01 is UP: PING OK - Packet loss = 0%, RTA = 1.58 ms [01:26:08] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review, and 2 others: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#4045341 (10mmodell) [01:26:11] 10Phabricator, 10Release-Engineering-Team (Someday), 10Operations, 10Patch-For-Review: Add support for stretch in the phabricator puppet class - https://phabricator.wikimedia.org/T187127#4045340 (10mmodell) [01:33:33] PROBLEM - Host deployment-maps02 is DOWN: CRITICAL - Host Unreachable (10.68.21.90) [01:35:21] PROBLEM - Host deployment-maps01 is DOWN: CRITICAL - Host Unreachable (10.68.22.96) [01:49:05] 10Gerrit, 10WikimediaUI Style Guide: Setup mirror for gerrit to clone from GitHub for repo design/style-guide - https://phabricator.wikimedia.org/T189370#4045350 (10Krinkle) Sounds good. @demon It's not a mirror, in so far that the updating is explicitly not automatic. We are not configuring Gerrit or GitHub t... [01:55:26] RECOVERY - Host deployment-maps01 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [02:40:19] RECOVERY - Puppet errors on deployment-maps01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:40:29] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [02:47:37] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4045400 (10Paladox) This requires nodejs and python so would need to extend the nodejs and python image :) [03:20:30] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [04:11:25] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<30.00%) [04:47:31] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [10.0] [04:51:24] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<50.00%) [05:12:40] PROBLEM - Free space - all mounts on integration-slave-jessie-1003 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1003.diskspace._srv.byte_percentfree (<33.33%) [05:13:49] 10Project-Admins: Tags for WCAG levels - https://phabricator.wikimedia.org/T189558#4045538 (10tstarling) [06:07:31] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:08:56] 10Phabricator, 10Discourse, 10MediaWiki-extension-requests: Support diagrams on Phabricator and mediawiki.org - https://phabricator.wikimedia.org/T183689#4045571 (10Tgr) Discourse integration would also be nice, so all our technical spaces can use the same diagram markup. Discourse has plugins for [[https://... [06:22:31] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [10.0] [06:59:40] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:09:28] !log integration: cherry picking https://gerrit.wikimedia.org/r/c/412894/ to fix cumin 3.0.1 | T188112 [08:09:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:09:31] T188112: cumin 3.0.1-1 is broken on labs master - https://phabricator.wikimedia.org/T188112 [08:10:29] hashar: ping? [08:10:45] SMalyshev: pong ! :) [08:10:59] hashar: if you have a minute, could you take a look at https://phabricator.wikimedia.org/T160943 ? [08:11:30] and https://gerrit.wikimedia.org/r/c/415769/ [08:11:34] I was told you know how to do it right [08:12:38] PROBLEM - Puppet errors on integration-cumin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [08:13:17] sorry let me finish something [08:13:18] :D [08:13:38] hashar: not urgent at all,just wanted some help, doesn't have to be right now :) [08:13:40] !log integration-cumin: ln -s /usr/local/lib/python3.4 /usr/local/lib/python3 (HACK) | T188112 [08:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:14:00] thanks! [08:14:06] SMalyshev: looking :) [08:15:04] SMalyshev: ah yeah that should be straightforward. I will have a look at it this afternoon [08:15:13] thank you! [08:15:17] SMalyshev: we did the same for the wikimedia/portals :] [08:15:31] yeah I know I copied it from there :) [08:16:40] 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 3 others: Automate WDQS GUI deployment - https://phabricator.wikimedia.org/T160943#4045727 (10hashar) We did something very similar for the Wikimedia portal website ( T180777 ). Stas has a definitely promising patch... [08:17:09] SMalyshev: and I think I have troubles running maven for that repo when in a docker container. But I don't have all the details :D [08:17:20] expect a task from me at some point this week hehe [08:17:31] sure [08:17:37] brb folk ringing at the door [08:22:37] RECOVERY - Puppet errors on integration-cumin is OK: OK: Less than 1.00% above the threshold [0.0] [08:55:01] 10Differential, 10Security-Reviews: Arcanist security review (before being used in WMF deployments) - https://phabricator.wikimedia.org/T613#10132 (10Bawolff) Just following up on old security review tasks. AFAICT, the transition to arc seems pretty stalled in general. Is there still a security review needed... [09:16:14] (03PS1) 10Thiemo Kreuz (WMDE): Simplify overly specific "if ( $fix === true )" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419123 [09:18:32] (03PS1) 10Thiemo Kreuz (WMDE): Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 [09:24:42] (03PS1) 10Thiemo Kreuz (WMDE): Add missing newlines to multi-line if() [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419126 [09:35:14] 10Continuous-Integration-Infrastructure, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load extensions - https://phabricator.wikimedia.org/T189567#4045813 (10hashar) [09:35:49] 10Continuous-Integration-Infrastructure, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load extensions - https://phabricator.wikimedia.org/T189567#4045823 (10hashar) [09:41:27] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load extensions - https://phabricator.wikimedia.org/T189567#4045839 (10hashar) a:03hashar [09:45:05] 10Continuous-Integration-Infrastructure, 10Operations-Software-Development: cumin 3.0.1-1 is broken on labs master - https://phabricator.wikimedia.org/T188112#4045845 (10Volans) I've split the WMCS part into a separate CR that can be merged independently of production: https://gerrit.wikimedia.org/r/c/419131 [09:46:54] !log integration: cherry pick PS3 of https://gerrit.wikimedia.org/r/#/c/412894/ | T188112 [09:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:46:57] T188112: cumin 3.0.1-1 is broken on labs master - https://phabricator.wikimedia.org/T188112 [09:50:47] !log integration: cherry pick https://gerrit.wikimedia.org/r/#/c/419131/1 | T188112 [09:50:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:00:36] 10Continuous-Integration-Infrastructure, 10Operations-Software-Development, 10Patch-For-Review: cumin 3.0.1-1 is broken on labs master - https://phabricator.wikimedia.org/T188112#4045887 (10hashar) [10:07:20] 10Scap, 10Operations, 10Packaging, 10Patch-For-Review: Install git-lfs client (at least on scap targets & masters) - https://phabricator.wikimedia.org/T180628#4045892 (10akosiaris) >>! In T180628#4044173, @mmodell wrote: > @akosiaris: I think it's needed on masters, at least to enable deployers to issue gi... [10:39:49] !log Update cxserver to fd2c4be [10:39:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:42:48] 10Scap, 10Operations, 10Packaging, 10Patch-For-Review: Install git-lfs client (at least on scap targets & masters) - https://phabricator.wikimedia.org/T180628#4045978 (10mmodell) @akosiaris: **tl;dr** I can't think of any reason that we //must// have `git-lfs` on masters, I've only got vague hand-wavy not... [10:46:39] 10Beta-Cluster-Infrastructure, 10Cloud-Services, 10cloud-services-team, 10Privacy, 10Vuln-Infoleak: logstash-beta: stop exposing IP addresses to the public - https://phabricator.wikimedia.org/T189489#4045994 (10MarcoAurelio) As refuted elsewhere, yes it is. But gladly that this is being resolved. [10:54:08] PROBLEM - Host deployment-videoscaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.130) [10:54:52] PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211) [11:21:18] 10Differential, 10Security-Reviews: Arcanist security review (before being used in WMF deployments) - https://phabricator.wikimedia.org/T613#4046094 (10Aklapper) @Bawolff: T119908#4033583 implies that there are no plans to move from Gerrit to Differential (Differential uses Arcanist). I guess this task could b... [11:22:56] 10Differential, 10Security-Reviews: Arcanist security review (before being used in WMF deployments) - https://phabricator.wikimedia.org/T613#4046099 (10Bawolff) 05Open>03declined Ok. Please feel free to reopen if the situation changes. [11:24:07] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Privacy, 10User-MarcoAurelio: Disable the collection of private information on abusefilter log for Beta Cluster wikis - https://phabricator.wikimedia.org/T188862#4046103 (10MarcoAurelio) CheckUser was removed from Beta for the same reasons. Prevent peopl... [11:25:13] 10Project-Admins: Standardize time frame forms in project names - https://phabricator.wikimedia.org/T134134#4046106 (10Aklapper) 05Open>03Resolved a:03Aklapper No replies; hence assuming T134134#2703477 is sufficient [11:26:16] !log Update cxserver to bd2ccfc [11:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:27:04] 10Project-Admins: Tags for WCAG levels - https://phabricator.wikimedia.org/T189558#4046126 (10Aklapper) This reminds me of T136173 but difference is that this task (T189558) actually offers a use case. So it's worth a shot I guess [11:32:04] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: LanguageScreenshotBot uploads files to Commons without a license - https://phabricator.wikimedia.org/T184732#4046153 (10zeljkofilipin) >>! In T184732#4035380, @zeljkofilipin wrote: > There is no way to //change// text of a file via the API?! > From htt... [11:35:40] (03PS1) 10Thiemo Kreuz (WMDE): Skip closing parentheses in "( )" and "[ ]" instead of rechecking [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419149 [11:37:41] PROBLEM - Free space - all mounts on integration-slave-jessie-1003 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1003.diskspace._srv.byte_percentfree (<44.44%) [11:40:44] 10Beta-Cluster-Infrastructure, 10Privacy: Flush private data on Beta Cluster - https://phabricator.wikimedia.org/T189541#4046203 (10MarcoAurelio) Wikimedia (the one who runs Beta) should comply with their own rules: https://wikitech.wikimedia.org/w/index.php?title=Wikitech:Labs_Terms_of_use&oldid=1771474#What_... [11:41:19] 10Release-Engineering-Team, 10DNS, 10Operations, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4046204 (10jhsoby) >>! In T188776#4021634, @Varnent wrote: >>>! In T188776#4021611, @Bawolff wrote: >> That sa... [11:42:12] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: LanguageScreenshotBot uploads files to Commons without a license - https://phabricator.wikimedia.org/T184732#4046208 (10zeljkofilipin) 05Open>03Resolved See [[ https://commons.wikimedia.org/wiki/Category:VisualEditor-de | Category:VisualEditor-de ]... [11:42:18] (03PS1) 10Thiemo Kreuz (WMDE): Check for close parenthesis first and shorten out earlier [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419150 [11:55:38] (03PS1) 10Thiemo Kreuz (WMDE): Skip empty () and [], not processing closing token a second time [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419152 [12:00:29] (03PS2) 10Hashar: parsoidsvc-hhvm-parsertests to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/418984 [12:01:28] (03PS3) 10Hashar: Setting portals JJ to commit to portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/393252 (https://phabricator.wikimedia.org/T180777) (owner: 10Jdrewniak) [12:07:44] (03CR) 10Thiemo Kreuz (WMDE): "Here is a comparison of the profiler data before the 3 patches I uploaded here, and after: https://phabricator.wikimedia.org/F15262520. No" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419152 (owner: 10Thiemo Kreuz (WMDE)) [12:13:19] (03CR) 10Hashar: Setting portals JJ to commit to portals/deploy (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/393252 (https://phabricator.wikimedia.org/T180777) (owner: 10Jdrewniak) [12:13:46] (03PS4) 10Hashar: Setting portals JJ to commit to portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/393252 (https://phabricator.wikimedia.org/T180777) (owner: 10Jdrewniak) [12:17:24] (03CR) 10Hashar: [C: 032] "There was a minor gotcha related to "cd prod". The git hook installation failed since in submodules ".git" is a file :]" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/393252 (https://phabricator.wikimedia.org/T180777) (owner: 10Jdrewniak) [12:17:41] RECOVERY - Free space - all mounts on integration-slave-jessie-1003 is OK: OK: All targets OK [12:19:07] (03Merged) 10jenkins-bot: Setting portals JJ to commit to portals/deploy [integration/config] - 10https://gerrit.wikimedia.org/r/393252 (https://phabricator.wikimedia.org/T180777) (owner: 10Jdrewniak) [12:30:53] PROBLEM - Puppet errors on deployment-ores01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:59:46] "npm WARN install:err-code@1.1.2 ENOSPC: no space left on device, open '/src/node_modules/.staging/err-code-e70e8b58/index.js'" - https://integration.wikimedia.org/ci/job/wikimedia-portals-npm-browser-node-6-docker/38/consoleFull - ooopsy daisy [13:07:10] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Upgrade WebdriverIO to 4.9 - https://phabricator.wikimedia.org/T180144#4046477 (10zeljkofilipin) a:03zeljkofilipin [13:08:47] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Upgrade WebdriverIO to 4.12.0 - https://phabricator.wikimedia.org/T180144#3748284 (10zeljkofilipin) [13:13:58] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10Gilles) [13:18:39] hashar ^^ [13:18:49] I think we should re build it with a bigger storage [13:28:45] rm -Rf * :P [14:05:59] paladox: na we hsould stop running jobs on permanent slaves :D [14:08:02] oh [14:10:04] I am off, going to work outside this afternoon [14:24:09] out of space: https://integration.wikimedia.org/ci/job/php55lint/56890/console ?? [14:24:55] yes [14:25:12] integration-slave-jessie-1002 [14:26:00] it's passing now [14:27:41] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10MarcoAurelio) Apparently too on integration-slave-jessie-1002. [14:34:08] 10Phabricator: Build a bot that pushes Phabricator updates to Google Chat - https://phabricator.wikimedia.org/T189313#4046664 (10Aklapper) Seeing how quickly proprietary projects can decide to disable team productivity features (like [[ https://get.slack.help/hc/en-us/articles/201727913-Connect-to-Slack-over-IRC... [14:53:16] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [14:55:16] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4046742 (10Paladox) p:05Triage>03Unbreak! [14:57:34] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10Marostegui) Just got it too on integration-slave-jessie-1001 [15:07:34] twentyafterfour we can customize favicons now https://secure.phabricator.com/T13103 [15:07:48] (will need to at least do that anyways when we next update) [15:12:43] 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikidata, 10Wikidata-Query-Service, and 3 others: Automate WDQS GUI deployment - https://phabricator.wikimedia.org/T160943#3115759 (10phuedx) @hashar: Thanks for taking a look at the change. Readers Web are also interested in this topic in general as we m... [15:43:14] 10Phabricator: Setting up phabricator for the first results in schema errors - https://phabricator.wikimedia.org/T189601#4046934 (10Paladox) [15:49:30] 10Phabricator: Setting up phabricator for the first results in schema errors - https://phabricator.wikimedia.org/T189601#4046971 (10Aklapper) How did you "test out" (or "set up" according to the task summary?) on phab-stretch.wmflabs.org ? What are steps to reproduce? Where does the upstream Phabricator code to... [15:50:32] 10Phabricator: Setting up phabricator for the first results in schema errors - https://phabricator.wikimedia.org/T189601#4046985 (10Paladox) Ah found it, it was removed here https://github.com/wikimedia/phabricator/commit/f177f92217c21c12118347ac83d41d1f28a29080 the steps to reproduce was just run bin/storage u... [15:55:56] 10Phabricator: Setting up phabricator for the first results in schema errors - https://phabricator.wikimedia.org/T189601#4047020 (10Paladox) [15:57:15] 10Phabricator: Setting up phabricator for the first results in schema errors - https://phabricator.wikimedia.org/T189601#4046934 (10Paladox) [15:58:46] greg-g: I’ve got a new extension to load on the beta cluster, should I make a window for that or just be bold? [15:58:58] It’s passed security review :D [16:05:09] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Technical-Debt: Phaseout CI mediawiki config / extensions_load.txt to load extensions - https://phabricator.wikimedia.org/T189567#4047054 (10Legoktm) > For the context, we used to have Jenkins bui... [16:05:59] (03PS1) 10MarcoAurelio: [test] grant 'submit' to gerrit managers so they can merge their changes on refs/meta/config branches [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/419215 [16:06:51] (03PS2) 10MarcoAurelio: [test] grant 'submit' to gerrit managers [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/419215 [16:07:03] (03CR) 10Paladox: [C: 04-1] "This wont work, this will only work if this group was added as a owner for All-Projects" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/419215 (owner: 10MarcoAurelio) [16:07:50] (03CR) 10MarcoAurelio: "> This wont work, this will only work if this group was added as a" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/419215 (owner: 10MarcoAurelio) [16:08:00] (03Abandoned) 10MarcoAurelio: [test] grant 'submit' to gerrit managers [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/419215 (owner: 10MarcoAurelio) [16:13:21] My wikitech login doesn’t work on logstash-beta.wmflabs.org, what am I missing... [16:18:15] awight: hrm, it didn't used to have a login, but no_justification was looking at doing something about that yesterday. Don't know what the outcome was. [16:18:39] ooh sounds fun. I’ll chill out, in that case. [16:18:40] bd808 put a login on [16:19:11] Are beta cluster logs available on a filesystem we can ssh to? [16:19:18] read the dialog. it tells you how to get the user and password [16:19:21] deployment-fluorine [16:19:23] ty [16:19:53] * bd808 was forced to put the password protection on by people with more organizational clout [16:19:55] bd808: fwiw, I have a generic dialog which doesn’t suggest how to get creds [16:19:57] argh. [16:19:59] ok [16:20:15] I’m using Safari 11.0.3, MacOS [16:20:17] 'https://logstash-beta.wmflabs.org is requesting your username and password. The site says: “Logstash (ssh deployment-tin.eqiad.wmflabs sudo cat /root/secrets.txt)”' [16:21:04] if your web browser does not show the realm description that sounds like a bug to file upstream [16:21:25] works for me, and I like the solution a lot :) [16:21:34] (though I’m on the fence on whether this needed a solution at all) [16:21:43] bd808: https://snag.gy/Y3Atyn.jpg [16:22:15] awight: yeah, that's a crappy client :) [16:22:19] I’ll add password instructions to the beta cluster wiki [16:24:11] https://www.mediawiki.org/w/index.php?title=Beta_Cluster&diff=2735842&oldid=2711427&diffmode=source [16:26:21] no_justification: fyi, I don’t have permission to login to deployment-fluorine. Only mentioning in case this is unexpected. [16:27:02] Project beta-code-update-eqiad build #197646: 04FAILURE in 4 min 0 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/197646/ [16:27:09] awight: I'm out this week, please ask Chad. [16:27:19] o/ have fun! [16:31:18] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:32:44] PROBLEM - Puppet errors on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:33:26] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [16:34:21] Yippee, build fixed! [16:34:21] Project beta-code-update-eqiad build #197647: 09FIXED in 1 min 19 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/197647/ [16:34:59] no_justification: I hear you’re deployment gatekeeper today, so wanted to check in: Extension:JADE passed security review and I’m planning to enable on the beta cluster. Should I reserve a window, or is right about now a decent time to do this? [16:35:28] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:58:26] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [17:02:46] RECOVERY - Puppet errors on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [17:06:17] RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [17:10:25] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [17:13:26] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10fgiunchedi) Would be helpful if we had a cleanup policy for artifacts so operators don't have to manually chase and delete things to recover disk space. [17:16:27] RECOVERY - Free space - all mounts on integration-slave-jessie-1001 is OK: OK: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found) [17:21:04] New error while deploying on beta: 17:20:13 scap failed: IOError [Errno 13] Permission denied: u'/srv/mediawiki-staging/php-master/cache/gitinfo/info-extensions-BlueSpiceUEModulePDF.json' (duration: 00m 50s) [17:24:57] sudo chmod g+w [18:00:38] (03PS1) 10Awight: Add new extension JADE to the branch config [tools/release] - 10https://gerrit.wikimedia.org/r/419242 (https://phabricator.wikimedia.org/T176333) [18:12:56] (03CR) 10Chad: [C: 032] Add new extension JADE to the branch config [tools/release] - 10https://gerrit.wikimedia.org/r/419242 (https://phabricator.wikimedia.org/T176333) (owner: 10Awight) [18:13:37] (03Merged) 10jenkins-bot: Add new extension JADE to the branch config [tools/release] - 10https://gerrit.wikimedia.org/r/419242 (https://phabricator.wikimedia.org/T176333) (owner: 10Awight) [18:15:09] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4047509 (10zeljkofilipin) This might be useful. http://shawnzhu.blogspot.hr/2014/04/feedback-information-from.html [18:16:50] no_justification: On that note... I think I've done everything on https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#How_to_add_a_new_extension_on_beta [18:17:07] Extension:JADE still isn't showing up, do you know if that checklist is correct? [18:17:28] It is :) [18:17:49] I just edited it last week or the week before [18:18:10] * awight looks over glasses nose [18:19:26] Can I look after the train? [18:20:11] no_justification: no rush, I just don't want to leave things broken! [18:22:49] This is all kind of in flux as well :) [18:22:59] in flux? In motion? [18:23:05] Idk, I'm kinda brain dead this morning [18:24:30] Too early! I think everything's stable, I'll stop tinkering until the train goes by. [18:29:18] awight|lunch: Skimmed wmf-config, I see no immediate issues with JADE [18:29:54] wonderful, sorry to interrupt your "sanity" time [18:30:29] After the train, I'll try to debug and will probably knock at your door with eyes full of tears. [18:31:55] 10Beta-Cluster-Infrastructure, 10Privacy: Flush private data on Beta Cluster - https://phabricator.wikimedia.org/T189541#4047535 (10Tgr) All of that section is about user obligations ("you must...") not operator obligations. Anyway it does not seem like there is disagreement about the concrete steps to take. [18:34:35] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 hphp_invoke - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 287 bytes in 0.006 second response time [18:54:29] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:12:58] is there any way to debug what is happening on CI or recreate the environment? I have a test that runs locally and fails weirdly on CI :( [19:18:27] kinda [19:18:30] depends which test [19:18:43] phpunit one [19:18:47] link? [19:19:05] https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/38855/console [19:19:30] maybe hhvm thing though... CI runs on hhvm right? [19:19:41] it says hhvm in the job name ;) [19:19:41] I'll start with installing hhvm and trying with it [19:19:50] ah yes :) who reads thos? [19:19:55] *those [19:20:15] if you comment "check php" on the task it'll run the tests with php5/php7 [19:20:19] so you can see if it's a runtime difference [19:20:28] good idea [19:20:34] didn't know that [19:20:41] thanks! [19:22:02] who needs self documenting things [19:22:19] otherwise I'd suspect there's some configuration setting affecting behavior or some data not being bootstrapped properly [19:23:26] yeah that's a definite possibility I am just not sure which one without local reproduction... it /seems/ straightforward but something is clearly not right [19:48:26] 10Continuous-Integration-Config: mwgate-npm-node-6-docker failed on a patch with an error saying "No space left on device" - https://phabricator.wikimedia.org/T189616#4047745 (10SamanthaNguyen) [19:51:22] wow the docker workspaces are giant [20:09:43] 10Phabricator, 10Release-Engineering-Team (Kanban), 10Phlogiston: Phlogiston reports don't have new data since mid-February - https://phabricator.wikimedia.org/T188149#4047813 (10JAufrecht) What kind of information would be helpful? I could examine the dump file to give more information over what's above.... [20:17:05] (03CR) 10Legoktm: "Awesome :)" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419152 (owner: 10Thiemo Kreuz (WMDE)) [20:17:14] (03PS2) 10Legoktm: Optimize ClassMatchesFilename sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/418000 (owner: 10Thiemo Kreuz (WMDE)) [20:17:32] (03CR) 10Legoktm: [C: 032] "Yeah, you're right." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/418000 (owner: 10Thiemo Kreuz (WMDE)) [20:18:44] (03Merged) 10jenkins-bot: Optimize ClassMatchesFilename sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/418000 (owner: 10Thiemo Kreuz (WMDE)) [20:19:29] (03CR) 10jenkins-bot: Optimize ClassMatchesFilename sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/418000 (owner: 10Thiemo Kreuz (WMDE)) [20:20:33] (03CR) 10Legoktm: [C: 032] "As for why, it's because the upstream codesniffer style uses this format." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419123 (owner: 10Thiemo Kreuz (WMDE)) [20:20:49] (03CR) 10Legoktm: [C: 032] Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 (owner: 10Thiemo Kreuz (WMDE)) [20:21:24] (03Merged) 10jenkins-bot: Simplify overly specific "if ( $fix === true )" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419123 (owner: 10Thiemo Kreuz (WMDE)) [20:21:28] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 out of disk space - https://phabricator.wikimedia.org/T189587#4047891 (10Paladox) [20:21:30] 10Continuous-Integration-Config: mwgate-npm-node-6-docker failed on a patch with an error saying "No space left on device" - https://phabricator.wikimedia.org/T189616#4047893 (10Paladox) [20:21:42] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10Paladox) [20:21:49] (03CR) 10jenkins-bot: Simplify overly specific "if ( $fix === true )" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419123 (owner: 10Thiemo Kreuz (WMDE)) [20:23:53] 10Phabricator: Build a bot that pushes Phabricator updates to Google Chat - https://phabricator.wikimedia.org/T189313#4047905 (10Framawiki) Just a note : according to https://wikitech.wikimedia.org/wiki/Mail, lots of @wikimedia.org mail address are forwarded to Google's servers. So the independence of our moveme... [20:24:01] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4047906 (10Paladox) Or may @legoktm knows? :) [20:32:18] (03PS2) 10Legoktm: Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 (owner: 10Thiemo Kreuz (WMDE)) [20:32:36] (03CR) 10Legoktm: [C: 032] Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 (owner: 10Thiemo Kreuz (WMDE)) [20:33:28] (03Merged) 10jenkins-bot: Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 (owner: 10Thiemo Kreuz (WMDE)) [20:34:14] (03CR) 10jenkins-bot: Remove overly specific "isset() === true" and "empty() === true" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/419124 (owner: 10Thiemo Kreuz (WMDE)) [20:40:26] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4047943 (10Legoktm) The docker slaves we use for phan probably have enough memory for this. Probably a custom docker image makes sense... [20:41:23] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4047947 (10Paladox) @legoktm we can either use the google bazel deb repo or download the deb from GitHub. See https://github.com/Gerr... [20:41:46] 10Phabricator: Build a bot that pushes Phabricator updates to Google Chat - https://phabricator.wikimedia.org/T189313#4047950 (10Legoktm) >>! In T189313#4046664, @Aklapper wrote: > Of course I can't stop anybody from writing whatever code they are interested in in their free time though. :) Right, it just sound... [20:47:45] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Setup CI for operations/software/gerrit/gerrit - https://phabricator.wikimedia.org/T189549#4047975 (10Legoktm) We would need ops to import that to apt.wikimedia.org then. [21:04:57] 10Beta-Cluster-Infrastructure, 10Privacy: Flush private data on Beta Cluster - https://phabricator.wikimedia.org/T189541#4048022 (10MarcoAurelio) Except that in this case host and operator are one and the same: the Wikimedia Foundation. [21:08:22] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:08:33] legoktm: any chances you could rm -Rf what may be needed in T189587 ? [21:08:34] T189587: integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space - https://phabricator.wikimedia.org/T189587 [21:10:40] /srv/jenkins-workspace/workspace$ sudo rm -rf * [21:10:48] seems that's the only needed [21:25:35] PROBLEM - Free space - all mounts on integration-slave-docker-1005 is CRITICAL: CRITICAL: integration.integration-slave-docker-1005.diskspace.root.byte_percentfree (<20.00%) [21:30:00] no_justification a simple javascript fix now turns into a whole throw a 404 if the page does not exist lol [21:30:09] ref https://gerrit-review.googlesource.com/c/gerrit/+/160132/ [21:30:17] ejegg: AndyRussG: Is https://phabricator.wikimedia.org/T176334 waiting for a review from perf/me, or something else? [21:31:16] Oh, good question, lemme see [21:31:28] The ref'ed commit(s) are merged. [21:31:32] So not entirely sure [21:32:06] maybe it's just the usage in-banner that needs scrutiny? [21:38:17] ejegg: Krinkle yes, it's the last bit of code suggested in a comment, that I was hoping another pair of eyes would check over :) [21:39:09] Krinkle: K just saw your comment there! I'll check it out in a bit, sorry I missed that [21:40:07] AndyRussG: K. I'll assume this ticket "done" for from my POV for the moment, but ping me if/when you'd like another look over, no worries. [21:40:53] Krinkle: sounds great, thanks much :) [21:44:23] rel-engers: getting npm ERR! tar.unpack untar error on integration-slave-docker-1002 , is that another symptom of low disk space? [21:44:34] https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-docker/28424/console [21:45:15] ejegg hi yes [21:45:19] it's due to out of space [21:45:34] ejegg see https://phabricator.wikimedia.org/T189587 [21:46:13] thx, anything I can do to help? [21:46:31] ejegg unless you can ssh in there [21:46:37] and rm -rf * in the workspace [21:46:43] let's see... [21:46:51] DELETE ALL THE THINGS [21:47:01] !log maurelio@deployment-tin:~$ foreachwiki extensions/AbuseFilter/maintenance/purgeOldLogIPData.php [21:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:47:50] I think it's via restricted-bastion@wmflabs [21:48:22] heh, looks like the android app test env takes almost 5G [21:48:36] -> /srv/jenkins-workspace/workspace$ sudo rm -rf [21:49:02] weird, but why is the disk only allocated 21G total? [21:49:39] poof! [21:49:57] ok Hauskatze, 1002 is cleaned up [21:50:03] is gerrit in prod really running with all config etc. from a repository right now? - no, not all [21:50:06] partially though [21:50:17] some account data is stored in the repo All-Users [21:51:40] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space - https://phabricator.wikimedia.org/T189587#4046494 (10Ejegg) The biggest dir in /srv/jenkins-workspace/workspace took almost 5G (apps-android-wikipedia-test), while the whole /srv disk is o... [21:53:44] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space - https://phabricator.wikimedia.org/T189587#4048123 (10Ejegg) OK, jessie-1002 and jessie-1001 are cleaned up. Was getting some untar errors on docker-1002 too, so I'll see if I can clean tha... [21:54:07] ejegg: wow, I didn't know you could do that :) [21:54:29] can we re-+2 https://gerrit.wikimedia.org/r/#/c/419184/ and see if it works again? [21:56:35] Hauskatze: ah, I only got the jessie-1001 and -1002, doing the docker-100x ones new [21:56:38] *now [21:56:41] Krenair you can continue the disccusion we had in the other channel here if you want? [21:57:22] legoktm: I'd like to add some color codes into the patch coverage output, would you accept a patch for that? E.g. marking the line that's lower in "New" in red, to make it easier to find (and/or maybe adding a third "Diff" column) [21:57:59] Currently takes it a bit of dense parsing to figure out which one regressed in eg. https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patch/1066/console [22:03:23] Krinkle: can you re+2 https://gerrit.wikimedia.org/r/#/c/419184/ please? [22:03:32] (got stuck on no disk space) [22:03:57] it was +2'ed already [22:06:15] 10Continuous-Integration-Infrastructure: integration-slave-jessie-1001 and integration-slave-jessie-1002 out of disk space - https://phabricator.wikimedia.org/T189587#4048132 (10Ejegg) docker-1002 now also cleaned up, but also had the 5G android test dir on a 21G disk. Seems like the out-of-space errors kicked i... [22:23:34] 10Beta-Cluster-Infrastructure: deployment-tin: scripts taking a whole lot more of time to complete - https://phabricator.wikimedia.org/T189631#4048186 (10MarcoAurelio) [22:45:49] mutante: if -devtools is deprecated, issues with Phabricator and such should be reported here instead? [22:46:31] Hauskatze: afaict, yes [22:58:10] (03CR) 10BearND: [C: 04-1] "you could also tag this patch with T177896" [integration/config] - 10https://gerrit.wikimedia.org/r/415886 (owner: 10Mholloway) [23:07:21] (03PS3) 10Mholloway: Add mobileapps-diff-test periodic job [integration/config] - 10https://gerrit.wikimedia.org/r/415886 (https://phabricator.wikimedia.org/T177896) [23:08:06] 10Beta-Cluster-Infrastructure: various .beta.wmflabs.org domains use an invalid ssl certificate - https://phabricator.wikimedia.org/T182927#4048323 (10Krenair) https://community.letsencrypt.org/t/acme-v2-and-wildcard-certificate-support-is-live/55579 We'll want to have a look at how much needs to change in acme_... [23:14:27] (03CR) 10BearND: [C: 031] Add mobileapps-diff-test periodic job (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/415886 (https://phabricator.wikimedia.org/T177896) (owner: 10Mholloway) [23:30:06] thank you Krinkle [23:37:29] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [23:47:01] legoktm: Hm.. latest clover-diff shows it already supports colors, darn [23:47:15] legoktm: I guess it's not enabling it for Jenkins?