[00:08:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [00:17:07] Project selenium-Flow » chrome,beta,Linux,BrowserTests build #517: 04FAILURE in 1 min 6 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/517/ [00:17:15] Project selenium-Flow » firefox,beta,Linux,BrowserTests build #517: 04FAILURE in 1 min 14 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/517/ [01:13:17] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [04:44:19] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [04:47:52] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:37:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [06:10:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:37:02] 10Release-Engineering-Team (Next), 10Release Pipeline: Find CI container build location - https://phabricator.wikimedia.org/T173128#3613811 (10MoritzMuehlenhoff) [07:41:34] 10Deployment-Systems, 10Wikimedia-Incident: App servers get into bad states when coming back online/are newly provisioned due to puppet/salt craziness - https://phabricator.wikimedia.org/T68050#3613821 (10MoritzMuehlenhoff) 05Open>03declined Salt is being removed. [07:41:47] 10Deployment-Systems, 10Operations, 10Beta-Cluster-reproducible, 10Patch-For-Review, 10Puppet: grain-ensure erroneous mismatch with (bool)True vs (str)true - https://phabricator.wikimedia.org/T146914#3613824 (10MoritzMuehlenhoff) 05Open>03declined Salt is being removed. [07:45:16] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [07:54:42] (03CR) 10Hashar: [C: 032] "In Gerrit I have updated the description of a few repos based on your links. Thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/378518 (https://phabricator.wikimedia.org/T154220) (owner: 10Umherirrender) [07:56:03] (03CR) 10Hashar: [C: 032] Archive Extension:XMLContentExtension [integration/config] - 10https://gerrit.wikimedia.org/r/378495 (https://phabricator.wikimedia.org/T148825) (owner: 10MarcoAurelio) [07:56:19] (03Merged) 10jenkins-bot: Remove some archived extensions [integration/config] - 10https://gerrit.wikimedia.org/r/378518 (https://phabricator.wikimedia.org/T154220) (owner: 10Umherirrender) [07:57:04] (03Merged) 10jenkins-bot: Archive Extension:XMLContentExtension [integration/config] - 10https://gerrit.wikimedia.org/r/378495 (https://phabricator.wikimedia.org/T148825) (owner: 10MarcoAurelio) [08:04:47] (03CR) 10Hashar: [C: 04-1] ":-)" (035 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/378530 (owner: 10Addshore) [08:26:23] (03CR) 10Hashar: [C: 04-1] "That will surely ease people life :]" (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/375834 (owner: 10Addshore) [08:31:23] (03CR) 10Hashar: [C: 04-1] docker: zuul-cloner image (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375834 (owner: 10Addshore) [08:32:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:32:50] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:33:17] (03CR) 10Hashar: [C: 04-1] docker: composer image (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/378531 (owner: 10Addshore) [08:34:23] (03CR) 10Hashar: "Is that needed for CI? Seems to me it is easier to just install git whenever we need it." [integration/config] - 10https://gerrit.wikimedia.org/r/378534 (owner: 10Addshore) [08:34:48] (03CR) 10Hashar: [C: 032] dib: jsduck is now in operations/puppet [integration/config] - 10https://gerrit.wikimedia.org/r/378341 (https://phabricator.wikimedia.org/T175764) (owner: 10Hashar) [08:37:50] (03Merged) 10jenkins-bot: dib: jsduck is now in operations/puppet [integration/config] - 10https://gerrit.wikimedia.org/r/378341 (https://phabricator.wikimedia.org/T175764) (owner: 10Hashar) [08:38:26] (03PS8) 10Hashar: Add missing unit test, npm jobs and make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/376761 (owner: 10Umherirrender) [08:40:49] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:41:03] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:45:06] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review, 10Zuul: Migrate zuul-server behind systemd service - https://phabricator.wikimedia.org/T167845#3614093 (10hashar) 05Resolved>03Open contint-admins can not interact with the `zu... [08:47:00] PROBLEM - Free space - all mounts on integration-slave-jessie-1001 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1001.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1001.diskspace._srv.byte_percentfree (<100.00%) [08:54:03] (03PS1) 10Hashar: fab: reload zuul via systemd [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) [09:05:29] (03CR) 10Hashar: [C: 032] Add missing unit test, npm jobs and make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/376761 (owner: 10Umherirrender) [09:07:00] (03Merged) 10jenkins-bot: Add missing unit test, npm jobs and make tests voting [integration/config] - 10https://gerrit.wikimedia.org/r/376761 (owner: 10Umherirrender) [09:24:01] (03PS2) 10Hashar: fab: reload zuul via systemd [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) [09:24:04] (03PS1) 10Hashar: fab: git gc zuul repo on the servers [integration/config] - 10https://gerrit.wikimedia.org/r/378667 [09:36:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:48:09] hashar: for when you have a minute, can you make mediawiki/extensions/DataTypes ? https://phabricator.wikimedia.org/T127292 [09:48:23] That's one of the last steps to kill the build step [09:54:04] Morning hashar [09:55:01] hashar: thanks for all the comments, I'll read them today at some point! :AD [10:31:39] 10Gerrit, 10Wikidata, 10User-Ladsgroup, 10Wikidata-Sprint-2016-03-01, 10Wikidata-Sprint-2016-04-12: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#2038409 (10hashar) On behalf of @Ladsgroup , I have created https://gerrit.wikimedia.org/r/#/admin/project... [10:31:45] Amir1: https://phabricator.wikimedia.org/T127292#3614340 done [10:31:56] Amir1: still have to add it to mediawiki/extensions and ci though :D [10:32:11] hashar: Thank you! [10:32:18] Great like always [10:32:26] I will make the patch for that [10:32:50] :) [10:39:52] (03PS1) 10AnotherLadsgroup: Add extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) [10:43:09] (03CR) 10Paladox: [C: 031] fab: reload zuul via systemd [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [10:46:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [10:50:05] hashar: is this enough? https://gerrit.wikimedia.org/r/#/c/378672/ [10:50:52] (03CR) 10Paladox: [C: 031] Add extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [10:54:57] hashar sorry about you cannot reload zuul. I presumed all contint members had full sudo. [10:55:05] paladox: no worries :) [10:55:20] ok thanks :). [11:07:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:11:24] 10Release-Engineering-Team, 10Cleanup, 10Repository-Admins, 10Patch-For-Review, 10User-MarcoAurelio: Deprecate unmaintained/inactive XMLContentExtension - https://phabricator.wikimedia.org/T148825#3614416 (10MarcoAurelio) @demon (or anyone with access): Can you handle the GitHub mirror deletion, please:... [11:47:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [12:22:54] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #529: 04FAILURE in 53 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/529/ [12:41:14] hashar: what's holding https://gerrit.wikimedia.org/r/#/c/375765/ ? [12:47:00] no_justification: got a minute to delete a GitHub mirror? [12:54:36] hashar: any reason why deployment-memc07 can't be reached from tin but deployment-memc06 can? [12:54:54] tabbycat: I dont know [12:55:21] AaronSchulz: I have no clue. Maybe memcached is not enabled or the instance is missing a security rule compared to memc06 ? -can be checked in horizons) [12:55:48] it's running and I can telnet on the server itself [13:20:12] 10Beta-Cluster-Infrastructure: IP Address Lookup Tool Installation - https://phabricator.wikimedia.org/T176074#3614686 (10Sau226) @Aklapper AbuseFilter can't look up IPs and that I didn't mean throttling. I mean a tool which can return the IP address or a tickbox for global blocking. After all if we can block th... [13:33:26] 10Release-Engineering-Team (Kanban), 10Reading-Admin, 10Release, 10Train Deployments, 10User-notice: MW-1.30.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T163512#3614712 (10TheDJ) [13:44:14] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #526: 04FAILURE in 13 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/526/ [14:22:28] 10Beta-Cluster-Infrastructure: Requesting Global Rights - https://phabricator.wikimedia.org/T176140#3614785 (10Anooprao) [14:26:59] 10Beta-Cluster-Infrastructure: Requesting Global Rights - https://phabricator.wikimedia.org/T176140#3614818 (10Anooprao) [14:31:27] 10Beta-Cluster-Infrastructure: Requesting Global Rights - https://phabricator.wikimedia.org/T176140#3614839 (10Reedy) [14:33:40] Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #520: 04FAILURE in 1 min 39 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/520/ [15:08:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:19:16] (03CR) 10Hashar: [C: 04-1] Add extensions/DataTypes (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [15:23:29] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #561: 04FAILURE in 1 min 28 sec: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/561/ [15:23:56] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #561: 04FAILURE in 1 min 55 sec: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/561/ [15:43:16] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [15:51:26] 10Beta-Cluster-Infrastructure: Requesting Global Rights - https://phabricator.wikimedia.org/T176140#3614785 (10Aklapper) @Anooprao: Please do not remove me as a task subscriber. (It is up to me which tasks I subscribe to.) Thanks. [15:54:26] 10Beta-Cluster-Infrastructure: Requesting Global Rights - https://phabricator.wikimedia.org/T176140#3615115 (10Anooprao) OK thanks for letting me know. Since it's beta related so i taught not to bother you & I taught you are a BOT [16:13:15] mobile apps clogging up the tubs [16:28:13] Zuul/Nodepool backlog increasing steadily over last 2 hours. [16:28:33] wait time starting to exceed 1.5h [16:28:45] Like I say.. loads of mobileapps stuff appeared [16:29:28] wow no kidding https://graphite.wikimedia.org/render/?areaMode=stacked&height=400&width=800&target=alias(color(zuul.geard.queue.running,%27blue%27),%27Running%27)&target=alias(color(zuul.geard.queue.waiting,%27red%27),%27Waiting%27)&title=Gearman%20job%20queue%20&from=-48h [16:29:58] Handful of wikibase appeared too [16:30:25] 10Deployment-Systems, 10Release-Engineering-Team (Kanban): Automate the recurring management of wikitech:Deployments and phab:#train_deployments - https://phabricator.wikimedia.org/T114488#3615249 (10mmodell) Status update: This should finally get some more attention this week. [16:30:43] thcipriani: ^ fyi [16:30:58] Reedy: Yeah, lots of mobileapps waiting, but not executing. [16:31:07] Not sure is actually filling up executes at this point. [16:31:11] It seems to be doing a whole lot of nothing. [16:31:28] Numerous patches are going through though [16:31:32] jobs are running and executing [16:31:33] hm.. okay [16:31:35] Anything I can help to reduce that? I only uploaded a series of 5 patches for mobileapps once today, not that many times I see in Zuul. [16:31:35] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3615250 (10demon) p:05High>03Low [16:31:45] Concurrency seems about 10 jobs in Jenkins [16:31:52] Max is twice that [16:31:58] ci-jessie is busy https://integration.wikimedia.org/ci/ [16:32:11] 10MediaWiki-Releasing, 10Release-Engineering-Team (Kanban), 10MW-1.29-release-notes, 10Patch-For-Review: Include release extensions/skins/vendor as submodules of core - https://phabricator.wikimedia.org/T137564#3615252 (10demon) p:05Lowest>03High [16:32:20] bearND: Avoid adding any more for the moment might be useful [16:32:25] Reedy: Yea, but /ci/ only shows jessie slaves once Zuul assigns a job to it (they're on-demand) [16:32:28] Just wait for it to catch up [16:32:43] There are 10/10 there, but that's always N/N. Zuul/Nodepool has 25 or so. [16:32:46] It's not using them it seems [16:34:15] Reedy: Understood. But I can't wait forever. Fortunately I reordered the patches so that next time I do an update it would be only to the last patch. [16:34:31] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint 2017-Q2), 10WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3615255 (10mmodell) Planning to resolve this during the upcoming quarter. [16:34:35] bearND: I didn't say stop working forever :P [16:35:20] npm-node-6-jessie queued for 59 mins [16:35:26] :P [16:35:47] yeah, lots of dependant patch sets going through. Nodepool seems to be cranking along, afaict. [16:36:12] Not ideal... But it is doing stuff at least [16:36:53] 10Release-Engineering-Team (Kanban), 10Phabricator: Custom task form for #WMF-CTO-Team-Backlog - https://phabricator.wikimedia.org/T175869#3615258 (10mmodell) 05Open>03Resolved @ksmith: I think this one is resolved? Please reopen if you need any changes or further clarification. [16:36:55] yeah :( [16:38:29] 10Release-Engineering-Team (Kanban), 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#1627966 (10mmodell) 05stalled>03Open [16:41:11] it's chugging along :) [16:44:23] 10Release-Engineering-Team (Kanban), 10Cleanup, 10Repository-Admins, 10Patch-For-Review, 10User-MarcoAurelio: Deprecate unmaintained/inactive XMLContentExtension - https://phabricator.wikimedia.org/T148825#3615291 (10greg) [16:46:40] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown: Make sure extensions using composer/npm for development dependencies have the right .gitignore rules - https://phabricator.wikimedia.org/T116434#3615301 (10greg) [16:47:44] 10Deployment-Systems, 10Release-Engineering-Team (Next), 10Scap (Tech Debt Sprint 2017-Q2): Add jobrunners to Scap canary process - https://phabricator.wikimedia.org/T172480#3615304 (10greg) p:05Triage>03Low [16:48:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Nodepool: Investigate nodepool slow deletion - https://phabricator.wikimedia.org/T172229#3615310 (10greg) 05Open>03declined [16:50:04] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Set scap.cfg's canary_dashboard_url to useful beta logstash url - https://phabricator.wikimedia.org/T168211#3615314 (10greg) a:03thcipriani [16:51:36] 10Release-Engineering-Team (Next), 10Wikimedia-General-or-Unknown: Work out how to (mass) deploy trivial mediawiki-config changes - https://phabricator.wikimedia.org/T168326#3615317 (10greg) ftr, regarding the backlog of mw-config gerrit changes, @demon recently went through it and abandoned really old untouch... [16:53:10] !log removed https://gerrit.wikimedia.org/r/#/c/377753/ from the git cherry-picks in operations/puppet on puppetmaster02 [16:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:56:10] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Nodepool: Change time at which Nodepool refresh the images - https://phabricator.wikimedia.org/T166889#3615349 (10greg) 05Open>03declined [16:58:08] 10Release-Engineering-Team (Kanban), 10Cleanup, 10Repository-Admins, 10Patch-For-Review, 10User-MarcoAurelio: Deprecate unmaintained/inactive XMLContentExtension - https://phabricator.wikimedia.org/T148825#3615365 (10demon) Done. [17:00:09] 10Release-Engineering-Team (Kanban), 10Operations, 10Release Pipeline, 10Patch-For-Review: Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3615375 (10akosiaris) >>! In T175293#3611511, @thcipriani wrote: >>>! In T175293#3599497, @hashar wrote: >> Potentially the require... [17:03:54] howdy, is this the right place to ask about linking a github profile to the wikimedia/puppet repo setup on github? [17:10:48] (03PS1) 10Krinkle: zuul: Don't toggle panel when clicking Gerrit patch link [integration/docroot] - 10https://gerrit.wikimedia.org/r/378743 [17:13:07] (03PS2) 10Krinkle: zuul: Don't toggle panel when clicking Gerrit patch link [integration/docroot] - 10https://gerrit.wikimedia.org/r/378743 [17:14:29] herron: Worth trying :) [17:14:34] herron: What would you like to do? [17:15:38] In order to select wikimedia/puppet on your GitHub profile, via "Customize your pinned repositories" you need to have contributed to it in a way GitHub can detect. [17:15:53] E.g. be on https://github.com/wikimedia/puppet/graphs/contributors [17:16:09] This basically just requires that the e-mail address used for your git patches in Gerrit is also linked to your GitHub account [17:16:43] It's okay to be a secondary and not public e-mail address [17:16:50] ah! cool, that answers my question! was wondering how the two connect [17:17:02] and you don't even have to actually verify it [17:17:10] At that point https://github.com/wikimedia/puppet/commit/c44361bd83131f5a523999e76c9490efb602a3c3 will also be linked to your account [17:17:16] with avatar etc. [17:17:29] It actually shows a nice (?) bubble there on that page [17:17:36] although that's not where you'd normally look for that information [17:17:48] Reedy: Hehe yeah, assuming nobody else tries to add it unconfirmed. [17:17:53] I did that with the SVN addy [17:17:57] yeah, ditto [17:18:10] ha indeed I have not yet verified and there it is working [17:18:20] sweet thanks! [17:22:18] RECOVERY - Puppet staleness on deployment-kafka-jumbo-1 is OK: OK: Less than 1.00% above the threshold [3600.0] [17:23:52] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:27:11] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3615520 (10thcipriani) [17:27:16] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10Mathoid, 10Release Pipeline: Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions - https://phabricator.wikimedia.org/T170482#3615516 (10thcipriani) 05Open>03Invalid Closing this task... [17:28:22] Krinkle: hiya, have you had a chance to catch up with my rambling about jobrunner deployment on: https://phabricator.wikimedia.org/T129148#3582888 ? [17:49:36] (03CR) 10Dzahn: [C: 031] "sudo rule has been added" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [17:51:53] thcipriani: So `scap deploy -v --service-restart` issues a restart without deploying code, is that right? [17:52:06] and `scap deploy -v` will deploy and auto-restart as needed? [17:53:04] Krinkle: yep, that's correct. Both those commands need to be issued from /srv/deployment/jobrunner/jobrunner on the deployment server [17:54:42] thcipriani: OK. So it's ready in a state to try it out with a deployment? [17:55:18] Krinkle: yes, as far as scap and service masks are concerned, you should be able to run a deployment without issue. [17:55:21] anybody knowns why I am getting this when deploying: 17:42:27 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'wdqs/wdqs', '-g', 'canary', 'fetch', '--refresh-config'] on wdqs1003.eqiad.wmnet returned [70]: Could not chdir to home directory /var/lib/deploy-service: No such file or directory [17:55:29] has anything changed in scap recently? [17:56:19] SMalyshev: hrm, I wonder if the home directory changed for the scap user ^ twentyafterfour did that patch merge? [17:56:42] thcipriani: What about https://gerrit.wikimedia.org/r/#/c/360856/ and https://gerrit.wikimedia.org/r/#/c/376159/ [17:57:22] yes, i did [17:57:36] https://gerrit.wikimedia.org/r/#/c/365891/ [17:57:53] Krinkle: https://gerrit.wikimedia.org/r/#/c/376159/2 will need to merge to update the scap config. It contains the other gerrit patch which can be abandoned. [17:58:43] thcipriani: so what needs to be done to make scap deploy work? is there some puppet part missing? [17:58:50] mutante: hrm, but that patch has managehome => true so you would think it would have created /var/lib/deploy-service on wdqs1003 [17:59:51] thcipriani: Aye, so it's not yet ready to go - that patch needs to be pulled in first [17:59:59] SMalyshev: from the looks of the error message wdqs1003 is missing the /var/lib/deploy-service directory which is now set as the deploy-service home directory. I imagine creating that directory and ensuring deploy-service has x access on it should make scap work. [18:00:09] yes, indeed [18:00:11] reading docs [18:00:23] thcipriani: I'm landing it now. Will to a simple no-op deploy later today (2-3 hours) [18:00:40] gehel: could you create that dir? I don't have permissions... [18:00:41] Krinkle: right, sorry, missed that step. Merge and pull that down to tin and then scap should do the right thing. Thank you! [18:01:02] SMalyshev: I'm checking the puppet code first, but will do ... [18:01:11] thcipriani: yes, it clearly says it should create that dir.. hmmmmm [18:01:31] let's fix it in puppet right away and not even do manual things? [18:01:42] that would be ideal [18:02:44] the creation of the user in scap::target is guarded by a "if defined()", checking if that user is also declared somewhere else... [18:03:24] hrm, possible, but deploy-service were defined elsewhere I wouldn't think the home directory would be set to /var/lib/deploy-service... [18:03:32] also greping [18:03:53] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:06] yeah, I have no idea, but since we do have managehome => true, there is probably something more... [18:04:56] i was going to say , let's just add a file{} in wdqs module init.pp [18:05:02] that creates the dir [18:05:10] with a comment.. at least better than manual mkdir [18:05:31] the problem is probably more generic than that, so let's spend at least a few minutes to see if we can get to the bottom of it... [18:06:14] if we don't find anything, we can add that dir in puppet or manually (I don't think adding it in puppet without understanding why it's not there is much better than creating manually) [18:06:52] puppet docs are clearly "will create the home directory when ensure => present" but .. it does not [18:07:22] 10Release-Engineering-Team, 10Page-Previews, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10User-zeljkofilipin: Provide a reliable test environment that mimics production for running integration tests - https://phabricator.wikimedia.org/T174786#3615676 (10Krinkle) [18:07:28] i think it's still much better in puppet because there will be the next server soon where the same thing happens? [18:07:30] * gehel is going to blame our code before bplaming puppet on that one... [18:09:11] so, the user exists, it is in /etc/passwd WITH that homedir, it even has bash as shell [18:10:07] the user does not exist on any of the wdqs nodes [18:10:15] deploy-service:x:998:997::/var/lib/deploy-service:/bin/bash [18:10:38] oh [18:11:21] it's looking for /var/lib/deploy-service and dir doesnt exist, but user is there [18:12:00] yeah I see it on all servers [18:12:39] none has the var/lib/deploy-service dir though [18:12:45] I kind of wonder if the useradd provider creates the directory if the user already exists and the directory is changed: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/provider/user/useradd.rb#L126-L134 [18:13:34] i was wondering about the "-" in it, but /etc/adduser.conf says #NAME_REGEX="^[a-z][-a-z0-9_]*\$" [18:14:07] probably what you said, it has to do with changing existing user [18:14:23] i would think it works if the user is deleted and puppet creates it again [18:14:37] yeah, for a service user, we should probably manage the dir explicitly in scap::target [18:14:40] patch coming up [18:15:04] :) [18:16:23] so I tested this with puppet apply locally, changing the home of a user that already exists on the system, even with managehome => true, doesn't seem to create the directory... [18:17:24] https://gerrit.wikimedia.org/r/#/c/378757/ [18:17:24] thcipriani: if you get some spare cycle. Can you try to reload the zuul server with https://gerrit.wikimedia.org/r/#/c/378665/ applied please ? :) [18:17:31] and then if you just "deluser" the user and run puppet again? [18:17:34] then it creates it fine? [18:17:44] thcipriani: we switched to systemd and had to tweak the sudo rule [18:18:05] (03CR) 10Hashar: [C: 031] "I will let a non-root verify and CR+2 this change :]" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [18:18:09] mutante: thank you for the sudo fix :) [18:18:27] hasharAway: oh, of course, logical follow-up , imentioned it in ops meeting [18:18:50] :] [18:18:54] i think we should probably use "systemctl" for all of that, but i will suggest that later [18:20:32] I merely copy pasted from the other lines [18:20:45] seems service is supposed to be the canonical one to use but how knows really [18:21:09] gehel: not sure if one of those file/user should require the other? [18:21:21] other than that and my comment on the patch: lgtm [18:21:23] nope, they should be autorequired [18:21:27] hashar: yes, consistency first :) [18:21:28] ah, cool [18:21:47] https://docs.puppet.com/puppet/latest/types/file.html#file-description [18:23:00] thcipriani: I'm going to run a puppet compiler just in case, but are you OK with me merging that puppet patch? It probably touches a LOT of systems... [18:23:06] always forget about the autorequires [18:23:10] it looks trivial though... [18:23:47] gehel: yes, as long as puppet compiler is ok, it should be ok. [18:23:51] looks good to me, and it's a fix :) [18:24:04] thcipriani: do you have a few nodes that I should check beside wdqs? [18:24:38] you could try cobalt.wikimedia.org - gerrit uses scap now [18:24:49] ok, will do [18:25:20] sca nodes probably make the most use of scap for multiple things [18:26:59] gehel: why checking for /var/lib/deploy-user is under if? [18:27:08] interesting, cobalt does not have any change... [18:27:50] eh, maybe i was mistaken about the conversion to scap deploy for gerrit [18:27:53] it uses gerrit2 [18:27:55] but it was definitely under way [18:28:12] SMalyshev: seems more coherent, since the user and its directory should be managed together. It would probably be even better to find a way to not have an "if defined()"... but that's hard [18:28:16] hrm, does it manage the user separately [18:28:30] it is created by puppet and the deb package. [18:28:46] gehel: but we do have the user defined, so on existing config it won't create the dir? [18:28:50] paladox: ah right, so it wouldn't have changes because it doesn't use scap::target to manage the user [18:29:03] It uses scap::target [18:29:23] no, defined() does not mean that the user already exist, it means that the user is already managed in some other puppet code. [18:29:30] but https://github.com/wikimedia/puppet/blob/production/modules/gerrit/manifests/jetty.pp#L68 [18:29:33] the package added the user in the past, so puppet thinks nothing to do [18:29:53] https://github.com/wikimedia/puppet/blob/production/modules/gerrit/manifests/jetty.pp#L70 [18:29:56] yeah [18:30:01] which is the case if we have multiple scap::targets with the same deploy-user (which is probably a use case that exists) [18:30:25] ok, puppet compiler is happy, I'm merging [18:30:42] though any new systems that setup gerrit without the package, will have the user created with https://github.com/wikimedia/puppet/blob/production/modules/gerrit/manifests/jetty.pp#L17 , otherwise if the package is installed as a deb then the package will do it, if we doint apply puppet before it. [18:32:36] ^ good time to recreate a labs VM to see that happen :) [18:32:50] SMalyshev: directory created on all wdqs nodes (via puppet) [18:33:43] gehel: thank you! [18:33:59] coolio! [18:34:01] thcipriani: happy to help! [18:34:07] mutante it seems to work, as recreating gerrit-test3 the other day worked. Without failing with users. [18:34:15] * gehel is back to his holiday. ping me as needed! [18:34:16] let's see if that hepls deploy [18:35:00] paladox: :) [18:35:05] thanks gehel [18:35:11] enjoy holiday [18:35:24] :) [18:45:00] !log reloading zuul to test https://gerrit.wikimedia.org/r/#/c/378665/2 [18:45:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:45:08] 10Gerrit, 10DBA: Switch to mariadb java connector once we upgrade to gerrit 2.14 - https://phabricator.wikimedia.org/T176164#3615775 (10Paladox) [18:45:45] hashar: > Fatal error: sudo() received nonzero return code 1 while executing! [18:45:59] didn't seem to reload :\ [18:49:32] deploy still fails :( looks like scap is completely ignoring gitfat [18:49:59] gitfat itself works fine if I run it manually, but scap for some reason just doesn't do it [18:50:34] 10Gerrit, 10DBA: Switch to mariadb java connector once we upgrade to gerrit 2.14 - https://phabricator.wikimedia.org/T176164#3615798 (10demon) p:05Triage>03Low [18:50:41] * thcipriani looks [18:50:56] e.g. /srv/deployment/wdqs/wdqs-cache/revs/ecdbd0db953b0927a705f984f1eb2429a9020ff1 on wdqs1003 [18:51:34] all previous revs are checked out fine from gitfat, this one has all gitfat files broken. Even ones that didn't change in this rev [18:52:57] thcipriani: yeh that script needs updating [18:52:59] I'm not far if you need me, but just taking a few minutes break... [18:53:03] cant use it to update zuul right now [18:53:25] recently switched to systemd for zuul apparently [18:53:46] SMalyshev: I'm digging [18:53:59] thcipriani: thank you! [18:54:12] addshore: yes, it switched to systemd, but also yes, we added a follow-up [18:54:19] addshore: try using "service zuul .." [18:54:29] we updated the sudo privileges line just a while ago [18:54:43] mutante: yep, but the script in the integration/config we use wasnt updated! [18:54:47] * addshore goes to make a patch [18:54:50] addshore: https://gerrit.wikimedia.org/r/#/c/378664/2/modules/admin/data/data.yaml [18:54:57] aha! ok [18:55:27] (03PS1) 10Addshore: fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 [18:55:28] mutante thcipriani ^^ [18:56:22] [contint1001.wikimedia.org] out: /bin/bash: service: command not found [18:56:23] mhhmhhmm [18:57:04] (03CR) 10Ladsgroup: Add extensions/DataTypes (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [18:57:45] greg-g: May I have a deployment window to graduate the RCFilters beta feature on ca/he/frwiki tomorrow 8am-9am Pacific? [18:57:46] (03PS2) 10Ladsgroup: Add extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [18:58:07] We could abuse a SWAT window, but neither 6am nor 4pm is a great time and we probably need a scap [18:59:11] mutante: sudo: /usr/sbin/service zuul reload >>> out: Failed to reload zuul.service: Access denied [18:59:12] :( [18:59:18] SMalyshev: ah ha! I think I see the issue. [18:59:34] cool! [18:59:45] addshore: checking [18:59:58] mutante: ahh wait, the scripts switches user to zuul, which isnt in contint-admins so wont work! [19:00:03] SMalyshev: could you run a deploy with the -f flag to force a git_fat repull? Scap thinks its done this before since the directory exists and you have to tell it that just because the directory exists doesn't mean it's done working. [19:00:18] addshore: oh, ok [19:00:25] SMalyshev: so scap deploy -v -f "whatever message for SAL" [19:00:34] ok doing [19:00:56] thcipriani: looks like it worked this time! [19:01:02] !log addshore@contint1001:~$ sudo service zuul reload [19:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:01:39] (03CR) 10Addshore: [C: 04-1] fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 (owner: 10Addshore) [19:01:57] SMalyshev: sorry about all the trouble. When it failed the first time due to the missing user it created the checkout, but didn't complete the run, so on subsequent runs it didn't realize that just because the code was checked out doesn't mean that git-fat succeeded. [19:02:21] thcipriani: I see! thanks for help. looks like it's deploying now [19:02:28] cool :) [19:02:35] * thcipriani stops deploy stalking [19:02:38] :) [19:03:51] (03PS2) 10Addshore: fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 [19:04:36] (03CR) 10Addshore: [V: 031] fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 (owner: 10Addshore) [19:05:29] (03CR) 10Dzahn: [C: 031] fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 (owner: 10Addshore) [19:10:18] thcipriani: just wondering, I was going to add a command to the /config fabfile to do a docker pull command on all of the docker slaves, do you know of an easy way to get a list of all of the docker slaves programmatically rather than hardcoding them in the script? [19:11:47] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3615917 (10Addshore) [19:12:06] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3540725 (10Addshore) [19:12:10] 10Gerrit, 10Wikidata: [Task] move git repositories that are dependencies of wikidata to gerrit - https://phabricator.wikimedia.org/T74907#3615923 (10Addshore) [19:12:12] 10Gerrit, 10Wikidata, 10Patch-For-Review, 10User-Ladsgroup, and 2 others: [Task] Move DataTypes repository from Github to gerrit - https://phabricator.wikimedia.org/T127292#3615919 (10Addshore) 05Open>03Resolved [19:12:21] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3540725 (10Addshore) [19:12:24] addshore: hrm. there used to be a way to do this in ldap, i.e. query all the machines that had specific roles assigned, but I think that functionality may be gone. [19:12:39] (or at least I haven't figured out how to make it work again) [19:12:50] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3540725 (10Addshore) [19:16:51] (03PS1) 10Umherirrender: [ExtJSBase] Change to composer unittests [integration/config] - 10https://gerrit.wikimedia.org/r/378770 [19:17:36] (03PS1) 10Addshore: Add all wikidata build exts to make-wmf-branch [tools/release] - 10https://gerrit.wikimedia.org/r/378771 (https://phabricator.wikimedia.org/T173940) [19:20:21] thcipriani: ack! [19:21:39] addshore: I have been using salt [19:21:47] * addshore knows nothing about salt :D [19:22:13] or you can potentially query wikitech api.php [19:22:24] I think it exposes the list of instances for a project [19:24:42] hashar: could you add me to the 'admins' for the 'integration' project on 'cloudservices' ?:) [19:24:59] addshore: https://wikitech.wikimedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=&list=novainstances&niproject=integration&niregion=eqiad ! :D [19:25:02] also hashar http://tools.wmflabs.org/openstack-browser/api/dsh/project/integration [19:25:31] oh really [19:25:45] bd just gave me that in -cloud :) [19:26:07] and you already an admin of the project apparently [19:26:12] oh, hmmm [19:26:27] integration-saltmaster asks me for a password for sudo [19:27:20] you gotta grant your self sudo I guess ? https://horizon.wikimedia.org/project/sudo/ [19:27:30] oh... :D [19:28:00] got it, thanks! Thanks new to me! [19:28:03] which probably should be cleaned up [19:29:12] and I reviewed a few of your docker related patches [19:29:22] I am not sure what all those containers are going to be used for [19:29:37] one sure thing: there are bits that we keep using again and again [19:31:08] if y'all are trying to replace salt, you may want to look at https://github.com/bd808/wikimedia-cloud-vps-hostgroup-generator [19:31:42] that's a thing I made to make nice host groups for running clush commands [19:32:24] there is stuff hidden in the openstack-browser tool for targeting by puppet role too I think [19:33:08] https://phabricator.wikimedia.org/source/tool-keystone-browser/browse/master/app.py;3a597a95ffc5ad146872e1cc35c17981a56265a2$226 [19:47:42] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Scap, 10Patch-For-Review: Deploy gerrit with scap3 - https://phabricator.wikimedia.org/T157414#3616038 (10Dzahn) Plugins are now deployed via scap and dropped from the deb package in return. [19:49:21] 10Release-Engineering-Team (Kanban), 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#3616040 (10MBinder_WMF) @mmodell far out. :) Any idea how this shows up in the database? CC @JAufrecht for reporting possibilities [19:53:56] hashar: oooh, salt is nice [19:54:14] bad for your health [19:54:18] hah [19:55:40] RoanKattouw: sure thing [19:55:57] Thanks greg-g [20:06:33] (03CR) 10Hashar: [C: 032] [ExtJSBase] Change to composer unittests [integration/config] - 10https://gerrit.wikimedia.org/r/378770 (owner: 10Umherirrender) [20:07:21] (03CR) 10Hashar: [C: 032] Add extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [20:07:23] (03PS1) 10Addshore: fabric: Add command to pull docker image in all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 [20:07:35] (03CR) 10Addshore: [V: 031] fabric: Add command to pull docker image in all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 (owner: 10Addshore) [20:10:01] (03PS2) 10Addshore: fabric: Add command to pull docker image on all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 [20:10:08] (03CR) 10Addshore: [V: 031] fabric: Add command to pull docker image on all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 (owner: 10Addshore) [20:10:28] (03Merged) 10jenkins-bot: [ExtJSBase] Change to composer unittests [integration/config] - 10https://gerrit.wikimedia.org/r/378770 (owner: 10Umherirrender) [20:10:33] (03Merged) 10jenkins-bot: Add extensions/DataTypes [integration/config] - 10https://gerrit.wikimedia.org/r/378672 (https://phabricator.wikimedia.org/T127292) (owner: 10AnotherLadsgroup) [20:12:37] !log deleted unused images that were *months old* on docker slaves [20:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:14:34] (03PS1) 10Addshore: fabric: fix file documentation [integration/config] - 10https://gerrit.wikimedia.org/r/378785 [20:15:37] (03CR) 10jerkins-bot: [V: 04-1] fabric: Add command to pull docker image on all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 (owner: 10Addshore) [20:16:16] (03CR) 10Hashar: "random remarks. I havent looked at the Dockerfile." (034 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/371708 (owner: 10Addshore) [20:17:36] The DataTypes extension is not mirrored in github: https://github.com/wikimedia/mediawiki-extensions-DataTypes I want to enable travis and stuff [20:17:44] (03PS3) 10Addshore: fabric: Add command to pull docker image on all hosts [integration/config] - 10https://gerrit.wikimedia.org/r/378783 [20:18:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [20:32:10] (03Abandoned) 10Chad: Add basic scap.cfg [tools/release] - 10https://gerrit.wikimedia.org/r/356430 (owner: 10Chad) [20:33:25] Amir1: tasks please :) [20:34:04] greg-g: sorry, https://phabricator.wikimedia.org/T127292. Is this enough or I should make other ones for mirror and stuff? [20:36:21] thcipriani: Doing the jobrunner deploy now [20:36:32] Krinkle: ok, I'll get on tin and watch [20:36:50] Should be a no-op. Only scap patches since last deploy, and maybe https://github.com/wikimedia/mediawiki-services-jobrunner/commit/5f6099ffa63c depending on whether or not that was effectively rolled out or not [20:37:03] I believe we rolled that back. Anyway, doesn't matter :) [20:38:05] ok :) [20:39:48] Krinkle: saw it go out to both canaries, only restarted on eqiad node according to logs [20:40:21] thcipriani: Yeah, so what is 'config_deploy' ? [20:40:27] It says it skipped that on both. [20:41:45] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #521: 04FAILURE in 44 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/521/ [20:41:52] config_deploy is for deploying configuration files with scap3. You can specify files that you want to be able to rebuild as part of deployments. None are configured for jobrunner so that's normal. [20:42:32] https://doc.wikimedia.org/mw-tools-scap/scap3/quickstart/setup.html#config-template-variables [20:42:57] some docs I could find quickly for config_deploy ^ [20:44:43] well. Didn't see any errors in the logs at least... [20:48:48] 10Release-Engineering-Team, 10Wikidata, 10Epic, 10User-Addshore: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818#3616197 (10Addshore) [20:49:21] legoktm: ^^ just moved the 'move wikibase composer deps' from after deploying extensions to before... as of course mostof the extensions require one of them... [20:49:29] so I guess we can start with that again [20:50:57] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616198 (10Krinkle) I've updated thcipriani: ^ [20:51:49] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616204 (10Krinkle) [20:51:50] * thcipriani reviews [21:01:01] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616263 (10demon) Docs lgtm [21:07:16] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 2 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616268 (10thcipriani) >>! In T129148#3616198, @Krinkle wrote: > I've updated 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 2 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616269 (10Krinkle) [21:07:26] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 2 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3272788 (10Krinkle) [21:07:30] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 2 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616272 (10thcipriani) [21:09:39] Krinkle: alright, I'm going to close out that task unless you have any other concerns? [21:09:47] thcipriani: nope, lgtm [21:10:15] Krinkle: awesome. Thank you for your help! sorry for the ambush right after you get back from traveling :) [21:10:27] 10Release-Engineering-Team (Kanban), 10JobRunner-Service, 10Operations, 10Beta-Cluster-reproducible, 10Patch-For-Review: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3616275 (10thcipriani) [21:10:28] 10Scap (Scap3-Adoption-Phase1), 10releng-201516-q4, 10releng-201718-q1, 10Trebuchet: [keyresult] Migrate remaining trebuchet deployed services - https://phabricator.wikimedia.org/T129290#3616276 (10thcipriani) [21:10:31] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 2 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3616274 (10thcipriani) 05Open>03Resolved [21:10:48] thcipriani: it's alright. I'm glad it's sorted out now. One less thing to worry about re: jobqueue. [21:11:13] * thcipriani nods [21:12:33] 10Scap (Scap3-Adoption-Phase1), 10releng-201516-q4, 10releng-201718-q1, 10Trebuchet: [keyresult] Migrate remaining trebuchet deployed services - https://phabricator.wikimedia.org/T129290#3616299 (10thcipriani) 05Open>03Resolved a:03thcipriani All subtasks closed. Closing this one out. [21:14:05] 10Scap (Scap3-Adoption-Phase1), 10releng-201516-q4, 10releng-201718-q1, 10Trebuchet: [keyresult] Migrate remaining trebuchet deployed services - https://phabricator.wikimedia.org/T129290#3616306 (10greg) \o/ [21:21:49] (03PS4) 10Addshore: docker: zuul-cloner image [integration/config] - 10https://gerrit.wikimedia.org/r/375834 [21:21:50] (03CR) 10Addshore: docker: zuul-cloner image (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/375834 (owner: 10Addshore) [21:22:07] (03CR) 10Addshore: docker: zuul-cloner image (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375834 (owner: 10Addshore) [21:22:53] (03CR) 10Addshore: docker: zuul-cloner image (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/375834 (owner: 10Addshore) [21:22:58] (03PS5) 10Addshore: docker: zuul-cloner image [integration/config] - 10https://gerrit.wikimedia.org/r/375834 [22:06:02] (03PS2) 10Addshore: docker: php7 base image [integration/config] - 10https://gerrit.wikimedia.org/r/378530 [22:07:39] (03CR) 10Addshore: "A new tidy version is up." [integration/config] - 10https://gerrit.wikimedia.org/r/378530 (owner: 10Addshore) [22:10:23] (03PS1) 10Addshore: Docker: build.sh Remove any temporary cache-buster files [integration/config] - 10https://gerrit.wikimedia.org/r/378807 [22:11:28] (03PS2) 10Addshore: Docker: build.sh Remove any temporary cache-buster files [integration/config] - 10https://gerrit.wikimedia.org/r/378807 [22:16:36] (03PS2) 10Addshore: docker: composer image [integration/config] - 10https://gerrit.wikimedia.org/r/378531 [22:16:47] (03CR) 10Addshore: docker: composer image (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/378531 (owner: 10Addshore) [22:18:34] (03PS6) 10Addshore: docker: zuul-cloner image [integration/config] - 10https://gerrit.wikimedia.org/r/375834 [22:20:10] (03PS1) 10Addshore: docker: updated cache-buster cmd for operations-puppet [integration/config] - 10https://gerrit.wikimedia.org/r/378811 [22:24:59] 10Release-Engineering-Team (Backlog), 10Scap, 10Parsoid: Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616450 (10Arlolra) [22:26:41] (03CR) 10Addshore: "Yep, I mean, it is perfectly reasonable to have git on the CI machines :)" [integration/config] - 10https://gerrit.wikimedia.org/r/378534 (owner: 10Addshore) [22:27:19] (03PS2) 10Addshore: docker: mediawiki specific php image [integration/config] - 10https://gerrit.wikimedia.org/r/378532 [22:27:23] 10Release-Engineering-Team (Backlog), 10Scap, 10Operations, 10Parsoid: Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616468 (10ssastry) p:05Triage>03High [22:27:46] 10Release-Engineering-Team (Backlog), 10Scap, 10Operations, 10Parsoid: Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616450 (10ssastry) This is blocking deployments right now. [22:27:48] (03CR) 10Addshore: "PS2 creates roughly 60-70MB smaller images that PS1" [integration/config] - 10https://gerrit.wikimedia.org/r/378532 (owner: 10Addshore) [22:28:20] (03CR) 10Addshore: "PS2 creates roughly 60-70MB smaller than PS1" [integration/config] - 10https://gerrit.wikimedia.org/r/378530 (owner: 10Addshore) [22:29:02] (03CR) 10Addshore: "PS2s image is roughly 10MB smaller than PS1 :)" [integration/config] - 10https://gerrit.wikimedia.org/r/378531 (owner: 10Addshore) [22:29:45] no_justification: What is apache used for in scap? RE: https://gerrit.wikimedia.org/r/#/c/344221/7 [22:30:17] (03CR) 10Addshore: [V: 031] docker: mediawiki-phan [integration/config] - 10https://gerrit.wikimedia.org/r/378533 (owner: 10Addshore) [22:30:34] Krinkle: Frontend to git [22:30:35] (03PS2) 10Addshore: docker: mediawiki-phan [integration/config] - 10https://gerrit.wikimedia.org/r/378533 [22:30:38] For fetching stufffffff [22:30:45] (03PS2) 10Addshore: docker: git [integration/config] - 10https://gerrit.wikimedia.org/r/378534 [22:33:02] no_justification: Hm.. you mean to allow anonymous git fetch over http instead of ssh? [22:33:30] interesting. I thought it'd all be ssh. Or is that the case and this patch is a dependency for switching to http? [22:33:43] We already use http(s) mostly [22:33:50] SSH is dumb for read-only operations [22:33:51] :) [22:34:21] no_justification: So what does this patch enable? [22:34:33] Right now, all targets fetch from the master. This works for non-MW deployments [22:34:45] But when we move MW to the git-based system of scap3, we'll want to proxy that out [22:34:50] (same reason we don't all rsync from tin) [22:35:07] (03CR) 10Dzahn: [C: 031] "duplicate of https://gerrit.wikimedia.org/r/#/c/378665/" [integration/config] - 10https://gerrit.wikimedia.org/r/378763 (owner: 10Addshore) [22:35:17] This enables a vhost for fetching from on the proxies so they can do that :) [22:35:21] (03CR) 10Dzahn: [C: 031] "duplicate of https://gerrit.wikimedia.org/r/#/c/378763/" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [22:35:27] no_justification: ah, right. the rsync-proxies step doesn't happen with scap3 right now [22:35:31] because it's git based [22:35:36] And this enables that? [22:35:40] cool [22:35:43] Is there a task for that? [22:35:45] Path towards merging the two behaviors :) [22:35:53] Yeah, makes sense :) [22:35:59] Somewhereeeeeee [22:36:01] Lemme look [22:37:06] T147938, T121276 mostly [22:37:09] T147938: Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938 [22:37:09] T121276: Bring co-master / fanout capabilities to scap3 deployments - https://phabricator.wikimedia.org/T121276 [22:37:29] The "install apache vhost on proxies" probably came out of one of our standups. [22:37:46] (03Abandoned) 10Krinkle: fabfile, switch to service zuul reload [integration/config] - 10https://gerrit.wikimedia.org/r/378763 (owner: 10Addshore) [22:37:56] (03CR) 10Krinkle: [C: 031] fab: reload zuul via systemd [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [22:38:25] 10Scap (Scap3-MediaWiki-MVP), 10scap2, 10WorkType-NewFunctionality: Remove apache dependency from scap3 deployment host - https://phabricator.wikimedia.org/T116630#3616477 (10demon) I don't think this is really the direction we were planning to go in anymore. [22:39:04] Krinkle: There's also a task somewhere about generalizing the idea of proxies and make them sorta like "deploy master-lite" servers. Figured it would make sense rather than needing $proxies for each group of server. [22:39:09] s/group/type/ [22:39:17] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:40:29] no_justification: k, wanna tag one or both of those tasks in the commitmsg? [22:41:39] Yeah one sec [22:43:45] Added T147938 to it [22:43:45] T147938: Use git as transport mechanism for MediaWiki scap deploys - https://phabricator.wikimedia.org/T147938 [22:44:05] Also, I did need to dust that off, thx for the poke [22:44:19] 10Release-Engineering-Team (Backlog), 10Scap, 10Operations, 10Parsoid: Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616450 (10thcipriani) > This is probably related to T172333 where keyholder_key: deploy_service is added. I have my doubts about this. The `keyholder_ke... [22:45:03] (03CR) 10Addshore: "This patch is in terrible shape and there is no point in looking at it." [integration/config] - 10https://gerrit.wikimedia.org/r/378535 (owner: 10Addshore) [22:46:13] what is the conftool project in phab? autocomplete isn't find it for me [22:47:15] (03CR) 10Addshore: [V: 04-1 C: 04-1] "[contint1001.wikimedia.org] sudo: /usr/sbin/service zuul reload" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [22:49:49] (03CR) 10Dzahn: [C: 031] "sudo_user = none ?" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [22:52:34] greg-g, i doint think there is a conftool project in phab. I think they use https://phabricator.wikimedia.org/tag/operations-software-development/ for that [22:52:49] just guessing by what is tagged on https://phabricator.wikimedia.org/T149213 [22:57:18] 10Gerrit, 10MediaWiki-Vagrant, 10Patch-For-Review: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#3616516 (10Tgr) Found out about [[https://stackoverflow.com/a/45322343/323407|this]] the hard way. Makes shallow clones (even repos that were created as... [23:06:15] (03CR) 10Addshore: [V: 04-1 C: 04-1] "I have no problem with using run & sudo in https://gerrit.wikimedia.org/r/#/c/378763/" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [23:08:33] (03CR) 10Dzahn: [C: 031] "does it work without the line "env.sudo_user = None" ?" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [23:10:05] 10Scap (Tech Debt Sprint 2017-Q2), 10scap2: Scap should touch symlinks when originals are touched - https://phabricator.wikimedia.org/T126306#2010701 (10demon) PrivateSettings shouldn't happen anymore. [[ https://gerrit.wikimedia.org/r/#/c/376762/ | CommonSettings just uses PrivateSettings directly now instead... [23:13:21] (03CR) 10Addshore: [V: 04-1 C: 04-1] "Nope, as then fab will try to sudo as zuul and zuul is not in contint-admins" [integration/config] - 10https://gerrit.wikimedia.org/r/378665 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [23:14:18] RECOVERY - Puppet errors on integration-slave-docker-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [23:32:32] PROBLEM - Puppet errors on deployment-memc05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:39:35] (03CR) 10Addshore: [C: 04-1] docker: base image for CI images (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/378033 (owner: 10Hashar) [23:40:30] (03CR) 10Addshore: [C: 031] fab: git gc zuul repo on the servers [integration/config] - 10https://gerrit.wikimedia.org/r/378667 (owner: 10Hashar)