[00:25:18] Project beta-scap-eqiad build #126013: 04FAILURE in 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126013/ [00:35:19] Project beta-scap-eqiad build #126014: 04STILL FAILING in 20 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126014/ [00:47:09] Yippee, build fixed! [00:47:10] Project beta-scap-eqiad build #126015: 09FIXED in 2 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126015/ [01:13:16] (03PS1) 10MaxSem: Add ParserFunctions as a dependency for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/318032 (https://phabricator.wikimedia.org/T147575) [01:14:17] (03CR) 10jenkins-bot: [V: 04-1] Add ParserFunctions as a dependency for Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/318032 (https://phabricator.wikimedia.org/T147575) (owner: 10MaxSem) [05:04:31] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2743809 (10Matanya) [05:15:00] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Update all on-wiki references to git.wikimedia.org and replace them with the Phabricator equivalent - https://phabricator.wikimedia.org/T137353#2743821 (10Dzahn) cool, thanks for that list @EBernhardson - fixed wikidata:Tools/External tools it was just o... [05:19:18] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2743838 (10Matanya) [05:24:07] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2743843 (10Matanya) [06:43:32] why does https://logstash.wikimedia.org/goto/a3b311474c57d89c8b6ea70296642ab6 show wmf.22 for group0 ?I see on special:version for mediawiki.org it has wmf.23 [06:44:18] maybe ostriches ot greg-g can explain [06:50:50] group0 is on wmf.23 for sure. [06:50:55] logstash is lying [06:51:20] file a bug it's almost midnight I'm not debugging that heh [07:00:21] thanks ostriches have a good night [07:36:42] 10Gerrit, 13Patch-For-Review: Update site CSS customizations for the new change screen in Gerrit 2.12 - https://phabricator.wikimedia.org/T141286#2744025 (10PleaseStand) 05Open>03Resolved a:03Paladox Marking this resolved because I can't think of any other customizations that would have to be fixed. [09:11:50] 10Continuous-Integration-Config, 10Analytics-Dashiki, 13Patch-For-Review: Add CI job for Dashiki - https://phabricator.wikimedia.org/T148019#2744145 (10hashar) I guess that this more or less depends on bower -> npm migration T147884 [09:51:33] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:05:49] (03PS3) 10Hashar: Add mw ext Translate to mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:08:26] (03CR) 10Hashar: [C: 032] "Rebased and I have dropped the generic tests which are redundant with the mediawiki-extensions* jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:08:33] (03CR) 10jenkins-bot: [V: 04-1] Add mw ext Translate to mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:11:30] (03CR) 10Hashar: [C: 032] WikibaseRepository has been deleted [integration/config] - 10https://gerrit.wikimedia.org/r/318000 (owner: 10Hashar) [11:12:30] (03Merged) 10jenkins-bot: WikibaseRepository has been deleted [integration/config] - 10https://gerrit.wikimedia.org/r/318000 (owner: 10Hashar) [11:12:47] (03PS4) 10Hashar: Add mw ext Translate to mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:16:32] (03CR) 10Hashar: [C: 032] Add mw ext Translate to mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:17:33] (03Merged) 10jenkins-bot: Add mw ext Translate to mediawiki-gate [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:37:00] 10Deployment-Systems, 06Release-Engineering-Team, 06Operations: Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#2744454 (10hashar) [11:38:23] (03CR) 10Hashar: "Validated on Translate change https://gerrit.wikimedia.org/r/#/c/77284/3 (the composer failures there are expected, the dummy .php file b" [integration/config] - 10https://gerrit.wikimedia.org/r/315941 (https://phabricator.wikimedia.org/T86930) (owner: 10Paladox) [11:41:46] 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review, 06Services (doing): Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2744530 (10MoritzMuehlenhoff) 05Open>03Resolved This is now complete. [11:51:55] (03PS5) 10Hashar: [BlueSpiceSMWConnector] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/316297 (owner: 10Paladox) [11:52:04] (03CR) 10Hashar: [C: 032] [BlueSpiceSMWConnector] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/316297 (owner: 10Paladox) [11:53:00] (03Merged) 10jenkins-bot: [BlueSpiceSMWConnector] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/316297 (owner: 10Paladox) [11:55:41] (03PS2) 10Hashar: [XAnalytics] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/298346 (owner: 10Paladox) [11:56:01] (03CR) 10Hashar: [C: 032] [XAnalytics] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/298346 (owner: 10Paladox) [11:56:58] (03Merged) 10jenkins-bot: [XAnalytics] Update jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/298346 (owner: 10Paladox) [11:59:05] (03PS1) 10Hashar: Drop integration-zuul-layoutdiff from gate [integration/config] - 10https://gerrit.wikimedia.org/r/318080 [11:59:59] (03CR) 10Hashar: [C: 032] Drop integration-zuul-layoutdiff from gate [integration/config] - 10https://gerrit.wikimedia.org/r/318080 (owner: 10Hashar) [12:00:36] (03Merged) 10jenkins-bot: Drop integration-zuul-layoutdiff from gate [integration/config] - 10https://gerrit.wikimedia.org/r/318080 (owner: 10Hashar) [12:07:54] (03CR) 10Hashar: [C: 04-1] "The wikidata/query/gui-deploy repository uses the 'production' branch which does not have any package.json around!" [integration/config] - 10https://gerrit.wikimedia.org/r/291736 (owner: 10Paladox) [12:16:45] (03CR) 10Hashar: [C: 04-1] "I am not sure there is much point in running tests against Zend 5.6? Might want to raise that topic on wikitech-l and see whether it make" [integration/config] - 10https://gerrit.wikimedia.org/r/316012 (owner: 10Paladox) [12:19:39] (03PS2) 10Hashar: [Kartographer] Add ParserFunctions as a dependency [integration/config] - 10https://gerrit.wikimedia.org/r/318032 (https://phabricator.wikimedia.org/T147575) (owner: 10MaxSem) [12:20:47] (03CR) 10Hashar: [C: 032] "The Jenkins job error was transient. I have rebased and tweaked the commit message slightly. Thank you MaxSem :}" [integration/config] - 10https://gerrit.wikimedia.org/r/318032 (https://phabricator.wikimedia.org/T147575) (owner: 10MaxSem) [12:21:22] (03Merged) 10jenkins-bot: [Kartographer] Add ParserFunctions as a dependency [integration/config] - 10https://gerrit.wikimedia.org/r/318032 (https://phabricator.wikimedia.org/T147575) (owner: 10MaxSem) [12:59:22] (03PS6) 10Hashar: mwext-mw-selenium jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/293096 (https://phabricator.wikimedia.org/T137112) [13:15:59] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:35:54] 10Deployment-Systems, 06Release-Engineering-Team: cannot delete non-empty directory: php-1.25wmf14/cache/l10n - https://phabricator.wikimedia.org/T90798#2744759 (10hashar) 05Invalid>03Open That is happening again with scap `3.3.0-1` on mw2098.codfw.wmnet: ``` hashar@mw2098:~$ scap pull 13:30:13 Copying to... [13:50:59] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [13:52:03] (03PS1) 10Tobias Gritschacher: Enable experimental browsertests for ELectronPdfService [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) [13:53:59] (03CR) 10Addshore: [C: 031] Enable experimental browsertests for ELectronPdfService [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) (owner: 10Tobias Gritschacher) [13:55:37] 10Browser-Tests-Infrastructure, 13Patch-For-Review: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#2744840 (10hashar) [13:55:39] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744836 (10hashar) 05stalled>03Open a:03hashar I have rebased the patch, the jobs for Nodepool are ad... [13:57:02] (03CR) 10Hashar: [C: 032] "Almost jobs are in experimental pipeline so it is hardly a raise in consumption of Nodepool instances. The only one that is in a test pip" [integration/config] - 10https://gerrit.wikimedia.org/r/293096 (https://phabricator.wikimedia.org/T137112) (owner: 10Hashar) [13:57:49] (03Merged) 10jenkins-bot: mwext-mw-selenium jobs on Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/293096 (https://phabricator.wikimedia.org/T137112) (owner: 10Hashar) [14:04:17] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744875 (10hashar) [14:14:46] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744884 (10hashar) [14:15:29] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2357597 (10hashar) I have updated the task description to add a table listing the status of all repos havin... [14:19:40] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744904 (10hashar) [14:20:12] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2357597 (10hashar) [14:23:31] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Update all on-wiki references to git.wikimedia.org and replace them with the Phabricator equivalent - https://phabricator.wikimedia.org/T137353#2744914 (10Aklapper) Erik: Thanks for the list! commons, meta, labswiki (wikitech) are done IMHO, as I won't e... [14:25:05] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744933 (10hashar) [14:46:35] PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:50:39] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2744993 (10hashar) [14:53:11] PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129) [14:54:29] PROBLEM - Host deployment-conftool is DOWN: CRITICAL - Host Unreachable (10.68.20.30) [15:24:28] hashar: what's the comment to make a patch depend on a patch in a different repo? [15:24:44] andrewbogott: Depends-On: xxxx [15:24:50] where xxxx is the Gerrit changeid [15:24:56] ok! Thank you :) [15:25:00] We'll see if openstack supports this [15:25:54] I think it does? [15:27:29] andrewbogott: yeah that comes from them :} [15:28:22] Seems like they're the only ones really using gerrit/jenkins at this point [15:28:26] well, and us [15:29:19] andrewbogott: they have a hundred of third parties testing openstack with more or less the same stack though :D [15:29:34] they have also dropped Jenkins entirely ! [15:29:46] yeah, and it takes 12 hours for their CI to review my patches :) [15:29:55] But I'm impressed at all the things it runs them through [15:30:32] 10Browser-Tests-Infrastructure, 05Continuous-Integration-Scaling, 13Patch-For-Review, 15User-zeljkofilipin: migrate mwext-mw-selenium to Nodepool instances - https://phabricator.wikimedia.org/T137112#2745161 (10hashar) Looks failures are due to `mediawiki/skins/Vector` not being included. It is cloned af... [15:39:47] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:40:58] hashar: what is the popular alternative to jenkins/gerrit nowadays ? [15:46:01] matanya: as I understand it, their instances/test runners now exposes functions to run [15:46:06] which under the hood are ansible playbooks [15:48:29] hashar, twentyafterfour, thcipriani, are you seeing any bad effects from my puppetmaster upgrade on Monday? [15:48:57] andrewbogott: apparently it is all fine :D [15:48:59] andrewbogott: I don't think so [15:49:03] :) [15:49:07] will get to migrate it to jessie eventually [15:49:20] great, I'm going to put those packages in reprepro and declare them to be the new standard. [15:49:26] Thank you for being labrats. [15:49:27] \O/ [15:49:36] awesome! [15:49:50] hashar: the new :standalone class that yuvi made uses passenger instead of the puppetmaster service and it seems to be much faster. [15:49:56] So you have that to look forward to when you rebuild. [15:50:12] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2745216 (10mmodell) a:05demon>03mmodell I guess I'm babysitting this one for the rest of the week :) [15:50:24] (03PS2) 10Hashar: Enable experimental browsertests for ELectronPdfService [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) (owner: 10Tobias Gritschacher) [15:50:31] (03CR) 10Hashar: [C: 032] Enable experimental browsertests for ELectronPdfService [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) (owner: 10Tobias Gritschacher) [15:50:56] andrewbogott: sweet, so no more puppet::self? :D [15:51:04] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:51:04] (03Merged) 10jenkins-bot: Enable experimental browsertests for ELectronPdfService [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) (owner: 10Tobias Gritschacher) [15:51:16] twentyafterfour: correct, although the new class has its own mysterious issues to sort out [15:51:54] It seems to make the cluster-with-local-puppetmaster case work much better, at the expense of making standalone-instance-with-custom-puppet worse [15:52:03] but hopefully we can hammer out the problems with the latter [15:52:35] so for standalone-instance-with-custom-puppet, why would you use a puppetmaster instead of just running puppet apply? [15:53:01] Mostly if you want it to be actively managed I think [15:53:13] But I'm not sure, maybe we can just live without that use case [15:53:14] I think the cluster-local-puppetmaster is a much more sensible use case to optimize for [15:53:21] yeah, I agree [15:53:55] cool [15:54:13] I'll have to give yuvi a token cookie or something :D [15:54:24] or buy him some beers next time I see him [15:54:25] (03CR) 10Hashar: "Deployed! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/318091 (https://phabricator.wikimedia.org/T149189) (owner: 10Tobias Gritschacher) [15:55:25] I am off [15:55:34] be back later this evening for some Jenkins job hacking [15:56:42] 05Gitblit-Deprecate, 10Diffusion, 13Patch-For-Review: Update all on-wiki references to git.wikimedia.org and replace them with the Phabricator equivalent - https://phabricator.wikimedia.org/T137353#2745226 (10Paladox) Thankyou for ding that :) [15:57:43] 10Deployment-Systems, 06Release-Engineering-Team: cannot delete non-empty directory: php-1.25wmf14/cache/l10n - https://phabricator.wikimedia.org/T90798#2745237 (10mmodell) a:05mmodell>03None [15:58:47] 06Release-Engineering-Team, 03Scap3 (Scap3-MediaWiki-MVP): cannot delete non-empty directory: php-1.25wmf14/cache/l10n - https://phabricator.wikimedia.org/T90798#1067769 (10mmodell) [17:19:33] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2745546 (10greg) [17:59:33] greg-g, I've created more errors - looks better now? :P [18:00:06] can't look, in meetings :/ [18:00:27] oh, I see [18:00:41] MaxSem: if it'll cause issues, please consider it UBN [18:05:35] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2745791 (10MaxSem) [18:21:28] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2745912 (10MaxSem) [18:22:08] 05Gitblit-Deprecate, 10Diffusion: Redirect git.wikimedia.org HEAD URLs to Diffusion - https://phabricator.wikimedia.org/T141965#2745914 (10Aklapper) >>! In T141965#2740947, @Dzahn wrote: > This now changed to: > > redirects to: https://phabricator.wikimedia.org/diffusion/GTWN/browse/irc-relay/ > > Unhandled... [18:27:15] 05Gitblit-Deprecate, 10Diffusion: Redirect git.wikimedia.org HEAD URLs to Diffusion - https://phabricator.wikimedia.org/T141965#2518118 (10mmodell) This seems like a bit of buggy url routing code in phabricator. The ending slash is allowed if you specify a branch name. |path|works?| |/browse/irc-relay... [18:40:22] niedzielski: hi! When you get time, please subscribe to the QA list if not done already. Low traffic and that is where we publicly talk about CI and other QA related stuff https://lists.wikimedia.org/mailman/listinfo/qa :D [18:41:20] hashar: will do! :] [18:41:45] niedzielski: maybe we should give you access to the contint server as well so you could deploy zuul changes as well [18:41:56] in case you end up needing a modification in zuul/layout.yaml to be pushed [18:43:49] !cireview [18:43:54] !cireview is https://gerrit.wikimedia.org/r/#/projects/integration/config,dashboards/default [18:43:54] Key was added [18:43:57] !cireview [18:43:57] https://gerrit.wikimedia.org/r/#/projects/integration/config,dashboards/default [18:43:59] danke [18:44:17] hashar: that might be nice. i have modified that file in the past and could probably make androidland changes there again. [18:45:00] niedzielski: going to handle the paper work :D [18:48:00] hashar: ah, thanks :] [18:48:40] niedzielski: can you check whether https://phabricator.wikimedia.org/legalpad/signatures/query/ULM.HYuZn4Gg/#R shows you the L3 document signed by you ? [18:48:47] that is the server access responsibilities [18:48:53] I guess you signed it since you get shell already [18:49:55] hashar: yes, that one is signed by me showing a date from last year [18:50:06] sounds good :) [18:50:15] I am creating the task and will add you as a subscriber to it [18:50:43] greg-g: i think it is wrong you lowered the proi on that fatal [18:56:00] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2746000 (10Matanya) [18:59:24] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Requesting access to contint for - https://phabricator.wikimedia.org/T149233#2746004 (10hashar) [19:00:33] hashar: thanks! [19:01:11] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Requesting access to contint for niedzielski - https://phabricator.wikimedia.org/T149233#2746004 (10hashar) [19:01:30] (03PS3) 10Hashar: REL1_28: adjust DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/317999 (https://phabricator.wikimedia.org/T148987) [19:03:32] (03CR) 10Hashar: [C: 032] REL1_28: adjust DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/317999 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:03:57] (03CR) 10Hashar: "I have created the new job:" [integration/config] - 10https://gerrit.wikimedia.org/r/317999 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:04:41] (03Merged) 10jenkins-bot: REL1_28: adjust DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/317999 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:05:39] (03PS2) 10Hashar: REL1_28: skip mwext-MobileFrontend-npm-run-lint-modules [integration/config] - 10https://gerrit.wikimedia.org/r/318002 (https://phabricator.wikimedia.org/T148987) [19:06:03] (03CR) 10Hashar: [C: 032] REL1_28: skip mwext-MobileFrontend-npm-run-lint-modules [integration/config] - 10https://gerrit.wikimedia.org/r/318002 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:06:12] (03PS2) 10Hashar: REL1_28: drop references to no more supported releases [integration/config] - 10https://gerrit.wikimedia.org/r/318005 (https://phabricator.wikimedia.org/T148987) [19:06:38] (03Merged) 10jenkins-bot: REL1_28: skip mwext-MobileFrontend-npm-run-lint-modules [integration/config] - 10https://gerrit.wikimedia.org/r/318002 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:07:21] (03CR) 10jenkins-bot: [V: 04-1] REL1_28: drop references to no more supported releases [integration/config] - 10https://gerrit.wikimedia.org/r/318005 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:08:27] (03PS3) 10Hashar: REL1_28: drop references to no more supported releases [integration/config] - 10https://gerrit.wikimedia.org/r/318005 (https://phabricator.wikimedia.org/T148987) [19:08:45] moaaar branches are now obsoletes [19:09:07] :o [19:18:31] (03CR) 10Hashar: [C: 032] REL1_28: drop references to no more supported releases [integration/config] - 10https://gerrit.wikimedia.org/r/318005 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:19:03] (03Merged) 10jenkins-bot: REL1_28: drop references to no more supported releases [integration/config] - 10https://gerrit.wikimedia.org/r/318005 (https://phabricator.wikimedia.org/T148987) (owner: 10Hashar) [19:23:40] who own logstash anyway ? [19:23:52] matanya: Wikimedia :) [19:24:08] really ? amazing! [19:24:19] the software? or our deploy? [19:24:22] bd808: like - what team/person ? [19:24:39] well... me maybe [19:24:53] i.e create dashboards, look at logs etc [19:25:04] I've done most of the work on it, but I've been trying to hand it off to discovery [19:25:24] bd808: so would you be a good owner for https://phabricator.wikimedia.org/T149166 ? [19:25:40] discovery (and erik b specifically) did most of the hard work for the kibana4 migration [19:26:25] matanya: hit refresh? [19:26:32] did, many times [19:26:41] you see it correctly ? [19:26:46] that version number is taken from the log messages. It's showing wmf.23 for me [19:28:13] opened a new browser, and incognito in current one, checked on other device - no luck [19:28:22] weird [19:28:43] what's the timestamp of the newest message you see in the event list? [19:28:46] i go to the group0 link on the dashboard [19:28:57] 2016-10-26T19:27:17 [19:29:16] 2016-10-26T19:27:17 hhvm WARNING - mw1263 Warning: Failed connecting to redis server at rdb1007.eqiad.wmnet: Connection timed out in /srv/mediawiki/php-1.28.0-wmf.22/includes/libs/redis/RedisConnectionPool.php on line 235 [19:29:55] url used: https://logstash.wikimedia.org/goto/c4d99e5042dda75f32821b9a87012316 [19:31:58] matanya: ok, I have a possible clue. When I visit that version of the dashboard I see "No results displayed because all values equal 0" [19:32:08] indeed [19:32:22] For me that's because there are no MW logs in the time window [19:32:48] and if it you make the window larger (I have messages) [19:32:53] its all hhvm logs and hhvm logs don't kow what mw version they are running [19:33:19] if I zoom out to 1h then it shows wmf.23 [19:34:07] I see, so not a bug, it is intentional [19:34:26] well its how the software works, yeah [19:34:36] that same gadget is used on many different reports [19:34:41] tough it would be useful to know more about the origin of hhvm logs [19:34:46] mostly to help see/filter to a version [19:35:03] we would have to patch hhvm to do that I guess [19:35:20] not something I'm super interested in spending my days on [19:35:27] not worth the hassle [19:36:01] your eyes can tell from the php file paths [19:36:54] but yeah we don't have a good way to filter searches for "hhvm errors from MW version X requests" [19:41:17] thcipriani: mutante: lets sync the time of contint1001 here :] [19:41:19] will be easier to manage [19:41:41] in short wednesday morning is quite busy for tyler [19:41:43] i would like to request 08.45 PST [19:41:49] mutante needs 9:00am PST + [19:42:04] I think I can do Wednesday, just have to move a meeting, and I don't think there's a reason not to. [19:42:08] well, or i can do 7-8, then a break and be back at 8.45 [19:42:18] so I guess we can do Thursday 3rd at 9:00am PST / 10:00 Boulder / 17:00 Paris ? [19:42:37] that would work well for me. [19:42:41] yes, same here [19:42:59] this way mutante can do his duties just fine from 7 to 9 :] [19:42:59] the day of the week itself doesnt matter that much to me [19:43:01] fwiw [19:43:09] sounds good [19:43:33] +1 [19:44:02] +1 [19:44:38] alright, let's see if we have more to merge before that :) [19:48:30] added to https://wikitech.wikimedia.org/wiki/Deployments#Thursday.2C.C2.A0November.C2.A003 :) [19:48:45] replied to the mail [19:49:58] 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 07Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2746143 (10hashar) The migration to contint1001 is scheduled for Thursday November 3rd at 9:00am PST / 16:00 UTC / 17:00 CET. http... [19:50:20] as well [19:50:20] and I have updated the wiki page + task [19:50:36] you reply is way more efficient than mine :] [19:52:07] thanks bd808 [19:52:43] matanya: yw. I can see how that gadget is confusing if you don't know what data drives it [19:52:54] yeah [19:53:16] I'm supposed to give a tech talk on kibana4 soon. seems like something worth mentioning [19:53:50] We at $day_job are planning to migrate to 5 soonish and grafana 4 as well [19:54:07] might be great if you give the talk before we do that :) [19:54:43] T148934 -- 2016-11-16 is the current date [19:55:06] we might make it [19:55:32] I'm going to focus on using our setup, not really on general system maintenance or anything like that [19:55:48] more "how to debug prod and beta cluster with our tools" [19:57:26] mutante: there are a bunch of merges for CI but they are merely clean up tasks [19:58:00] matanya: yeah, max told me, and it should now be resolved [19:58:23] thanks greg-g [19:58:46] bd808: listening to you is always worth it, even if wm specific [19:59:41] * bd808 blushes [20:02:10] hashar: i'll take a look at your outgoing queue [20:06:47] rsync is really the nicest tool ever [20:08:19] I don't know if I'd call it nicest [20:08:36] it's got one of the most confusing cli argument formats ever created [20:09:05] handy, that is the word i use [20:09:15] +1 [20:10:25] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:10:39] if you ever need to rsync between 2 prod hosts, you can make puppet install rsyncd with the right config and ferm with just a few lines. Since rsync over ssh wont just work without key forwarding. [20:11:10] hashar: are we going to rsync gallium to contint1001? [20:11:21] then let me prepare that stuff before [20:13:01] yeah that in the doc [20:13:08] I will do a first rsync to prepopulate the data [20:13:13] then have all commands listed in the doc [20:13:38] Alexandros has opened the firewall port for rsync [20:13:47] on gallium I have added a dummy /etc/rsyncd.conf for /var/lib/jenkins [20:13:58] serves it with: rsync --daemon --no-detach [20:14:15] then on contint1001 would do something like: rsync -av rsync://gallium.wikimedia.org/jenkins/userContent . [20:14:23] but it does not do the uid mapping :( [20:15:25] so basically the same thing just it's live hack instead of puppet :) [20:15:29] i would do -avp [20:15:45] the UID mapping will work if we make sure the same user has the same UID on both servers beforehand [20:16:06] yeah which is not the case :D [20:16:19] we should go to wikitech page called "UID" [20:16:26] and "reserve" one for this user there [20:16:33] I was somehow expecting rsync to send a message saying: that file belong to user "jenkins" with uid "110" [20:16:36] then we'll chnage it on contint1001 [20:16:45] next time i would recommed we just do everything puppet [20:16:47] then on the rsync client side, figure out that "jenkins" has a different uid and adjust [20:16:48] for the rsync part [20:17:11] so the use is jenkins, right [20:17:12] looks [20:17:15] well the users are created by the .deb packages [20:17:34] so the UID ends up pretty much random/ depending on which order puppet installed the packages :( [20:17:53] ok, so we'll just change it [20:18:07] and that would be an extra step to think of after a fresh install of a CI server [20:18:17] this happens all the time , btw, not just CI at all [20:19:07] --numeric-ids don't map uid/gid values by user/group name [20:19:08] bah [20:19:09] the usual fix was to rsync, then notice it, and then run find .. -exec chmod :p [20:19:12] to fix it later [20:19:16] but we can just change it now [20:19:19] just easier [20:19:53] we can add jenkins to https://wikitech.wikimedia.org/wiki/UID [20:20:07] but 110 is not a good one [20:20:19] hrmm [20:20:57] 113 as on contint1001 is a bit better [20:21:37] then that depends how puppet install the packages [20:21:52] i dont see rsyncd installed on contint1001 yet [20:22:10] the daemon is on gallium part [20:22:17] contint1001 acting as the client [20:22:29] gallium# /usr/bin/rsync --daemon --no-detach [20:22:57] ok, normally i'd push to the target [20:23:01] where is the config? [20:23:08] looked for /etc/rsync.. [20:23:29] gallium /etc/rsyncd.conf [20:23:42] oh, sorry, right [20:24:13] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Requesting access to contint for niedzielski - https://phabricator.wikimedia.org/T149233#2746281 (10Legoktm) +1 [20:25:56] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Operations, 10Ops-Access-Requests: Requesting access to contint for niedzielski - https://phabricator.wikimedia.org/T149233#2746004 (10greg) Approved from my side. [20:25:59] running iptables -L on gallium is really slow..odd [20:26:10] dns resolution maybe? [20:27:34] ahhh [20:27:37] use chroot = yes [20:27:44] enabled by default [20:27:46] with -n it's fast [20:28:04] When this parameter is enabled, rsync will not attempt to map users and groups by name (by default), but instead copy IDs as though --numeric-ids had been specified. [20:28:30] so gotta copy bunch of stuff under the chroot :( [20:28:36] brb need a tea [20:29:19] ok, or we just fix it with find and chmod after the sync, not a biggie [20:29:32] did that for every other server migration [20:29:53] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2746323 (10mmodell) [20:31:32] the DNS resolution thing seems true but i dont have that on other machines ..just gallium [20:31:45] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [20:31:45] contint1001 doesnt have that problem [20:35:15] :( [20:35:23] I call that the bowl of spagetthi [20:35:39] one start looking at something and get diverted by another unrelated issue :( [20:36:52] ok trick is [20:36:54] use chroot = no [20:45:12] Oh wow ive setup my own znc bouncer on my own pc :) [20:47:59] paladox: :) but what is the difference to just starting the IRC client on your PC [20:48:41] hashar: so should we define the "right" UID for jenkins once and for all? [20:48:47] mutante: ah sorry [20:48:49] no not needed :] [20:48:52] I can get ubuntu to start it and then just login through hexchat on windows, theres a bug that causes hexchat to freeze and crash if disconnected, this way i try not to crash it [20:49:10] if the rsync server runs without a chroot, it can lookup the user/group names from /etc/passwd just fine [20:49:38] then: rsync --bwlimit=5m --delete --info=progress2 -az rsync://gallium.wikimedia.org/jenkins . [20:50:25] ok, maybe first run with -n for simulate [20:50:51] ahhhh yeah that would have been a good thing [20:51:02] I tried it by syncing a single small directory to a different place [20:51:25] namely: rsync://gallium/jenkins/userContent --> contint1001:/var/lib/jenkins/fromgallium [20:51:32] played with that until I found the proper params [20:51:46] I have updated the Google doc [20:51:46] gotcha [20:52:00] https://docs.google.com/document/d/1xOcXkQA9gJaLAeyA6pePUJPZmV62RFU3KapGg8LCJ_A/edit# [20:52:32] I guess that was the last troublesome command [20:52:52] from there we will basically shutdown all services/puppet on gallium. rsync again [20:52:59] spawn services on contint1001 [20:53:04] and should be all set [20:53:30] wow, easy enough :) nice [20:53:47] how about firewalling? [20:54:01] do we not need more changes to allow connections from/to contint1001 [20:54:09] and maybe to/from the labs instances [20:54:29] had a task for that and it is all set [20:54:33] great! [20:54:38] the labs instances goes out to the internet [20:54:43] then back to contint1001 public IP [20:55:15] I am not "too" worried [20:55:16] :D [20:55:25] ok, well, then, i can prepare a change to decom gallium [20:55:27] hieradata/role/common/zuul/merger.yaml: gearman_server: '208.80.154.135' # gallium.wikimedia.org [20:55:38] modules/nodepool/templates/nodepool.yaml.erb: - tcp://gallium.wikimedia.org:8888 [20:55:55] modules/role/manifests/cache/misc.pp: 'gallium' => { # CI server [20:56:15] there is a ssh key called jenkins@gallium btw [20:56:38] modules/contint/manifests/master_dir.pp: if $::hostname == 'gallium' { [20:56:49] ouch, if $hostname inside module [20:57:09] yeah that one should not be a problem [20:57:29] 13 # gallium received a SSD drive (T82401) mount it [20:57:35] it is to mount a ssd disk to /srv/ssd [20:57:37] which we no more need [20:57:46] ok [20:58:10] the varnish cache::misc I probably already have a patch to switch the backend from gallium to contint1001 [20:58:16] quite trivial to handle [20:59:02] will review the few patches I have around [20:59:07] and make sure to list them in the google doc [20:59:15] ideally we will just copy paste commands [20:59:15] how about nodepool.yaml.erb [20:59:18] cool! ok [20:59:22] and blindly +2/merge run puppet [20:59:24] and get all set [20:59:29] but that is the theory ;] [21:00:33] i'm adding a patch to remove gallium from puppet/installserver and DNS [21:00:48] let's see if we cover all occurences of string gallium [21:01:04] feel free to add the patches to the Google doc in the last section "Aftermath" [21:02:20] ok [21:03:03] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2746354 (10mmodell) [21:07:22] files transfered [21:08:19] I guess that is all for today [21:08:21] :) [21:08:26] mutante: thanks for the support related to rsync :] [21:08:43] quite welcome hashar, see you soon [21:18:45] sleep well .* ! [21:45:27] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10DBA, 10MediaWiki-Database, 07WorkType-NewFunctionality: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255#2746607 (10Krinkle) [22:37:44] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2746768 (10Matanya) [22:55:05] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2746865 (10Matanya) [23:13:12] (03CR) 1020after4: [C: 031] install contint::arcanist module on contint slave snapshots [integration/config] - 10https://gerrit.wikimedia.org/r/295976 (owner: 1020after4) [23:13:20] (03PS3) 1020after4: install contint::arcanist module on contint slave snapshots [integration/config] - 10https://gerrit.wikimedia.org/r/295976 [23:15:12] (03Abandoned) 1020after4: install contint::arcanist module on contint slave snapshots [integration/config] - 10https://gerrit.wikimedia.org/r/295976 (owner: 1020after4)