[00:08:56] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Language-Engineering, 07Jenkins, 07Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#2497492... [00:34:27] 10MediaWiki-Codesniffer: MediaWiki.Commenting.FunctionComment.ParamNameNoMatch does not handle variables being passed by reference - https://phabricator.wikimedia.org/T141410#2497538 (10Legoktm) [00:37:14] 10MediaWiki-Codesniffer: MediaWiki.Commenting.FunctionComment.MissingReturn should not trigger if the function does not return anything - https://phabricator.wikimedia.org/T141411#2497550 (10Legoktm) [00:39:22] 06Release-Engineering-Team (Long-Lived-Branches), 06Performance-Team: Don't trash cache for front-end resources - https://phabricator.wikimedia.org/T102578#2497563 (10Krinkle) [00:43:50] (03PS1) 10Ladsgroup: Add Beta features as dependency for ORES [integration/config] - 10https://gerrit.wikimedia.org/r/301319 [00:44:14] 06Release-Engineering-Team (Long-Lived-Branches): Static asset time on disk - https://phabricator.wikimedia.org/T140921#2497571 (10Krinkle) [00:44:26] 06Release-Engineering-Team (Long-Lived-Branches), 06Performance-Team: Don't trash cache for front-end resources - https://phabricator.wikimedia.org/T102578#2497565 (10Krinkle) 05Open>03Resolved The last blocking task is {T113916}. However closing this as the resolved since the original goals have been achi... [00:47:13] 10MediaWiki-Codesniffer: MediaWiki.Commenting.FunctionComment.MissingParamTag should handle "@param $var type" a little better - https://phabricator.wikimedia.org/T141412#2497580 (10Legoktm) [00:47:47] 10MediaWiki-Codesniffer: MediaWiki.Commenting.FunctionComment.MissingParamTag should handle "@param $var type" a little better - https://phabricator.wikimedia.org/T141412#2497592 (10Krinkle) [00:48:34] 10MediaWiki-Codesniffer, 13Patch-For-Review: Add sniff to avoid if/else/while/foreach without curly braces - https://phabricator.wikimedia.org/T113863#2497594 (10Krinkle) p:05Triage>03Normal [00:48:42] 10MediaWiki-Codesniffer: phpcs: Enforce clone, require, etc., are not functions - https://phabricator.wikimedia.org/T116779#2497595 (10Krinkle) p:05Triage>03Normal [00:48:57] 10MediaWiki-Codesniffer: Add rule for unused variables - https://phabricator.wikimedia.org/T114846#2497596 (10Krinkle) p:05Triage>03Normal [00:51:15] 10MediaWiki-Codesniffer, 10FR-Smashpig, 10Fundraising-Backlog: Write mutant code style config for SmashPig, or fully adopt MediaWiki style - https://phabricator.wikimedia.org/T133576#2236348 (10Krinkle) Note that most style violations can be fixed automatically by running the `phpcbf` utility that comes with... [00:59:37] 10MediaWiki-Codesniffer: Add rule for unused variables - https://phabricator.wikimedia.org/T114846#2497607 (10aaron) Make sure it handle ScopedCallback and similar things. [01:28:58] 03Scap3, 06Operations, 10Ops-Access-Requests, 06Services, 13Patch-For-Review: Allow Pchelolo to deploy services via Scap3 - https://phabricator.wikimedia.org/T141086#2497635 (10Dzahn) 05Open>03Resolved a:03Dzahn done [01:29:24] 03Scap3, 06Operations, 10Ops-Access-Requests, 06Services: Allow Pchelolo to deploy services via Scap3 - https://phabricator.wikimedia.org/T141086#2497638 (10Dzahn) [01:53:27] 06Release-Engineering-Team, 10Gerrit, 06Operations, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497685 (10Dzahn) 18:47 < mutante> !log ytterbium - shutdown -h now, over and out 18:54 < grrrit-wm> (PS1) Dzahn: contint: remove... [02:07:32] Is Gerrit down? [02:08:18] Oh looks like mutante is moving it to a new server [02:08:35] And it's back up [02:10:34] RoanKattouw: not even that, just merged a config change which triggered an automatic service restart to reload config [02:10:39] Oh OK [02:10:54] I saw Phab activity about shutting down ytterbium so I thought that was related [02:10:57] and the config change was to remove the _bugzilla_ password :p [02:11:04] it is related [02:11:11] ytterbium is the old gerrit server which can be killed [02:21:48] 06Release-Engineering-Team, 10Gerrit, 06Operations: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#2497714 (10Dzahn) [02:24:58] 06Release-Engineering-Team, 10Gerrit, 06Operations, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#1972391 (10Dzahn) removed ytterbium from DNS in https://gerrit.wikimedia.org/r/#/c/301324/ [02:25:52] 06Release-Engineering-Team, 10Gerrit, 06Operations, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497780 (10Dzahn) 05Open>03Resolved no more remnants in puppet or DNS, except mgmt DNS, continued in subtask now [02:26:14] mutante: can you also restart grrrit-wm? [02:26:23] 06Release-Engineering-Team, 10Gerrit, 06Operations: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497785 (10Dzahn) [02:26:34] legoktm: i already did [02:26:39] oh, thank you :D [02:33:22] 10MediaWiki-Releasing, 06Release-Engineering-Team, 06Operations, 10Parsoid: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2497820 (10Dzahn) p:05Triage>03High [05:43:48] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins, 07Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#2497991 (10Nikerabbit) Blind se... [06:20:59] !log created instance deployment-depurate01 for testing of role::html5depurate [06:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [06:58:14] PROBLEM - Puppet run on phab-beta is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:01:36] (03PS1) 10Lethexie: Add detection for calling global functions in target classes. [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/301335 [08:32:16] (03CR) 10jenkins-bot: [V: 04-1] Add detection for calling global functions in target classes. [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/301335 (owner: 10Lethexie) [09:02:34] 10MediaWiki-Releasing, 06Release-Engineering-Team, 06Operations, 10Parsoid: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2498232 (10fgiunchedi) sigh, thanks for letting us know! Looks like a good occasion to switch to 4k pgp key too, I'm going to generate a new one... [09:03:48] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498236 (10hashar) [09:03:51] 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498233 (10hashar) 05Open>03Resolved a:03hashar Imho there is nothing left to do. All services got transitioned :-} [09:05:57] 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498240 (10hashar) [09:06:25] 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#973092 (10hashar) ::beta related roles as of July 27th: ``` ./modules/role/manifests/beta/availability_collector.pp ./modules/role/manifests/beta/bastion.pp .... [09:07:03] 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498242 (10hashar) [09:07:21] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498243 (10hashar) [09:07:28] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#973295 (10hashar) As of July 27th 2016 in puppet.git: ``` $ find . -type f -path '*role*beta*' ./modules/role/manifests/beta/availability_collector.pp ./modu... [09:08:03] 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498250 (10hashar) [09:12:57] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins, 07Upstream: Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#2498254 (10hashar) 05Open>03... [09:30:36] morning hashar ! [09:33:10] addshore: hello :-} [09:34:03] I have a question about the deployment process etc (asking you as I just spotted you pushed out https://gerrit.wikimedia.org/r/#/c/301336/ an hour ago) ;) [09:35:00] Say I want to backport https://gerrit.wikimedia.org/r/#/c/298987 and get it out before morning swat, can I just go ahead and do that? Should I still add a tiny entry to the deployment calendar(even though there is nothing for hours)? [09:42:43] hashar: ^^ ;) [09:46:43] !log manually triggered debian-glue on all operations/debs repo that had no jenkins-bot vote. Via zuul enqueue on gallium and list fetched from "gerrit query --current-patch-set 'is:open NOT label:verified=2,jenkins-bot project:^operations/debs/.*'|egrep '(ref|project):'" [09:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:47:47] addshore: the policy is more or less: [09:47:50] * do not deploy on friday [09:48:10] * for devs deploy in the SWAT deploy window / custom window if that can be scheduled in advance [09:48:24] * bug fixes can be pushed out of the windows pending approval/notification to ops/releng [09:48:54] the later being more of a safeguard to avoid deploying potentially harmful / unreviewed code at odd hours (eg a friday evening hehe) [09:49:12] hehe, fair. okay, so I should just add it to morning swat? :) [09:49:38] well https://gerrit.wikimedia.org/r/#/c/298987/ is in wmf.12 already isn't it ? [09:49:57] yeh, so it would end up everywhere on thursday, I was going to land it on 11 too [09:50:04] if so it is going to be rolled tonight with the mw train ( https://tools.wmflabs.org/versions/ ) [09:50:14] by thursday evening it will be on all wikis [09:51:27] or you can add it to tonight swat slot [09:51:49] starting week of August 20th we will have a SWAT window during european business hours (2 or 3pm iirc?) [09:52:01] ooooooooh, that would be cool...! [09:52:17] then if that patch fix an issue that is kind of urgent for statsd/graphite data store Iguess we can deploy it now [09:52:35] hashar i thougt label:verified=2 is broken in gerrit 2.12.2 [09:52:43] and hi [09:52:58] label:verified=0 is broken [09:53:02] oh [09:53:03] other scores work :} [09:53:10] yep, i just remebered sorry. [09:53:11] :) [09:53:13] hashar, well, backporting it will avoid having 2 days worth of data coming in in 2 different formats, but I wouldn't really call it an urgent issue! [09:53:16] addshore: europe swat task is https://phabricator.wikimedia.org/T137970 :) [09:53:26] * paladox has just renewed norton security [09:53:40] addshore: with candidates at https://phabricator.wikimedia.org/T139544 . I guess we can enroll you as well [09:54:02] the aim is to be able to deploy hot fixes during our "normal" working time since 6pm CET is quite late [09:54:10] (or is it 5pm I can't remember) [09:54:32] and ultimately have european based folks to be trained on deployment so we get moaaar deployers [09:54:42] and devs rely less on releng [09:56:42] hashar it looks like it is trying to do [09:56:43] 09:55:42 E: could not update with cowdancer, try --no-cowdancer-update option [09:56:43] 09:55:42 forking: rm -rf /var/cache/pbuilder/build//cow.25895 [09:56:43] 09:55:42 + '[' 1 -eq 0 ']' [09:56:44] hashar: 5pm central, 6pm eastern europe (except when timezones do not match) [09:56:47] https://integration.wikimedia.org/ci/job/debian-glue/286/console [09:56:56] /var/cache/** [09:58:25] yeah it is really experimental [09:58:27] 06Release-Engineering-Team, 15User-zeljkofilipin: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2435541 (10Addshore) @greg I now have deployment access and am also based in Europe! (Just in case you want to add me to come list) Not totally familiar wi... [09:59:23] hashar: I added a comment poking greg to see if he wants to add me to the list ;) [09:59:24] 06Release-Engineering-Team, 15User-zeljkofilipin: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2498309 (10hashar) [09:59:53] and yes, waiting until the evening for dpeloys sucks, or staying up to midnight for me for the evening swat..... [10:00:02] 06Release-Engineering-Team, 15User-zeljkofilipin: Identify inaugural SWAT members for the European SWAT window - https://phabricator.wikimedia.org/T139544#2435541 (10hashar) @addshore I have enlisted you for SWAT conscription :-} [10:00:10] addshore: I have formally added you to the list :-} [10:00:15] awesome! [10:00:30] it is not rocket science really [10:00:40] the deploy steps are super detailled [10:00:42] and straightforward [10:00:47] oh [10:00:58] can be summarized as: git fetch && git diff HEAD..HEAD@{upstream} [10:01:19] then git rebase and one of scap sync-file or scap sync-dir [10:01:27] then be around for at least an hour [10:01:33] and monitor logs / graphs / alarms etc [10:01:40] and be prepared to rollback [10:01:53] that is the later (looking at the aftermath) that needs some training [10:01:53] hashar: indeed, I deployed 2 small things yesterday with thcipriani watching over me ;) [10:02:00] hurrah [10:02:24] ultimately releng will just deal with the process / training and everyone else being "certified" will just be able to deploy as needed [10:02:30] well that is MY view of the thing :} [10:02:39] hehe :) [10:03:45] kart_: if you are stil around. I have just triggered a Jenkins job on all of operations/debs/contenttranslation repos [10:04:02] hashar: yes. Thanks for that :) [10:04:10] kart_: it is most probably failing entirely on each of them but it is non voting. The job is kind of experimental so you can safely ignore the result [10:04:16] hashar: I found some issues, will fix it. [10:04:57] if you happen to look at the console feel free to fill tasks. Will be happy to look at them when I am back from vacations [10:05:43] hashar: sure! [10:07:31] kart_: things worth knowing, the job attempt to build the package in the distribution mentionned in debian/changelog [10:07:47] eg: jessie , jessie-wikimedia (which has backports + our custom package), or unstable [10:07:59] trusty and trusty-wikimedia should be supported as wel [10:07:59] l [10:08:19] dpkg-parsechangelog --show-field distribution -lsource/debian/changelog [10:10:18] 10MediaWiki-Releasing, 06Release-Engineering-Team, 06Operations, 10Parsoid: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2498327 (10fgiunchedi) the new key is this: ``` pub 4096R/22250DD7 2016-07-27 [expires: 2019-06-12] Key fingerprint = A6FD 76E2 A61C 556... [10:11:12] (03CR) 10Paladox: [C: 031] Switch to dh_virtualenv as a buildsystem [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/301306 (owner: 10Hashar) [10:11:15] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/301306 (owner: 10Hashar) [10:11:46] hashar: thanks. [10:12:14] (03CR) 10Paladox: [C: 04-1] "It fails with" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/301306 (owner: 10Hashar) [10:12:25] hashar: I've to check with alex about if he has uploaded to jessie or jessie-wikimedia (for apertium/cg3) that's main cause of almost all failures. [10:14:45] kart_: looks like it is going to be a magic fix so :} [10:15:12] hashar: yeah :) [10:15:21] kart_: the job reuses Alexandros hooks that dynamically inject apt.wm.o with proper components whenever WIKIMEDIA=yes or the distribution named has the '-wikimedia' suffix [10:15:26] that is all super magic and very helpful [10:15:32] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/299869 (owner: 10Hashar) [10:17:25] hashar it seems the patch you uploaded yesturday to fix jessie didnt work, see https://integration.wikimedia.org/ci/job/debian-glue/330/console please. [10:18:11] paladox: yeah that was a one time try :D [10:18:19] Oh [10:18:23] I mean [10:18:27] But why does it work for precise and not for jessie? [10:18:32] debian-glue is not going to work yet for integration/zuul [10:18:43] Oh [10:18:45] cause something is broken and has to be figured out [10:18:47] Oh [10:20:18] (03CR) 10Hashar: [C: 032] Add Beta features as dependency for ORES [integration/config] - 10https://gerrit.wikimedia.org/r/301319 (owner: 10Ladsgroup) [10:20:58] I think i may know since in gerrit it dosent show all the commits only those two files as being editted but in phabricator it shows alot more [10:20:59] https://phabricator.wikimedia.org/rCIZUc37b1a98e8a98d5b3aedc7f3db2251d11d21a7c4 [10:21:18] (03Merged) 10jenkins-bot: Add Beta features as dependency for ORES [integration/config] - 10https://gerrit.wikimedia.org/r/301319 (owner: 10Ladsgroup) [10:22:02] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/299545 (owner: 10Hashar) [10:24:32] kart_: just one lintian error for cg3 : https://integration.wikimedia.org/ci/job/debian-glue/331/testReport/junit/lintian/libcg3-0/symbols_file_contains_current_version_with_debian_revision_on_symbol_ASTType_str_Base_and_697_others/ :) [10:25:08] hashar: sure. Let me fix. [10:25:16] kart_: might not be a trivial one [10:25:41] hashar: yep. Look like that's not happening in unstable, so need to take deeper look. [10:29:49] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [10:44:01] (03Draft1) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301354 [10:45:34] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016), 15User-greg: Write blog post highlighting recent Phabricator improvements - https://phabricator.wikimedia.org/T137727#2498387 (10mmodell) OK I've published {J9} [10:45:57] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016), 15User-greg: Write blog post highlighting recent Phabricator improvements - https://phabricator.wikimedia.org/T137727#2498388 (10mmodell) 05Open>03Resolved a:03mmodell [10:49:35] (03PS2) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301354 [10:53:32] (03PS1) 10Hashar: debian-glue: lintian now fails instead of UNSTABLE [integration/config] - 10https://gerrit.wikimedia.org/r/301355 [10:57:43] hashar yay it build [10:57:44] https://integration.wikimedia.org/ci/job/debian-glue/335/ [10:57:46] for jessie [10:58:14] seems what ever was broken was fixed and is in debian/precise-wikimedia branch [10:58:24] based on patch https://gerrit.wikimedia.org/r/#/c/301354/ [10:59:05] (03PS3) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301354 [11:00:11] Ah that is why DEB_BUILD_OPTIONS=nocheck we forgot that << [11:00:18] https://gerrit.wikimedia.org/r/#/c/301354/3/debian/rules [11:01:40] I broke the test report [11:02:05] (03CR) 10Paladox: [C: 031] debian-glue: lintian now fails instead of UNSTABLE [integration/config] - 10https://gerrit.wikimedia.org/r/301355 (owner: 10Hashar) [11:02:15] Oh [11:03:17] Well seems to build for debian-jessie now yay [11:03:21] :) :) [11:03:34] but requires us to merge in debian/precise-wikimedia branch [11:03:40] which i seem to coulden do [11:03:45] paladox: eg https://integration.wikimedia.org/ci/job/debian-glue/336/ [11:03:49] it is marked as a SUCCESS [11:03:53] :) [11:03:55] but there is one test failure [11:03:59] Oh [11:04:15] Yay [11:05:16] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301354 (owner: 10Paladox) [11:07:28] https://integration.wikimedia.org/ci/job/debian-glue/339/ [11:07:31] failed but still [11:07:34] created the deb [11:07:41] brb, going for lunch :). [11:23:46] (03PS2) 10Hashar: debian-glue: lintian now fails instead of UNSTABLE [integration/config] - 10https://gerrit.wikimedia.org/r/301355 [11:24:03] paladox: debian-glue now fails whenever a lintian test fail :-} [11:24:10] instead of UNSTABLE [11:24:22] (03CR) 10Hashar: [C: 032] "Validated :-}" [integration/config] - 10https://gerrit.wikimedia.org/r/301355 (owner: 10Hashar) [11:26:05] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:26:09] (03Merged) 10jenkins-bot: debian-glue: lintian now fails instead of UNSTABLE [integration/config] - 10https://gerrit.wikimedia.org/r/301355 (owner: 10Hashar) [11:27:33] RECOVERY - Puppet run on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:53:15] Oh [11:53:18] yay [11:53:26] im back, but brb again. [12:12:32] 10Continuous-Integration-Config, 10Tool-Labs-tools-stewardbots, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2498531 (10MarcoAurelio) a:05MarcoAurelio>03None Not working on this right now. [12:16:04] Hi im back [12:16:25] and hashar i got jessie building for zuul yay [12:16:39] Sorry i ping you again [12:16:44] :-} [12:16:52] Could you if you have time [12:16:54] backport [12:17:01] debian/precise-wikimedia into [12:17:06] debian/jessie-wikimedia [12:17:07] please [12:17:21] Since i only did it as a test and didnt correctly do it how you would merge it. [12:17:22] :) [12:17:27] please [12:19:56] I found another bug in gerrit. [12:20:10] When you try and copy text in a diff it highlights the second line too [12:20:15] but dosent copy that bit. [12:20:26] It is fixed in codemirror newer release though [12:21:47] (03PS1) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301361 [12:23:43] ^^ fixed by using DEB_BUILD_OPTIONS=nocheck [12:23:46] thats why it failed. [12:24:25] and didnt build the debs [12:24:29] but now it does with DEB_BUILD_OPTIONS=nocheck [12:25:41] 10Deployment-Systems, 10scap, 10Analytics-Cluster, 06Analytics-Kanban, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2498547 (10elukey) @thcipriani sorry to bother you again, but we were wondering what would be the best way to migrate the repo. Afaiu merging... [12:28:26] paladox: yeah that is to skip the tests [12:28:33] Yep [12:28:40] it is supposedly injected by Zuul [12:29:06] Oh [12:29:16] ah no [12:29:18] it is not :-} [12:29:30] ideally the tests should pass [12:29:40] but then there is something funky happening with dh_virtualenv :( [12:29:45] Yep [12:29:56] it is not so trivial [12:30:10] gotta reproduce and dig in dh_virtualenv code [12:30:22] Oh [12:30:35] Would this https://github.com/jenkinsci/ssh-credentials-plugin/pull/5 fix the newer ssh [12:30:54] Even though it is not bouncy castle, we could rebase it and then build it and deploy it. [12:30:55] ? [12:32:08] Oh wait [12:32:09] never mind [12:32:22] that is really old what it is changing too, and isent even maintained. [12:32:27] https://github.com/hudson/ganymed-ssh-2 [12:34:13] paladox: there is a task in jenkins issue tracker that is about getting rid of trilead-ssh2 [12:34:24] the lib is apparently not maintained any more and lack the recent algos [12:34:24] Oh [12:34:28] Yep [12:34:46] They just updated https://wiki.jenkins-ci.org/display/JENKINS/Bouncy+Castle+API+Plugin [12:34:54] but the release date is wrong [12:35:24] plus only compatible with jenkins 2.16 yet that isent even release and it is showing as needing a update in the update manager. [12:35:33] lol and i am using jenkins 2.15 [12:35:33] !log hard rebooting integration-slave-trusty-1011 from Horizon. ssh lost, no log in Horizon. [12:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:37:06] PROBLEM - SSH on deployment-logstash2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:20] RECOVERY - SSH on integration-slave-trusty-1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [12:40:52] 10Continuous-Integration-Infrastructure: Jenkins lost ssh to slave integration-slave-trusty-1011 - https://phabricator.wikimedia.org/T141435#2498609 (10hashar) [12:42:44] 10Continuous-Integration-Infrastructure: Jenkins lost ssh to slave integration-slave-trusty-1011 - https://phabricator.wikimedia.org/T141435#2498622 (10hashar) 05Open>03Resolved a:03hashar I can ssh to the instance just fine. [12:43:15] Project beta-code-update-eqiad build #114596: 15ABORTED in 15 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/114596/ [12:43:40] !log restarted Jenkins for some trivial plugins updates [12:43:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:44:48] hashar which plugins were updated? [12:45:01] PROBLEM - Puppet staleness on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [12:48:48] hashar it seems that this https://issues.jenkins-ci.org/browse/JENKINS-31549 may also deffintly be related [12:48:53] to what is happening [12:49:59] RECOVERY - Puppet staleness on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [3600.0] [12:50:02] Yep deffintly since it leads us to https://github.com/jenkinsci/trilead-ssh2/blob/master/src/com/trilead/ssh2/crypto/digest/MAC.java [12:50:12] and sorry if i pingged you too much. [12:57:22] paladox: let me find the task [12:57:31] Found it [12:57:39] https://issues.jenkins-ci.org/browse/JENKINS-33021 [12:57:41] hashar ^^ [12:57:42] our is https://phabricator.wikimedia.org/T103351 [12:57:47] Yep [12:58:01] and yeah https://issues.jenkins-ci.org/browse/JENKINS-33021 is the upstream one [12:58:16] Yep [12:58:17] :) [12:58:34] Kind of a duplicate is https://issues.jenkins-ci.org/browse/JENKINS-31549 [12:58:40] filed for a different plugin [12:59:09] but a task needs to be filled for jenkins core to remove support for the old ssh and move forward with another one ie the one you suggested, bouncy castle [12:59:10] ? [12:59:14] we have a workaround for it [12:59:17] Oh [12:59:19] yeh [12:59:19] which is to not use the newer ssh algo [12:59:22] Yep [12:59:23] on the ci slaves [12:59:28] so that is really a non issue [12:59:38] Oh [12:59:41] yeh [12:59:51] but then it could make it a security hole [12:59:55] for any hacker to get it [12:59:57] it = in [13:01:16] Or we can use this fork [13:01:16] https://github.com/connectbot/sshlib [13:01:22] as a comment on the issue said [13:01:47] it adds support for newer ssh, but we should only use that until upstream move along and go with an updated ssh implementation [13:01:48] ? [13:02:13] Does that support the ssh key's we need supporting [13:02:16] for newer ssh [13:02:17] ? [13:03:07] https://github.com/connectbot/sshlib/blob/master/sshlib/src/main/java/com/trilead/ssh2/crypto/digest/MAC.java [13:03:20] We can use ^^ [13:04:28] talk to upstream about it [13:04:42] as I said it is a non issue for us [13:04:47] we have a workaround [13:04:54] Ok [13:05:23] you can potentially mention https://github.com/connectbot/sshlib on the upstream task [13:05:31] but really I have no idea which ssh lib they will want to use [13:05:49] there are several java implementation of java and they might be using another one already [13:06:03] Oh [13:06:36] It is already mentioned upstream [13:06:44] on the comment before yours [13:06:51] see :-} [13:07:01] it is all being sorted out ;-} [13:08:14] What i mean is the authors didnt do that, but another user used that implementation and it worked for them [13:08:21] since it is another fork and is being updated [13:13:28] that's why, the plugin also uses https://github.com/is/jsch [13:13:42] which well uses the old ssh implementation and not newer ones. [13:13:51] So that will also need to be migrated off. [13:19:29] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team: Beta puppetmaster cherry-pick process - https://phabricator.wikimedia.org/T135427#2498703 (10hashar) Here is the current state on the beta cluster puppetmaster: `git log --format='| %an | %s' HEAD@{upstream}..` | Andrew Otto | Hieraize eventlogging_ka... [13:21:30] (03PS1) 10Lethexie: Report warnings when $dbr->query() is used instead of $dbr->select(). [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/301364 [13:22:26] (03Abandoned) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301354 (owner: 10Paladox) [13:22:36] (03Abandoned) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301361 (owner: 10Paladox) [13:32:39] (03Restored) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301361 (owner: 10Paladox) [13:32:51] (03PS2) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301361 [13:32:57] hashar: any idea if there is some cool google calander for deployments? ;) [13:33:46] (03Abandoned) 10Paladox: Testing [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/301361 (owner: 10Paladox) [13:33:48] (03Abandoned) 10Hashar: Switch to dh_virtualenv as a buildsystem [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/301306 (owner: 10Hashar) [13:34:10] (03CR) 10Paladox: [C: 031] 2.1.0-391-gbc58ea3-wmf1precise1 [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [13:34:17] (03CR) 10Paladox: [C: 031] 2.1.0-391-gbc58ea3-wmf1precise1 [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [13:39:19] paladox: you should really stop spamming gerrit [13:39:27] https://gerrit.wikimedia.org/r/#/c/300532/ ends up being completely cluttered [13:39:29] oh sorry [13:40:04] oh [13:40:10] that is just the commit message apparently [13:40:11] bah [13:40:19] oh [13:51:23] (03PS10) 10Hashar: 2.1.0-391-gbc58ea3-wmf1precise1 [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 [13:52:14] (03CR) 10Hashar: "The paramiko fix was missing https://gerrit.wikimedia.org/r/#/c/299136/" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [13:58:02] (03CR) 10Hashar: "Pushed at 13:54 on https://people.wikimedia.org/~hashar/debs/zuul_2.1.0-391-gbc58ea3/" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [13:58:40] :) [14:00:18] thcipriani: joining [14:00:56] hashar: also joining :) [14:03:31] !log upgraded zuul on gallium via dpkg -i /root/zuul_2.1.0-391-gbc58ea3-wmf1precise1_amd64.deb (revert is zuul_2.1.0-151-g30a433b-wmf4precise1_amd64.deb ) [14:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:04:50] (03CR) 10Hashar: [C: 032] 2.1.0-391-gbc58ea3-wmf1precise1 [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [14:05:03] (03CR) 10Hashar: "Deployed and running" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/300532 (owner: 10Hashar) [14:06:16] :) [14:15:51] (03PS1) 10Mholloway: Make apps-android-wikipedia-lint voting [integration/config] - 10https://gerrit.wikimedia.org/r/301370 (https://phabricator.wikimedia.org/T141440) [14:17:31] (03CR) 10Mholloway: [C: 04-1] "Holding at -1 for team discussion." [integration/config] - 10https://gerrit.wikimedia.org/r/301370 (https://phabricator.wikimedia.org/T141440) (owner: 10Mholloway) [14:22:08] 10Continuous-Integration-Config, 06Wikipedia-Android-App-Backlog, 13Patch-For-Review: [Dev] Fix periodic tests - https://phabricator.wikimedia.org/T139137#2498851 (10Mholloway) @hashar, that's great! Should we have the new periodic test job start sending results to #wikimedia-android-ci, or is it temporary? [14:23:39] mdholloway|brb: I am talking about that job with thcipriani right now :} [14:23:41] he will take the lead [14:23:43] to polish it up [14:34:26] hashar: sounds good :) [15:01:42] mdholloway: so the summary is. The job works on Jessie but I have manually fixed it over the week-end with trial and errors until it passed :D [15:02:18] mdholloway: and since I am in vacations next weeks, it sounded like a good idea to enroll thcipriani so he can support you / the job in case something screw up :} [15:02:30] :D [15:02:41] normalizing the job should be easy after a discussion we just add [15:02:55] excellent! [15:03:03] just a few changes in the job definition, namely point it to the jessie instance integration-android-jessie (or some name like that) [15:03:13] point jdk to 'Debian - OpenJDK 8' [15:03:18] and it might just do it [15:03:31] a tricky part will be to hack /etc/hosts so that localhost always resolve to 127.0.0.1 [15:23:27] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 06Release-Engineering-Team: Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI - https://phabricator.wikimedia.org/T139771#2499034 (10Andrew) [15:23:30] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2499033 (10Andrew) [15:23:44] 05Continuous-Integration-Scaling, 13Patch-For-Review, 07WorkType-NewFunctionality: Migrate mediawiki-core-phpcs job to Nodepool - https://phabricator.wikimedia.org/T133976#2499038 (10Andrew) [15:23:47] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 07WorkType-NewFunctionality: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2499039 (10Andrew) [15:23:56] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2248624 (10Andrew) 05Open>03Resolved a:05Andrew>03None [15:25:59] 10Continuous-Integration-Infrastructure, 06Labs, 10Labs-Infrastructure: Nodepool has trouble taking snapshots on OpenStack labs - https://phabricator.wikimedia.org/T138106#2499051 (10Andrew) Is this still failing, or are things resolved now that we have increased labs capacity? [15:32:02] ostriches hi, i found another bug in gerrit, when you try and copy text in a diff it highlights two lines. [15:32:13] I am aware that is because of codemirrors which fixed it [15:32:32] Also could you review https://gerrit.wikimedia.org/r/#/c/301381/ please. [15:32:33] :) [15:39:43] hashar (sorry for ping) i get good coverage in france with three mobiles feel at home [15:39:44] http://www.three.co.uk/Support/Roaming_and_international/Mobile_roaming?content_aid=1214306374696 [15:41:18] hashar: got a "can't connect to mysql" failure again [15:41:50] You get 50gb for 15.99 euroes. [15:42:11] I get 1000gb (1tb) for £25 from three [16:11:47] thcipriani: ah another culprit: mysql dies randomly on CI slaves :( [16:11:52] Glaisher: restarting it [16:12:02] :) [16:12:20] hashar: it didn't happen after a recheck [16:12:30] https://gerrit.wikimedia.org/r/301349 [16:12:59] !log salt -v '*slave-trusty*' cmd.run 'service mysql start' ( was missing on integration-slave-trusty-1011.integration.eqiad.wmflabs ) [16:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:13:06] which I have rebooted a few hours ago [16:13:15] so that is definitely mysql not coming up on instance boot [16:14:48] Glaisher: fixed thank you [16:24:03] 10Continuous-Integration-Infrastructure: mysql does not start when Trusty instances spawn - https://phabricator.wikimedia.org/T141450#2499220 (10hashar) [16:24:07] Glaisher: filled as https://phabricator.wikimedia.org/T141450 :) [16:25:04] thanks for looking into it :) [16:27:50] 10Continuous-Integration-Infrastructure: mysql does not start when Trusty instances spawn - https://phabricator.wikimedia.org/T141450#2499235 (10hashar) [16:28:34] Yay ssh works [16:28:43] i am using ssh on ubuntu on windows [16:57:48] 10Continuous-Integration-Infrastructure, 06Operations, 10Packaging, 10Traffic: piuparts fail with WARN: Broken symlinks: /etc/systemd/system... - https://phabricator.wikimedia.org/T141454#2499359 (10hashar) [16:57:57] 10Continuous-Integration-Infrastructure, 06Operations, 10Packaging, 10Traffic: piuparts fail with WARN: Broken symlinks: /etc/systemd/system... - https://phabricator.wikimedia.org/T141454#2499372 (10hashar) p:05Triage>03Low [17:08:12] 10Deployment-Systems, 10scap, 10Analytics-Cluster, 06Analytics-Kanban, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499422 (10thcipriani) >>! In T129151#2498547, @elukey wrote: > @thcipriani sorry to bother you again, but we were wondering what would be th... [17:14:55] thcipriani: I think I figured out the mystery of the problem with the ~l10nupdate/.gitconfig file -- https://gerrit.wikimedia.org/r/#/c/301405/ [17:15:37] heh, that makes sense, I guess. That was definitely a frustrating one :) [17:16:05] yeah. User and LDAP react in strange ways [17:19:12] 06Release-Engineering-Team, 06Developer-Relations: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2499441 (10greg) [17:19:44] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016), 15User-greg: Write blog post highlighting recent Phabricator improvements - https://phabricator.wikimedia.org/T137727#2376793 (10greg) [17:19:46] 06Release-Engineering-Team, 06Developer-Relations: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2499455 (10greg) [17:27:40] 06Release-Engineering-Team (Long-Lived-Branches): Static asset time on disk - https://phabricator.wikimedia.org/T140921#2480982 (10mmodell) I discussed this at some length with @demon and @thcipriani. These are the two choices we came up with, either one should work. #1 would save disk space, however, #2 might... [17:28:08] Who would ever do this rm -rf /mnt/c/ [DO NOT DO THIS] [17:28:08] to the windows os. [17:28:09] LOL [17:28:12] in ssh. [17:31:52] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016), 15User-greg: Write blog post highlighting recent Phabricator improvements - https://phabricator.wikimedia.org/T137727#2499478 (10mmodell) [18:02:06] 06Release-Engineering-Team, 06Developer-Relations, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2499610 (10Aklapper) [18:09:11] thcipriani, ostriches: this patch chain -- https://gerrit.wikimedia.org/r/#/q/topic:deployment-server -- is on striker-deploy03.striker.eqiad.wmflabs and puppet is running cleanly. :) [18:09:34] * bd808 eats lunch before diving into the next rabbit hole [18:09:48] I +1'd some already :) [18:10:55] \o/ neat. [18:11:53] https://gerrit.wikimedia.org/r/#/c/301404/ is interesting though. php-dbg doesn't provide it? Wouldn't it be best to be explicit as to what provides it? [18:12:06] php5-dbg on its own won't provide that directory. [18:13:49] (tangent: I don't think anything uses mediawiki::packages::legacy anymore) [18:14:07] 10Deployment-Systems, 10scap, 10Analytics-Cluster, 06Analytics-Kanban, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499623 (10elukey) >>! In T129151#2499422, @thcipriani wrote: > > Sorry for the wall of text, but it is to say: you can get back to your pre... [18:14:10] (nvm bad grep, ignore me) [18:20:23] PROBLEM - Host deployment-upload is DOWN: CRITICAL - Host Unreachable (10.68.16.189) [18:30:05] (03PS6) 10Madhuvishy: Add job that allows for updating analytics refinery artifacts with latest source jars [integration/config] - 10https://gerrit.wikimedia.org/r/290640 (https://phabricator.wikimedia.org/T130123) [18:30:07] 10Deployment-Systems, 10scap, 10Analytics-Cluster, 06Analytics-Kanban, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2499656 (10Ottomata) > 1. Run puppet on the targets. The main thing that changes here is that the ownership of /srv/deployment/analyti... [18:30:54] hasharAway: can you CR/merge this when you get a chance? https://gerrit.wikimedia.org/r/#/c/290640/ [18:31:06] it's the only patch pending in all of that [18:53:34] 07Browser-Tests, 10MobileFrontend, 06Reading-Web-Backlog, 03Reading-Web-Sprint-77-Segmentation-fault, and 4 others: Spike [2hrs] Wikidata description browser tests do not run anywhere - https://phabricator.wikimedia.org/T137756#2499723 (10Jdlrobson) No pairing needed. The above patch is sufficient. [18:53:35] for requests like "reset my password" that need shell access, frankly i dont see why that is an operations issue. number of deployers: ~ 60 number of ops on duty: 1 [18:59:09] Why would it be an ops issue? [18:59:14] Wiki password reset? [18:59:18] tags on phab [18:59:34] honestly i thought first i'd be OTRS [18:59:46] but it's site requests [19:00:01] yes, but email and password reset [19:00:08] this guy https://phabricator.wikimedia.org/T141401 [19:00:46] It's site requests, it's not operations, it's certainly not release engineering [19:01:01] it might be security-related I guess [19:01:19] it won't fit into a foundation team neatly [19:01:24] yes, sorry, i just used that because it seemed close to "all deployers" [19:01:28] agreed [19:01:39] don't need deployment, can be done by restricted [19:01:51] yea, but didnt we just find out that is just 1 person :p [19:02:03] on that ticket about getting rid of restricted [19:02:38] well, not 1 .. we were checking it though [19:02:39] no? [19:03:09] you're thinking of https://phabricator.wikimedia.org/T104671 ? [19:04:15] yes [19:04:36] site-requests makes sense to me, no? [19:08:33] Krenair: mutante historically folks with access to the the site handled the password requests [19:08:51] I and Reedy definitely handled a bunch of password restore [19:09:15] Indeed [19:09:31] we should drop #operations imho [19:09:50] #Wikimedia-Site-requests should notify the appropriate set of folks [19:10:36] yup, +1 [19:10:45] I didn't drop #operations since I didn't add it ;) [19:11:18] mutante: ftr: that wasn't me dumping it on ops :), just trying to to get close to "right" with #-site-requests [19:13:06] Yay i setup a test user on my ubuntu on windows, and created test@*** and has sudo rights. [19:13:49] greg: yes, i saw that. i was also justtrying to get close to "right". it seems it would be a group that includes all people with any kind of shell.. if site-requests works then it's right [19:14:46] Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 3.4.0+ x86_64) [19:19:15] mutante: I have unsubscribed you from the task and dropped #operations [19:19:53] hasharAway: ok, thanks [19:33:27] greg-g I don't see a reason for MF to block the train [19:41:16] I have reached out to 'jem' on #wikipedia-es and updated a few things on the task [19:41:49] namely that identity has to be confirmed. Then looking at the email provider, I am pretty sure the email pass can be reset easily (poke mutante / reedy ) [20:20:33] Project beta-update-databases-eqiad build #10202: 04FAILURE in 32 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10202/ [20:23:41] DB connection error: Can't connect to MySQL server on '10.68.16.193' [20:23:51] and I am not going to fix it :D [20:24:15] I just restarted the job with fingers crossed that it was transient [20:24:57] Yippee, build fixed! [20:24:58] Project beta-update-databases-eqiad build #10203: 09FIXED in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/10203/ [20:25:15] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T139215#2500112 (10Krinkle) [20:28:37] bd808: it was transient! :] [20:29:05] sunspots! [20:38:55] thcipriani: ostriches: twentyafterfour: any of you wanna debug the ongoing CI / Nodepool instances outage going on ? :] [20:39:07] https://integration.wikimedia.org/zuul/ shows stallen change for up to 33 minutes [20:39:23] caught out of pure luck watching the Gearman job queue graph at the bottom [20:39:38] * thcipriani looks [20:39:48] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T139215#2500123 (10Krinkle) [20:40:12] greg-g: sorry :) [20:41:13] thcipriani: do spam question / what you do here as needed :] [20:41:37] hashar: looks like all the nodepool nodes are in a delete state [20:42:05] logs show: Forbidden: Quota exceeded for instances: Requested 1, but already used 10 of 10 instances (HTTP 403) but I think that's a red herring [20:42:26] trying to manually delete instance 202181 [20:42:40] yeah [20:42:51] wonder why it cant delete really [20:42:55] seems to be looping HTTP requests. [20:42:59] to -labs! [20:43:10] that would be my assumption :D [20:43:17] something is borked / exploded in the openstakc api hehe [20:43:35] another tip [20:43:42] nodepool list|grep -c ci- [20:43:49] shows we have 9 instances but the quota is 10 [20:44:08] nodepool creates instance to generate the snapshots, so that count against the quota [20:44:19] can be seen via nodepool image-list [20:44:36] | 890 | wmflabs-eqiad | ci-trusty-wikimedia | ci-trusty-wikimedia-1469628842 | building | 6.50 | [20:44:54] been building for 6,5 hours so it is definitely broken somehow and can be deleted to reclaim an instance [20:45:14] with: nodepool image-delete [20:45:31] puppet on trusty has been broken for a while. Not sure why though [20:52:15] thcipriani: I am pretty sure nodepool will be the least of your problems [20:52:31] I hope so :P [20:52:34] it is really doing nothing but punching buttons [20:52:53] that's what I do all day [20:52:57] though if it lost access to the db it might get a bit funky but should delete all instances and respawn fresh one [20:52:59] or [20:53:02] restart! :] [20:53:07] weird that there was no real impact here: https://grafana.wikimedia.org/dashboard/db/releng-kpis?panelId=5&fullscreen [20:53:25] yeah that is what I was saying in my email to releng list [20:53:36] I am not sure how helpful that graph is going to be [20:53:37] and [20:53:41] that is the one hour moving average [20:54:00] the issue started roughly 40 minutes ago so it should show effect on the graph [20:54:22] huh. [20:54:25] or it is broken :] [20:55:14] https://grafana.wikimedia.org/dashboard/db/releng-zuul?panelId=10&fullscreen [20:55:20] that is the gearman queue [20:55:23] for the other graph [20:55:38] the changes haven't been assigned an instance yet so they are still waiting and havent reported their wait time yet [20:56:09] now that instance can be deleted/spawned jobs get allocated and report how long they have waited [20:56:24] so you the graph should show a bump over the next half hour or so [20:57:53] and the trusty image is broken due to grub upgrade for some reason ... [20:58:08] (found via /var/log/nodepool/image.log ) [20:59:01] what do you see in the nodepool/image.log? [21:02:38] nodepool spawning an instance out of the reference image [21:02:49] then connecting to it with ssh and executing the setup.sh script [21:02:56] which does git pull and run puppet [21:03:10] all that mess / provisionning is in integration/config.git under /dib/ [21:03:31] every day Nodepool redo that [21:03:40] to try to keep the snapshot up-to-date [21:20:30] have a happy day! [21:47:19] (03PS1) 10Cscott: Move Parsoid roundtrip and toolcheck tests into npm scripts in Parsoid repo [integration/config] - 10https://gerrit.wikimedia.org/r/301499 (https://phabricator.wikimedia.org/T141481) [21:58:41] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, 07RfC: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2500359 (10ZhouZ) Is there a potential, perhaps corner case issue, that the commit individual is not actually the person contributing t... [22:22:48] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10MediaWiki-extensions-DonationInterface, 03Fundraising Sprint Nitpicking, 07Unplanned-Sprint-Work: Continuous integration: DonationInterface needs composer variant - https://phabricator.wikimedia.org/T141309#2500420 (10awight) [22:29:00] 06Release-Engineering-Team, 06Operations, 15User-greg: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2500477 (10greg) [22:29:06] 06Release-Engineering-Team, 06ArchCom, 06Developer-Relations, 10Phabricator, 15User-greg: Consider alternative processes for Unbreak Now bugs, especially those which cross-cut components - https://phabricator.wikimedia.org/T140207#2500474 (10greg) 05Open>03Resolved a:03greg #wikimedia-incident crea... [22:31:12] twentyafterfour hi, i think i have an idea for getting the refs/meta/config links working in gerrit [22:31:20] but need help with it phabricator side [22:31:26] please [22:32:50] 06Release-Engineering-Team, 15User-greg: Add #Wikimedia-Incident to all open "actionables" in past incident reports - https://phabricator.wikimedia.org/T141493#2500484 (10greg) [22:34:34] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 05Wikimedia-Incident: Setup poolcounter daemon in Beta Cluster - https://phabricator.wikimedia.org/T38891#2500497 (10greg) [22:35:29] paladox: ok [22:35:43] twentyafterfour we can strip out refs/meta/config [22:35:45] to be refs/ [22:35:55] then add the commit at the end of the url [22:36:11] so it will look like https://phabricator.wikimedia.org/diffusion/MW/browse/refs/;475af966c47b75749c21525903e75869b4128bef [22:36:21] 06Release-Engineering-Team, 06ArchCom, 06Developer-Relations, 10Phabricator, and 2 others: Consider alternative processes for Unbreak Now bugs, especially those which cross-cut components - https://phabricator.wikimedia.org/T140207#2500504 (10greg) [22:36:30] That way it wont affect other branches either hopefully. [22:36:37] 06Release-Engineering-Team, 15User-greg, 05Wikimedia-Incident: Identify "first responders" for "all" "components" deployed on Wikimedia servers - https://phabricator.wikimedia.org/T141066#2500505 (10greg) [22:36:53] I think we need to edit this part [22:36:54] private function getBranchNameFromRef($branch) { [22:36:54] // get rid of refs/heads prefix [22:36:54] $branch = str_replace('refs/heads', '', $branch); [22:36:54] $branch = trim($branch, '/'); [22:36:54] $branch = str_replace('HEAD', '', $branch); [22:36:56] // double encode any forward slashes in ref. [22:36:58] $branch = str_replace('%2F', '%252F', $branch); [22:37:00] $branch = str_replace('/', '%252F', $branch); [22:37:02] return $branch; [22:37:04] } [22:37:17] to also do it for refs/meta/config replacing it with just refs/ or anything [22:38:02] But if upstream add support for also doing name/name but that branch dosent exist instead relying on the commit [22:38:16] that will be good, since it only supports it for name not name/name [22:39:17] 10Continuous-Integration-Config, 06Release-Engineering-Team, 10DBA, 10Datasets-General-or-Unknown, and 3 others: Automatize the check and fix of object, schema and data drifts between production masters and slaves - https://phabricator.wikimedia.org/T104459#2500509 (10greg) [22:41:36] 03releng-201617-q1, 10Continuous-Integration-Infrastructure (phase-out-gallium), 05Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2500520 (10greg) [22:42:47] 10scap, 03Scap3 (Scap3-MediaWiki-MVP), 05Wikimedia-Incident: Implement MediaWiki pre-promote checks - https://phabricator.wikimedia.org/T121597#2500534 (10greg) [22:43:11] 10Deployment-Systems, 10scap, 10MediaWiki-API, 03Scap3 (Scap3-MediaWiki-MVP), and 3 others: Create a script to run test requests for the MediaWiki service - https://phabricator.wikimedia.org/T136839#2500536 (10greg) [22:43:46] 06Release-Engineering-Team (Long-Lived-Branches), 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 07Spike, 05Wikimedia-Incident: Spike: Plan reforms of the CentralNotice deployment branch - https://phabricator.wikimedia.org/T136904#2500538 (10greg) [22:56:26] 06Release-Engineering-Team, 15User-greg: Add #Wikimedia-Incident to all open "actionables" in past incident reports - https://phabricator.wikimedia.org/T141493#2500584 (10greg) Made it back through May 2016. Done: * https://wikitech.wikimedia.org/wiki/Incident_documentation/20160713-ContentTranslation * https... [23:25:00] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, 07RfC: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2500642 (10Bawolff) >>! In T139300#2500359, @ZhouZ wrote: > Is there a potential, perhaps corner case issue, that the commit individual... [23:50:26] Hi, could someone run maintenance/updateCollation.php on en.wikipedia.beta.wmflabs.org? [23:51:44] you don't have access to do it? [23:51:52] we should fix that [23:54:30] No, I don't have access to the beta cluster. [23:55:56] Dereckson, try logging in now [23:59:04] Krenair: works [23:59:16] thanks