[00:05:45] PROBLEM - App Server bits response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:05:45] PROBLEM - App Server bits response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:32] RECOVERY - App Server bits response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.002 second response time [00:10:34] RECOVERY - App Server bits response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.003 second response time [00:34:58] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:34:43] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave can take over 30s to respond - https://phabricator.wikimedia.org/T95971#1205472 (10Krinkle) [01:34:52] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave can take over 30s to respond - https://phabricator.wikimedia.org/T95971#1205329 (10Krinkle) p:5Triage>3High [01:40:13] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Separate BoilerPlate extension from extension/examples - https://phabricator.wikimedia.org/T94279#1205514 (10Krinkle) p:5Triage>3Low [01:40:25] 10Continuous-Integration, 5Patch-For-Review: Migrate all debian-glue jobs to Jessie slaves - https://phabricator.wikimedia.org/T95545#1205516 (10Krinkle) p:5Triage>3Normal [01:40:32] 10Continuous-Integration: Phase out yamllint jobs (tracking) - https://phabricator.wikimedia.org/T95890#1205518 (10Krinkle) p:5Triage>3Low [01:40:38] 10Continuous-Integration: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#1205520 (10Krinkle) p:5Triage>3Normal [01:41:54] 10Continuous-Integration, 7Jenkins: limit jenkins user git setting pack.windowMemory to 2GB - https://phabricator.wikimedia.org/T58717#598307 (10Krinkle) [01:43:15] 10Continuous-Integration, 6Mobile-Web: Jenkins: Set up jsduck test and publish jobs for MobileFrontend - https://phabricator.wikimedia.org/T66374#1205535 (10Krinkle) p:5Normal>3Low [01:47:24] (03PS1) 10Krinkle: mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 [01:49:25] (03CR) 10jenkins-bot: [V: 04-1] mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 (owner: 10Krinkle) [01:49:44] (03PS2) 10Krinkle: mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 [01:50:10] (03CR) 10Krinkle: [C: 032] mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 (owner: 10Krinkle) [01:51:38] (03CR) 10jenkins-bot: [V: 04-1] mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 (owner: 10Krinkle) [01:57:28] (03PS1) 10Krinkle: Remove jobs for mediawiki/extensions/Hanp to unbreak config [integration/config] - 10https://gerrit.wikimedia.org/r/204005 [02:01:14] (03PS2) 10Krinkle: Remove jobs for mediawiki/extensions/Hanp to unbreak config [integration/config] - 10https://gerrit.wikimedia.org/r/204005 [02:01:40] (03CR) 10Krinkle: [C: 032] "Please don't remove repositories from Gerrit without notifying CI." [integration/config] - 10https://gerrit.wikimedia.org/r/204005 (owner: 10Krinkle) [02:04:06] (03Merged) 10jenkins-bot: Remove jobs for mediawiki/extensions/Hanp to unbreak config [integration/config] - 10https://gerrit.wikimedia.org/r/204005 (owner: 10Krinkle) [02:21:39] twentyafterfour btw, remember me ranting about txstatsd stupid behavior repeating previous points if there are no new ones? [02:21:39] we’ve switched to statsite and that no longer happens!!1 [02:21:39] see for example http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1428977955.751&target=tools.tools-services-02.WebServiceMonitor.startsuccess [02:21:39] missing points are actually missing!!1 [02:27:07] (03PS3) 10Krinkle: mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 [02:27:23] (03CR) 10Krinkle: [C: 032] mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 (owner: 10Krinkle) [02:29:37] (03Merged) 10jenkins-bot: mwext-VisualEditor-publish: Replace bin/generateDocs.sh with 'npm run doc' [integration/config] - 10https://gerrit.wikimedia.org/r/204004 (owner: 10Krinkle) [02:34:14] 10Continuous-Integration, 6Mobile-Web: Jenkins: Set up jsduck test and publish jobs for MobileFrontend - https://phabricator.wikimedia.org/T66374#1205575 (10Krinkle) Since last year, the infrastructure improved and the process for extending doc.wikimedia.org has been simplified. Example: [mwext-GuidedTour-jsdu... [03:10:07] Yippee, build fixed! [03:10:07] Project beta-update-databases-eqiad build #8897: FIXED in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/8897/ [03:10:28] Yippee, build fixed! [03:10:29] Project beta-code-update-eqiad build #51674: FIXED in 2 min 5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/51674/ [03:32:50] Yippee, build fixed! [03:32:51] Project beta-scap-eqiad build #48931: FIXED in 22 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/48931/ [06:09:42] YuviPanda: awesome :) [06:25:41] Yippee, build fixed! [06:25:42] Project browsertests-VisualEditor-production-linux-firefox-sauce build #57: FIXED in 1 hr 25 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-production-linux-firefox-sauce/57/ [08:07:47] 6Release-Engineering: Read "Vagrant: Up and Running" book - https://phabricator.wikimedia.org/T95401#1205890 (10zeljkofilipin) 5Open>3Resolved [08:24:51] 10Continuous-Integration, 10MediaWiki-Unit-tests, 7JavaScript: Apache on Jenkins slave can take over 30s to respond - https://phabricator.wikimedia.org/T95971#1205925 (10hashar) We had a labs outage yesterday and it was probably still going on at the time the report has been created. [08:45:00] 10Continuous-Integration, 3Continuous-Integration-Isolation, 10Wikimedia-Labs-Infrastructure: Support dedicating a specific virt node to a specific nova project - https://phabricator.wikimedia.org/T84989#1205967 (10hashar) I once filled a bug (T59833) to get the bastion instances to be on different hosts, th... [08:46:31] !log does qa-morebots works ? [08:46:35] Logged the message, Master [09:29:36] (03PS3) 10Hashar: (WIP) debian-glue job for Zuul (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/203347 [09:31:33] (03PS1) 10Hashar: Add tox-py27 to wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204028 (https://phabricator.wikimedia.org/T95894) [09:31:45] (03CR) 10Hashar: [C: 032] Add tox-py27 to wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204028 (https://phabricator.wikimedia.org/T95894) (owner: 10Hashar) [09:33:28] (03Merged) 10jenkins-bot: Add tox-py27 to wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204028 (https://phabricator.wikimedia.org/T95894) (owner: 10Hashar) [09:35:05] 10Continuous-Integration, 5Patch-For-Review: Status of Jouncebot and dropping the yamllint Jenkins job - https://phabricator.wikimedia.org/T95894#1206049 (10hashar) I have added the Jenkins job `tox-py27` which fails compiling the `lxml` python module because of `libxml/xmlversion.h: No such file or directory`. [09:35:27] (03PS1) 10Aude: Add wikidata people [integration/config] - 10https://gerrit.wikimedia.org/r/204029 [09:51:45] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:59:02] (03PS1) 10Hashar: Remove yamllint from wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204033 (https://phabricator.wikimedia.org/T95894) [09:59:21] (03CR) 10Hashar: [C: 032] Remove yamllint from wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204033 (https://phabricator.wikimedia.org/T95894) (owner: 10Hashar) [10:01:12] (03Merged) 10jenkins-bot: Remove yamllint from wikimedia/bots/jouncebot [integration/config] - 10https://gerrit.wikimedia.org/r/204033 (https://phabricator.wikimedia.org/T95894) (owner: 10Hashar) [10:01:45] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [10:02:25] 10Continuous-Integration, 5Patch-For-Review: Status of Jouncebot and dropping the yamllint Jenkins job - https://phabricator.wikimedia.org/T95894#1206088 (10hashar) a:3bd808 Thanks a ton @bd808 , the patch even provided the foundations to add some more tests as needed. [10:02:42] 10Continuous-Integration: Phase out yamllint jobs (tracking) - https://phabricator.wikimedia.org/T95890#1206091 (10hashar) [10:25:06] (03CR) 10JanZerebecki: [C: 031] "Please merge and deploy." [integration/config] - 10https://gerrit.wikimedia.org/r/204029 (owner: 10Aude) [10:25:36] (03CR) 10Ebrahim: [C: 031] "Thank you :-)" [integration/config] - 10https://gerrit.wikimedia.org/r/204029 (owner: 10Aude) [10:28:25] hasharLunch: darn, yer out to lunch [10:28:31] i'll holler at you when you get back [10:28:42] and by that i mean, i'm watching this channel like a hawk [10:28:56] and by that i mean, i'm idling and i'll check back when i remember [10:29:06] and by that i mean, i won't check back, see you later, kthx [10:50:33] 10Continuous-Integration, 6Mobile-Web, 10WikiGrok: Include WikidataBuildResources in the WikiGrok extensions build dependencies - https://phabricator.wikimedia.org/T96012#1206167 (10phuedx) 3NEW [10:53:41] PROBLEM - App Server bits response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:55:18] PROBLEM - App Server bits response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:55:24] 10Continuous-Integration, 6Mobile-Web, 10WikiGrok: Include WikidataBuildResources in the WikiGrok extensions build dependencies - https://phabricator.wikimedia.org/T96012#1206180 (10phuedx) [10:58:32] RECOVERY - App Server bits response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.014 second response time [10:59:06] RECOVERY - App Server bits response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.002 second response time [11:02:21] 6Release-Engineering: Read "Vagrant: Up and Running" book - https://phabricator.wikimedia.org/T95401#1206198 (10zeljkofilipin) Deleted the previous two repositores, nothing useful there. Created a new one, testing my vagrant-fu with creating a minimal machine that can run selenium tests in ruby. https://github.c... [11:34:17] hasharLunch: I've moved most tags we triaged in the past from Untriaged to Backlog [11:34:29] Looks like we've got a manageable 31 tasks left to triage today [11:34:30] Yay [11:34:30] great thanks ! [11:34:42] lets do some triage so [11:34:58] I reviewed the list of actions from past meeting and there is not much to say [11:36:20] (03Abandoned) 10Hashar: yamllint: actually realize the yaml linting [integration/jenkins] - 10https://gerrit.wikimedia.org/r/144150 (owner: 10Hashar) [11:37:12] 10Continuous-Integration: Remove tools/yamllint.py from integration/jenkins once yamllint jobs disappear - https://phabricator.wikimedia.org/T96014#1206245 (10hashar) 3NEW [11:38:54] 10Continuous-Integration, 3Continuous-Integration-Isolation, 10Wikimedia-Labs-Infrastructure: Support dedicating a specific virt node to a specific nova project - https://phabricator.wikimedia.org/T84989#1206253 (10hashar) Related change by @andrew @chasemp [[ https://gerrit.wikimedia.org/r/#/c/203969/ | Ger... [12:06:08] nap and back in an hour or so [12:33:18] PROBLEM - Puppet staleness on deployment-bastion is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [12:37:35] PROBLEM - SSH on deployment-lucid-salt is CRITICAL: Connection refused [12:38:01] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1206324 (10mobrovac) [13:29:59] 6Release-Engineering, 10MediaWiki-Debug-Logging, 6Security-Team, 6operations, 5Patch-For-Review: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1206390 (10fgiunchedi) just for reference, sampling is defined in $wgDebugLogGroups in InitialiseSettings.php and currently at 1000... [14:00:42] lets start the CI meeting [14:22:18] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure: Diamond metrics for cpu.system suddenly up 100% after a reboot - https://phabricator.wikimedia.org/T95912#1206480 (10Krinkle) p:5Triage>3Unbreak! [14:22:37] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure: Diamond metrics for cpu.system suddenly up 100% after a reboot - https://phabricator.wikimedia.org/T95912#1206482 (10Krinkle) p:5Unbreak!>3Normal [14:24:49] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure: Diamond collected metrics about memory usage inaccurate until third reboot - https://phabricator.wikimedia.org/T91351#1206489 (10Krinkle) p:5Triage>3Low [14:26:43] 10Continuous-Integration, 10MediaWiki-General-or-Unknown, 10MediaWiki-extensions-ContentTranslation, 7JavaScript: Write test to ensure all mw.hooks are documented - https://phabricator.wikimedia.org/T86544#1206497 (10Krinkle) [14:30:11] 10Continuous-Integration, 10MediaWiki-General-or-Unknown, 10MediaWiki-extensions-ContentTranslation, 7JavaScript: Write test to ensure all mw.hooks are documented - https://phabricator.wikimedia.org/T86544#1206506 (10Krinkle) 5Open>3declined a:3Krinkle No resources for it in #contint. However per my... [14:33:18] 10Continuous-Integration, 6Mobile-Web, 10WikiGrok: Include WikidataBuildResources in the WikiGrok extensions build dependencies - https://phabricator.wikimedia.org/T96012#1206514 (10JanZerebecki) This may alternatively be solved for non-wmf branches by T90303 and depending on the correct thing via composer.... [14:38:52] 10Continuous-Integration: Review CI tutorials on mediawiki.org - https://phabricator.wikimedia.org/T96024#1206517 (10hashar) 3NEW [14:39:02] 10Continuous-Integration: Jenkins: Add jobs for SwiftMailer extension - https://phabricator.wikimedia.org/T69722#1206524 (10Krinkle) To enable one or more jobs in Continuous Integration, please propose a patch to the integration/config repository. Learn more at: https://www.mediawiki.org/wiki/Continuous_integra... [14:40:25] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Add Jenkins jobs for OAuthAuthentication - https://phabricator.wikimedia.org/T93274#1206528 (10Krinkle) To enable one or more jobs in Continuous Integration, please propose a patch to the integration/config repository. Learn more at: https:/... [14:40:41] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Add Jenkins jobs for SemanticMetaTags - https://phabricator.wikimedia.org/T93276#1206531 (10Krinkle) To enable one or more jobs in Continuous Integration, please propose a patch to the integration/config repository. Learn more at: https://ww... [14:40:44] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Add Jenkins jobs for VirtualKeyboard - https://phabricator.wikimedia.org/T93277#1206533 (10Krinkle) To enable one or more jobs in Continuous Integration, please propose a patch to the integration/config repository. Learn more at: https://www... [14:40:48] 10Continuous-Integration, 10MediaWiki-extensions-OpenID-Connect: Add Jenkins jobs for OpenIDConnect - https://phabricator.wikimedia.org/T93275#1206535 (10Krinkle) To enable one or more jobs in Continuous Integration, please propose a patch to the integration/config repository. Learn more at: https://www.media... [14:41:31] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Add Jenkins jobs for OAuthAuthentication extension - https://phabricator.wikimedia.org/T93274#1206536 (10Krinkle) [14:47:58] 10Continuous-Integration, 10MediaWiki-extensions-General-or-Unknown: Add Jenkins jobs for OAuthAuthentication extension - https://phabricator.wikimedia.org/T93274#1206557 (10hashar) 5Open>3Resolved a:3hashar It seems I have added the job just when this task has been created and even made the unit test o... [14:48:00] 10Continuous-Integration, 7Monitoring: Alert when Zuul/Gearman queue is stalled - https://phabricator.wikimedia.org/T70113#1206551 (10Krinkle) [14:50:15] 10Continuous-Integration, 7Monitoring: Alert when Zuul/Gearman queue is stalled - https://phabricator.wikimedia.org/T70113#1206562 (10Krinkle) Per @hashar, once we determine a good metric and threshold; we should be able to use Shinken to monitor the Graphite query. [14:54:51] 10Continuous-Integration, 7Jenkins: Launching Jenkins slave agent fails with "java.io.IOException: Unexpected termination of the channel" - https://phabricator.wikimedia.org/T91697#1206566 (10Krinkle) p:5Triage>3Low [14:55:46] 10Continuous-Integration, 7Jenkins, 7Upstream: Launching Jenkins slave agent fails with "java.io.IOException: Unexpected termination of the channel" - https://phabricator.wikimedia.org/T91697#1093554 (10Krinkle) [15:00:25] 10Continuous-Integration: Disable core dumps generation on CI labs slave - https://phabricator.wikimedia.org/T96025#1206573 (10hashar) 3NEW [15:01:05] 10Continuous-Integration: Disable core dumps generation on CI labs slave - https://phabricator.wikimedia.org/T96025#1206582 (10Krinkle) [15:06:29] 10Continuous-Integration: Disable core dumps generation on CI labs slave - https://phabricator.wikimedia.org/T96025#1206649 (10Krinkle) p:5Triage>3Normal [15:06:52] 10Continuous-Integration: Review CI tutorials on mediawiki.org - https://phabricator.wikimedia.org/T96024#1206651 (10Krinkle) p:5Triage>3Normal [15:09:34] 10Continuous-Integration, 10Wikidata, 5Patch-For-Review: Run Wikibase tests on Jenkins with hhvm - https://phabricator.wikimedia.org/T95230#1206667 (10Krinkle) p:5Triage>3Normal a:3Legoktm [15:14:52] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1206674 (10Cmjohnson) [15:20:29] 10Continuous-Integration: Re-create ci slaves (March 2015) - https://phabricator.wikimedia.org/T91524#1206703 (10Krinkle) [15:20:30] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1206701 (10Krinkle) 5declined>3Open https://integration.wikimedia.org/ci/job/npm/2590/console ``` 00... [15:27:19] !log puppetmaster: Re-apply I05c49e5248cb operations/puppet patch to re-fix T91524. Somehow the patch got lost. [15:27:23] Logged the message, Master [15:39:08] hashar: Are you also getting this problem lately, that many times I can't configure a Jenkins job via web UI. It is stuck on "Loading.." [15:39:12] e.g. https://integration.wikimedia.org/ci/job/mwext-Wikidata-testextension-zend/configure [15:39:17] there is a javascript exception [15:39:19] ugh.. [15:39:28] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1206779 (10hashar) On CI we do DNS requests for pubic DNS entry so we apparently need to remove the `ndo... [15:39:36] it's intermittend [15:40:57] s/d/t/ [15:42:10] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:44:13] 10Continuous-Integration: mwext-Wikidata-testextension-zend not queuing properly - https://phabricator.wikimedia.org/T96034#1206804 (10Krinkle) 3NEW [15:45:01] greg-g: good morning!! [15:45:12] greg-g: do you have rights to create new tags in Phabricator? [15:56:45] 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1206842 (10Krinkle) 5Open>3Resolved [15:56:47] 10Continuous-Integration: Re-create ci slaves (March 2015) - https://phabricator.wikimedia.org/T91524#1206843 (10Krinkle) [15:57:23] 10Continuous-Integration: Fix "Entry point ('console_scripts', 'tox') not found" on new slaves running Ubuntu Precise - https://phabricator.wikimedia.org/T91526#1206846 (10Krinkle) 5Open>3Resolved a:3Krinkle [15:57:24] 10Continuous-Integration: Re-create ci slaves (March 2015) - https://phabricator.wikimedia.org/T91524#1089205 (10Krinkle) [15:57:35] 10Continuous-Integration, 10Wikidata, 5Patch-For-Review: Run Wikibase tests on Jenkins with hhvm - https://phabricator.wikimedia.org/T95230#1206850 (10Legoktm) 5Open>3Resolved [15:58:02] joining joining [16:02:32] hashar: I do! [16:03:52] twentyafterfour: meeting ping :) [16:04:22] 10Continuous-Integration, 3Continuous-Integration-Isolation, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1206883 (10chasemp) a:5chasemp>3mark [16:07:08] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:08:50] 10Continuous-Integration, 7Composer: Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in - https://phabricator.wikimedia.org/T92605#1206905 (10Krinkle) Perhaps we can change those repos to not check in `vendor/`? MediaWiki core and extensions don't do that either.... [16:15:46] 10Continuous-Integration, 7Composer: Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in - https://phabricator.wikimedia.org/T92605#1206927 (10bd808) >>! In T92605#1206905, @Krinkle wrote: > Perhaps we can change those repos to not check in `vendor/`? MediaWiki cor... [16:18:54] 10Continuous-Integration, 7Composer: Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in - https://phabricator.wikimedia.org/T92605#1206929 (10Krinkle) >>! In T92605#1206927, @bd808 wrote: > For mediawiki-config I'm not exactly sure what we would do. We could use a... [16:19:38] bd808: Hm.. actually, not sure we can/should re-use mediawiki/vendor [16:19:45] Those runtimes overlap right? [16:19:51] CommonSettings and mediawiki core [16:19:53] they do [16:19:57] So... [16:20:26] different versions even for mwcore depending [16:20:35] which means the CDB lib in settings will load rather than the ones in mwcore [16:20:41] Yeah [16:20:50] How is that not a problem already [16:20:58] but the same happened in the past practice of hand synced files [16:21:13] it just means that we have to keep them in sync [16:21:24] ugly but managable [16:21:34] bd808: Keep what in sync with what [16:21:48] vendor in wmf branch of core can be two different versions [16:21:49] * YuviPanda waves [16:21:52] which should be allowed [16:22:00] the versions of CDB included in the various branches [16:22:31] or more properly CDB changes must be backwards compat to all deployed core brances [16:22:41] Right [16:22:41] 10Continuous-Integration, 3Continuous-Integration-Isolation, 6operations: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1206932 (10chasemp) I talked to @mark about this and how it relates to the labnodepool box placement. mark wants to go over things a bit... [16:23:02] bd808: so does that mean right now the cdb dependency in mediawiki/core for wmf branches is never used [16:23:08] because the wmf-config one is already loaded [16:23:12] yes [16:23:29] the wmf-config autoloader should win I think [16:23:39] bd808: Yeah, because multi version loads before we include core [16:23:42] naturally [16:23:54] *nod* and it always uses cdb [16:23:56] bd808: If we use mediawiki/vendor though, it won't just be cdb [16:24:01] it'll be all dependencies [16:24:04] yeah we can't do that [16:24:23] it would break the heck out of oojsui if nothing else [16:24:34] probably other things too [16:24:51] Why? We'll just require that package changes must be applied to both branches [16:24:54] in case of backports [16:25:08] I'm not fan, just checking what the impact would be [16:25:41] I would require coordinating changes in 5 repos (master, 2 active branches, wmf-config) [16:25:45] yucky [16:25:48] Right [16:26:11] We do currently require that already for cdb [16:26:22] But beyond that would get tricky indeed [16:26:24] We haven't back-ported anything for OOjs UI in PHP yet. [16:26:36] yes, but in practices CDB really is not expected to ever change [16:26:43] Ideally we won't have it occur ever. [16:26:44] it is what it is [16:26:52] * James_F nods. [16:27:05] James_F: Yeah, because php tends to be cached in html so if that'd require a chance, backporting wouldn't help [16:27:27] Krinkle: Well, no. Because development on the PHP side is so slow that bugs are rare. :-) [16:27:36] and wiki templates/gadgets using it etc. major change that'll take longer than one wmf branch (at least 2) [16:27:40] Krinkle: (But yes, that too.) [16:27:53] in which case a backport would't be needed. [16:56:23] 10Deployment-Systems, 7Puppet: Trebuchet master should be separate from scap - https://phabricator.wikimedia.org/T96042#1207023 (10Tgr) 3NEW [17:39:20] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207219 (10mmodell) a:5mmodell>3GWicke Although ansible looks really cool I think it's going to be a tough sell as long as we are using puppet and salt as our 'offical' configuration management... [17:41:49] 7Blocked-on-RelEng, 6Release-Engineering, 6Multimedia, 6Scrum-of-Scrums, and 3 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1207233 (10Tgr) [17:47:34] 10Continuous-Integration: Build apps/common/android under Jenkins and from Gerrit (instead of GitHub) - https://phabricator.wikimedia.org/T51500#1207256 (10bearND) 5Open>3Invalid a:3bearND The Commons Android app has been sunset. No need to add a build for that. I'd rather have a build for the Wikipedia A... [17:49:33] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207263 (10GWicke) @mmodell, our requirements are described in T93428. If you can find a solution that genuinely addresses those (or at least have a clear plan for something that will & steps for g... [17:53:19] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207295 (10mmodell) @Gwicke: we are still working on the plan, T94620 will track that work and I will try to address your requirements there. I'm also adding T93428 as a blocker for that task. [17:53:52] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1207306 (10mmodell) [17:53:55] 10Deployment-Systems, 7Epic, 3releng-201415-Q4: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1168219 (10mmodell) [17:55:38] 10Deployment-Systems, 6Release-Engineering, 7Epic: Rethinking our deployment process - https://phabricator.wikimedia.org/T89945#1207320 (10mmodell) [18:00:30] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207325 (10GWicke) > we are still working on the plan, T94620 will track that work and I will try to address your requirements there. @mmodell, right now I see nothing in T94620 that describes how... [18:02:18] 10Continuous-Integration, 10Wikipedia-Android-App: Android app build: Gradle checkstyle + app build - https://phabricator.wikimedia.org/T88494#1207337 (10bearND) I think the recording of Christopher Orr's Android meetup presentation that was held on March 31 at the WMF office is very helpful and inspiring: htt... [18:14:15] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207373 (10mmodell) @GWicke: the current task is developing the plan. Once we have a workable plan that does address everyone's needs, and seems achievable based on our research, then we can make a... [18:22:12] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207401 (10GWicke) > the current task is developing the plan Okay, looking forward to hearing more. You might also want to correct the wording about trebuchet to reflect that no decision has been... [18:27:51] 10Continuous-Integration, 10MediaWiki-extensions-Translate: mediawiki-extensions-hhvm: MessageGroupStatesUpdaterJobTest::testHooks is intermittent failing - https://phabricator.wikimedia.org/T88554#1207421 (10Nikerabbit) The test was recently marked as @ broken https://gerrit.wikimedia.org/r/#/c/203455/ as it... [19:24:58] https://integration.wikimedia.org/zuul/ postmerge is in some kind of hell [19:25:05] Plus the beta-code-update isn't running [19:30:52] https://integration.wikimedia.org/ci/ [19:30:58] https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/8912/ is stuck [19:31:36] Yup [19:31:39] marktraceur: Other postmerge actions will be fine. They don't depend on each other like 'gate' does. [19:31:52] So it's just beta not updating. Wich happens twice a day, nothing special [19:32:10] marktraceur: Wanna try mitigating it? [19:32:26] Krinkle: How I do zat [19:33:04] Firstly, log in to Jenkins and cancel all queued *beta* jobs in the sidebar, then abort the one that's running and stuck [19:33:05] Bonus points if it involves wanton murder of stalled jobs. [19:33:09] Woohooo [19:33:35] bascially just rage click [x] everywhere [19:34:08] I was born to click [19:34:38] 1) cancel beta related jobs in queue, 2) abort the running job(s) on deployment.bastion, 3) relaunch slave agent on deployment.bastion [19:34:50] That procedure is first response. Might not mitigate it, but we'll see. [19:34:56] Murder successful [19:35:02] Relaunch slave agent? [19:35:12] click on "deployment-bastion.eqiad" [19:35:15] Disconnect [19:35:18] then Relaunch Slave agent [19:35:38] Shablam [19:37:23] Looks like it's still waiting for an open executor. [19:37:26] I'm also relaunching Gearman connection since that appears to be stuck, too [19:37:28] Aha! [19:37:28] Yeah [19:37:29] Done [19:37:34] Or at least running [19:37:54] Oh yeah [19:37:55] cool [19:38:27] 10Continuous-Integration, 7Jenkins: Jenkins: "Warning: Some projects have builds whose timestamps are inconsistent." - https://phabricator.wikimedia.org/T94963#1207580 (10Krinkle) These showed up today on https://integration.wikimedia.org/ci/manage: >Some projects have builds whose timestamps are inconsistent.... [19:40:00] Thanks a lot Krinkle :) [19:40:19] yw [19:41:04] marktraceur: We're not out the woods yet though [19:41:08] Gearman won't come back up [19:41:15] So Zuul isn't able to send or receive anything new [19:42:09] !log deployment-bastion jobs were stuck. marktraceur cancelled queue and relaunched slave. Now processing again. [19:42:16] Logged the message, Master [19:42:39] Looks like scap ran twice for config-change but code-update hasn't gone yet, I guess it's twelve minutes now anyway [19:42:44] 6Release-Engineering: Make qunit test failures readable - https://phabricator.wikimedia.org/T96072#1207596 (10EBernhardson) 3NEW [19:42:54] !log Jenkins still unable to obtain Gearman connection. (HTTP 503 error from /configure). Have to force restart Jenkins. [19:42:58] This is the ugly part [19:42:58] Logged the message, Master [19:43:05] I'll do it as I'm half way through already [19:43:09] Wuhoh. [19:43:11] Good luck [19:43:17] ssh galium, sudo kill two jenkins pids, and then restart [19:43:27] Aha, code update happening [19:43:37] That's all I needed, so yay [19:45:16] Oh, damn, the Jenkins murder caused the job to be murdered too. [19:45:28] SO MUCH CARNAGE. AND FOR WHAT. [19:45:49] an eye for an eye just leaves us blind [19:46:09] Restarting.. [19:46:28] !log Jenkins restarted. Relaunching Gearman [19:46:33] Logged the message, Master [19:48:05] !log Jenkins configuration panel won't load ("Loading..." stays indefine, "Uncaught TypeError: Cannot convert to object at prototype.js:195") [19:48:10] Logged the message, Master [19:49:19] !log All systems go. [19:49:24] Logged the message, Master [19:49:46] greg-g: It's quite literally patches upon patches. Casdading failures and peeling them away layer by layer each time. [19:50:17] We now have an added layer causing the Jenkins config panel to not load properly. Some random JS failure inside Jenkins. [19:50:33] Only when all race conditions align can you restart it [19:50:48] g2g [19:51:20] :( [20:08:51] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #621: FAILURE in 21 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/621/ [20:10:18] 10Beta-Cluster, 6MediaWiki-API-Team, 10SUL-Finalization: Finalize SUL on beta cluster - https://phabricator.wikimedia.org/T96075#1207662 (10Legoktm) 3NEW a:3Legoktm [20:22:39] 10Browser-Tests, 6Mobile-Web, 10Mobile-Web-Sprint-45-Snakes-On-A-Plane: Fix failed MobileFrontend browsertests Jenkins jobs - https://phabricator.wikimedia.org/T94156#1207705 (10KLans_WMF) [20:23:54] 10Continuous-Integration, 6Mobile-Web, 5Patch-For-Review, 7Technical-Debt: Publish MobileFrontend JS Documentation - https://phabricator.wikimedia.org/T74794#1207711 (10JKatzWMF) moving to eng backlog, as hasn't been touched in awhile. [20:35:45] Krinkle|detached: What's the problem *now*? [20:35:57] Oh. "g2g". Crap. [20:40:02] 10Continuous-Integration: Jenkins check for vulnerable libraries in all node.js repos - https://phabricator.wikimedia.org/T96078#1207791 (10csteipp) 3NEW [20:40:31] So I guess I'll try the same thing again? [20:41:58] !log stopping all beta jobs, aborting running (and stuck) beta DB update, kicking bastion, to try and get beta to update [20:42:02] Logged the message, Master [20:43:44] !log starting SULF on beta cluster [20:43:49] Logged the message, Master [20:45:34] aawiki is really broken [20:46:10] hmm, I guess someone did some spambot cleanup or something [21:04:15] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207902 (10mmodell) @GWicke: I'm curious how you envisioned using ansible. I suspect that one of the main reasons that ansible is appealing is because it would be more straightforward to debug when... [21:13:11] 10Continuous-Integration, 5Continuous-Integration-Isolation, 6operations, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1207917 (10hashar) Some upstream requirements are not matched by Jessie: | Upstream | Jessie | python-d... [21:15:29] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207920 (10GWicke) @mmodell, my impression is that Ansible lets us address many of the requirements outlined in T93428 in a fairly simple and straightforward way. I have [played a tiny bit with its... [21:33:59] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1207989 (10mmodell) >>! In T93433#1207920, @GWicke wrote: > Do you have an idea for a good evaluation task? Proper rolling deploys with health checks, pybal orchestration etc would be my default,... [21:53:21] legoktm: What is SULF? [21:53:27] marktraceur: id it work? [21:53:31] marktraceur: Did it work? [21:54:24] It did! [21:54:29] Krinkle: At least it worked once [21:54:32] Which is enough for me [21:54:43] cool [21:54:51] Krinkle: SUL finalization [21:54:53] Looks like it's still running [21:55:32] k [22:07:47] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1208126 (10GWicke) >>! In T93433#1207989, @mmodell wrote: > >>>! In T93433#1207920, @GWicke wrote: >> Do you have an idea for a good evaluation task? Proper rolling deploys with health checks, pyb... [23:17:12] twentyafterfour: just wanted to +1 git subtrees [23:17:48] 6Release-Engineering, 3Team-Practices-This-Week: Test phabricator sprint extension updates - https://phabricator.wikimedia.org/T95469#1208310 (10mmodell) [23:18:21] YuviPanda: :) so far it looks promising [23:18:29] :) yeah [23:18:45] still horribly slow to run make-wmf-branch .. probably even slower than the submodule version [23:18:55] but ideally it doesn't have to run every week [23:19:05] just merge from upstream once it's all set up correctly [23:19:13] yeah [23:19:33] * YuviPanda remembers hanging out in #git, even made a patch to it once [23:19:37] nice people to chat to! [23:23:41] (03CR) 10Krinkle: "Per IRC convo, let's do this upstream with OpenStack. They've started work on it here https://review.openstack.org/164371." [integration/zuul] - 10https://gerrit.wikimedia.org/r/203290 (owner: 10Legoktm) [23:29:57] 10Continuous-Integration: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1208334 (10Krinkle) 5Open>3declined a:3Krinkle Upstream wontfix. If people are allowed to self-merge on a project, the project can't use Zuul. The two are mutually exc... [23:30:07] 10Continuous-Integration, 7Upstream: Change force merged cause a deadlock in Zuul gate-and-submit pipeline - https://phabricator.wikimedia.org/T93812#1208337 (10Krinkle)