[00:00:10] greg-g: I don't remember... I'm going to look in the backing data to find out [00:01:40] man, I keep getting the unresponsive script warning on logstash... :( [00:01:45] greg-g: 30d [00:01:50] ty [00:06:10] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<12.50%) [00:36:12] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [01:02:48] (03PS2) 10Krinkle: jsduck: Remove obsolete --process=0 override for Ruby 1.8 [integration/config] - 10https://gerrit.wikimedia.org/r/194058 (https://phabricator.wikimedia.org/T62138) [01:04:17] ^d: greg-g: Tracked down the cause commit of that 5::duration notice [01:04:24] Was merged recently [01:04:30] added to ticket [01:05:14] <^d> thx [01:39:04] So I've worked out how to get the metrics I need from varnish, so that we can build staging cluster availability metrics based on returned http status codes [01:39:20] what I haven't worked out is where to store them. would logstash be appropriate? [01:41:04] Or should I look into using graphite? [01:49:15] (03CR) 10BryanDavis: [C: 031] "lgtm but should probably be tested against a fairly large sample of real code (mediawiki/core?) to see if there are strange false positive" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/193766 (owner: 10Legoktm) [03:14:35] Project beta-scap-eqiad build #43808: FAILURE in 37 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43808/ [03:33:55] Yippee, build fixed! [03:33:56] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #516: FIXED in 14 min: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/516/ [03:34:57] Yippee, build fixed! [03:34:57] Project beta-scap-eqiad build #43810: FIXED in 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43810/ [03:43:28] (03CR) 10Krinkle: [C: 032] "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/194058 (https://phabricator.wikimedia.org/T62138) (owner: 10Krinkle) [03:49:02] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1085388 (10Krinkle) Similar: ``` Mar 4 03:35:20 integration-slave1010 puppet-agent[27419]: Could not retrieve catalog from remote server: Error 40... [03:49:54] (03Merged) 10jenkins-bot: jsduck: Remove obsolete --process=0 override for Ruby 1.8 [integration/config] - 10https://gerrit.wikimedia.org/r/194058 (https://phabricator.wikimedia.org/T62138) (owner: 10Krinkle) [03:50:29] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #403: FAILURE in 8 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/403/ [03:53:06] (03PS1) 10Krinkle: Disable operations-apache-config-lint [integration/config] - 10https://gerrit.wikimedia.org/r/194261 (https://phabricator.wikimedia.org/T72068) [03:54:46] (03CR) 10Krinkle: [C: 032] Disable operations-apache-config-lint [integration/config] - 10https://gerrit.wikimedia.org/r/194261 (https://phabricator.wikimedia.org/T72068) (owner: 10Krinkle) [03:55:51] (03Merged) 10jenkins-bot: Disable operations-apache-config-lint [integration/config] - 10https://gerrit.wikimedia.org/r/194261 (https://phabricator.wikimedia.org/T72068) (owner: 10Krinkle) [03:56:38] 10Continuous-Integration, 6operations, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1085401 (10Krinkle) >>! In T72068#1056111, @scfc wrote: > Could the `operations-apache-config-lint` job be completely removed until it is... [03:56:46] 10Continuous-Integration, 6operations: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1085402 (10Krinkle) [03:59:13] 10Continuous-Integration, 7HHVM: HHVM Jenkins job throw: Unable to set CoreFileSize to 8589934592: Operation not permitted (1) - https://phabricator.wikimedia.org/T78799#1085405 (10Krinkle) >>! In T78799#1011253, @Nikerabbit wrote: > I was in fact right now looking at the hhvm config for this: https://github.c... [04:08:29] 10Quality-Assurance, 6Release-Engineering, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Investigate using the sikuli-like Applitools framework for visual testing - https://phabricator.wikimedia.org/T90884#1085413 (10Mattflaschen) Per https://wikimediafoundation.org/wiki/Resolution:Wikimedia_Foun... [04:16:21] 10Quality-Assurance, 7I18n: Add Sikuli to the machines that run browser tests - https://phabricator.wikimedia.org/T56393#1085423 (10Mattflaschen) Did anyone investigate using it headless with Xvfb? People have gotten this working with Linux and Jenkins (see for example https://answers.launchpad.net/sikuli/+qu... [04:18:36] 10Quality-Assurance, 6Release-Engineering, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Investigate using the sikuli-like Applitools framework for visual testing - https://phabricator.wikimedia.org/T90884#1085425 (10Mattflaschen) Sorry, I thought this was about visual diffing. I see now it's ab... [05:34:19] Yippee, build fixed! [05:34:19] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #503: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/503/ [05:39:23] Yippee, build fixed! [05:39:24] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #496: FIXED in 13 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/496/ [07:19:12] (03PS4) 10Legoktm: Don't use zuul variables to clone for -composer-security [integration/config] - 10https://gerrit.wikimedia.org/r/193526 [07:19:17] (03CR) 10Legoktm: [C: 032] Don't use zuul variables to clone for -composer-security [integration/config] - 10https://gerrit.wikimedia.org/r/193526 (owner: 10Legoktm) [07:25:34] 10Continuous-Integration, 10Wikidata, 10§ Wikidata-Sprint-2015-02-03, 10§ Wikidata-Sprint-2015-02-25: fix the qunit tests for wikidata: mwext-Wikibase-qunit - https://phabricator.wikimedia.org/T74184#1085482 (10adrianheine) Thank you, @Krinkle! [07:25:54] (03Merged) 10jenkins-bot: Don't use zuul variables to clone for -composer-security [integration/config] - 10https://gerrit.wikimedia.org/r/193526 (owner: 10Legoktm) [07:49:40] 10Continuous-Integration, 10Wikidata, 10§ Wikidata-Sprint-2015-02-03, 10§ Wikidata-Sprint-2015-02-25, and 2 others: mw-debug.log missing in Jenkins jobs (Failed to be created "Permission denied") - https://phabricator.wikimedia.org/T85799#1085498 (10adrianheine) p:5Normal>3Triage [08:07:52] 6Release-Engineering, 10Code-Review: Import all gerrit.wikimedia.org repositories with Diffusion - https://phabricator.wikimedia.org/T616#1085558 (10Chad) Most repos are done now. Will tidy up the remaining ones. [08:30:07] 6Release-Engineering, 7Browser-Tests: Fix easy problems reported by RuboCop - https://phabricator.wikimedia.org/T91485#1087225 (10zeljkofilipin) 3NEW a:3zeljkofilipin [08:37:25] 6Release-Engineering, 7Browser-Tests: Fix easy problems reported by RuboCop - https://phabricator.wikimedia.org/T91485#1087266 (10zeljkofilipin) [08:37:58] 6Release-Engineering, 7Browser-Tests: Move the list of repositories with Selenium tests from mediawiki-selenium readme file to mediawiki.org - https://phabricator.wikimedia.org/T91486#1087270 (10zeljkofilipin) a:3zeljkofilipin [08:46:59] coffeeee [08:51:15] (03CR) 10Zfilipin: Update Chrome, Firefox and Safari to the latest supported version (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [08:54:31] 6Release-Engineering, 5Patch-For-Review: Release mediawiki_api 0.3.1 - https://phabricator.wikimedia.org/T91388#1087497 (10zeljkofilipin) @Ijon To use the new gem in a repository, `cd` into the repository and run `bundle update`. That command will update `Gemfile.lock` file. Make a commit and push it to Gerrit... [08:55:10] zeljkof: good morning :) [08:55:13] I am in the hangout [08:55:31] hashar: coming in a minute [08:57:06] 10Quality-Assurance, 7I18n: Add Sikuli to the machines that run browser tests - https://phabricator.wikimedia.org/T56393#1087507 (10zeljkofilipin) @Mattflaschen the last time I have checked sikuli web page, it explicitly said running headless is not supported. Maybe something changed in the meantime. [09:02:49] (03CR) 10Zfilipin: "ping" [integration/config] - 10https://gerrit.wikimedia.org/r/193577 (https://phabricator.wikimedia.org/T90423) (owner: 10Zfilipin) [09:06:17] zeljkof: lost net ? :D [09:06:24] hashar: no :) [09:06:29] not sure what is wrong [09:07:10] (03PS4) 10Zfilipin: Refactor VisualEditor JJB builder for production status browsertest [integration/config] - 10https://gerrit.wikimedia.org/r/193577 (https://phabricator.wikimedia.org/T90423) [09:11:31] (03CR) 10Hashar: [C: 031] "The VE change has been merged :)" [integration/config] - 10https://gerrit.wikimedia.org/r/193577 (https://phabricator.wikimedia.org/T90423) (owner: 10Zfilipin) [09:21:34] hashar: sorry, looks like my net is having trouble [09:26:13] 10Continuous-Integration, 6Release-Engineering, 7Jenkins, 5Patch-For-Review: Refactor VisualEditor JJB builder for production status browsertest to use Cucumber tag instead of calling a file directly - https://phabricator.wikimedia.org/T90423#1087582 (10hashar) 5Open>3Resolved [09:28:09] (03CR) 10Hashar: [C: 032] "Confirmed with Zeljkof that the job is working properly :)" [integration/config] - 10https://gerrit.wikimedia.org/r/193577 (https://phabricator.wikimedia.org/T90423) (owner: 10Zfilipin) [09:29:17] (03PS3) 10Zfilipin: Update Chrome, Firefox and Safari to the latest supported version [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) [09:34:30] (03Merged) 10jenkins-bot: Refactor VisualEditor JJB builder for production status browsertest [integration/config] - 10https://gerrit.wikimedia.org/r/193577 (https://phabricator.wikimedia.org/T90423) (owner: 10Zfilipin) [09:34:42] Project beta-scap-eqiad build #43846: FAILURE in 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43846/ [09:34:51] (03PS4) 10Hashar: Update Chrome, Firefox and Safari to the latest supported version [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [09:35:04] (03CR) 10Hashar: [C: 032] Update Chrome, Firefox and Safari to the latest supported version [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [09:35:53] (03CR) 10jenkins-bot: [V: 04-1] Update Chrome, Firefox and Safari to the latest supported version [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [09:36:33] hashar: looks like my internet does not like me today :) [09:36:37] hehe [09:39:35] (03PS4) 10Zfilipin: Created the first Android CentralNotice Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) [09:40:57] (03CR) 10Zfilipin: "Andy, I have deployed the job. Is there a reason not to merge this into master. If there is no reason, feel free to +2 this commit." [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) (owner: 10Zfilipin) [09:41:16] (03PS5) 10Zfilipin: Created the first Android CentralNotice Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) [09:41:18] (03Merged) 10jenkins-bot: Update Chrome, Firefox and Safari to the latest supported version [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [09:41:36] (03PS6) 10Zfilipin: Created the first Android CentralNotice Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) [09:42:13] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087623 (10hashar) Upstream has migrated their bugtracker, their ticket is now https://tickets.puppetlabs.com/browse/PUP-1031 [09:43:23] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087624 (10hashar) @Dzahn any idea why it does not seem to happen on the production puppetmaster? hashar@integration-puppetmaster:~$ puppet -... [09:49:16] Project browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce build #1: FAILURE in 3.6 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce/1/ [09:49:40] 6Release-Engineering: Ask Sauce Labs support if there is a way to disable Selenium log temporarily - https://phabricator.wikimedia.org/T89353#1087635 (10hashar) From a 1/1 with @zeljkofilipin: Ideally we would have the mediawiki selenium gem to set log level to error before sending the password, then restore th... [09:49:57] 6Release-Engineering, 7Browser-Tests: Ask Sauce Labs support if there is a way to disable Selenium log temporarily - https://phabricator.wikimedia.org/T89353#1087641 (10hashar) [09:51:44] (03CR) 10Zfilipin: "Looks like there is a problem:" [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) (owner: 10Zfilipin) [09:54:51] Yippee, build fixed! [09:54:52] Project beta-scap-eqiad build #43848: FIXED in 52 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43848/ [10:01:06] (03PS7) 10Zfilipin: Created the first Android CentralNotice Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) [10:12:50] 10Continuous-Integration: "/usr/local/bin/zuul-cloner" broken on new instances - https://phabricator.wikimedia.org/T90984#1087674 (10hashar) 5Open>3Resolved Zuul requires python-statsd < 0.3 but we bumped the Debian package to 0.3+ hence the failure I just did the install using pip off of pypi on all Precis... [10:25:57] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [10:31:16] (03PS4) 10Hashar: Drop logrotation from 90 days to 30 days [integration/config] - 10https://gerrit.wikimedia.org/r/194116 (https://phabricator.wikimedia.org/T91396) [10:31:34] (03CR) 10Hashar: [C: 032] Drop logrotation from 90 days to 30 days [integration/config] - 10https://gerrit.wikimedia.org/r/194116 (https://phabricator.wikimedia.org/T91396) (owner: 10Hashar) [10:32:22] (03CR) 10jenkins-bot: [V: 04-1] Drop logrotation from 90 days to 30 days [integration/config] - 10https://gerrit.wikimedia.org/r/194116 (https://phabricator.wikimedia.org/T91396) (owner: 10Hashar) [10:35:54] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.022 second response time [10:38:04] (03Merged) 10jenkins-bot: Drop logrotation from 90 days to 30 days [integration/config] - 10https://gerrit.wikimedia.org/r/194116 (https://phabricator.wikimedia.org/T91396) (owner: 10Hashar) [10:38:51] 10Continuous-Integration, 5Patch-For-Review: Have jenkins jobs logrotate their build history - https://phabricator.wikimedia.org/T91396#1087774 (10hashar) Still have to check that jobs all have the daysToKeep logrotate parameter. [10:41:58] 10Continuous-Integration, 7HHVM: HHVM Jenkins job throw: Unable to set CoreFileSize to 8589934592: Operation not permitted (1) - https://phabricator.wikimedia.org/T78799#1087778 (10hashar) + @Joe and @ori our hhvm gurus: from my comment above T78799#1011250 the hhvm.resource_limit.core_file_size value is too h... [10:43:32] 10Continuous-Integration, 7Technical-Debt: Add regression tests for slave-script tools - https://phabricator.wikimedia.org/T86158#1087782 (10hashar) I agree, thanks Timo for the cleanup. [11:02:24] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087823 (10akosiaris) @hashar It has nothing to do with puppet version but ruby version $ ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-... [11:35:49] 10Continuous-Integration, 7Jenkins: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1087915 (10KartikMistry) 3NEW [11:47:21] vikasyaligar, aharoni: did you submit the talk for wikimania? [11:47:36] zeljkof: yes [11:48:18] zeljkof: if I understand correctly, having the page https://wikimania2015.wikimedia.org/wiki/Submissions/Making_Translated_Screenshots_With_No_Effort before February 28 is considered "submitted". [11:48:24] IF NOT, THEN WE ARE DOOMED!!! [11:48:59] february 28? I thought it was today [11:51:22] aharoni: ^ [12:01:54] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [12:02:25] zeljkof: well then it's good than I did it already on Saturday. [12:02:33] in any case :) [12:02:36] aharoni: :) [12:02:42] thanks [12:04:38] 6Release-Engineering, 7Browser-Tests: Fix easy problems reported by RuboCop - https://phabricator.wikimedia.org/T91485#1088109 (10zeljkofilipin) [12:16:56] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.027 second response time [12:39:28] 10Continuous-Integration: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1088190 (10hashar) [12:43:04] 10Continuous-Integration, 6operations: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1088194 (10hashar) One would need to write a test suite that is able to test the hiera files are valid and eventually add some integration test on the resulting configuration.... [13:00:30] 10Continuous-Integration, 6Release-Engineering, 7Jenkins: JJB installation problem - https://phabricator.wikimedia.org/T90434#1088215 (10zeljkofilipin) My problem was fixed by adding this: ``` export PYTHONPATH="/Library/Python/2.7/site-packages" ``` to .bash_profile, as @hashar has suggested. [13:18:41] zeljkof: aharoni: Can you review this also https://github.com/amire80/screenshot/pull/8 ? [13:19:21] vikasyaligar: amir, we were reviewing that last week, right? [13:19:34] but I do not think we had the time to finish the review [13:19:51] we were in the middle of testing the change when our time ran out [13:22:09] zeljkof: Oh ! OK [14:13:57] zeljkof, vikasyaligar - VE was badly broken in Beta Wikipedia last week, and now it's OK. [14:14:03] So we'll probably review it tomorrow. [14:14:22] aharoni, vikasyaligar: sounds good to me [14:14:35] now I remember VE being broken [14:15:49] aharoni: great ! [14:37:43] (03CR) 10Zfilipin: "The Android job[0] will fail until the feature file[1] has @android tag. Let me know if you need help with that." [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) (owner: 10Zfilipin) [14:56:12] hashar: hola [14:56:34] what needs to be done for https://phabricator.wikimedia.org/T90943 for ContentTranslation? [14:56:46] I started something naive in https://gerrit.wikimedia.org/r/#/c/194331/ . [14:59:34] hashar: I guess that a change in zuul/layout.yaml is needed, too. [15:02:41] aharoni: hi [15:03:27] aharoni: yeah you will need a composer job to be defined for your repository [15:03:38] hashar: is there anything like for any other extension? [15:03:49] aharoni: that is done in integration/config.git you will need to edit some job under jjb/* and the zuul/layout.yaml file [15:04:01] some extension might have switched to composer already [15:04:03] not sure [15:05:24] hashar: so how do I add the `composer test` step for ContentTranslation? [15:05:59] and I guess that I should remove the mwext-ContentTranslation-phpcs-HEAD step. [15:08:40] legoktm: ^ [15:10:07] aharoni: no clue ereally [15:10:14] I am not sure we had any extension set to use composer [15:10:48] 6Release-Engineering, 7Browser-Tests: Move the list of repositories with Selenium tests from mediawiki-selenium readme file to mediawiki.org - https://phabricator.wikimedia.org/T91486#1088483 (10zeljkofilipin) The page: https://www.mediawiki.org/wiki/Repositories_with_Ruby_code [15:10:57] so, as it frequently happens, Language Engineering will be the brave bold pioneers \o/ [15:11:33] We have a template '{name}-composer-{phpflavor}' [15:12:05] aharoni: so in jjb/mediawiki-extensions.yaml there is a project named 'mwext-ContentTranslation' [15:12:18] which has a list of jobs, you could add the '{name}-composer-{phpflavor}' to the list of jobs [15:12:26] and pass it the two phpflavor: hhvm and zend [15:12:29] something like: [15:12:30] jobs: [15:12:33] - '{name}-composer-{phpflavor}': [15:12:40] - hhvm [15:12:42] - zend [15:13:09] That should create two jobs: 'mwext-ContentTranslation-composer-hhvm' and 'mwext-ContentTranslation-composer-zend' [15:13:33] which you should then add in zuul/layout.yaml to the test and gate-and-submit pipeline [15:13:37] (03CR) 10Tobias Gritschacher: "Not an issue here as far as I can see, @krinkle" [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [15:13:55] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1088509 (10Amire80) I started something for ContentTranslation here: https://gerrit.wikimedia.org/r/#/c/194331/ . I jus... [15:14:08] hashar: let's see... [15:15:45] (03PS1) 10Zfilipin: Move the list of repositories to mediawiki.org [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) [15:17:56] (03CR) 10Hashar: [C: 031] "I am letting a native English speaker proofread the language :)" [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [15:18:26] (03PS1) 10Amire80: WIP: Move phpcs in ContentTranslation to composer [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) [15:19:00] hashar: and here's another something naive: https://gerrit.wikimedia.org/r/#/c/194340/ [15:19:22] But I guess that I have to define the actual mwext-ContentTranslation-composer-(hhvm|zend) jobs [15:19:38] (03CR) 10jenkins-bot: [V: 04-1] WIP: Move phpcs in ContentTranslation to composer [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [15:20:00] does there have to be a line for such job for each extension or can it be a template? [15:22:16] (03CR) 10Krinkle: [C: 031] Move the list of repositories to mediawiki.org [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [15:23:47] aharoni: see my explanation above to define the jobs in jjb :D [15:25:29] (03CR) 10Krinkle: "I know they can't be swapped. I was suggesting it may have to be substituted and embedded. It's not about adding/installing a new extensio" [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [15:26:37] (03CR) 10Krinkle: "Note that non-voting tests that always fail due to an infrastructure issue are not useful. Wasted resources and time. If it doesn't work f" [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [15:26:47] hashar: Hey [15:27:38] (03CR) 10Jforrester: Update Chrome, Firefox and Safari to the latest supported version (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [15:32:04] 6Release-Engineering, 6Mobile-Apps, 10Wikimedia-Hackathon-2015: Create end-to-end test for Wikipedia Android app - https://phabricator.wikimedia.org/T90177#1088615 (10Dbrant) [15:32:12] hashar: Oh. So if I understand this correctly, no more changes are needed in https://gerrit.wikimedia.org/r/#/c/194340/ , [15:32:21] but it currently fails with another error. [15:33:19] legoktm, Krinkle ^ [15:34:18] aharoni: The error is a good one [15:34:26] aharoni: composer is *not* allowed in the check pipeline [15:34:45] Don't worry about the check pipeline, just remove it from there [15:35:04] Make the change in the test pipeline instead [15:37:55] oh, got it [15:39:01] (03PS2) 10Amire80: WIP: Move phpcs in ContentTranslation to composer [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) [15:39:41] Krinkle: updated [15:39:54] (and I'm curious why is jslint allowed in check, but not composer) [15:39:58] aharoni: composer in the check pipeline would compromise our servers [15:40:26] aharoni: and not having composer in the test pipeline means your change wuld not have worked since most commits come from authorised people that get the test pipeline [15:40:42] aharoni: jslint is pre-installed and pinned to a specific version [15:40:46] aharoni: same as phpcs [15:41:08] aharoni: but npm (which downloads from npm.org within the build, and executes arbitrary bash commands from package.json) and composer are definitely not allowed. [15:41:13] oh, ok [15:41:30] check pipeline must not execute any code from within the repo or that is not already reviewed and installed. [15:41:47] not even package.json or Gruntfile.js, that's just a bash file in disguise. [15:42:08] Krinkle: all green now... is that enough, together with https://gerrit.wikimedia.org/r/#/c/194331/ ? [15:42:10] (03CR) 10Adrian Lang: "I'm fine with not running testextension until we get composer to run there." [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [15:44:46] aharoni: Yeah. Let's merge the zuul change first so that we can verify that other commit before merging [15:45:35] aharoni: Also remove -lint from test pieline. That is redundant with your composer test [15:45:38] which includes lint I see [15:46:12] 10Continuous-Integration, 10OOjs-UI: OOjs UI's PHP docs should be auto-generated - https://phabricator.wikimedia.org/T74454#1088702 (10Jdforrester-WMF) [15:46:16] (03PS3) 10Amire80: WIP: Move phpcs in ContentTranslation to composer [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) [15:46:22] aharoni: Hm.. is that repositories php code base standalone? [15:46:26] aharoni: phpunit is not gonna work [15:46:31] (03PS4) 10Amire80: Move phpcs in ContentTranslation to composer [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) [15:46:36] Did you test locally if 'composer test' works in that repository? [15:46:43] (after 'composer install') [15:47:20] I don;t know if you meant to add that or copied it. [15:48:04] 10Continuous-Integration, 6Release-Engineering: Repositories with Ruby code should be documented and appropriate Jenkins jobs should be running - https://phabricator.wikimedia.org/T1361#1088737 (10zeljkofilipin) [15:48:43] 10Continuous-Integration, 6Release-Engineering: Repositories with Ruby code should be documented and appropriate Jenkins jobs should be running - https://phabricator.wikimedia.org/T1361#23935 (10zeljkofilipin) [15:49:02] Krinkle: it doesn't actually have any phpunit tests [15:52:16] Krinkle: `composer test` fails with an error: http://etherpad.wikimedia.org/p/cx-composer [15:53:00] aharoni: Unless you're using an unsupported version of php or composer, you can expect the same error to happen on Jenkins. [15:53:36] probably, but what does it even mean? [15:53:50] Is there any problem with Autoload? [15:54:31] Project beta-scap-eqiad build #43883: FAILURE in 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43883/ [15:56:16] Krinkle: (I'm running PHP 5.6.6 fwiw) [15:56:48] aharoni: Probably it's because no MediaWiki context. [15:56:50] And you can't add an array to null [15:56:54] Remove phpunit :) [15:59:41] Krinkle: that is after I removed phpunit [16:00:34] aharoni: Does the extension work at all if you load it in MediaWiki? [16:01:18] Krinkle: yes, I do it many times a day [16:02:03] aharoni: Ah, you're registering all of ContentTranslation.php in composer.json [16:02:16] that means when phplint and phpcs run they are forced to load that autoloader [16:02:31] I don't know how it should but I don't think composer is supposed to be used like that. [16:02:33] 6Release-Engineering: mediawiki-api gem should have integration tests - https://phabricator.wikimedia.org/T1330#1088796 (10greg) 5Resolved>3Open [16:02:41] (03CR) 10Zfilipin: Update Chrome, Firefox and Safari to the latest supported version (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [16:02:43] Maybe ask Bryan or Lego later. [16:02:44] <^d> greg-g: Further followup to your travel e-mail. How can we apply if we don't know where offsite is yet? [16:02:57] * ^d saw deadline is today and had a little panic [16:03:14] hashar: I see that you have +2d this: https://gerrit.wikimedia.org/r/#/c/194101/ [16:03:27] hashar: thanks :) did you deploy the jobs, or should I? [16:03:38] hashar: (looking at a few job config diff...) [16:03:46] aharoni: and ContentTranslation.php includes Autoloader.php [16:08:26] 6Release-Engineering, 7Browser-Tests, 5Patch-For-Review: Updated version of Firefox and Safari to the latest supported one by Sauce Labs - https://phabricator.wikimedia.org/T91389#1088817 (10zeljkofilipin) 5Open>3Resolved [16:09:48] hashar: ok, as far as I can see you did not update all browsertests-* jobs, doing that right now [16:15:04] Yippee, build fixed! [16:15:04] Project beta-scap-eqiad build #43886: FIXED in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43886/ [16:17:13] 6Release-Engineering, 7Browser-Tests, 5Patch-For-Review: Updated version of Firefox and Safari to the latest supported one by Sauce Labs - https://phabricator.wikimedia.org/T91389#1088876 (10zeljkofilipin) I have [[ https://lists.wikimedia.org/pipermail/qa/2015-March/002182.html | notified ]] QA mailing list... [16:17:30] Krinkle: I don't know what should I do here... what is supposed to be in the "autoload" part in composer.json? [16:18:12] aharoni: I don't know. I maintain the infrastructure to run the tests. For how to write tests or set up a repo to use composer, I'd ask Bryan, Lego or someone else in the core team. [16:18:23] Krinkle: k [16:18:25] aharoni: I suspect there might've been a mistake when the repo initially converted to use composer. [16:19:14] (03CR) 10Zfilipin: "James, if you think all/some Mac Jenkins jobs should be updated to Safari 8, please create phab ticket and cc me." [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [16:22:39] zeljkof: James_F|Away: just quick fyi: Safari 8 will use OSX 10.10, where Safari 7 uses OSX 10.9. Mutually exclusive in both directions. Might make tests behave slightly different. [16:26:10] !log integration-slave14xx are now provisioned and being added to the pool. Old trusty slaves will be depooled later and eventually deleted. [16:26:16] Logged the message, Master [16:29:30] Krinkle: James_F|Away: that is the reason I did not want to make the change yet [16:29:53] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1088945 (10Amire80) OK, the Jenkins part is probably done, but I need @Legoktm's help with https://gerrit.wikimedia.org/... [16:30:29] ^d: are you (going to be) in office today? [16:31:23] <^d> I had planned on it, I'm just slow getting ready (up late with Phab phun) [16:32:12] :) no worries, just moving our 1:1 to another time, and wanted to get you a room if you needed it [16:32:58] <^d> I had thought we had moved it to avoid the talk already :p [16:33:06] <^d> Also, ping re: what I asked ^^^^ [16:34:47] ^d: just apply for hackathon, say you'll be with RelEng offsite, do you need a location? [16:35:05] <^d> I figured I needed more details but ok [16:36:52] * greg-g looks [16:37:38] ah, the travel form... wait on that one, that one's not due today [16:37:41] ^d: ^ [16:38:10] rachel and karen are talking today about offsite stuffs [16:38:16] what time is the due date [16:38:17] <^d> "Just a reminder: if you want to participate in the Wikimedia Hackathon or Wikimania, the deadline for requesting travel approval is TODAY." [16:38:21] about wikitravel [16:38:28] midnight UTC?:p [16:38:48] arrgg, yes, i need to do that [16:39:25] mother... [16:39:37] so, that's not clear [16:40:01] there's 1) the form about requesting hackathon/wikimania and 2) the details of travel [16:40:06] I thought 1 is due today, and 2 isn't? [16:40:33] yeah, this is the only one due today: http://goo.gl/forms/kuX7mofzg0 [16:40:50] "travel approval" is an overloaded phrase [16:54:24] well I am there [16:54:25] :) [16:54:32] I have managed to get hangouts installed! [16:55:23] hashar: I'm pool-dancing now. :) [16:56:04] Krinkle: like that: http://i.imgur.com/pFf2IuC.gif ?? [16:56:11] !log Pooled integration-slave1401 [16:56:15] Logged the message, Master [16:56:21] pool! not pole :P [16:56:24] Krinkle: I have pre configured the new Precise slaves [16:56:28] but kept them offline for now [16:58:49] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089059 (10hashar) ``` integration-puppetmaster:~$ ruby -v ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux] ``` The previous instance was... [17:06:02] !log Pooled integration-slave1402, integration-slave1403, integration-slave1404, integration-slave1405 [17:06:06] Logged the message, Master [17:08:31] (03PS2) 10Dduvall: Move the list of repositories to mediawiki.org [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [17:11:03] !log Pooled integration-slave1201, integration-slave1202, integration-slave1203, integration-slave1204 [17:11:07] Logged the message, Master [17:11:29] (03CR) 10Dduvall: [C: 031] "That tidies up the README quite a bit. I tweaked the language slightly." [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [17:11:58] hashar: I removed the obsolete hasPhpUnit and hasPhpcs from the new instances [17:12:15] noticed your ones had it, the new I created for 1401 did not [17:12:50] 17:07:27 ERROR: Reference path does not exist: /srv/ssd/gerrit/operations/puppet.git on https://integration.wikimedia.org/ci/job//13443/console [17:14:14] mutante: Ha, I guess that was not puppetised. [17:14:36] mutante: since last thursday we've been re-creating out slaves from puppet and I just flipped the last switch a few minutes ago [17:14:38] the strange part is that it just happened [17:14:45] but worked a couple minutes ago [17:14:59] Our pool now contains both new and old slaves. [17:15:00] ah, yes, then that [17:15:19] mutante: The path error is not the reason it fails [17:15:20] so my change https://gerrit.wikimedia.org/r/#/c/194353/ got downvoted by jenkins [17:15:22] That's just a fallback [17:15:31] but it liked the previous PS [17:16:15] and the change is the next UTF-8 character fix [17:16:39] "17:07:43 Could not record history. Previous build's commit, 372117d773c0cc786066a8159c1ec8ae30bba40e, does not exist in the current repository." [17:17:08] mutante: That's all unrelated. Just how Jenkins works. It's very verbose. [17:17:21] mutante: By default it tries to find the previous build commit hash, ignore that. [17:17:26] the difference between PS1 and PS2 is an edit of the commit message [17:17:31] The error is that the tux script is using a package that is not installed. [17:17:34] but the voting result is different [17:17:38] ah [17:17:41] mutante: Yes, it's running on a different server now [17:17:43] look at the top [17:18:00] Which means whatever it was using before is not puppetised. [17:18:35] gotcha, so missing package [17:18:44] mutante: If you tell me how to globally install that package using pip or something, I can install it manually and add it to the list of hacks. [17:19:18] afaik don't use pip, we want deb packages to be installed [17:19:31] my guess is this was packaged for precise [17:19:34] but not for trusty [17:19:44] because the new instances are trusty [17:20:02] mutante: It's not production, these are unimportant. The new direction we're headed in for testing is to use localised installs instead of global ones (e.g. composer install, npm install). [17:20:10] and the fix would be getting it imported to the trusty repo [17:20:28] People can add and remove arbitrary tools at any point. [17:21:12] (03CR) 10Zfilipin: [C: 031] "+1 also for moving release notes to a separate file" [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [17:21:21] the "it's not production" is the reason for having a "list of hacks" though [17:21:34] i thought we want it to be as much as production as possible.. because ..testing [17:21:37] mutante: The reason it's broken is because this test is not ensuring its own dependencies. [17:21:48] mutante: We're not running phpunit in production. [17:22:44] mutante: We're never ever going to create debian packages for this. Nor will anyone else. We should debianize our infrastructure needs (zuul, jenkins, gearman etc.) but anything within the build itself by design must not. [17:22:59] The whole point is so that one can test whether something will work. Creating a package first creates a paradox. [17:23:05] ok, but i don't know how to install that package because i don't know which one it is [17:23:21] k [17:24:32] mutante: I didn't create this test. And it seems to be part of the legacy stack of test not maintained in the repo. Probably it was added at some point by someone in ops. And then left unmaintained. [17:24:41] The output is asking for 'console_scripts' [17:24:46] Perhaps that's the package the test needs [17:25:38] Ah, so it is in the repo [17:25:39] https://github.com/wikimedia/operations-puppet/blob/e55ab067ae5660a6fa91195e9dbb4daa89fdacbc/tox.ini#L7 [17:25:48] mutante: Does that work for you locally? [17:28:11] mutante: btw, Bug: should be in the footer. (similar to email, http and git headers) After the line break, it remains part of the body instead of meta data. [17:28:16] eh, i don't think ops added tests [17:28:49] i'm not sure how you mean "works locally" [17:28:58] i merged that UTF-8 fix though [17:29:03] mutante: In your local clone of the puppet repo, does this test work? [17:29:16] so the next bug on integration master should be gone [17:29:22] cool [17:30:24] ok, @ Bug: in the footer, i just wanted to add the link to upstream bug too [17:31:58] (03PS1) 10Krinkle: python-jobs: Use remoteonly-zuul instead of remote-zuul [integration/config] - 10https://gerrit.wikimedia.org/r/194357 [17:32:19] mutante: switch it :) Link in the body. meta-data for our purpose in the footer. [17:32:34] *nod* [17:32:49] mutante: https://gerrit.wikimedia.org/r/194357 fixes the path error. [17:33:35] !log "/usr/local/bin/tox: Permission denied" on integration-slave14xx instances [17:33:39] Logged the message, Master [17:33:45] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089194 (10Dzahn) >>! In T91453#1085388, @Krinkle wrote: > Similar: > ``` > invalid byte sequence in US-ASCII at /etc/puppet/manifests/role/statist... [17:34:08] !log "ImportError: Entry point ('console_scripts', 'tox') not found" on integration-slave12xx instances for operations-puppet-tox-data_admin_lint [17:34:12] Logged the message, Master [17:34:26] !log Depooling all new integation-slave12xx and integration-slave14xx instances again [17:34:30] Logged the message, Master [17:36:04] (03CR) 10Jforrester: "> James, if you think all/some Mac Jenkins jobs should be updated to Safari 8, please create phab ticket and cc me." [integration/config] - 10https://gerrit.wikimedia.org/r/194101 (https://phabricator.wikimedia.org/T91389) (owner: 10Zfilipin) [17:38:13] (03CR) 10Dzahn: [C: 031] python-jobs: Use remoteonly-zuul instead of remote-zuul [integration/config] - 10https://gerrit.wikimedia.org/r/194357 (owner: 10Krinkle) [17:39:47] 10Continuous-Integration: Re-create integration slaves - https://phabricator.wikimedia.org/T91524#1089205 (10Krinkle) 3NEW a:3Krinkle [17:40:00] 10Continuous-Integration: "/usr/local/bin/zuul-cloner" broken on new instances - https://phabricator.wikimedia.org/T90984#1089214 (10Krinkle) [17:40:01] 10Continuous-Integration: Re-create integration slaves - https://phabricator.wikimedia.org/T91524#1089213 (10Krinkle) [17:40:17] 10Continuous-Integration: On all slaves, /srv/deployment/integration/slave-scripts permissions went crazy - https://phabricator.wikimedia.org/T85969#1089220 (10Krinkle) [17:40:22] 10Continuous-Integration: Re-create integration slaves - https://phabricator.wikimedia.org/T91524#1089205 (10Krinkle) [17:40:42] 10Continuous-Integration, 6operations: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1089230 (10coren) p:5Triage>3Normal [17:41:34] 10Staging, 6operations: Package trebuchet-trigger for trusty - https://phabricator.wikimedia.org/T91463#1089237 (10coren) p:5Triage>3Normal [17:41:51] 10Continuous-Integration: "/usr/local/bin/tox: Permission denied" on integration-slave14xx instances - https://phabricator.wikimedia.org/T91525#1089241 (10Krinkle) 3NEW [17:42:22] (03CR) 10Legoktm: [C: 04-1] "Need to also update jjb (but I can do that when the time comes). See my comments on Iabc322c7b92c768c9a6e7d272a5816e32743a501 about why th" [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [17:42:46] 10Continuous-Integration: Fix "ImportError: Entry point ('console_scripts', 'tox') not found" on integration-slave12xx instances - https://phabricator.wikimedia.org/T91526#1089265 (10Krinkle) 3NEW a:3Krinkle [17:42:50] Krinkle: turns out that letting extensions be installed via composer completely breaks any other usage of composer [17:43:32] yippe. [17:44:00] 10Continuous-Integration: Pool new integration-slave12xx and integration-slave14xx instances and delete old ones - https://phabricator.wikimedia.org/T91524#1089280 (10Krinkle) [17:44:20] 10Continuous-Integration: Pool new integration-slave12xx and integration-slave14xx instances and delete old ones - https://phabricator.wikimedia.org/T91524#1089205 (10Krinkle) [17:44:34] hashar: So.. depooled again because suff is still broken in other ways. [17:45:33] 10Continuous-Integration: Pool new integration-slave12xx and integration-slave14xx instances and delete old ones - https://phabricator.wikimedia.org/T91524#1089205 (10Krinkle) [17:47:58] 10Continuous-Integration: Fix "ImportError: Entry point ('console_scripts', 'tox') not found" on integration-slave12xx instances - https://phabricator.wikimedia.org/T91526#1089343 (10Krinkle) a:5Krinkle>3None [17:52:14] hashar: just checking in before SoS, do you know what fundraising needs from us regarding https://phabricator.wikimedia.org/T86103 ? [17:52:55] it's currently listed as blocked but there's no explanation as to what they need [17:56:13] Is there a task to stop gerritbot spamming tasks with "now merged", given we have "Catrope mentioned this in rGVED61e1646536da: Resolve URLs in LinkContextItem." now? [17:56:45] Do we have that for all repositories? [18:03:54] thcipriani: https://wikitech.wikimedia.org/wiki/SWAT_deploys <- step 1: I added you to the list o' names [18:04:43] hooray! [18:04:59] btw, tech talk going on re Hack (php from FB ;) ) now [18:05:18] /join #wikimedia-office [18:06:03] http://www.youtube.com/watch?v=jqXqdqUhxy8 <-- if you're (anyone) curious [18:08:14] in there, definitely :) [18:10:29] yeah, this should be a fun one [18:32:54] (03PS1) 10Legoktm: Update to dev-master#05e196893b1225898de280ef8f97d5f2be684e8f [integration/composer] - 10https://gerrit.wikimedia.org/r/194364 (https://phabricator.wikimedia.org/T91176) [18:34:04] thcipriani: so, steps 2-n are: do a test/nop deploy with mukunda at some point (where you ssh in and type the commands). See the horrible https://wikitech.wikimedia.org/wiki/How_to_deploy_code . After that, be online in -operations at 8am pacific and volunteer to help. Brad (anomie) is usually there helping, and Chad of course. [18:35:19] 7Blocked-on-RelEng, 6Multimedia, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1089461 (10MarkTraceur) [18:35:43] greg-g: gotcha, thanks! [18:50:23] Yippee, build fixed! [18:50:23] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #514: FIXED in 30 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/514/ [18:57:47] 7Blocked-on-RelEng, 6Multimedia, 6Scrum-of-Scrums, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1089526 (10dduvall) [18:59:19] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089529 (10Dzahn) now all UTF-8 chars should be gone from all .pp files (but still in .erb files but that should not break things) grep -l --color... [19:03:20] Yippee, build fixed! [19:03:21] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #404: FIXED in 12 min: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/404/ [19:16:08] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #332: FAILURE in 37 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/332/ [19:32:20] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce build #347: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox-monobook-sauce/347/ [19:33:01] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #542: FAILURE in 32 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/542/ [19:37:05] Project browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #1: FAILURE in 41 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/1/ [19:40:23] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #389: FAILURE in 7 min 21 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/389/ [19:46:01] Krenair: All-ish. ^d's awesome work will make it all by the end of today, probably. [19:58:23] marxarelli: sorry about SoS :/ I was having dinner [19:58:31] hashar: no worries [19:58:57] i just told them we'd figure out ownership in our next weekly meeting [19:59:15] by that time it will be hopefully completed :) [19:59:27] it sounds like they're busy with pci certification right now anyway [19:59:38] oh, good! :) [19:59:51] at least they are all members of the labs project and Adam Wight is a project admin [20:00:10] next step is to create an instance, apply the magic puppet class to make it a slave and then add it to Jenkins [20:00:21] then it should be all about crafting a nice job in JJB / Zuul :) [20:01:27] 10Staging: Create staging-terbium - https://phabricator.wikimedia.org/T91543#1089749 (10greg) 3NEW [20:02:01] 10Staging: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#1089764 (10greg) 3NEW [20:02:31] 10Staging: Create staging-mc* (memcached) - https://phabricator.wikimedia.org/T91546#1089771 (10greg) 3NEW [20:02:47] 10Staging: Create staging-rdb (redis) - https://phabricator.wikimedia.org/T91547#1089777 (10greg) 3NEW [20:03:04] 10Staging: Create staging-mw-app* (MW App servers) - https://phabricator.wikimedia.org/T91548#1089783 (10greg) 3NEW [20:03:19] 10Staging: Create staging-wtp* (Parsoid runners) - https://phabricator.wikimedia.org/T91549#1089789 (10greg) 3NEW [20:03:39] 10Staging: Create staging-jobrunner (Job runners!) - https://phabricator.wikimedia.org/T91550#1089795 (10greg) 3NEW [20:04:26] 10Staging: Create staging-varnish**** - https://phabricator.wikimedia.org/T91551#1089801 (10greg) 3NEW [20:04:51] 10Staging: Create staging-elastic* (ElasticSearch machines) - https://phabricator.wikimedia.org/T91552#1089809 (10greg) 3NEW [20:05:11] 10Staging: Create staging-ms-fe* / staging-ms-be* (swift frontend/backend) - https://phabricator.wikimedia.org/T91553#1089815 (10greg) 3NEW [20:05:57] 10Staging: Create staging-ocg* (OCG servers) - https://phabricator.wikimedia.org/T91555#1089828 (10greg) 3NEW [20:06:12] 10Staging: Create staging-mw-api* (MW Api servers) - https://phabricator.wikimedia.org/T91556#1089834 (10greg) 3NEW [20:06:33] 10Staging: Create staging-tmh* (Video scalers) - https://phabricator.wikimedia.org/T91557#1089841 (10greg) 3NEW [20:06:56] 10Staging: Create staging-mw-imagescalers - https://phabricator.wikimedia.org/T91558#1089847 (10greg) 3NEW [20:06:59] oh my god :D [20:07:18] 10Staging: Create staging-logstash* - https://phabricator.wikimedia.org/T91559#1089853 (10greg) 3NEW [20:07:33] 10Staging: Create staging-rcs* (RC stream) - https://phabricator.wikimedia.org/T91560#1089860 (10greg) 3NEW [20:07:49] 10Staging: Create staging-eventlogging - https://phabricator.wikimedia.org/T91561#1089867 (10greg) 3NEW [20:08:08] 10Staging: staging-mx (Mail server, pollonium replacement) - https://phabricator.wikimedia.org/T91562#1089873 (10greg) 3NEW [20:08:35] :) [20:08:36] sorry [20:09:29] 10Staging: Create staging-mx (Mail server, pollonium replacement) - https://phabricator.wikimedia.org/T91562#1089873 (10greg) [20:09:34] we did it! we officially own our house! [20:09:43] :D [20:10:03] just thought i'd add to the spam [20:10:08] congrats! [20:11:14] thanks! we'll have to throw a party next allhands or something [20:12:17] greg-g: looks like you're staging a party above ;) [20:15:26] har har har [20:15:27] ;) [20:16:17] 6Release-Engineering, 10Staging, 3releng-201415-Q3: Determine code update cycle/cadence for the staging cluster - https://phabricator.wikimedia.org/T91563#1089885 (10greg) 3NEW [20:18:24] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #552: FAILURE in 29 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/552/ [20:20:42] 6Release-Engineering: Have Jenkins update nightly.wmflabs.org once per day - https://phabricator.wikimedia.org/T1112#1089904 (10greg) 5Open>3declined Plans changed, now see: {T88701} [20:20:42] 6Release-Engineering: [Quarterly Goal] Create tested nightly builds - https://phabricator.wikimedia.org/T426#1089908 (10greg) [20:20:46] 6Release-Engineering: [Quarterly Goal] Create tested nightly builds - https://phabricator.wikimedia.org/T426#4479 (10greg) [20:20:47] 6Release-Engineering: Add "Nightly" domains to Beta Cluster - https://phabricator.wikimedia.org/T1111#1089910 (10greg) 5Open>3declined Plans changed, now see: {T88701} [20:20:49] 6Release-Engineering: Enable multiversion (het-deploy) on Beta Cluster (for nightly.wmflabs) - https://phabricator.wikimedia.org/T805#1089915 (10greg) 5Open>3declined Plans changed, now see: {T88701} [20:20:50] 6Release-Engineering: [Quarterly Goal] Create tested nightly builds - https://phabricator.wikimedia.org/T426#4479 (10greg) [20:20:54] 6Release-Engineering: Create use cases for two test cluster ("Beta Cluster" and the yet-to-be-created and yet-to-be-named cluster) - https://phabricator.wikimedia.org/T427#1089925 (10greg) [20:20:57] 6Release-Engineering: [Quarterly Goal] Create tested nightly builds - https://phabricator.wikimedia.org/T426#1089920 (10greg) 5Open>3declined a:3greg Plans changed, now see: {T88701} [20:23:15] 6Release-Engineering, 10Staging: Run browser test suite against staging cluster - https://phabricator.wikimedia.org/T806#1089928 (10greg) [20:23:44] 6Release-Engineering, 10Staging: Run browser test suite against staging cluster - https://phabricator.wikimedia.org/T806#13416 (10greg) [20:23:45] 6Release-Engineering, 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Green nightly builds on the staging cluster (tracking) - https://phabricator.wikimedia.org/T88701#1089931 (10greg) [20:24:19] 6Release-Engineering, 10Staging: Run browser test suite against staging cluster - https://phabricator.wikimedia.org/T806#13416 (10greg) [20:24:20] 6Release-Engineering: Add "Nightly" domains to Beta Cluster - https://phabricator.wikimedia.org/T1111#1089936 (10greg) [20:24:21] 6Release-Engineering: Enable multiversion (het-deploy) on Beta Cluster (for nightly.wmflabs) - https://phabricator.wikimedia.org/T805#1089935 (10greg) [20:24:22] 6Release-Engineering: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1089938 (10greg) [20:24:23] 6Release-Engineering: [Quarterly Goal] Create tested nightly builds - https://phabricator.wikimedia.org/T426#1089933 (10greg) [20:24:24] 6Release-Engineering: Have Jenkins update nightly.wmflabs.org once per day - https://phabricator.wikimedia.org/T1112#1089937 (10greg) [20:25:18] (03CR) 10Krinkle: [C: 032] "Deployed 52 affected *-tox-* jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/194357 (owner: 10Krinkle) [20:26:35] 10Quality-Assurance, 7Browser-Tests, 7Easy: Select a set of stable tests for smoke test suite - https://phabricator.wikimedia.org/T52576#1089943 (10greg) [20:26:43] 10Quality-Assurance, 7Browser-Tests, 7Easy: Select a set of stable tests for smoke test suite - https://phabricator.wikimedia.org/T52576#571746 (10greg) p:5Low>3Normal [20:27:10] 10Quality-Assurance, 7Browser-Tests, 7Easy: Select a set of stable tests for smoke test suite - https://phabricator.wikimedia.org/T52576#571746 (10greg) [20:31:25] 10Continuous-Integration, 10Quality-Assurance, 7Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#1089960 (10greg) [20:31:52] (03Merged) 10jenkins-bot: python-jobs: Use remoteonly-zuul instead of remote-zuul [integration/config] - 10https://gerrit.wikimedia.org/r/194357 (owner: 10Krinkle) [20:32:40] 10Quality-Assurance: Run smoke tests on hermetic instances per patchset submitted in mediawiki/core - https://phabricator.wikimedia.org/T54424#1089981 (10greg) [20:32:41] 10Continuous-Integration, 10Quality-Assurance, 7Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted to extensions that run on WMF production - https://phabricator.wikimedia.org/T54425#540581 (10greg) [20:33:34] 10Continuous-Integration, 10Quality-Assurance, 7Browser-Tests: Run subset of browser tests on isolated CI instances per commit submitted in mediawiki/core - https://phabricator.wikimedia.org/T54424#1089986 (10greg) [20:33:43] 10Quality-Assurance: Fix failing Selenium tests for PhantomJS browser on local machine - https://phabricator.wikimedia.org/T51813#1089992 (10greg) [20:33:44] 10Continuous-Integration: Create hermetic test environment runnable on labs infra - https://phabricator.wikimedia.org/T53492#1089990 (10greg) [20:34:22] 10Continuous-Integration, 6operations, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089996 (10Krinkle) @dzahn The last occurrence: ``` Mar 4 18:12:35 integration-puppetmaster puppet-master[1325]: Could not parse for environment... [20:35:16] (03CR) 10Hashar: [C: 031] "Dduvall dont be shy, +2 it :]" [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [20:35:33] hashar: Hey :) [20:35:43] hashar: Did you see the latest drama? [20:35:55] Krinkle: which one ? :D [20:36:02] hashar: new instances depooled again [20:36:11] hashar: more permissions and dependency issues [20:36:16] This time in tox [20:36:17] :-((( [20:36:44] hashar: btw, what you did to make zuul-cloner work on precise, can you add that to the setup page? [20:36:56] python setup.pyp [20:37:01] python setup.py install [20:37:11] I indent to re-create the instances at least every month. To weed out all issues and clarify our live hacks to a simple bash script. [20:37:19] which downloaded the python-statsd 2.0 or so it needed [20:37:31] intend* [20:37:37] :) [20:37:46] thx [20:37:53] are there any hacks beside the zuul install process ? [20:37:58] hashar: https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup .. [20:38:03] please add the exact commands and cwd [20:38:24] There used to be more hacks but I puppetised a few of them. [20:38:48] ahh the node / npm craziness :( [20:40:08] pfff [20:40:23] I am fighting with Debian: dpkg-buildpackage: cannot combine -S and -F [20:40:37] Krinkle: nothing to add to the page, all is puppetized [20:40:47] brb nature's call [20:41:12] hashar: But not zuul install, right? that obviously failed, as we found out this week. That required the pip install bypassing debian for hte momnet. [20:41:27] Or can I re-create slave1201 now and zuul-cloner will be installed and working? [20:42:02] it will not [20:42:09] cause of python-statsd :/ [20:42:13] Exactly [20:42:16] we need 2.0 [20:42:19] So, let's document the hack [20:42:24] apt.wm.o now provides 3.0 [20:42:39] So that next time re-creating instances will not take a week of what-was-the-hack-again cascading error waterfall [20:42:50] just run the provision command ( should be: python setup.py install ) [20:42:58] In which directory [20:43:00] ie without the http_proxy=. [20:43:03] please document it :) [20:43:05] in /usr/local/src/zuul [20:43:15] sure :D [20:43:19] brb [20:46:22] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #503: FAILURE in 4 min 23 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/503/ [20:48:17] (03CR) 10AndyRussG: [C: 031] "Thanks! Done: https://gerrit.wikimedia.org/r/#/c/193552/" [integration/config] - 10https://gerrit.wikimedia.org/r/193361 (https://phabricator.wikimedia.org/T86092) (owner: 10Zfilipin) [20:49:48] 10Continuous-Integration, 6operations, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090047 (10hashar) I managed to get a rough package locally using a random upstream commit without any of our hack. Good progress so far. [20:50:08] 10Continuous-Integration, 6operations, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090049 (10hashar) a:3hashar [20:53:26] 10Continuous-Integration, 5Patch-For-Review: Have jenkins jobs logrotate their build history - https://phabricator.wikimedia.org/T91396#1090060 (10hashar) I have updated most of the jobs. There is still a hundredish that are missing the daysToKeep though. A full list can be gathered via: ssh gallium.wikim... [20:56:01] Krinkle: done https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Integration/Setup&diff=146902&oldid=146232 [20:56:13] 10Continuous-Integration: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown - https://phabricator.wikimedia.org/T50002#516358 (10Krinkle) [20:58:48] 10Continuous-Integration: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown - https://phabricator.wikimedia.org/T50002#1090096 (10Krinkle) [20:59:31] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #497: FAILURE in 16 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/497/ [21:01:44] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #504: FAILURE in 43 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/504/ [21:08:59] 10Continuous-Integration, 10MediaWiki-Database, 10MediaWiki-ResourceLoader: Fix "DatabaseSqlite::replace/single-row NOT NULL constraint failed" for md_module table - https://phabricator.wikimedia.org/T91567#1090122 (10Krinkle) 3NEW a:3Krinkle [21:09:08] 10Continuous-Integration: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown - https://phabricator.wikimedia.org/T50002#1090131 (10Krinkle) Known errors to fix first: `SqlBagOStuff::set/single-row 5 database is locked REPLACE INTO objectcache` (T89180) Found every run of med... [21:12:32] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #348: FAILURE in 42 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/348/ [21:13:14] hashar: Just checking, but the patch is only needed for Precise instances for Zuul cloner install, right? [21:18:33] 7Blocked-on-RelEng, 6Multimedia, 6Scrum-of-Scrums, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1090183 (10Tgr) See also the //Guidance on creating Debian packages for puppet// ops thread from January. The consensus there seemed to be that apt packages are an over... [21:19:45] 10Continuous-Integration, 10MediaWiki-Database, 10MediaWiki-ResourceLoader: Fix "DatabaseSqlite::replace/single-row NOT NULL constraint failed" for md_module table - https://phabricator.wikimedia.org/T91567#1090196 (10Krinkle) [21:20:21] Krinkle: should be for Trusty as well [21:20:53] Krinkle: should be *needed* for Trusty as well [21:21:30] hashar: Oh? [21:21:37] hashar: So you applied it there as well? [21:21:50] It was working on trusty instances. I already pooled them secretly yesterday for a while to test it out. [21:22:12] Good to know :) I'll update my edit [21:24:49] 10Continuous-Integration, 6operations, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090218 (10hashar) My super lame commands: Create a cow image for debian sid: cowbuilder --create --debug \ --basepath /p... [21:25:01] Krinkle: nop. I noticed the Trusty intances were just fine [21:25:08] so I havent played with them at all [21:25:15] ok [21:25:45] hashar: Feel like tackling tox tomorrow? [21:25:55] I'd love to do it myself, but I really don't know that area at all. [21:26:15] if I am on cc of the task, sure [21:26:29] though I really want to get some Zuul debian package by the end of the week [21:28:03] (03PS2) 10AndyRussG: WIP CentralNotice browser tests: even more platforms and browsers [integration/config] - 10https://gerrit.wikimedia.org/r/193556 (https://phabricator.wikimedia.org/T86092) [21:28:59] (03CR) 10AndyRussG: [C: 04-1] "This PS: just rebased, needs some changes..." [integration/config] - 10https://gerrit.wikimedia.org/r/193556 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [21:30:44] hashar: I tracked them both via [21:30:44] https://phabricator.wikimedia.org/T91524 [21:32:32] (03CR) 10Dduvall: [C: 032] "Ok, fine. :)" [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [21:32:47] (03Merged) 10jenkins-bot: Move the list of repositories to mediawiki.org [selenium] - 10https://gerrit.wikimedia.org/r/194339 (https://phabricator.wikimedia.org/T91486) (owner: 10Zfilipin) [21:34:12] 10Continuous-Integration, 10MediaWiki-ResourceLoader, 10MediaWiki-Unit-tests: Fix "DatabaseSqlite::replace/single-row NOT NULL constraint failed" for md_module table - https://phabricator.wikimedia.org/T91567#1090255 (10Krinkle) [21:44:13] Krinkle: /usr/local/bin/tox sounds bad :D [21:44:34] ahh [21:44:44] so when python setup.py install got run for Zuul [21:44:48] that upgraded all dependencies :( [21:44:53] which is not really what I wanted [21:44:59] :( [21:45:07] This is on trusty instance though [21:45:23] both show a related issue: they have /usr/local/bin/tox [21:45:24] (03PS3) 10AndyRussG: CentralNotice browser tests: even more platforms and browsers [integration/config] - 10https://gerrit.wikimedia.org/r/193556 (https://phabricator.wikimedia.org/T86092) [21:45:37] though tox should be provided by the Debian package [21:45:42] or maybe it is [21:45:45] I figured maybe something in puppet using the wrong chmod/umask [21:45:54] But if it's a debian package that wouldn't matter [21:47:42] So I've worked out how to get the metrics I need from varnish, so that we can build staging cluster availability metrics based on returned http status codes [21:47:45] What I haven't worked out is where to store them. Would logstash be appropriate? [21:47:47] bd808 ^ [21:47:56] aohghe [21:48:12] # Bring tox/virtualenv... from pip bug 44443 [21:48:12] package { 'tox': [21:48:12] provider => 'pip', [21:48:15] Krinkle: ^^ [21:48:17] twentyafterfour: hmmm... maybe? [21:48:32] hashar: aha, yeah, that probably needs umask [21:48:37] Krinkle: but we dont ensure latest. So the old versions have some version of tox, the newly created one have a new version [21:48:40] :) [21:48:40] since it's not apt [21:48:52] Krinkle: and tox 1.9.0 would have some breakage. But yeah might be umask [21:48:55] hashar: Over the lat 2-3 months the default umask in puppet chaned to be root only [21:49:03] changed* [21:49:37] twentyafterfour: "logstash" doesn't really store anything, but there is an elasticsearch cluster paired with it. [21:50:04] find: `/usr/local/lib/python2.7/dist-packages/tox': Permission denied [21:50:04] find: `/usr/local/lib/python2.7/dist-packages/tox-1.9.0.egg-info': Permission denied [21:50:07] Krinkle: yeah umask for the win :( [21:50:12] Krinkle: good catch [21:50:16] twentyafterfour: Is this data that seems like the right thing to stuff in elasticsearch? [21:51:43] bd808: hmm .. well it's just a number that I need to summarize over time and graph for the sake of reporting as a metric on our quarterly reviews [21:51:52] graphite? [21:52:12] greg-g: yeah that was the other one I considered [21:52:20] but I have no clue how to get anything into graphite [21:52:29] twentyafterfour: That sounds like graphite data [21:53:24] it's just throwing a small packet at a port [21:53:43] metric_name value timestamp\n [21:55:13] twentyafterfour: YuviPanda can help you I bet. He sticks lots of things in labs graphite servers [21:55:25] I DIDN’T DO IT [21:55:26] oh [21:55:43] I’m in a very shaky bus [21:55:46] reading backscroll [21:55:55] YuviPanda: teach twentyafterfour the tao of collecting data to graph [21:56:07] yesssss massssterrrr [21:56:08] labs? wouldn't this belong in production graphite? [21:56:20] yes [21:56:27] just the principles are the same [21:56:31] "staging cluster availability metrics" sounded like beta [21:56:33] yeah, just tweak URL [21:56:53] I guess it depends on where the thing getting the data lives... [21:56:56] labs graphite is hosted on raw metal as well, and is considered a ‘production service' [21:57:19] hashar: Unles you're on it already, I'll write a puppet patch later for tox/umask and let it re-install. [21:57:47] > So I've worked out how to get the metrics I need from varnish, so that we can build staging cluster availability metrics based on returned http status codes [21:57:52] twentyafterfour: ^ how are you getting them? [21:57:55] python? bash? [21:58:24] YuviPanda: varnishtop -b -i RxStatus -1 [21:58:54] I'd have to parse that to get each metric name (since there would be a few lines of output from that command) [21:59:40] moment. [22:00:47] brb 5 mins? [22:00:50] i swear [22:01:00] YuviPanda: no rush. example output: https://phabricator.wikimedia.org/P357 [22:01:42] wait... we have this data in prod already I think [22:01:58] there should be a collector for it that you can just setup for beta [22:02:04] bd808: but I need it for the 'staging' cluster [22:02:52] sure, but we know how to graph 500/4xx response rates in prod is what I'm saying [22:03:13] right on I didn't see that in graphite but it's been a while since I looked [22:03:33] * bd808 tries to remember where that stuff it [22:03:35] *is [22:04:39] twentyafterfour: https://gdash.wikimedia.org/dashboards/reqerror/ [22:05:24] https://github.com/wikimedia/operations-puppet/blob/production/files/gdash/dashboards/reqerror/01.5xx.graph [22:05:43] bd808: I want to compare it to the rate of 200+300 errors so that I can compute an 'availability' metric [22:06:20] so it needs to be something like 2xx + 3xx / 5xx [22:06:35] https://github.com/wikimedia/operations-puppet/blob/b4d1ad8b6f7823b23284c821226753bfff2be6e2/files/udp2log/sqstat.pl [22:06:38] or rather .. 5xx / (2xx+3xx) [22:09:53] twentyafterfour: *nod* graphite can do all that on the fly if you have the raw stats. Which I think would be the thing to get added in to monitor the new environment you are talking about [22:10:24] (03PS1) 10Hashar: Add .gitreview [integration/zuul] (upstream-debian-sid) - 10https://gerrit.wikimedia.org/r/194397 [22:10:26] (03PS1) 10Hashar: Vcs-* points to openstack-infra now [integration/zuul] (upstream-debian-sid) - 10https://gerrit.wikimedia.org/r/194398 [22:10:27] The way that the stats are collected in prod looks like its tied into the analytics data stream from varnishkafka [22:10:28] (03PS1) 10Hashar: (WIP) Bump to 2.0.0 [integration/zuul] (upstream-debian-sid) - 10https://gerrit.wikimedia.org/r/194399 (https://phabricator.wikimedia.org/T48552) [22:10:49] (03CR) 10Hashar: "check experimental" [integration/zuul] (upstream-debian-sid) - 10https://gerrit.wikimedia.org/r/194399 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [22:13:28] bd808: twentyafterfour hey! am back [22:13:46] so looks like prod does this using udp2log / varnishkafka [22:13:54] and I’m going to assume you don’t want to set *that* up [22:14:07] bd808: twentyafterfour a simple diamond collector should suffice [22:14:09] * YuviPanda finds example [22:14:42] Project beta-scap-eqiad build #43923: FAILURE in 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43923/ [22:14:48] varnishtop gives me the info I want. I don't know anything about varnishkafka [22:14:48] twentyafterfour: https://gerrit.wikimedia.org/r/#/c/192335/ has a decent example [22:15:15] so you will have to write a simple bit of python that executs the commands you need, reads the data, and uses self.publish. [22:15:25] diamond will take care of running it every minute and publishing it to appropriate graphite host [22:15:30] varnishkafka is a full request log stream from the varnish boxes. way more than the test env needs [22:16:01] ok [22:18:17] Krinkle: please do the puppet patch. I am packaging zuul meanwhile. And really going to bed :D [22:18:37] Krinkle: I think I got something working [22:19:27] bd808: YuviPanda thanks guys [22:19:41] twentyafterfour: is there a bug for this? [22:19:49] dpkg-deb: building package `zuul' in `../zuul_2.0.0-1_all.deb'. !!!!! [22:20:09] hashar: And the other zuul error (there were two in that task) [22:20:26] Something we can pip install somehow? [22:20:44] YuviPanda: https://phabricator.wikimedia.org/T88705 [22:20:45] console_scripts,tox or whatever that is [22:22:53] (03PS2) 10Hashar: (WIP) Bump to 2.0.0 [integration/zuul] (upstream-debian-sid) - 10https://gerrit.wikimedia.org/r/194399 (https://phabricator.wikimedia.org/T48552) [22:23:51] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090394 (10mmodell) [22:24:12] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090396 (10mmodell) a:3mmodell [22:24:27] Krinkle: same issue I guess [22:24:35] heading to sleep sorry :-( [22:24:37] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090397 (10yuvipanda) Perhaps, but I'll note that this also has flaws - if the nginx / varnish machines themselves are down. or DNS is down and people can't reach thi... [22:26:59] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090413 (10mmodell) @yuvipanda: yes I talked to chase about this a while back but he didn't seem to think he had anything we could immediately apply to our stuff, at... [22:33:29] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090443 (10yuvipanda) So our current txstatsd implementation interprets 'no new data' to be 'repeat last data point' (yes, stupid) rather than 'no new data'. I'm sure... [22:35:03] Yippee, build fixed! [22:35:03] Project beta-scap-eqiad build #43925: FIXED in 57 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/43925/ [23:18:51] (03CR) 10BryanDavis: [C: 031] Update to dev-master#05e196893b1225898de280ef8f97d5f2be684e8f [integration/composer] - 10https://gerrit.wikimedia.org/r/194364 (https://phabricator.wikimedia.org/T91176) (owner: 10Legoktm) [23:22:05] (03CR) 10Legoktm: [C: 032] Update to dev-master#05e196893b1225898de280ef8f97d5f2be684e8f [integration/composer] - 10https://gerrit.wikimedia.org/r/194364 (https://phabricator.wikimedia.org/T91176) (owner: 10Legoktm) [23:22:13] (03CR) 10Legoktm: [V: 032] Update to dev-master#05e196893b1225898de280ef8f97d5f2be684e8f [integration/composer] - 10https://gerrit.wikimedia.org/r/194364 (https://phabricator.wikimedia.org/T91176) (owner: 10Legoktm) [23:24:19] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090622 (10bd808) >>! In T88705#1090443, @yuvipanda wrote: > So our current txstatsd implementation interprets 'no new data' to be 'repeat last data point' (yes, stup... [23:35:25] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090661 (10greg) >>! In T88705#1090443, @yuvipanda wrote: > My personal preference would be to just wait for Chase to do the prod stuff and just use exactly the same... [23:38:07] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<33.33%) [23:45:11] 6Release-Engineering: Try out hack ( ^d: want to add that to the possible hackathon projects? ^ [23:46:08] 6Release-Engineering, 10Wikimedia-Hackathon-2015: Try out hack ( (yes) [23:46:37] :) [23:50:35] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090725 (10mmodell) @yuvipanda: the number is cumulative, so repeating the last data point would be the same as no new data. [23:53:29] 10Staging, 3releng-201415-Q3: [Quarterly Success Metric] Stable uptime metrics of the Staging cluster - https://phabricator.wikimedia.org/T88705#1090737 (10chasemp) I'm in ping central today gentlemen. My 2 cents. Yes, txstatsd is not great and will die. Yes, it seems to confuse no metrics with "submit somet...