[01:23:49] legoktm: What about https://gerrit.wikimedia.org/r/#/c/204983/ ? [01:24:01] (And friends.) [02:35:31] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #663: FAILURE in 2 min 30 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/663/ [04:06:42] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #530: FAILURE in 14 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/530/ [04:20:34] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #536: FAILURE in 13 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/536/ [04:57:49] all images seem broken on beta, is that a known issue? [05:06:32] looks like maybe a varnish server is down? connection refused for http://upload.beta.wmflabs.org/wikipedia/en/c/ca/Some_bug_after_deleting_a_page.png [05:08:35] wikitech says ostriches deleted deployment-cache-upload02 12 hours ago [05:12:27] upload.beta.wmflabs.org -> deployment-cache-upload04.deployment-prep.eqiad.wmflabs now [05:12:34] varnish is running there [05:13:15] but nothing is listening on port 80 [05:19:06] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #161: FAILURE in 3 min 5 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/161/ [05:21:31] !log varnish-fe on deployment-cache-upload04.deployment-prep.eqiad.wmflabs not starting because nginx isn't starting because ssl cert is missing. No port 80 listener to serve images [05:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [05:21:59] ostriches: ^ not sure how to fix but you'll figure it out in the morning I'm sure [06:03:32] 10Beta-Cluster, 10Pywikibot-OAuth: Set up a Pywikibot OAuth test client on the Beta cluster - https://phabricator.wikimedia.org/T104764#1538414 (10VcamX) 5Open>3Resolved [06:27:31] Yippee, build fixed! [06:27:31] Project browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #711: FIXED in 8 min 30 sec: https://integration.wikimedia.org/ci/job/browsertests-Core-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/711/ [07:23:54] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce build #129: FAILURE in 14 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce/129/ [07:52:07] 10Browser-Tests, 10Wikidata: [Task] Browsertests for merging items - https://phabricator.wikimedia.org/T101500#1538561 (10Jonas) [07:52:34] 10Browser-Tests, 10Wikidata: [Task] Browsertests for merging items - https://phabricator.wikimedia.org/T101500#1340592 (10Jonas) Is this still open? Description should be improved. [08:13:20] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #691: FAILURE in 3 min 19 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/691/ [08:18:54] 10Browser-Tests, 10Wikidata: [Task] Browsertests for Special:MergeItems - https://phabricator.wikimedia.org/T101500#1538617 (10Tobi_WMDE_SW) [08:19:27] 10Browser-Tests, 10Wikidata: [Task] Browsertests for Special:MergeItems - https://phabricator.wikimedia.org/T101500#1340592 (10Tobi_WMDE_SW) @jonas yes, this is still not done. I've changed the title and added a description. [08:23:20] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n, 7Wikimedia-log-errors: l10n-update not updating Vector - https://phabricator.wikimedia.org/T103879#1538623 (10Reedy) It'd be interesting to see if this is actually all skins, only vector, all extensions... Or just anything that isn't core [11:28:33] (03PS7) 10Paladox: Update Blueprint tests [integration/config] - 10https://gerrit.wikimedia.org/r/226542 [11:28:39] (03PS8) 10Paladox: Update Blueprint tests [integration/config] - 10https://gerrit.wikimedia.org/r/226542 [11:29:01] (03CR) 10Paladox: "This can be merged now since required patch has been merged." [integration/config] - 10https://gerrit.wikimedia.org/r/226542 (owner: 10Paladox) [12:54:59] Yippee, build fixed! [12:54:59] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #564: FIXED in 58 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/564/ [13:15:18] bd808: Bah! It was working before.... [13:28:42] bblack: I thought we had puppet running on the beta varnishes, but nginx won't start on -upload04. [13:28:51] (same usual snafu w/ ssl certs) [13:30:37] yeah [13:30:46] I had to manually install the private key on the one we got working [13:31:05] (the one that's actually exposed in the labs repo anyways and shouldn't be :/) [13:54:12] bblack: Meh, that was just on -text04 right? [13:54:36] yeah [13:55:17] you'll see on text04 /etc/ssl/private/star.wmflabs.org.key [13:55:22] that's the file I had to put in manually [13:55:38] it gets nginx loading, but the cert is still invalid for a regular browser [13:55:41] There's a puppet thing for it iirc. [13:57:00] Er, was. I can't find it now [14:27:09] (03PS11) 10Reedy: Add jenkins tests for EditUser [integration/config] - 10https://gerrit.wikimedia.org/r/228500 (owner: 10Paladox) [14:29:05] (03PS12) 10Paladox: Add jenkins tests for EditUser [integration/config] - 10https://gerrit.wikimedia.org/r/228500 [15:09:51] integration and beta instances should be back up shortly. [15:26:01] (03CR) 10Thcipriani: [C: 032] "Tested on soft errors during some downtime provided by the labvirt1003 reboot. Works as expected.2" [tools/scap] - 10https://gerrit.wikimedia.org/r/231442 (https://phabricator.wikimedia.org/T109007) (owner: 10BryanDavis) [15:26:25] (03Merged) 10jenkins-bot: Return super().main() when overriding AbstractSync.main() [tools/scap] - 10https://gerrit.wikimedia.org/r/231442 (https://phabricator.wikimedia.org/T109007) (owner: 10BryanDavis) [15:42:22] 10Beta-Cluster, 10Continuous-Integration-Infrastructure, 10MediaWiki-API, 7Pywikibot-tests: prevent modules with broken paraminfo being deployed to production wikis - https://phabricator.wikimedia.org/T108322#1539877 (10Umherirrender) >>! In T108322#1524624, @Legoktm wrote: >>>! In T108322#1523073, @Anomie... [15:54:05] (03PS1) 10Florianschmidtwelzow: Remove WikiGrok as dependency from mediawiki-core phpunit [integration/config] - 10https://gerrit.wikimedia.org/r/231579 [16:14:23] !log fixed deployment-cache-upload04 [16:14:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:14:32] bd808: All fixed. Meh [16:15:53] ostriches: what was wrong? [16:15:59] ssl bs. [16:16:07] * greg-g grumbles [16:16:31] don't grumble greg-g. It is all in the name of progress [16:16:43] yay! [16:16:49] up and to the right! [16:17:09] mobile's still fucked up, and bits is now yelling about role::cache::bits no longer existing. [16:17:16] varnishes in beta clsuter are getting to the point where bb.lack might actually be able to test things there [16:17:20] yay! [16:17:24] bits is dead in prod [16:17:33] (re bd808, not ostriches ;) ) [16:17:34] or mostly dead anyway [16:17:42] right [16:17:46] dead in all but dns [16:18:00] the bits vhsot got moved behind the text-lb I think [16:18:18] yeah, somethign like that [16:18:44] so ostriches you may just be able to point the dns name at another cache and kill the dedicated one [16:18:50] Yeah I knew we were moving that way [16:19:01] Lemme try switching it to point at the text cache [16:19:05] it happened like last week I think [16:21:31] bits.beta.wmflabs.org is now pointing at text04 [16:21:56] resolving properly for me now [16:24:10] beta still looking good for folks? styled? js? bits resolving to 208.80.155.135? [16:25:45] Released all the old DNS [16:26:00] 208.80.155.137 shouldn't resolve to anything now [16:27:28] bblack: bits-specific cache is dead in beta now too, just using text :) [16:29:49] we'll see how the browser tests go :) [16:30:03] bd808: mobile's screwy because varnish is already bound on :80 [16:30:07] Stray process? [16:31:02] Yeah, it's already running but puppet ain't happy [16:34:36] better. [16:35:12] varnish::zero_update is still fubar, but otherwise we're good now [16:35:16] (!log that shit ;) ) [16:36:52] images are showing up :) -- http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:Search&search=File&fulltext=Search&profile=images [16:37:03] some thumbs borked but not all [16:37:43] Erm, what? https://bits.beta.wmflabs.org/w/load.php?debug=false&lang=en&modules=site&only=styles&skin=vector&* [16:37:51] I wonder if we are missing a thumb on 404 handler somewhere? [16:37:59] w/o the https too. [16:38:41] errr... not being processed as php [16:39:03] we've seen this before... not remembering the fix [16:39:13] it's something in the apache config [16:39:17] Yeah [16:39:50] where does that stuff hide now? modules/apache? [16:40:26] modules/mediawiki/files/apache/beta/sites [16:41:08] Yeah. [16:41:10] Weird place. [16:41:35] no hhvm block [16:42:59] Should just copy+paste from wikimedia.conf I think [16:44:37] See also: stupid duplication of apache config. [16:46:00] yeah... we shoudl template those files. it's really on the hostnames that shoudl change [16:49:30] 10Continuous-Integration-Infrastructure, 10Wikidata: [Bug] github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1540131 (10Jonas) [16:50:00] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: [Task] generate patch code coverage on gerrit patch-set upload for wikibase.git - https://phabricator.wikimedia.org/T88435#1540134 (10Jonas) [16:50:34] bd808: Yeah, I have a task for it [16:51:14] All good [16:52:28] ostriches: should we just fix it? ;) [16:52:38] I started on it [16:52:42] https://gerrit.wikimedia.org/r/#/c/197655/ [16:59:08] create_resources() might be able to do most of what needs to be done -- https://docs.puppetlabs.com/references/latest/function.html#createresources [16:59:44] which can be powered by hiera lookups [16:59:54] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1540195 (10mmodell) A key quote from the github issue: >"If you run with -vvvv you will see exactly what... [17:00:24] reasonably long example at https://ask.puppetlabs.com/question/1655/an-end-to-end-roleprofile-example-using-hiera/?answer=1656#post-id-1656 [17:03:36] 10Deployment-Systems, 10RESTBase, 6Services, 6operations, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1540209 (10GWicke) > So it appears that this isn't really fixable and unfortunately detracts from ansible's... [17:12:43] bd808: Amended. feedback welcome [17:17:54] looks like an ok start [17:23:26] works in beta afaict [18:20:44] Reedy: btw, like the bugmail spam in #wikimedia-log-errors :) [18:29:34] 6Release-Engineering, 15User-greg: Create #releng-201516-q2 goals - https://phabricator.wikimedia.org/T109115#1540615 (10greg) 3NEW a:3greg [18:35:20] 18:34 < James_F> greg-g: Also, is config dirty? [18:35:23] 18:34 < James_F> greg-g: 20+ hours of queued beta-mediawiki-config-update-eqiad generally means that, doesn't it? [18:36:06] (based on the skills matrix): marxarelli thcipriani ostriches: ^^ [18:36:16] * greg-g automates away management [18:37:09] * James_F grins. [18:37:41] (Should we have a bot that warns when jobs are queued > 1 hour on Zuul?) [18:37:54] probably? [18:38:03] * James_F makes. [18:38:13] see also: https://phabricator.wikimedia.org/T108750 [18:38:17] James_F: ^ [18:38:24] * James_F nods. [18:38:28] hmm, looks like all these jobs are just waiting on deployment-bastion [18:38:30] oh, you're gonna make a bot? or a task? :) [18:38:41] greg-g: Task. Management, remember. :_) [18:38:58] James_F: was gonna say :) [18:39:04] thcipriani: yeah, disconnect/reconnect? [18:39:19] yeah, trying that first seems like the right hting [18:39:25] * greg-g nods [18:39:31] 10Continuous-Integration-Infrastructure, 10Wikimedia-IRC: Have a bot warn in RelEng(?) IRC when anything is queued in CI for > 1 hour - https://phabricator.wikimedia.org/T109118#1540686 (10Jdforrester-WMF) [18:39:34] greg-g: https://phabricator.wikimedia.org/T109118 [18:39:49] ty [18:40:33] 10Continuous-Integration-Infrastructure, 10Wikimedia-IRC: Have a bot warn in -releng IRC when anything is queued in CI for > 1 hour - https://phabricator.wikimedia.org/T109118#1540691 (10greg) [18:41:05] 10Continuous-Integration-Infrastructure: Have a bot warn in -releng IRC when anything is queued in CI for > 1 hour - https://phabricator.wikimedia.org/T109118#1540695 (10Legoktm) [18:42:12] !log disconnected and reconnected deployment-bastion jenkins slave [18:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:43:56] !log killed some of the queued jobs (beta-scap etc) via clicking on the red X [18:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:44:34] the beta-mediawiki-config-update-eqiad job is not letting me kill it [18:45:41] hmm, yeah, me either [18:46:07] disconnect the slave, then kill the job, before reconnecting? [18:46:56] 10Continuous-Integration-Infrastructure: Have a bot warn in -releng IRC when anything is queued in CI for > 1 hour - https://phabricator.wikimedia.org/T109118#1540718 (10Krenair) Not #Wikimedia-IRC. [18:48:23] yeah, lemme see if that works after the fact... [18:48:43] Yippee, build fixed! [18:48:43] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #2401: FIXED in 2 min 42 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/2401/ [18:50:58] greg-g: did you keep trying to delete the jobs after I took the slave offline? Or did they just go away after I disconnected the slave? [18:51:16] I haven't deleted any since I !log'd [18:51:40] (I don't believe...) def not in the last 3 minutes [18:52:03] kk, the job that wouldn't delete went away when I disconnected the slave [18:52:25] neat [18:52:32] !log disconnect/reconnect for deployment-bastion jenkins slave—left over stalled jobs went away [18:52:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:53:32] Yippee, build fixed! [18:53:33] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #410: FIXED in 31 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/410/ [18:53:39] thcipriani: ty [18:59:39] marxarelli: (not time urgent) Why the big spike on 8/11 in failed tests (tests overall, too): https://integration.wikimedia.org/ci/view/BrowserTests/view/-Dashboard/ [19:32:10] greg-g: er, which spike? [19:49:23] marxarelli: good question :) [19:49:45] I had my date wrong [19:50:23] looks like a # of jobs spike on 8-01, with a spike in both green and red, then a spike in failing on 8-03 [19:53:28] it seems to correspond directly to an increase in builds overall [19:53:48] so perhaps it was a bunch of new tests that were unstable [19:54:20] 6Release-Engineering, 15User-greg: Create #releng-201516-q2 goals - https://phabricator.wikimedia.org/T109115#1540947 (10greg) a:5greg>3None [19:54:23] 6Release-Engineering, 15User-greg: Update Master Project List (MPL) by end of Q1 / before Q2 FY 2015-2016 - https://phabricator.wikimedia.org/T108629#1540948 (10greg) a:5greg>3None [19:54:23] 6Release-Engineering, 15User-greg: Get RelEng team members greater access - https://phabricator.wikimedia.org/T107926#1540949 (10greg) a:5greg>3None [19:54:25] 6Release-Engineering, 15User-greg: Review team ownership of projects/things listed on the Developers/Maintainers page - https://phabricator.wikimedia.org/T106751#1540950 (10greg) a:5greg>3None [19:54:43] marxarelli: /me nods [19:55:04] just curious if you had an idea of if that meant "bad things" or "not really an indication of good or bad things" [19:55:24] "need more data" i think [19:55:57] word [19:56:11] if you aren't worried, we can wait :) [19:56:13] but never "really bad things" when it comes to browser tests [19:56:14] :) [20:05:46] 10Staging, 5Patch-For-Review: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#1540985 (10greg) a:5thcipriani>3None [20:29:56] greg-g: all the browser test views should be actual dashboards now, with trend charts, etc. [20:30:15] neato [20:30:29] so it should be easier to pinpoint massive failure events to a single project [20:30:44] * marxarelli wipes the Jenkins off his hands [20:31:33] good luck, it's pretty sticky [20:34:33] might have to use the gasoline from our mower [21:57:33] 6Release-Engineering, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1541433 (10Tgr) I'll just upload the correct files then: {F1496921} {F1496922} [23:02:32] 10Browser-Tests, 10MobileFrontend, 3Reading-Web: MobileFrontend Selenium tests do not use page object pattern - https://phabricator.wikimedia.org/T65620#1541722 (10Jdlrobson) [23:32:12] Hello, I want to ask something about the beta cluster: [23:32:46] beta eswiki is a wiki for the contenttranslationproject, but since more than 2 months there was nobody, but there are a lot of spambots, at this wiki [23:33:42] So, my proposal is, that we put this wiki to read-only mode, till somebody wants to use it again [23:34:22] Luke081515: go ahead and make a task with that as the proposal, putting it in both #beta-cluser and #contenttranslation-cxserver [23:34:44] I'd want to make sure the language team isn't still using it/don't want to surprise them [23:34:50] ok [23:35:00] thanks for helping :) [23:35:08] no problem ;) [23:44:17] 10Beta-Cluster, 10ContentTranslation-cxserver: Put beta eswiki to read-only mode - https://phabricator.wikimedia.org/T109157#1541866 (10Luke081515) 3NEW [23:44:29] done [23:45:38] 10Beta-Cluster, 10ContentTranslation-cxserver: Put beta eswiki to read-only mode - https://phabricator.wikimedia.org/T109157#1541873 (10Luke081515) p:5Triage>3High [23:46:57] (real) dewiki is down at the moment, do you know why? [23:47:09] yeah, unfortunately [23:47:09] Yup [23:47:10] Being fixed [23:47:11] all sites are down [23:47:16] ok, thanks [23:53:51] (03PS1) 10Niedzielski: Add Android emulator wrapper [integration/jenkins-job-builder] - 10https://gerrit.wikimedia.org/r/231722 (https://phabricator.wikimedia.org/T107336) [23:56:44] 6Release-Engineering, 10Continuous-Integration-Config, 3Mobile-App-Sprint-63-Android-Europium, 5Patch-For-Review, 3Wikipedia-Android-App: Create jenkins slave instance dedicated to Android runs - https://phabricator.wikimedia.org/T107336#1541892 (10Niedzielski) @thcipriani, we're excited to get our tests...