[02:10:53] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<60.00%) [02:12:17] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:12:18] PROBLEM - App Server bits response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:07] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 47862 bytes in 0.555 second response time [02:17:07] RECOVERY - App Server bits response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.002 second response time [03:47:34] 10Continuous-Integration, 10Wikipedia-Android-App, 10Wikipedia-iOS-App: Jenkins should run tests for the Wikipedia app before merge - https://phabricator.wikimedia.org/T62720#1196734 (10bearND) p:5Lowest>3High [05:36:48] 10Beta-Cluster: /var/lib/l10nupdate fills up deployment-bastion /var partition - https://phabricator.wikimedia.org/T95564#1196813 (10mmodell) Can't we just make that path into a hiera variable? [06:37:55] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:38:24] 10Deployment-Systems, 7Epic: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1196846 (10mmodell) [06:38:25] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1196848 (10mmodell) [06:39:33] 10Continuous-Integration, 10Wikipedia-Android-App, 10Wikipedia-iOS-App: Jenkins should run tests for the Wikipedia app before merge - https://phabricator.wikimedia.org/T62720#1196857 (10Legoktm) Are there more details about what needs to be done for this? does the jenkins plugin need to be installed? or can... [06:45:35] (03PS1) 10Krinkle: Update graphs for property renames in Graphite [integration/docroot] - 10https://gerrit.wikimedia.org/r/203289 (https://phabricator.wikimedia.org/T90111) [06:46:13] (03CR) 10Krinkle: [C: 032] Update graphs for property renames in Graphite [integration/docroot] - 10https://gerrit.wikimedia.org/r/203289 (https://phabricator.wikimedia.org/T90111) (owner: 10Krinkle) [06:46:15] (03Merged) 10jenkins-bot: Update graphs for property renames in Graphite [integration/docroot] - 10https://gerrit.wikimedia.org/r/203289 (https://phabricator.wikimedia.org/T90111) (owner: 10Krinkle) [06:58:03] morning Krinkle [06:58:24] twentyafterfour: did someone just point l10nupdate onto NFS? [06:58:30] it’s using up way too much network bandwidth... [06:59:53] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:00:11] 10Deployment-Systems, 7Epic: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1196873 (10mmodell) [07:01:08] PissedPanda: on deployment-bastion... [07:01:12] werdna: Hi [07:01:20] yeah? [07:01:26] I see gzip processes running [07:01:35] and doing enough IO to cause an alert :) [07:01:55] well i suspect that might be related to https://phabricator.wikimedia.org/T95564 [07:02:21] I was under the impression that it used to be on nfs until recently [07:02:53] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [07:02:59] twentyafterfour: ah, nope [07:03:04] twentyafterfour: put that in /srv please [07:03:09] no touching NFS on beta [07:03:29] PissedPanda: I haven't touched it, other than trying to make it a hiera configurable path [07:03:34] oh [07:03:35] hmm [07:03:53] I don’t see anything in https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep for it [07:03:55] either [07:03:55] good evening! [07:04:27] PissedPanda: I just submitted a patch with a default value on the variable ... didn't put it in hiera yet [07:04:34] right [07:04:51] anwyay, gotta sleep now :) [07:16:53] 10Beta-Cluster, 10Citoid: On beta cluster citoid should self update and reload after change is merged - https://phabricator.wikimedia.org/T95652#1196889 (10hashar) 3NEW [07:24:56] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:44] Krinkle: thanks to have updated the Graphs links on the zuul status page :) [07:50:52] yw :) [08:05:48] (03PS1) 10Legoktm: Allow specifying jobs to use to merge queues in DependentPipelineManager [integration/zuul] - 10https://gerrit.wikimedia.org/r/203290 [08:06:17] hashar: ^ [08:07:34] RECOVERY - Citoid on deployment-sca01 is OK: HTTP OK: HTTP/1.1 200 OK - 865 bytes in 0.030 second response time [08:14:59] legoktm: ahhhh wonderful :D [08:15:09] legoktm: gotta deploy the Zuul package on gallium [08:15:28] might be able to cherry pick your patch and some others next week [08:16:09] awesome :) [08:17:06] legoktm: ideally you will want to push it to upstream [08:17:11] but it is a bit late for you to do so :D [08:17:28] is this something they would want? [08:18:25] 10Beta-Cluster: beta-scap-eqiad no more run due to ssh Permission denied - https://phabricator.wikimedia.org/T95562#1196969 (10hashar) 5Open>3Resolved [08:18:54] legoktm: maybe [08:19:01] I made a point of upstreaming everything [08:19:10] if it is covered by tests it is probably fine for them [08:19:48] ok, I'll do that tomorrow [08:22:59] off to sleep now, good night :) [08:41:05] 10Continuous-Integration: Re-create ci slaves (April 2015) - https://phabricator.wikimedia.org/T94916#1197019 (10Krinkle) >>! In T94916#1191787, @hashar wrote: >> (/Stage[main]/Zuul/Package[zuul]/ensure) E: Unable to locate package zuul > > I have switched the instances to the Zuul Debian package via cherry pi... [08:56:43] (03PS18) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) [08:57:53] !log Fixed puppet failure for missing Zuul package on integration-dev by applying patch-integration-slave-trusty.sh [08:57:57] (03CR) 10Hashar: "Got rid of dh_override_autoinstall since it was empty" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [08:57:58] Logged the message, Master [09:02:13] !log integration: Refreshed Zuul packages under /home/hashar [09:02:16] Logged the message, Master [09:26:02] (03PS1) 10Hashar: Drop lucene-search-2 [integration/config] - 10https://gerrit.wikimedia.org/r/203295 [09:28:46] (03CR) 10Hashar: [C: 032] Drop lucene-search-2 [integration/config] - 10https://gerrit.wikimedia.org/r/203295 (owner: 10Hashar) [09:28:50] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [09:32:13] (03Merged) 10jenkins-bot: Drop lucene-search-2 [integration/config] - 10https://gerrit.wikimedia.org/r/203295 (owner: 10Hashar) [09:48:04] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » hy,contintLabsSlave && UbuntuTrusty build #38: ABORTED in 19 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=hy,label=contintLabsSlave%20&&%20UbuntuTrusty/38/ [09:56:17] 10Continuous-Integration: the first time a Jenkins language screenshots job runs, it checks out an old VisualEditor revision - https://phabricator.wikimedia.org/T95668#1197209 (10Amire80) [09:57:35] 10Continuous-Integration: the first time a Jenkins language screenshots job runs, it checks out an old VisualEditor revision - https://phabricator.wikimedia.org/T95668#1197203 (10Amire80) [10:07:35] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » hy,contintLabsSlave && UbuntuTrusty build #39: SUCCESS in 18 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=hy,label=contintLabsSlave%20&&%20UbuntuTrusty/39/ [10:29:51] hasharAway: https://tools.wmflabs.org/nagf/?project=integration#h_integration-puppetmaster_cpu [10:29:57] cpu.system looks crazy [10:30:09] looks fine on other instances [10:31:56] !log Pool integration-slave-precise-1011 [10:31:58] Logged the message, Master [10:35:16] !log Creating integration-slave-precise-1012...integration-slave-precise-1014 [10:35:18] Logged the message, Master [10:36:11] PROBLEM - Host integration-slave1405 is DOWN: CRITICAL - Host Unreachable (10.68.16.238) [10:38:53] PROBLEM - Host integration-slave1404 is DOWN: CRITICAL - Host Unreachable (10.68.17.208) [10:39:30] !log Deleting the old integration1401...integration1405 instances. They've been depooled for 24h and their replacements are OK. This is to free up quota to create new Precise instances. [10:39:33] Logged the message, Master [10:42:07] RECOVERY - Puppet failure on integration-dev is OK: OK: Less than 1.00% above the threshold [0.0] [10:42:11] PROBLEM - Host integration-slave1401 is DOWN: CRITICAL - Host Unreachable (10.68.17.179) [10:42:49] PROBLEM - Host integration-slave1402 is DOWN: CRITICAL - Host Unreachable (10.68.17.195) [10:45:19] PROBLEM - Host integration-slave1403 is DOWN: CRITICAL - Host Unreachable (10.68.17.207) [10:53:01] (03PS1) 10Zfilipin: Move WB_REPO_PASSWORD environment variable to Jenkins Credentials plugin store [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) [11:00:27] hasharAway, Krinkle: is there a problem with Jenkins? it is really really slow [11:00:30] (03CR) 10WMDE-Fisch: [C: 031] Move WB_REPO_PASSWORD environment variable to Jenkins Credentials plugin store [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) (owner: 10Zfilipin) [11:16:49] zeljkof: Anything in particular? Only two changes being tested right now. Seems fine. [11:17:15] Krinkle: does not let me create new jobs via the web interface [11:17:24] testing if it works via JJB [11:20:02] !log Jenkins unable to re-establish Gearman connection. Full restart. [11:20:07] Logged the message, Master [11:21:13] Not sure if that was related.. [11:21:31] It's a new kind of error. https://wikitech.wikimedia.org/wiki/Release_Engineering/Argh [11:23:33] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:25:17] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:25:55] OK. that's a first. After a restart it still won't connect to Gearman [11:26:25] K, got it [11:26:32] !log Re-established Gearman connection from Jenkins [11:26:36] Logged the message, Master [11:29:39] !log Fixed job "Global-Dev Dashboard Data" to be restricted to node "gallium" because it fails to connect to gp.wmflabs.org from lanthanum 1/2 builds. [11:29:41] Logged the message, Master [11:29:46] hasharAway: ^ Is that job still needed? Should be in jjb [11:29:51] It still runs and seems to work [11:34:50] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:38:07] Krinkle: do you have any idea what that "Global-Dev Dashboard Data" job is ? [11:38:15] hasharAway: I do not. [11:38:23] https://github.com/wikimedia/analytics-global-dev-dashboard-data [11:38:26] It polls that repo [11:38:29] You created it in 2013 [11:38:34] doh [11:38:37] :D [11:39:16] it is only run from time to time [11:40:06] !log Deleting various Jenkins jobs that can be safely deleted (recently removed from jjb-config). Will report the rest to T91410 for inspection. [11:40:09] Logged the message, Master [11:43:38] !log Filled https://phabricator.wikimedia.org/T95675 to migrate "Global-Dev Dashboard Data" to JJB/Zuul [11:43:40] Logged the message, Master [11:43:56] 10Continuous-Integration: Migrate Jenkins job "Global-Dev Dashboard Data" to JJB and Zuul trigger - https://phabricator.wikimedia.org/T95675#1197427 (10hashar) [11:46:22] Krinkle: don't you have a script to run commands on all ci slaves ? [11:46:28] I am wondering whether we should use our salt master [11:46:35] hashar: https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup#dsh-ci-slaves [11:46:55] https://github.com/Krinkle/dotfiles/blob/master/hosts/KrinkleMac/dsh-ci-slaves.list [11:46:55] https://github.com/Krinkle/dotfiles/blob/master/hosts/KrinkleMac/bin/dsh-ci-slaves [11:46:58] That's what I usually use [11:47:12] Update the list though :) [11:47:23] ah [11:47:28] so with salt [11:47:33] we could label nodes [11:48:17] I am wondering whether I should cohost it with integration-puppetmaster or create another instance [11:52:01] (03CR) 10Hashar: [C: 031] "Deploy at anytime :) You will also want to change the password!" [integration/config] - 10https://gerrit.wikimedia.org/r/203309 (https://phabricator.wikimedia.org/T89343) (owner: 10Zfilipin) [12:09:42] bah Zuul stalled [12:11:03] Aye [12:37:35] PROBLEM - SSH on deployment-lucid-salt is CRITICAL: Connection refused [12:38:14] 10Continuous-Integration, 7Technical-Debt: Delete old jobs not (or no longer) managed by JJB - https://phabricator.wikimedia.org/T91410#1197507 (10Krinkle) Recently, we cleaned up a lot of job declarations in integration-config repository. However, they still existed on gallium. I purged 889 jobs from gallium... [12:38:38] 10Continuous-Integration, 7Technical-Debt: Delete old jobs not (or no longer) managed by JJB - https://phabricator.wikimedia.org/T91410#1197518 (10Krinkle) [12:38:39] 10Continuous-Integration: Migrate Jenkins job "Global-Dev Dashboard Data" to JJB and Zuul trigger - https://phabricator.wikimedia.org/T95675#1197519 (10Krinkle) [12:48:19] 10Continuous-Integration: Store Jenkins build output outside Jenkins (e.g. static storage) - https://phabricator.wikimedia.org/T53447#1197563 (10Krinkle) p:5Normal>3Low [12:49:43] 10Continuous-Integration: Store Jenkins build output outside Jenkins (e.g. static storage) - https://phabricator.wikimedia.org/T53447#1197569 (10Krinkle) OpenStack does do this, but they're working on phasing out Jenkins entirely in favour of letting Zuul communicate with Gearman directly. It may be interesting... [12:50:58] 10Continuous-Integration: Zuul: python git assert error assert len(fetch_info_lines) == len(fetch_head_info) - https://phabricator.wikimedia.org/T61991#1197576 (10Krinkle) [12:56:23] RECOVERY - Puppet failure on integration-slave-trusty-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [12:59:00] 10Continuous-Integration, 7Performance: Add a voting YSlow job to Jenkins - https://phabricator.wikimedia.org/T59137#1197604 (10Krinkle) 5Open>3declined a:3Krinkle @ori has been working on various benchmark tools. If he or other teams have a specific need and proposal for jobs that should run after each... [13:00:32] 10Continuous-Integration: Zuul: Job name regex is a bit confusing - https://phabricator.wikimedia.org/T49847#496222 (10Krinkle) [13:00:44] 10Continuous-Integration: Zuul layout job name regex is a bit confusing - https://phabricator.wikimedia.org/T49847#1197617 (10Krinkle) [13:01:25] 10Continuous-Integration: Zuul layout job name regex is a bit confusing - https://phabricator.wikimedia.org/T49847#496222 (10Krinkle) 5Open>3Resolved a:3Krinkle C. Scott's explanation makes perfect sense. I'm not sure there's anything left to do here. Feel free to re-open. [13:04:02] 10Continuous-Integration: JJB: generates duplicates jobs with different parameters - https://phabricator.wikimedia.org/T53461#1197626 (10Krinkle) 5Open>3Resolved a:3Krinkle This appears to be resolved. Feel free to re-open. [13:13:04] 10Continuous-Integration, 7Puppet: Puppet run interrupted by "puppet-agent: Caught TERM; calling stop" - https://phabricator.wikimedia.org/T95683#1197667 (10Krinkle) 3NEW [13:29:03] 10Continuous-Integration, 7Technical-Debt: Delete old jobs not (or no longer) managed by JJB - https://phabricator.wikimedia.org/T91410#1197710 (10JanZerebecki) > ```operations-puppet-catalog-compiler``` That is used to manually simulate changes to operations/puppet.git so it needs to be migrated to JJB. [13:30:16] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [13:38:35] RECOVERY - Puppet failure on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [13:39:49] RECOVERY - Puppet failure on integration-slave-precise-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [13:50:15] !log Pool integration-slave-precise-1012..integration-slave-precise-1014 [13:50:19] Logged the message, Master [13:52:31] 10Continuous-Integration: Re-create ci slaves (April 2015) - https://phabricator.wikimedia.org/T94916#1197784 (10Krinkle) [13:53:21] 10Continuous-Integration: Re-create ci slaves (April 2015) - https://phabricator.wikimedia.org/T94916#1175877 (10Krinkle) The new integration-slave-precise-10xx instances have been successfully provisioned and are now pooled. [13:53:28] 10Continuous-Integration: Re-create ci slaves (April 2015) - https://phabricator.wikimedia.org/T94916#1197786 (10Krinkle) 5Open>3Resolved [13:54:17] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ka,contintLabsSlave && UbuntuTrusty build #40: SUCCESS in 2 hr 3 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ka,label=contintLabsSlave%20&&%20UbuntuTrusty/40/ [14:08:08] (03PS1) 10Hashar: tests: factor out fake classes in a module [integration/config] - 10https://gerrit.wikimedia.org/r/203336 [14:09:40] Yippee, build fixed! [14:09:40] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #584: FIXED in 38 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/584/ [14:18:49] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1197907 (10demon) Sure. Add it to mediawiki/* ACL and it'll inherit. Should be already...wonder why it's not :) [14:20:11] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » bn,contintLabsSlave && UbuntuTrusty build #41: SUCCESS in 20 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=bn,label=contintLabsSlave%20&&%20UbuntuTrusty/41/ [14:39:35] Yippee, build fixed! [14:39:35] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » gu,contintLabsSlave && UbuntuTrusty build #41: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=gu,label=contintLabsSlave%20&&%20UbuntuTrusty/41/ [15:14:00] 10Deployment-Systems, 7Epic, 3releng-201415-Q4: EPIC: The future of MediaWiki deployment: Tooling - https://phabricator.wikimedia.org/T94620#1198087 (10greg) [15:14:32] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198088 (10matmarex) 3NEW [15:14:59] 10Continuous-Integration, 6operations: Upload jenkins-debian-glue for Jessie on apt.wikimedia.org - https://phabricator.wikimedia.org/T95006#1198095 (10hashar) a:5hashar>3None [15:15:40] 10Continuous-Integration, 6operations: Upload jenkins-debian-glue for Jessie on apt.wikimedia.org - https://phabricator.wikimedia.org/T95006#1177778 (10hashar) Packages for Jessie are available at http://people.wikimedia.org/~hashar/debs/jenkins-debian-glue/ . The task is now pending upload by #operations . [15:16:35] (03PS1) 10Hashar: (WIP) debian-glue job for Zuul (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/203347 [15:26:35] greg-g: "Super special morning pre-SWAT SWAT"? [15:27:34] needs something about fridays [15:28:04] oh, that was an actual thing on monday. okay. [15:28:07] 10Continuous-Integration: job creation permission on jenkins for WMDE-Fisch - https://phabricator.wikimedia.org/T95546#1198135 (10JanZerebecki) We have process to maintain the ldap group nda. IMHO juristic person centric groups like wmde and wmf should not be used for permissions. [15:28:15] Yeah. But is it a recurring one? [15:28:18] James_F: eh? [15:28:33] oh, uhhhh [15:28:36] greg-g: In https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=0&oldid=153121 you duped this week for next. [15:28:39] greg-g: … yeah. [15:28:49] I was totally paying attention [15:28:54] :-D [15:28:57] * greg-g deletes [15:29:16] Robla interrupted me with mini postcards from the town I grew up in (Hannibal, MO) from the 1950s-ish [15:30:41] Aww. [15:31:15] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198141 (10greg) [15:31:44] the one of downtown: "Yep, that grain elevator is gone, and that bridge, too." [15:31:48] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198144 (10Jdforrester-WMF) Does it need them? Is there anyone who'll use it? [15:34:07] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198145 (10matmarex) Possibly; I filed this because of a question on IRC (`#mediawiki`). It's not unreasonable to want to run a MediaWiki de... [15:38:09] 3Continuous-Integration-Isolation, 10Ops-Access-Requests, 6operations: Grant hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1198163 (10Andrew) hashar, I will raise this issue during the Ops meeting on Monday. [15:43:31] 6Release-Engineering, 10MediaWiki-Debug-Logging, 6Security-Team, 6operations, 5Patch-For-Review: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1198164 (10Andrew) @fgiunchedi why is daily rotation not practical for unsampled logs? Too big? [15:49:17] 6Release-Engineering, 10MediaWiki-Debug-Logging, 6Security-Team, 6operations, 5Patch-For-Review: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1198174 (10fgiunchedi) @andrew yes, difficult to grep and compress and trim if needed [16:09:12] 10Continuous-Integration, 5Patch-For-Review: Migrate all debian-glue jobs to Jessie slaves - https://phabricator.wikimedia.org/T95545#1198251 (10Andrew) [16:09:13] 10Continuous-Integration: Create CI slaves using Debian Jessie (tracking) - https://phabricator.wikimedia.org/T94836#1198252 (10Andrew) [16:49:51] 10Browser-Tests, 6Mobile-Web, 10MobileFrontend: add metadata to ChunkyPNG image - https://phabricator.wikimedia.org/T67274#1198354 (10Jdlrobson) This was needed back in the days we supported photo uploads. When uploading a photo without meta data an alert would show reminding the user that they should be upl... [16:51:16] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Mobile-Web, 10MobileFrontend, 6Multimedia: add metadata to ChunkyPNG image - https://phabricator.wikimedia.org/T67274#1198365 (10greg) MediaViewer folks: Is this something you want? [16:53:12] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Mobile-Web, 10MobileFrontend, 6Multimedia: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#1198370 (10Jdlrobson) [17:02:42] 10Continuous-Integration, 5Patch-For-Review: Migrate all debian-glue jobs to Jessie slaves - https://phabricator.wikimedia.org/T95545#1198401 (10hashar) I have applied labels [[ https://integration.wikimedia.org/ci/label/DebianJessie/ | DebianJessie ]] and [[ https://integration.wikimedia.org/ci/label/DebianGl... [17:03:40] (03CR) 10Hashar: [C: 04-2] "Have to apply the new function to *debian-glue* jobs. Probably want to migrate all of them to the new job template in one go." [integration/config] - 10https://gerrit.wikimedia.org/r/203347 (owner: 10Hashar) [17:16:26] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198438 (10Legoktm) Yeah, we advertise it as a way to install MediaWiki purely through git without composer, so it would be a good idea if t... [17:17:17] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1198440 (10demon) Sounds good. I'll add it to my todo list for things to fix in make-extension-branches. [17:33:57] 6Release-Engineering, 6Mobile-Web: mwext-MobileFrontend-qunit-mobile issues again - https://phabricator.wikimedia.org/T95430#1198535 (10phuedx) [17:41:58] 10Continuous-Integration, 6Release-Engineering, 6Mobile-Web: mwext-MobileFrontend-qunit-mobile issues again - https://phabricator.wikimedia.org/T95430#1198592 (10greg) [18:05:25] 10Continuous-Integration, 6Collaboration-Team, 10Flow, 6Mobile-Web, and 2 others: Create Jenkins builds for Editing across repositories (MobileFrontend, VisualEditor etc) - https://phabricator.wikimedia.org/T90647#1198773 (10Jdlrobson) [18:15:41] 10Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 6Mobile-Web, 10MobileFrontend, 6Multimedia: Should be possible in browser tests to use images with meta data or without meta data - https://phabricator.wikimedia.org/T67274#1198908 (10Tgr) It's not clear to me what this task is about. As for Chunk... [18:43:33] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1199116 (10GWicke) [18:47:42] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1199130 (10GWicke) [18:48:26] 6Release-Engineering, 6Multimedia, 6Parsoid-Team, 6Services, and 2 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1199132 (10Qgil) [18:49:04] 10Deployment-Systems, 6Release-Engineering, 6Services, 6operations: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1136887 (10GWicke) [18:50:31] 6Release-Engineering, 6MediaWiki-API-Team, 10MediaWiki-Debug-Logging, 10Wikimedia-Logstash, and 2 others: Log php fatals with full backtraces again (fatal.log on fluorine) - https://phabricator.wikimedia.org/T89169#1199157 (10bd808) >>! In T89169#1189640, @Legoktm wrote: > Progress! We now have logs that l... [19:08:23] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199228 (10MarkAHershberger) oops, sorry for the wrong bug number and the noise. [19:08:49] andre__: 'stalled' is supposed to mean 'waiting for specific thing X', right? I have the feeling many people interpret it at 'the progress has stalled' instead. [19:08:56] (in phabricator) [19:14:59] valhallasw`cloud: it comes form RT so has no specific meaning I guess apart from 'something is blocking this so this is stalled in progress' (as I've seen it used in RT) [19:16:18] JohnFLewis: right, that's how I interpret it as well. However, I feel the interpretation 'progress on this bug has stalled' (i.e. 'no-one has picked this up') is also valid, and it's causing people to move stuff from 'open' to 'stalled' because of that [19:16:57] valhallasw`cloud: tbh, stalled in a bad status and should be phased out from its original team and axed [19:17:05] sounds good [19:18:01] valhallasw`cloud, https://www.mediawiki.org/wiki/Bug_management/Bug_report_life_cycle [19:18:02] If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on, the Stalled status is temporarily given. [19:18:32] hm, maybe we should also merge Phabricator/ and Bug_management/ [19:20:35] andre__: I say we axe it personally but not up to me :) [19:21:59] JohnFLewis, oh, and regarding your question from a few years ago: Yes, users could change their email address in Bugzilla (tied to their user account which has an internal unique ID) [19:22:24] I'll open a task with some notes on it. [19:22:48] andre__: now that makes things slightly more annoying regarding Chris' acceptance of the bugzilla dump process :/ [19:23:00] JohnFLewis: what's your phab username? [19:23:16] meh [19:23:23] valhallasw`cloud: JohnLewis :) [19:29:18] hum, #phabricator is no longer pushed here of course. https://phabricator.wikimedia.org/T95746#1199301 [19:31:23] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199328 (10Kghbln) I believe that it was possible to tag extensions in the past. Think one had to be the repo owner, but I am not absolutely s... [19:32:27] valhallasw`cloud: think this should be in -devtools not -releng [19:33:29] JohnFLewis: yeah, it is, but -devtools has been pretty silent since phabricator was live and running :-) [19:33:33] it's also in -dev [19:33:51] valhallasw`cloud: well make it noisy :) [19:33:56] :"D [19:34:13] I've tried to cut back a bit on the number of channels, as everyone is everywhere anyway [19:34:33] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199333 (10demon) >>! In T94412#1199328, @Kghbln wrote: > I believe that it was possible to tag extensions in the past. Think one had to be th... [19:46:13] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 5MW-1.25-release: mediawiki/vendor repo has no release branches like REL1_25 - https://phabricator.wikimedia.org/T95704#1199415 (10demon) 5Open>3Resolved a:3demon Created REL1_25 branch from b208abfd6ee56f128c0e0179b2cafcedf6fa5033. [19:52:34] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199466 (10Mglaser) Just tried again. It works in `mediawiki/packages/WPI`. However, I tried to push tags in `mediawiki/extensions/BlueSpiceFo... [20:03:59] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199544 (10demon) https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/BlueSpiceExtensions,access https://gerrit.wikimedia.org... [20:05:46] 3Continuous-Integration-Isolation, 10Ops-Access-Requests, 6operations: Grant hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1199550 (10hashar) We spoke about it during our weekly Friday checkin. Agreed root access would be a convenience to bootstrap the servi... [20:13:11] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199566 (10Mglaser) OK, I remove the rules in BlueSpiceFoundation (so I should inherit everything). Still no success. How long does it take fo... [20:24:52] Project beta-scap-eqiad build #48507: FAILURE in 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/48507/ [20:24:58] 10Continuous-Integration: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#1199594 (10hashar) 3NEW [20:37:42] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199641 (10RobH) [20:39:15] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1179075 (10RobH) [20:39:42] 10Continuous-Integration, 3Continuous-Integration-Isolation, 10hardware-requests, 6operations: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1199645 (10hashar) 3NEW [20:43:28] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199668 (10RobH) [20:47:28] 10Continuous-Integration: job creation permission on jenkins for WMDE-Fisch - https://phabricator.wikimedia.org/T95546#1199670 (10JanZerebecki) Btw. see https://wikitech.wikimedia.org/wiki/Volunteer_NDA for the process, ignore that it says in review, it can be followed, if you want to. What I neglected to quest... [21:05:18] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:18] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:19] PROBLEM - App Server bits response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:25] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:26] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:26] PROBLEM - App Server bits response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:26] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:26] PROBLEM - App Server bits response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:05:29] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 28677 bytes in 2.801 second response time [21:05:29] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48046 bytes in 1.476 second response time [21:05:29] RECOVERY - App Server bits response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.002 second response time [21:05:29] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199682 (10RobH) [21:05:29] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1179075 (10RobH) OS is installed, but attempting to sign keys afterwards has lead to an issue. I cannot ssh or ping labnodepool1001.eqiad.wmnet from palladium (puppetmaster). I can do so... [21:05:33] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199688 (10RobH) bastion, carbon, gallium... hosts in public IP vlans can ping the host, but nothing in the private vlans... [21:05:42] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit, 7Documentation, 5Patch-For-Review: Document how to tag extensions in git - https://phabricator.wikimedia.org/T94412#1199699 (10Mglaser) With the help of @cicalese, I found the reason it didn't work: you have to use the ssh url instead of the https one. So wh... [21:08:09] RECOVERY - App Server bits response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.002 second response time [21:08:10] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 47856 bytes in 5.518 second response time [21:08:24] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47872 bytes in 0.706 second response time [21:08:34] RECOVERY - App Server bits response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 3895 bytes in 0.006 second response time [21:09:08] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:12:36] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199739 (10RobH) chatted with andrew, this is a known thing, and iron can ssh in. resuming installation [21:13:00] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 48274 bytes in 2.414 second response time [21:14:41] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:14:50] what's going on? [21:15:35] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199756 (10RobH) a:5RobH>3None [21:15:41] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1179075 (10RobH) puppet/salt accepted, system ready for service implementation. [21:18:07] 3Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1199769 (10hashar) 5Open>3stalled Thank you very much @RobH ! Service implementation is pending gaining access to it via T95303 that will be discussed Monday during the Ops meeting. [21:18:15] huh [21:18:45] some massive load on labs there for a second [21:18:45] thcipriani: see -labs, we're good [21:19:22] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47862 bytes in 0.563 second response time [21:20:08] Project beta-update-databases-eqiad build #8825: FAILURE in 8.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/8825/ [21:20:53] Catchable fatal error: Object of class Closure could not be converted to string in /mnt/srv/mediawiki-staging/wmf-config/Wikibase.php on line 184 [21:21:07] from that beta-update-db-eqiad failure [21:34:08] RECOVERY - Puppet failure on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:19] 10Continuous-Integration: job creation permission on jenkins for WMDE-Fisch - https://phabricator.wikimedia.org/T95546#1199914 (10hashar) Originally we only had the `wmf` group which is only for employee / contractors of the wmf. Then LDAP groups `nda` and `wmde` were introduced and in Jenkins they grant the abi... [22:15:17] 10Continuous-Integration, 6Collaboration-Team, 10Flow, 6Mobile-Web, and 2 others: Create Jenkins builds for Editing across repositories (MobileFrontend, VisualEditor etc) - https://phabricator.wikimedia.org/T90647#1199920 (10hashar) We have some utilities to do so. Namely Zuul cloner which let you clone mu... [22:15:28] 10Continuous-Integration, 3Continuous-Integration-Isolation, 7Epic, 3releng-201415-Q3, 3releng-201415-Q4: [Quarterly Success Metric] Jenkins: Run jobs in disposable VMs - https://phabricator.wikimedia.org/T47499#514898 (10hashar) [22:16:13] 10Continuous-Integration, 3Continuous-Integration-Isolation, 7Epic, 3releng-201415-Q3, 3releng-201415-Q4: [Quarterly Success Metric] Jenkins: Run jobs in disposable VMs - https://phabricator.wikimedia.org/T47499#514898 (10hashar) [22:16:15] 10Continuous-Integration, 6Collaboration-Team, 10Flow, 6Mobile-Web, and 2 others: Create Jenkins builds for Editing across repositories (MobileFrontend, VisualEditor etc) - https://phabricator.wikimedia.org/T90647#1199926 (10hashar) [22:22:39] 10Continuous-Integration: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1199941 (10hashar) [22:28:57] 10Continuous-Integration, 3Continuous-Integration-Isolation, 6Labs, 10Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#1199958 (10hashar) Adding @chasemp . We talked about nodepool user/credentials today. The task descri... [22:30:43] 3Continuous-Integration-Isolation, 6Scrum-of-Scrums, 6operations, 7Blocked-on-Operations: Review Jenkins isolation architecture with Antoine - https://phabricator.wikimedia.org/T92324#1199963 (10hashar) We had three meeting already with @chasemp @andrew and @hashar . We are exchanging on a weekly basis.