[00:06:37] 06Release-Engineering-Team, 06Labs, 10wikitech.wikimedia.org: Replace SemanticForms with PageForms - https://phabricator.wikimedia.org/T149866#2766962 (10Reedy) [00:07:53] 06Release-Engineering-Team, 06Labs, 10wikitech.wikimedia.org: Replace SemanticForms with PageForms - https://phabricator.wikimedia.org/T149866#2766981 (10Paladox) [01:03:23] 10Beta-Cluster-Infrastructure, 10Flow, 06Collaboration-Team-Triage (Collab-Team-Q2-Oct-Dec-2016), 07Technical-Debt: Use dedicated $wgFlowCluster and $wgFlowDefaultWikiDb on Beta Cluster - https://phabricator.wikimedia.org/T147523#2767088 (10jmatazzoni) @Catrope, you asked me to remind you to talk to @SBiss... [01:39:22] looks like Zull is stuck? [01:47:01] greg-g ^^ [01:47:07] https://integration.wikimedia.org/zuul/ [01:47:13] 1hr 17 mins [01:47:36] thcipriani ^^ [01:50:39] is that https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Gearman_deadlock ? [01:51:23] mediawiki-extensions-qunit-jessie for CirrusSearch only just started [01:52:22] Krenair it could be. [01:52:49] that's why I asked paladox [01:53:05] Ok [01:53:16] !log disabling and re-enabling gearman, zuul is not working and could be gearman deadlock [01:53:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:53:59] !log followed instructions at https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Gearman_deadlock [01:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:54:11] well that fixed it :) [01:54:43] did it? [01:54:53] Yep [01:54:55] nice [01:55:31] huh, I was right? [01:56:08] Krenair yeh [01:58:26] Krenair: yes, thanks [02:01:33] postmerge and publish pipeline seem to be frozen [02:01:34] on zuul [02:01:42] only the test pipeline is being processed [04:01:01] 10Deployment-Systems, 10scap: Considering adding a --no-touch flag to scap that stops automatic touch of InitialiseSettings.php - https://phabricator.wikimedia.org/T149872#2767271 (10Krenair) [04:01:22] 10Deployment-Systems, 10scap: Considering adding a --no-touch flag to scap that stops automatic touch of InitialiseSettings.php - https://phabricator.wikimedia.org/T149872#2767283 (10Krenair) Krenair: yes -- https://github.com/wikimedia/scap/blob/master/scap/tasks.py#L384-L390 happens at the en... [04:34:53] 03Scap3, 10scap: sync-dir labs config change cached wrong version of InitialSettings.php - https://phabricator.wikimedia.org/T149618#2758066 (10bd808) Using sync-dir to deploy a config change that includes adding new wmg variables is a bad idea. We've been bitten by it over and over. I'm not sure that we have... [04:35:48] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [04:37:46] 10Deployment-Systems, 10scap: Considering adding a --no-touch flag to scap that stops automatic touch of InitialiseSettings.php - https://phabricator.wikimedia.org/T149872#2767339 (10bd808) @Krenair and I realized that the nightly `l10nupdate` job does the touch of InitialiseSettings.php too via `scap sync-l10... [04:37:48] RECOVERY - Puppet staleness on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [3600.0] [04:38:30] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:12:46] 10Deployment-Systems, 10scap: Considering adding a --no-touch flag to scap that stops automatic touch of InitialiseSettings.php - https://phabricator.wikimedia.org/T149872#2767347 (10bd808) [05:13:28] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [05:14:55] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:15:51] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:16:49] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:17:05] PROBLEM - Puppet run on deployment-ms-be01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:17:27] PROBLEM - Puppet run on deployment-urldownloader is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:18:14] PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:18:16] PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:18:38] PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:19:06] PROBLEM - Puppet run on deployment-ms-fe01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:19:18] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:19:30] PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:19:30] PROBLEM - Puppet run on deployment-stream is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [05:20:06] PROBLEM - Puppet run on deployment-mx is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:20:28] PROBLEM - Puppet run on deployment-zookeeper01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [05:20:35] PROBLEM - Puppet run on deployment-pdf01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:21:43] PROBLEM - Puppet run on deployment-redis02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [05:22:02] that's me [05:22:21] PROBLEM - Puppet run on deployment-tmh01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:22:21] PROBLEM - Puppet run on deployment-elastic05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:22:33] PROBLEM - Puppet run on deployment-elastic06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:25:52] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [05:26:48] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [05:31:37] PROBLEM - Puppet run on deployment-db03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:36:01] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [05:41:16] PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:46:37] RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [05:47:11] most of these are some trusty-wide bug from what I can tell [05:47:31] some of them are recovering as I fix due to the stupid puppet.conf-duplicating bug [05:48:34] PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [05:56:17] RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:05:59] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:39] !log fiddled with jenkins jobs in mediawiki-core-doxygen-publish to try to get stuff moving in the postmerge queue again [07:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:55:57] 10Gerrit: Gerrit Internal Server Error when trying to cherry-pick patch from master to master - https://phabricator.wikimedia.org/T149878#2767460 (10Nikerabbit) [08:26:39] 10Gerrit, 07Upstream: Gerrit Internal Server Error when trying to cherry-pick patch from master to master - https://phabricator.wikimedia.org/T149878#2767565 (10Paladox) [08:26:54] 10Gerrit, 07Upstream: Gerrit Internal Server Error when trying to cherry-pick patch from master to master - https://phabricator.wikimedia.org/T149878#2767460 (10Paladox) I have filled this upstream at https://bugs.chromium.org/p/gerrit/issues/detail?id=4871 [08:27:40] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2767568 (10Aklapper) >>! In T149609#2766021, @Zppix wrote: > @hashar As you may already know we are actually doing not that we ar... [09:23:09] 06Release-Engineering-Team (Deployment-Blockers), 05Release: [Bug] TimestampException from line 213 of /srv/mediawiki/php-master/includes/libs/time/ConvertibleTimestamp.php - https://phabricator.wikimedia.org/T149882#2767631 (10Arseny1992) [09:30:06] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-Database, 05Release: [Bug] TimestampException from line 213 of /srv/mediawiki/php-master/includes/libs/time/ConvertibleTimestamp.php - https://phabricator.wikimedia.org/T149882#2767664 (10Arseny1992) [09:51:33] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:51:36] (03PS3) 10Hashar: mw-tools-codesniffer-mwcore-testrun to Jessie Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/291501 (owner: 10Paladox) [09:52:17] (03CR) 10Hashar: "recheck" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/318880 (https://phabricator.wikimedia.org/T149544) (owner: 10Samwilson) [09:54:22] (03CR) 10Hashar: [C: 032] "The job is non voting. I have made this change to switch directly to Jessie and that seems to run just fine." [integration/config] - 10https://gerrit.wikimedia.org/r/291501 (owner: 10Paladox) [09:54:45] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 1633 MB (3% inode=94%) [09:55:27] (03Merged) 10jenkins-bot: mw-tools-codesniffer-mwcore-testrun to Jessie Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/291501 (owner: 10Paladox) [10:06:45] RECOVERY - Disk space on contint1001 is OK: DISK OK [10:28:00] (03CR) 10Hashar: [C: 031] Avoid endless recursion when environments.yml is missing [selenium] - 10https://gerrit.wikimedia.org/r/318305 (https://phabricator.wikimedia.org/T149311) (owner: 10Tobias Gritschacher) [10:50:59] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2767871 (10Paladox) @hashar ok it now can automatically restart the ssh connection if it detects that it was dropped. I.E. if ger... [11:29:52] !log deployment-apertium01 fails puppet du to wrong certificate bah [11:29:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:32:24] !log deployment-apertium01 manually cleared puppet.conf [11:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:36:12] Dang you guys are working your asses off early today [11:36:45] Zppix|mobile: it is almost 1pm here [11:36:46] !! [11:37:03] Its only 6 am [12:02:23] Are you guys ok if i run a quick test on prod ver of grrrit-wm it will take few seconds (and shouldnt leave irc in theory) [12:37:02] (03PS1) 10Hashar: Announce CI maintenance [integration/docroot] - 10https://gerrit.wikimedia.org/r/319557 (https://phabricator.wikimedia.org/T95757) [12:37:48] (03CR) 10Hashar: [C: 032] Announce CI maintenance [integration/docroot] - 10https://gerrit.wikimedia.org/r/319557 (https://phabricator.wikimedia.org/T95757) (owner: 10Hashar) [12:38:26] (03Merged) 10jenkins-bot: Announce CI maintenance [integration/docroot] - 10https://gerrit.wikimedia.org/r/319557 (https://phabricator.wikimedia.org/T95757) (owner: 10Hashar) [13:09:10] 10Continuous-Integration-Config, 10MediaWiki-Releasing, 06Release-Engineering-Team, 13Patch-For-Review: Prepare CI for MediaWiki REL1_28 - https://phabricator.wikimedia.org/T148987#2768157 (10hashar) 05Open>03Resolved [13:17:04] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2768207 (10Aklapper) (Please consider proofreading comments before adding them. I have no idea what "Soniasuing" is.) [13:20:24] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2768213 (10Paladox) Oh sorry [13:23:20] 06Release-Engineering-Team, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2740747 (10Jonas) Not sure if this is related but with cognate extension i get this error when downloading a patch: ``` /extensi... [14:24:24] (03PS1) 10Hashar: puppet doc: wipe .git from source [integration/config] - 10https://gerrit.wikimedia.org/r/319573 [14:33:49] (03PS1) 10Hashar: Remove nightly dirs from .gitignore [integration/docroot] - 10https://gerrit.wikimedia.org/r/319574 [14:33:58] (03CR) 10Hashar: [C: 032] puppet doc: wipe .git from source [integration/config] - 10https://gerrit.wikimedia.org/r/319573 (owner: 10Hashar) [14:34:11] (03CR) 10Hashar: [C: 032] Remove nightly dirs from .gitignore [integration/docroot] - 10https://gerrit.wikimedia.org/r/319574 (owner: 10Hashar) [14:34:47] (03Merged) 10jenkins-bot: Remove nightly dirs from .gitignore [integration/docroot] - 10https://gerrit.wikimedia.org/r/319574 (owner: 10Hashar) [14:35:09] (03Merged) 10jenkins-bot: puppet doc: wipe .git from source [integration/config] - 10https://gerrit.wikimedia.org/r/319573 (owner: 10Hashar) [14:37:11] (03PS1) 10Hashar: Remove nightly directories for mobile apps [integration/docroot] - 10https://gerrit.wikimedia.org/r/319575 [14:37:19] (03CR) 10Hashar: [C: 032] Remove nightly directories for mobile apps [integration/docroot] - 10https://gerrit.wikimedia.org/r/319575 (owner: 10Hashar) [14:37:41] (03Merged) 10jenkins-bot: Remove nightly directories for mobile apps [integration/docroot] - 10https://gerrit.wikimedia.org/r/319575 (owner: 10Hashar) [14:45:22] hey yall, I added an extension change to today's sf morning swat at https://wikitech.wikimedia.org/wiki/Deployments#Week_of_October_31st [14:45:28] is there anything else I need to do to prep? [14:45:50] also, i'd love to test this in beta now, but i'm not sure how/when it would get deployed there [14:46:35] PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:52:54] !log deploying ores 0caa589 in deployment-sca03 [14:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:53:12] PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129) [14:54:30] PROBLEM - Host deployment-conftool is DOWN: CRITICAL - Host Unreachable (10.68.20.30) [14:56:30] ottomata: your change is already deployed in beta, after it merges to master, part of the post-merge will deploy it to beta. (beta-code-update and beta-scap-eqiad should do it) [14:57:07] thcipriani: its an extension though [14:57:11] do I need to update the submodule in core? [14:57:26] not for most extensions (this one included) [14:58:18] if you check out deployment-tin:/srv/mediawiki-staging/php-master/extensions/EventBus you can see your change there [14:59:00] ok cool [14:59:03] awesome thanks thcipriani [14:59:06] so [14:59:11] for SWAT you should cherry-pick to whatever branches you need to include this update on. wmf/1.29.0-wmf.1 in on everything except for wikipedias right now. This should roll to them this afternoon. [14:59:16] https://tools.wmflabs.org/versions/ [14:59:17] 11am sf time, i should be prepared [14:59:29] oh [15:00:00] so if you want it everywhere, just make the two cherry-picks and add them to the SWAT calendar. SWAT deployer will merge. [15:00:25] (two cherry-picks == one to wmf/1.28.0-wmf.23, one to wmf/1.29.0-wmf.1) [15:00:41] 05Gitblit-Deprecate, 10Diffusion: Update all on-wiki references to git.wikimedia.org and replace them with the Phabricator equivalent - https://phabricator.wikimedia.org/T137353#2768519 (10Aklapper) After editing dozens of files, looking at https://www.mediawiki.org/w/index.php?title=Special:Search&limit=500&o... [15:02:51] thcipriani: , so those are tags, can I cherry pick to them? [15:03:05] I do that locally, and then git-review the cherry picks? [15:04:00] you may be able to do this in the gerrit interface via the cherry-pick button [15:04:53] thcipriani: on mw core? [15:05:15] nope, just one the EventBus extension itself. [15:05:17] OH [15:05:26] submodule bumps for core are handled by gerrit automagically [15:05:26] !log deploy 0caa589 in ores to deployment-sca03 [15:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:06:39] thcipriani: why cherry pick and not merge? [15:07:19] as soon as you merge, it'll bump the submodule on mw-core wmf/1.2x.0-wmf.xx (whatever the matching core branch is) [15:07:43] so if someone needs to deploy in an emergency between now and then, they'll end up pulling down unrelated changes on the deploy host [15:07:44] no, i mean, why should i cherry pick this change onto the branch, rather than just merge master into it? [15:07:45] 05Gitblit-Deprecate, 10Diffusion: Redirect git.wikimedia.org HEAD URLs to Diffusion - https://phabricator.wikimedia.org/T141965#2768530 (10Aklapper) **Offtopic here** as this is only about `HEAD`: Dealing with T137353 I've ran into numerous links not redirecting as expected. Want to allow providing feedback so... [15:07:48] and they'll be mad :) [15:08:32] just makes for confusion about what has and hasn't been deployed from tin. [15:09:31] the workflow for swatters is: +2 cherry-pick, that merges to extension wmf branch, and makes a submodule bump to core wmf branch; jump on tin; git fetch; check to make sure you've pulled down what you thought you would; deploy *just that* change; done [15:09:39] ottomata: the most concise answer is that cherry-picks make it easer to see what has deliberately been back-ported. Merging master into an already deployed branch makes the version history murky. [15:10:00] ohhh, I misunderstood the question :) [15:10:44] hm, ok [15:11:13] thcipriani: you do cherry-pick on operations/mediawiki-config changes? [15:11:29] never saw gerrit changes for those cherry-picks [15:11:32] no. only master is deployed there [15:11:36] ^ [15:11:47] no need for cherry-picks [15:11:48] :) [15:14:22] 06Release-Engineering-Team, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2768540 (10Paladox) I think I'm the .gitreview file in the host section Instead of doing host=gerrit.wikimedia.org We Should d... [15:18:57] 06Release-Engineering-Team, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2768543 (10Paladox) >>! In T146293#2768233, @Jonas wrote: > Not sure if this is related but with cognate extension i get this err... [15:25:19] 10Gerrit, 06Project-Admins, 06Repository-Admins, 10Wikidata: Deactivate/archive WikibaseQueryEngine - https://phabricator.wikimedia.org/T141858#2514694 (10Aklapper) Archived the Phab project; please decide what to do with https://phabricator.wikimedia.org/T75267 [15:25:34] 10Gerrit, 06Project-Admins, 06Repository-Admins, 10Wikidata: Deactivate/archive WikibaseQueryEngine - https://phabricator.wikimedia.org/T141858#2768554 (10Aklapper) [15:27:49] 10Gerrit, 06Project-Admins, 06Repository-Admins, 10WikibaseDatabase, 10Wikidata: Deactivate/archive WikibaseDatabase - https://phabricator.wikimedia.org/T141865#2768560 (10Aklapper) Could this get a confirmation from Wikidata folks please (or a link to a statement declaring this obsolete)? @Lydia_Pintsch... [15:30:19] ok, thcipriani, i've cherry picked the extension into those extension branches [15:30:29] now i just wait for 11am in sf and hang around and listen in #ops? [15:30:57] yep :) [15:31:09] also update the deployment calendar with links to the cherry-picks [15:31:53] oh, i put a link to the original gerrit change [15:32:29] thcipriani: Andrew Otto (ottomata) [15:32:30] • https://gerrit.wikimedia.org/r/#/c/319407/ More logging detail for some EventBus errors [15:32:51] do I need the cherry pick links too? [15:32:56] ah, yeah, just link to the cherry-picks instead [15:32:56] (oh, i need to merge the cherry picks) [15:33:06] no don't merge the cherry-picks just yet [15:33:09] swatter will merge [15:34:39] bd808: so one thing that's weird about --no-touch is that it still may change the mtime on wmf-config/InitialiseSettings.php. It will modify the time on a given server to the mtime on tin which will be a change if you've run scap-pull without --no-touch at any point since the last time you ran scap pull --no-touch. Trying to think if that matters at all. [15:34:40] ok [15:35:10] thcipriani: yeah. I'm not sure there is much we can do about that [15:35:56] if InitialiseSettings.php actually changes then we shouldn't be stopping the mtime change for sure [15:36:15] sure [15:36:27] maybe we should only be touching the file on tin? [15:36:39] does that make things less confusing? [15:37:05] the touch on the client is actually extra protection for the weird races that have happened in the past [15:37:17] heh, figures :) [15:37:18] imagine that a MW node is .5s ahead of tin [15:37:42] so it starts a request, caches the data, then gets a changed file that has a lower mtime [15:38:04] the re-touch on the node itself breaks the cache (or should) [15:38:16] so it's comparing the ctime of the cache to the mtime of InitialiseSettings.php? [15:38:32] I guess you don't want it to stat the file when it doesn't have to? [15:38:36] I think so. let's pull up that in wmf-config [15:39:48] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:39:58] thcipriani: this is the relevant part of CommonSettings -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L153-L159 [15:40:31] it compares the mtime of the cached serialized vars with the mtime of InitialiseSettings [15:40:55] hrm, so the server time being off from tin shouldn't matter in that instance, I think... [15:41:11] oh, nevermind, misreading [15:42:09] yeah. [15:42:23] the place where it causes trouble is when a new wmg* has been added to InitialiseSettings and used in CommonSettings. If the local cache doesn't have the wmg* then we get fatals for undef var [15:42:57] there are other "eventually consistent" issues as well with changing values but they aren't as dangerous [15:43:21] It's certainly prevented issues for a while [15:44:05] Before the auto-touch it was really really common to touch + sync InitialiseSettings is anything looked weird [15:44:24] my concern is that --no-touch doesn't change anything unless you're running sync-l10n back-to-back [15:44:28] which of course was voodoo that only folks like Reedy remembered [15:45:06] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #213: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/213/ [15:45:07] Which was a often first fix attempt for the weird and wonderful [15:45:11] heh, but also doesn't work in that weird race for when an mw node is .5s ahead of tin [15:45:20] the only reason other than sync-l10n I can think that it should possibly be used is for a beta cluster only config change sync [15:45:22] Chad had a patch that did a stat of 'wmf-config' but that was bugged up [15:45:51] hrmm, wait, it would work, because the mtime is changing, but it'll still be older than the mtime of the cache file. [15:46:36] thcipriani: so maybe we should not add the option to sync-dir and only to sync-file. on sync-file we could call it `--beta-only-change` to make the intent more clear [15:46:43] 06Release-Engineering-Team, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2740751 (10Addshore) https://github.com/wikimedia/mediawiki-extensions-Cognate/blob/master/.gitreview Note: this doesn't happen... [15:47:24] since the touch is on the local machine clock skew is removed. both mtimes come from the local clock [15:47:37] (except is we remove the touch) [15:47:40] *if [15:48:13] It's all a mess :( MW is edge cases all the way down to the metal [15:48:22] :) [15:49:02] the no-touch paranoia is itself about an HHVM bug [15:49:17] I think what you have will work for the intended case. I was confusing a change in the initialisesettings mtime with the difference between cache and initialisesettings mtime. [15:49:52] like: initalisesettings.php mtime will change, but it will only get older when you do sync-l10n [15:50:23] which I think fixes the root problem https://phabricator.wikimedia.org/T149872 [15:51:04] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:51:07] the flag will likely never get used by deployers unless they remember what weird magic it holds. maybe --beta-only-change would be a good name? [15:51:50] *nod* I'd be cool with that [15:51:57] yeah, hhvm stuff is going to be tricky if we want to do a depool/restart type-thing. I'm not sure what kind of mess that will create. [15:52:17] thcipriani: actually that will fix it all! :) [15:52:47] the problem is that HHVM has some cache spaces for bytecode that don't always purge properly and can fill up [15:52:52] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #213: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/213/ [15:53:00] FB never sees this because they restart on each deploy [15:53:19] yeah, but there is some JIT optimization that makes things super slow for a bit. [15:53:20] we didn't see it for quite a while because HHVM would crash for other reasons and get restarted [15:53:26] Can i restart prod grrrit-wm to verify its up to date and works properly [15:53:27] 11 requests [15:53:33] like when we deploy a new version to testwiki [15:53:36] oh, is that all? [15:53:39] yeah [15:53:46] Zppix|mobile it's already up to date [15:53:52] it takes 11 passes to fully warm the JIT [15:54:36] hrm, well, that makes the choice of how to rollout a little more obvious [15:54:48] we do inch yet closer to a more modern deployment: https://phabricator.wikimedia.org/D429 [15:54:59] depool, drain, restart, prime, repool, profit! [15:55:05] ^ patch is a lot of fun [15:55:38] I looked over an early version of it. Seemed to be headed in a good direction [15:55:44] one interesting side-effect of flattening the repo is that git history becomes deployment history [15:56:39] lots of fun stats to be had as a result. [15:56:52] plus a very accurate history of who deployed what when. [16:02:36] We don't have any nice ways of using wikitags, without something in IS, do we? [16:03:34] PROBLEM - jenkins_zmq_publisher on gallium is CRITICAL: Connection refused [16:03:54] PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [16:04:04] PROBLEM - zuul_gearman_service on gallium is CRITICAL: Connection refused [16:04:14] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [16:05:12] Gonn have to create something like "wmgElevatedPasswordsForAll" or something [16:06:40] ACKNOWLEDGEMENT - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war daniel_zahn T95757 [16:06:40] ACKNOWLEDGEMENT - jenkins_zmq_publisher on gallium is CRITICAL: Connection refused daniel_zahn T95757 [16:06:41] ACKNOWLEDGEMENT - zuul_gearman_service on gallium is CRITICAL: Connection refused daniel_zahn T95757 [16:06:42] ACKNOWLEDGEMENT - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server daniel_zahn T95757 [16:08:14] 06Release-Engineering-Team, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2768697 (10Paladox) >>! In T146293#2768631, @Addshore wrote: > https://github.com/wikimedia/mediawiki-extensions-Cognate/blob/mas... [16:10:19] REMINDER: CI is now in maintenance: https://lists.wikimedia.org/pipermail/wikitech-l/2016-October/086882.html [16:10:22] thcipriani: this ok then? [16:10:23] Andrew Otto (ottomata) [16:10:24] • https://gerrit.wikimedia.org/r/#/c/319407/ More logging detail for some EventBus errors (cherry-picks: https://gerrit.wikimedia.org/r/#/c/319586/, https://gerrit.wikimedia.org/r/#/c/319587/) [16:12:44] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:14:25] (03PS7) 10Hashar: gallium is replaced by contint1001.eqiad.wmnet [integration/config] - 10https://gerrit.wikimedia.org/r/293300 (https://phabricator.wikimedia.org/T137293) [16:20:44] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:22:10] ^ not for long [16:25:44] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:29:15] :) [16:30:01] PROBLEM - zuul_gearman_service on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 4730: Connection refused [16:30:11] PROBLEM - zuul_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [16:30:41] RECOVERY - jenkins_service_running on contint1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [16:31:41] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:38:41] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:50:01] RECOVERY - zuul_gearman_service on contint1001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 4730 [16:50:11] RECOVERY - zuul_service_running on contint1001 is OK: PROCS OK: 2 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-server [16:51:41] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [16:52:53] 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 13Patch-For-Review, 07Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2768838 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/319604/ [17:07:42] PROBLEM - jenkins_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [17:09:42] RECOVERY - jenkins_service_running on contint1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [17:13:17] Project UploadWizard-api-commons.wikimedia.beta.wmflabs.org build #4868: 04FAILURE in 4.8 sec: https://integration.wikimedia.org/ci/job/UploadWizard-api-commons.wikimedia.beta.wmflabs.org/4868/ [17:14:02] RECOVERY - jenkins_zmq_publisher on contint1001 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 8888 [17:18:22] PROBLEM - Puppet staleness on deployment-kafka04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [17:23:51] (03CR) 10Legoktm: [C: 04-1] "Tests that verify upstream sniffs should go in the generic_* files, not in our custom sniff tests." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/318880 (https://phabricator.wikimedia.org/T149544) (owner: 10Samwilson) [17:35:13] 10Continuous-Integration-Infrastructure: Clear /srv/.git on contint1001 - https://phabricator.wikimedia.org/T149924#2769016 (10hashar) [17:44:28] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.3 deployment blockers - https://phabricator.wikimedia.org/T149927#2769078 (10greg) [17:44:49] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T149338#2749149 (10greg) [17:45:06] 10Gerrit, 06Project-Admins, 06Repository-Admins, 10WikibaseDatabase, 10Wikidata: Deactivate/archive WikibaseDatabase - https://phabricator.wikimedia.org/T141865#2769107 (10Lydia_Pintscher) Yes ok from my side. I closed the two tickets. [17:45:29] puppet should begin to recover across trusty nodes [17:46:39] 10Gerrit, 06Project-Admins, 06Repository-Admins, 10Wikidata: Deactivate/archive WikibaseQueryEngine - https://phabricator.wikimedia.org/T141858#2769112 (10Lydia_Pintscher) Closed the ticket. [17:48:53] 10Continuous-Integration-Infrastructure: contint: move .htacess file for doc.wm into regular Apache config - https://phabricator.wikimedia.org/T149928#2769113 (10Dzahn) [17:51:54] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/293300 (https://phabricator.wikimedia.org/T137293) (owner: 10Hashar) [17:52:27] (03CR) 10Hashar: "INFO:jenkins_jobs.builder:Number of jobs generated: 1" [integration/config] - 10https://gerrit.wikimedia.org/r/293300 (https://phabricator.wikimedia.org/T137293) (owner: 10Hashar) [17:53:17] RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [17:53:17] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:17] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:29] RECOVERY - Puppet run on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:29] RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:45] RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:53] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [17:55:29] RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:55:38] RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:56:42] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:18] RECOVERY - Puppet run on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:20] RECOVERY - Puppet run on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:27] RECOVERY - Puppet run on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:33] RECOVERY - Puppet run on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [17:57:41] 10Beta-Cluster-Infrastructure, 06Labs: Move deployment-prep to role::puppetmaster::standalone - https://phabricator.wikimedia.org/T149620#2769165 (10AlexMonk-WMF) [17:58:27] 10Continuous-Integration-Config, 06Reading-Web-Backlog, 07Technical-Debt (RW-Tech-Debt): Popups extension should run browser tests on merge - https://phabricator.wikimedia.org/T149931#2769166 (10jhobs) [18:06:45] hashar can we move some more jobs of precise?, like the ones that test zuul for integration? [18:06:56] Since we no longer need to test it on precise :) [18:07:05] (03PS1) 10Hashar: Revert "Remove nightly dirs from .gitignore" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319628 [18:07:28] (03Abandoned) 10Hashar: Revert "Remove nightly dirs from .gitignore" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319628 (owner: 10Hashar) [18:07:42] (03PS1) 10Hashar: Revert "Announce CI maintenance" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319629 [18:07:46] (03PS2) 10Hashar: Revert "Announce CI maintenance" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319629 [18:08:25] (03CR) 10Hashar: [C: 032] Revert "Announce CI maintenance" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319629 (owner: 10Hashar) [18:08:41] * paladox creates the patch :) [18:09:23] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2740552 (10mmodell) [18:09:53] (03Merged) 10jenkins-bot: Revert "Announce CI maintenance" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319629 (owner: 10Hashar) [18:10:05] (03PS1) 10Paladox: Migrate test integration-zuul-* from Ubuntu Precise to Debian jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319630 [18:10:07] hashar ^^ [18:10:48] (03PS2) 10Paladox: Migrate test integration-zuul-* from Ubuntu Precise to Debian jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319630 [18:12:17] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2769205 (10mmodell) [18:13:21] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2740552 (10mmodell) [18:15:24] PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:15:28] PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:17:31] (03PS1) 10Hashar: Fix up contint1001 oddities [integration/config] - 10https://gerrit.wikimedia.org/r/319634 [18:17:49] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:18:51] (03CR) 10Hashar: [C: 032] Fix up contint1001 oddities [integration/config] - 10https://gerrit.wikimedia.org/r/319634 (owner: 10Hashar) [18:20:36] (03Merged) 10jenkins-bot: Fix up contint1001 oddities [integration/config] - 10https://gerrit.wikimedia.org/r/319634 (owner: 10Hashar) [18:21:04] https://integration.wikimedia.org/ci/job/publish-on-contint1001/1/console !! [18:21:07] fixdd [18:22:07] PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:32:03] RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:39] RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:34:03] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:34:51] Anyone in release engineering have a moment to troubleshoot an odd phabricator issue? I've created a sub task off of https://phabricator.wikimedia.org/T149726 [18:34:57] and it says i cannot view it, even though i just made it [18:35:05] and owners should always be able to view tasks they create [18:35:28] RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:18] robh: i can view T149938 [18:37:25] PROBLEM - Host deployment-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.63) [18:37:47] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:38:23] RECOVERY - Puppet staleness on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [3600.0] [18:38:34] mutante: can you set the view and edit of task specifically to include me? [18:38:36] for some reason its not [18:38:42] its listing a bunch of cables right? [18:38:54] 10Beta-Cluster-Infrastructure, 06Labs: Move deployment-prep to role::puppetmaster::standalone - https://phabricator.wikimedia.org/T149620#2769393 (10AlexMonk-WMF) 05Open>03Resolved I shut down the old instance, will probably delete it in a week or two [18:39:23] mutante: actually, lemme pm you [18:40:23] RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [18:42:10] RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:43:35] RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [19:13:00] thcipriani: we just merged another eventbus logging patch [19:13:06] looking in beta, its not there yet [19:13:15] do I run beta-code-update stuff manually...or just wait longer? [19:14:04] ottomata: beta-code-update-eqiad is running now [19:14:07] ah ok [19:14:12] i'm just impatient then, thanks :) [19:14:13] https://integration.wikimedia.org/ci/view/Beta/job/beta-code-update-eqiad/128567/console [19:14:15] :) [19:14:18] no worries [19:18:29] PROBLEM - Free space - all mounts on integration-slave-jessie-android is CRITICAL: CRITICAL: integration.integration-slave-jessie-android.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-android.diskspace._srv.byte_percentfree (<33.33%) [19:18:38] 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 13Patch-For-Review, 07Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2769568 (10Dzahn) a:03Dzahn [19:19:38] thcipriani: sorry, am doing a bit of thumb twiddling, so i'm impatient, it looks like https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ finished...but i don't see my extension updated on tin [19:21:06] hrm. [19:24:59] thcipriani: i see it now though [19:25:05] on tin [19:25:09] ottomata: I went ahead and manually updated [19:25:15] ah ok [19:25:18] looks like a deploy is in process now [19:25:45] after that's done I can manually deploy to be sure it's out there. [19:26:37] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2769583 (10mmodell) [19:26:39] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2740552 (10mmodell) [19:26:46] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2740552 (10mmodell) [19:27:00] I think it should be fine to run these jobs manually as long as we aren't hammering them. Jenkins should be able to figure it out. [19:29:36] ottomata: should definitely be deployed now [19:29:54] k thanks [19:33:29] RECOVERY - Free space - all mounts on integration-slave-jessie-android is OK: OK: integration.integration-slave-jessie-android.diskspace._mnt.byte_percentfree (No valid datapoints found) [19:44:39] (03PS1) 10Hashar: integration-docroot-deploy fix jenkins label [integration/config] - 10https://gerrit.wikimedia.org/r/319657 [19:45:10] (03CR) 10Hashar: [C: 032] integration-docroot-deploy fix jenkins label [integration/config] - 10https://gerrit.wikimedia.org/r/319657 (owner: 10Hashar) [19:45:40] thcipriani: the maintenance announce is finally gone from https://integration.wikimedia.org/zuul/ ! [19:46:40] 10Continuous-Integration-Infrastructure, 07Technical-Debt: Clear /srv/.git on contint1001 - https://phabricator.wikimedia.org/T149924#2769630 (10hashar) [19:47:38] 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 13Patch-For-Review, 07Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2769632 (10hashar) The services have been migrated to contint1001 successfully a couple hours ago. [19:48:40] (03Merged) 10jenkins-bot: integration-docroot-deploy fix jenkins label [integration/config] - 10https://gerrit.wikimedia.org/r/319657 (owner: 10Hashar) [19:50:17] ok, thcipriani i want to do this next patch for the evening swat, do I cherry pick to the same branches I did before? [19:50:47] hopefully by evening swat we'll be all on wmf.1 [19:51:00] but probably wouldn't hurt to cherry pick to wmf.23 just in case [19:51:04] ok [20:09:11] (03PS1) 10Hashar: Stop generating puppet doc on each change [integration/config] - 10https://gerrit.wikimedia.org/r/319665 (https://phabricator.wikimedia.org/T143233) [20:13:13] (03PS2) 10Hashar: Stop generating puppet doc on each change [integration/config] - 10https://gerrit.wikimedia.org/r/319665 (https://phabricator.wikimedia.org/T143233) [20:14:54] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:14:59] (03CR) 10Hashar: [C: 032] Stop generating puppet doc on each change [integration/config] - 10https://gerrit.wikimedia.org/r/319665 (https://phabricator.wikimedia.org/T143233) (owner: 10Hashar) [20:15:48] (03Merged) 10jenkins-bot: Stop generating puppet doc on each change [integration/config] - 10https://gerrit.wikimedia.org/r/319665 (https://phabricator.wikimedia.org/T143233) (owner: 10Hashar) [20:21:14] (03PS1) 10Hashar: operations-puppet-doc now polls scm repo [integration/config] - 10https://gerrit.wikimedia.org/r/319666 [20:21:28] (03CR) 10Hashar: [C: 032] operations-puppet-doc now polls scm repo [integration/config] - 10https://gerrit.wikimedia.org/r/319666 (owner: 10Hashar) [20:22:17] (03Merged) 10jenkins-bot: operations-puppet-doc now polls scm repo [integration/config] - 10https://gerrit.wikimedia.org/r/319666 (owner: 10Hashar) [20:29:26] (03PS1) 10Legoktm: Whitelist Eloquence [integration/config] - 10https://gerrit.wikimedia.org/r/319669 [20:39:32] (03CR) 10Hashar: [C: 032] Whitelist Eloquence [integration/config] - 10https://gerrit.wikimedia.org/r/319669 (owner: 10Legoktm) [20:39:49] legoktm: migration to contint1001 is complete :] [20:39:56] the fabric file is uptodate [20:40:14] (03Merged) 10jenkins-bot: Whitelist Eloquence [integration/config] - 10https://gerrit.wikimedia.org/r/319669 (owner: 10Legoktm) [20:40:26] ooh, yay! [20:40:30] so gallium is dead? [20:40:34] not yet [20:40:39] keeping it around for abit [20:40:47] but essentially yeah, we no more rely on gallium [20:41:51] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #199: 09SUCCESS in 50 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/199/ [20:41:59] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #199: 09SUCCESS in 59 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/199/ [20:49:54] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:26] (03PS1) 10Legoktm: Revert "doc: Hide link to OOUI PHP demos because they're broken" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319673 (https://phabricator.wikimedia.org/T127809) [20:53:33] (03CR) 10Legoktm: [C: 032] Revert "doc: Hide link to OOUI PHP demos because they're broken" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319673 (https://phabricator.wikimedia.org/T127809) (owner: 10Legoktm) [20:53:38] hashar: ^ :D [20:54:01] legoktm: ah yeah!!! [20:54:04] finally :] [20:54:27] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team: doc.wikimedia.org should be running PHP 5.5+, not 5.3 -> demos etc. don't work - https://phabricator.wikimedia.org/T127504#2769819 (10Legoktm) 05Open>03Resolved a:03hashar This is fixed now. [20:54:33] (03PS3) 10Hashar: Migrate test integration-zuul-* from Ubuntu Precise to Debian jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319630 (owner: 10Paladox) [20:54:40] (03CR) 10Hashar: [C: 032] Migrate test integration-zuul-* from Ubuntu Precise to Debian jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319630 (owner: 10Paladox) [20:55:46] (03Merged) 10jenkins-bot: Migrate test integration-zuul-* from Ubuntu Precise to Debian jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319630 (owner: 10Paladox) [20:55:57] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/319630 (owner: 10Paladox) [20:56:51] (03PS2) 10Legoktm: Revert "doc: Hide link to OOUI PHP demos because they're broken" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319673 (https://phabricator.wikimedia.org/T127809) [20:57:01] (03CR) 10Legoktm: [C: 032] Revert "doc: Hide link to OOUI PHP demos because they're broken" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319673 (https://phabricator.wikimedia.org/T127809) (owner: 10Legoktm) [20:57:30] (03Merged) 10jenkins-bot: Revert "doc: Hide link to OOUI PHP demos because they're broken" [integration/docroot] - 10https://gerrit.wikimedia.org/r/319673 (https://phabricator.wikimedia.org/T127809) (owner: 10Legoktm) [20:58:15] hashar thanks [21:05:21] 10Continuous-Integration-Config, 05Continuous-Integration-Scaling, 10releng-201516-q3, 07WorkType-NewFunctionality: [keyresult] Migrate php (Zend and HHVM) CI jobs to Nodepool - https://phabricator.wikimedia.org/T119139#2769879 (10hashar) [21:05:24] 10Continuous-Integration-Infrastructure: Investigate installing php5.3 on trusty and/or debian instance - https://phabricator.wikimedia.org/T103786#2769877 (10hashar) 05Open>03declined We will wait until the end of life of MediaWiki 1.23 which is May 2017. [21:08:49] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for contintcloud labs project - https://phabricator.wikimedia.org/T142877#2769904 (10hashar) [21:08:52] 05Continuous-Integration-Scaling, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Bump quota of Nodepool instances (contintcloud tenant) - https://phabricator.wikimedia.org/T133911#2769905 (10hashar) [21:08:54] 10Continuous-Integration-Infrastructure, 10Monitoring: Alert when Zuul/Gearman queue is stalled - https://phabricator.wikimedia.org/T70113#2769906 (10hashar) [21:08:58] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 06Release-Engineering-Team: Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI - https://phabricator.wikimedia.org/T139771#2769901 (10hashar) 05Open>03Resolved a:03ha... [21:10:40] 10Continuous-Integration-Infrastructure: Update jobs to use zuul-cloner with git cache - https://phabricator.wikimedia.org/T97098#2769913 (10hashar) 05Open>03Resolved a:03hashar The macros are using zuul-cloner cache dir or the git plugin reference when relevant. [21:10:42] 10Continuous-Integration-Infrastructure, 07WorkType-NewFunctionality: Jenkins jobs must wipe workspace - https://phabricator.wikimedia.org/T96627#2769916 (10hashar) [21:11:38] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2769922 (10hashar) p:05Normal>03Low [21:12:25] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:12:52] 10Continuous-Integration-Infrastructure, 06Operations, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2361186 (10hashar) [21:19:28] 10Continuous-Integration-Infrastructure, 06Operations, 07Nodepool, 13Patch-For-Review: Clean up apt:pin of python modules used for Nodepool - https://phabricator.wikimedia.org/T137217#2769982 (10hashar) I have poked the ops-l internal mailling list to get this scheduled. [21:39:25] 10Continuous-Integration-Config, 06Operations, 07Puppet, 07Upstream: post build failures for operations/puppet on operations-puppet-doc - https://phabricator.wikimedia.org/T143233#2770145 (10hashar) The [[ https://integration.wikimedia.org/ci/job/operations-puppet-doc/ | operations-puppet-doc ]] no more tr... [21:55:33] Project selenium-Wikibase » chrome,test,Linux,contintLabsSlave && UbuntuTrusty build #162: 09SUCCESS in 1 hr 51 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/162/ [21:58:22] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #207: 09SUCCESS in 6 min 21 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/207/ [21:58:57] Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #197: 09SUCCESS in 56 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/197/ [21:59:04] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #197: 09SUCCESS in 1 min 4 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/197/ [22:06:00] Project selenium-CentralNotice » chrome,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 38 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:06:01] Project selenium-CentralNotice » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 38 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:06:07] Project selenium-CentralNotice » firefox,beta,Windows 7,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 45 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Windows%207,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:06:11] Project selenium-CentralNotice » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 49 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:06:23] Project selenium-CentralNotice » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 1 min 1 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:06:28] Project selenium-GettingStarted » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #197: 09SUCCESS in 1 min 0 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/197/ [22:06:38] Project selenium-Math » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #196: 09SUCCESS in 1 min 7 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/196/ [22:06:47] Project selenium-CirrusSearch » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #206: 04FAILURE in 1 min 21 sec: https://integration.wikimedia.org/ci/job/selenium-CirrusSearch/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/206/ [22:06:50] Project selenium-Math » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #196: 09SUCCESS in 1 min 20 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/196/ [22:07:01] Project selenium-Flow » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 09SUCCESS in 1 min 35 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/ [22:07:03] Project selenium-CentralNotice » firefox,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 1 min 41 sec: https://integration.wikimedia.org/ci/job/selenium-CentralNotice/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:07:16] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #198: 09SUCCESS in 1 min 38 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/198/ [22:07:35] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #198: 09SUCCESS in 1 min 56 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/198/ [22:07:43] Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 2 min 16 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/ [22:08:18] * hashar has mass clicked on all selenium jobs [22:08:27] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #196: 09SUCCESS in 2 min 50 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/196/ [22:09:37] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #200: 09SUCCESS in 3 min 59 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:09:49] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,contintLabsSlave && UbuntuTrusty build #192: 09SUCCESS in 4 min 7 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/192/ [22:10:25] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #199: 04FAILURE in 5 min 6 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/199/ [22:11:02] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #206: 09SUCCESS in 5 min 22 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/206/ [22:16:17] Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #162: 04FAILURE in 2 hr 12 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/162/ [22:18:06] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #192: 09SUCCESS in 12 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/192/ [22:21:41] Yippee, build fixed! [22:21:41] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #200: 09FIXED in 1 min 40 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/200/ [22:29:22] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #192: 09SUCCESS in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/192/ [22:29:39] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #214: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/214/ [22:32:03] Project selenium-MultimediaViewer » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #192: 09SUCCESS in 26 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/192/ [22:32:53] ohai browser tests that don't know of their own history [22:37:28] yeah we wiped it [22:37:34] it is all fresh! [22:37:48] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #214: 04FAILURE in 32 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/214/ [22:42:12] so fresh so clean [22:42:58] * Reedy has made various reverts for REL1_28 [22:44:14] :(( [22:45:45] Yippee, build fixed! [22:45:46] Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #195: 09FIXED in 1 min 8 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/195/ [23:07:00] what is "dib" [23:07:27] in integration/config [23:08:02] and "Puppet manifests used by wikimedia dib element" [23:09:11] aaaah.. disk image builder .. from openstack.. oh [23:09:20] ok, nevermind [23:09:39] heh it sounded like an image format [23:13:15] Dib? You mean dab? [23:14:33] no, actually dib [23:43:14] mutante dib is to build the nodepool images [23:43:36] that makes sense now:) yep [23:43:42] yep :) [23:44:00] i was just looking at a change in hashars queue [23:44:07] oh [23:44:45] this https://gerrit.wikimedia.org/r/#/c/318279/ .. it's just installing doxygen [23:44:53] but i didnt know what "dib" is [23:45:08] Oh [23:45:21] yeh he is installing doxygen onto the nodepool images [23:45:24] trusty or jessie [23:45:51] since it isen't like an instance, we have to add the packages we want in nodepool so they are not there automatically. [23:47:16] (03CR) 10Dzahn: [C: 031] [integration/config] - 10https://gerrit.wikimedia.org/r/318279 (owner: 10Hashar) [23:48:01] yes, and it's harmless. i was about to merge, but different project where i dont have +2 [23:48:08] all good [23:48:18] Oh :)