[00:37:04] (03PS1) 10Mattflaschen: Notify Collaboration team of failing browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/201083 [00:40:35] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:09] ^ works fine too. [00:41:13] not sure why it’s complaining... [00:44:00] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48367 bytes in 0.575 second response time [00:55:10] !log Jenkins stuck. Builds are queued in Zuul but nothing is sent to Jenkins. [00:55:14] Logged the message, Master [01:00:08] !log Force restarted Zuul, didn't help [01:00:12] Logged the message, Master [01:00:16] !log Jenkins is unable to start Gearman connection (HTTP 503); [01:00:20] Logged the message, Master [01:00:35] !log Restarting Jenkins [01:00:39] Logged the message, Master [01:01:19] (03PS6) 10Krinkle: Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [01:01:23] (03CR) 10jenkins-bot: [V: 04-1] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [01:13:46] (03CR) 10Legoktm: [C: 032] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [01:13:48] (03CR) 10jenkins-bot: [V: 04-1] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [01:13:55] oh [01:22:10] What's taking so long... [01:30:17] YuviPanda: I just (10m ago) got an email for "integration-slave1002/Free space - all mounts is WARNING" but shinken-wm didn't say anything in here ? [01:32:24] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [01:32:41] legoktm: uh, I’m not sure... [01:32:46] legoktm: file a bug? I’ll look at it later? [01:32:53] PROBLEM - zuul_gearman_service on gallium is CRITICAL: Connection refused [01:32:58] ok [01:33:06] umm Krinkle ^ ? [01:35:15] !log started zuul on gallium [01:35:20] Logged the message, Master [01:35:43] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [01:36:13] RECOVERY - zuul_gearman_service on gallium is OK: TCP OK - 0.000 second response time on port 4730 [01:36:48] legoktm: init.d/zuul does not warn against starting an already started service [01:36:53] I started it a minute earlier I think [01:36:56] sstopped and restarted [01:37:09] Krinkle: when I checked init.d/zuul status it said "stopped" [01:37:17] well, it said not running [01:37:23] Must've been a second before [01:37:29] :/ [01:37:36] are there two zuul-servers running now? [01:37:38] When I did ps aux | grep zuu there were two of them both started the same minute [01:37:42] Not anymore [01:37:53] Krinkle: sstopped and restarted [01:38:05] ok [01:47:02] legoktm: I'm stopping Zuul again because there's no point in bothering people with useless NOT_REGISTERED errors which it currently does [01:47:08] Zuul should handle this better, but it doesn't [01:50:34] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [01:51:03] PROBLEM - zuul_gearman_service on gallium is CRITICAL: Connection refused [02:01:13] legoktm: weird stuff [02:01:23] It's been an hour now [02:01:27] I'm killing it manually [02:05:18] !log Restarting Jenkins again.. [02:05:23] Logged the message, Master [02:06:02] RECOVERY - zuul_gearman_service on gallium is OK: TCP OK - 0.000 second response time on port 4730 [02:06:25] RECOVERY - Puppet staleness on deployment-eventlogging02 is OK: OK: Less than 1.00% above the threshold [3600.0] [02:07:12] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [02:24:42] Yippee, build fixed! [02:24:42] Project browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce build #53: FIXED in 2 min 41 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce/53/ [02:41:37] (03CR) 10Krinkle: [C: 032] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [02:41:39] (03CR) 10jenkins-bot: [V: 04-1] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [02:43:04] (03CR) 10Krinkle: [C: 032] Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [02:47:46] (03Merged) 10jenkins-bot: Switch wikidata qunit jobs from qunit to qunit-karma [integration/config] - 10https://gerrit.wikimedia.org/r/198699 (https://phabricator.wikimedia.org/T94393) (owner: 10Adrian Lang) [02:58:47] (03PS2) 10Krinkle: Move extensions to use generic jshint & jsonlint jobs part 3 [integration/config] - 10https://gerrit.wikimedia.org/r/200744 (owner: 10Legoktm) [03:19:42] (03PS1) 10Krinkle: tests: Remove mention of phpcs-lenient/phpcs-strict [integration/config] - 10https://gerrit.wikimedia.org/r/201094 [03:21:50] RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:26:48] (03PS1) 10Krinkle: Remove duplicate phpcs-HEAD config [integration/config] - 10https://gerrit.wikimedia.org/r/201096 [03:32:34] (03PS1) 10Krinkle: Make phpcs voting by default (inverse existing opt-in/opt-out) [integration/config] - 10https://gerrit.wikimedia.org/r/201097 [03:32:49] (03CR) 10Krinkle: [C: 032] tests: Remove mention of phpcs-lenient/phpcs-strict [integration/config] - 10https://gerrit.wikimedia.org/r/201094 (owner: 10Krinkle) [03:33:17] (03CR) 10Krinkle: [C: 032] "No-op." [integration/config] - 10https://gerrit.wikimedia.org/r/201096 (owner: 10Krinkle) [03:34:08] (03Merged) 10jenkins-bot: tests: Remove mention of phpcs-lenient/phpcs-strict [integration/config] - 10https://gerrit.wikimedia.org/r/201094 (owner: 10Krinkle) [03:34:36] (03Merged) 10jenkins-bot: Remove duplicate phpcs-HEAD config [integration/config] - 10https://gerrit.wikimedia.org/r/201096 (owner: 10Krinkle) [03:38:55] (03CR) 10Mattflaschen: [C: 031] "Thanks for the update, Antoine. I don't have any way to test the Jenkins side of this, but your explanation and change make sense." [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) (owner: 10Mattflaschen) [03:39:01] (03PS6) 10Mattflaschen: Pass MEDIAWIKI_CAPTCHA_BYPASS_PASSWORD to GettingStarted browser test [integration/config] - 10https://gerrit.wikimedia.org/r/194749 (https://phabricator.wikimedia.org/T91220) [03:51:26] (03PS2) 10Krinkle: Make phpcs voting by default (inverse existing opt-in/opt-out) [integration/config] - 10https://gerrit.wikimedia.org/r/201097 [04:03:25] (03CR) 10Krinkle: [C: 032] "No-op." [integration/config] - 10https://gerrit.wikimedia.org/r/201097 (owner: 10Krinkle) [04:04:40] (03Merged) 10jenkins-bot: Make phpcs voting by default (inverse existing opt-in/opt-out) [integration/config] - 10https://gerrit.wikimedia.org/r/201097 (owner: 10Krinkle) [05:42:20] !log Free up space on integration-slave1001-1004 by removing obsolete phplint and qunit workspaces [05:42:26] Logged the message, Master [05:42:38] oh, I should run my script [05:43:14] legoktm: btw, do you get shinken problem e-mails? [05:43:17] for integration labs? [05:43:20] yes [05:51:43] !log deleting non-existent job workspaces from integration slaves [05:51:47] Logged the message, Master [05:57:07] legoktm: script? [05:57:48] for https://phabricator.wikimedia.org/T94408 [05:58:06] the script is just http://paste.fedoraproject.org/205741/78678751/ [05:59:56] Krinkle: do you still need krinkle-mwcore-mysql-qunit ? [06:00:06] no [06:01:03] legoktm: Hm.. interesting [06:01:10] It doesn't handle jobs that moved from precise to trusty though [06:01:49] legoktm: WTFPL? [06:01:59] Can you put it on a Phab paste? [06:02:03] sure [06:02:44] Oh nice. Chrome no longer fucks with urls containing foreign characters [06:02:49] https://sa.wikipedia.org/w/index.php?title=विकिपीडिया:विचारमण्डपम्_(विविधविषयाः)&oldid=292255 [06:03:02] This is both the way the url is displayed and the way it is copied to the clipboard [06:03:12] (Chrome 43 canary) [06:03:41] especially the copy part is notable as it used to escape it on the way out, and only render it pretty in the UI [06:03:51] https://phabricator.wikimedia.org/P465 [06:04:28] jobs.txt is just "ls output" (from jjb) [06:05:29] legoktm: OK [06:05:46] legoktm: Note that the short term solution for gc is that we re-create slaves once a month [06:05:59] medium term is: There is no persistent workspaces in isolated one-off VMs [06:06:19] Which I'm doing for a 101 other reasons but that makes it 102 :) [06:06:23] Which, btw, is today. [06:06:53] is the creation of new slaves automated? [06:09:48] (it can be now) [06:13:18] YuviPanda: Tell me more? [06:13:45] legoktm: https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup [06:13:45] Krinkle: integration.yaml in nodes/labs/ in operations/puppet.git. [06:13:47] Krinkle: https://github.com/wikimedia/operations-puppet/blob/production/nodes/labs/staging.yaml for staging [06:14:00] YuviPanda: Look at the Setup list [06:14:01] Krinkle: define regexes, and roles to be applied to them [06:14:06] oooh [06:14:07] looking [06:14:11] afaik nothing can be automated further other than the initial role class [06:14:17] puppet sign? [06:14:45] It's important that it does not apply the ci-slave role from the default puppet master first [06:14:52] that may not be forward compatible with ours [06:15:03] plus all the patches [06:15:07] Krinkle: hmm, thinking. [06:15:14] Krinkle: well, ‘solution’ to that is to not have too many cherry-picks. [06:15:39] Krinkle: but I am pretty sure that as of now at least, that the roles won’t be applied until puppetmaster is changed [06:15:53] but I consider that a ‘bug’ and it’ll hopefully be fixed in a few weeks... [06:16:08] I can use dsh-ci-slaves for the patches [06:16:16] (a shell script I wrote that does what it sounds like) [06:16:29] right [06:16:33] so autosigning is already a thing [06:16:39] I think it’s even enabled right now? [06:16:43] I wrote some docs for that. looking [06:17:00] cool, that'd be nice [06:17:06] I'm checking cherry-picks now [06:17:23] don't use 'ndots: 2' in labs resolv.conf [06:17:23] Change-Id: I05c49e5248cb4a2cd0033899d3d9a71201480eb1 [06:17:29] and [06:17:35] contint: keep 180 min of puppet reports [06:17:36] Change-Id: I30e5bfeac398e0f88e538c75554439fe82fcc1cf [06:17:40] only those 2 at the moment [06:18:07] Krinkle: https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Set_up_project-wide_puppetmaster [06:18:14] Krinkle: the second one is a noop. [06:18:29] I think hashar pointed that out on the patch itself... [06:18:38] unless he then changed that bit too... [06:18:51] YuviPanda: No, setting it on the wiki was a no-op afaik [06:18:54] Krinkle: anyway, if you add the puppetmaster::autosigner class it will autosign thing. [06:19:40] YuviPanda: What's your availability? Would be cool to now or tomorrow have you nearby while I re-create this the "right" way and see how smooth it goes. It should not take long to re-create 6 instances from scratch [06:19:42] Krinkle: well, the cron runs only once every 8 hours. [06:20:00] Krinkle: not sure. am getting my Social Security Number tomorrow (Have moved to SF now) [06:20:11] Last three times I got blocked and then had to keep them in a frankenstein state for several days, running out to almost two weeks each time. [06:22:01] What does "role::puppet::self::enc": yaml+ldap do? Isn't that redundant with the wiki page? Or do we put stuff in the repo as well? [06:23:03] I'd rather put it in one place. Curious why it is split [06:25:33] Krinkle: yeah, there’s a patch in place to get rid of that split [06:26:06] YuviPanda: Is there documentation on the yaml Hiera page format? E.g. is "classes:" the same as ".*: " effectively? [06:26:17] it's not obvious from the Staging page what keys mean what [06:26:55] !log Apply puppetmaster::autosigner to integration-puppetmaster [06:27:00] Logged the message, Master [06:27:21] Krinkle: it is the same, yeah. there’s no documentation atm mostly because, uh, it’s all in a bit of a flux. classes: should go away soon once https://gerrit.wikimedia.org/r/#/c/197712/ gets merged... [06:27:39] but for the moment projectwide puppetmasters need classes: [06:28:02] because the ENC doesn’t work until after you’ve switched to your project’s puppetmaster and hence you can’t use the ENC to switch to your project’s puppetmaster but you can use classes: to [06:29:39] Right [06:30:02] YuviPanda: Can I apply roles to a pattern of instancenames from Hiera:Integration at this time? [06:30:09] nope [06:30:12] you can’t. [06:30:12] ok [06:30:21] you can only do that from the ENC [06:30:25] autosigning should work for new instances with that enabled on puppetmaster? [06:30:30] yup [06:30:33] it might take a while tho [06:30:34] 10mins or so [06:30:40] it runs on a cron [06:31:02] I’ve to go to bed now... [06:31:07] stupid paperwork. grr [06:31:09] OK [06:31:11] Thx [06:31:19] have fun. file bugs for anything that seems odd [06:46:06] !sal [06:46:07] https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:13:46] PROBLEM - Puppet failure on integration-slave1410 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [07:17:16] * legoktm pats shinken-wm [07:17:17] !log Creating integration-slave1410 as test. Will re-create our pool later today. [07:17:22] Logged the message, Master [07:18:49] RECOVERY - Puppet failure on integration-slave1410 is OK: OK: Less than 1.00% above the threshold [0.0] [07:31:54] (03PS1) 10Amire80: Add Hindi and Indonesian to the language screenshots job [integration/config] - 10https://gerrit.wikimedia.org/r/201141 [07:32:27] (03PS2) 10Amire80: Add Hindi and Indonesian to the language screenshots job [integration/config] - 10https://gerrit.wikimedia.org/r/201141 [08:23:45] (03CR) 10Hashar: [C: 04-1] "mediawiki/Vagrant and PoolCounter can be enabled right now, should be done as a different patch that I will be happy to merge on sight." (035 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/200863 (https://phabricator.wikimedia.org/T1361) (owner: 10Zfilipin) [08:25:50] (03CR) 10Hashar: "Rubocop can be enabled on PoolCounter once https://gerrit.wikimedia.org/r/#/c/201148/ is merged." [integration/config] - 10https://gerrit.wikimedia.org/r/200863 (https://phabricator.wikimedia.org/T1361) (owner: 10Zfilipin) [08:36:07] aharoni: have you deployed / refreshed the language screenshot job? [08:37:35] (03CR) 10Hashar: [C: 031] "Amir have you refreshed the job configuration so we can +2/merge it? :)" [integration/config] - 10https://gerrit.wikimedia.org/r/201141 (owner: 10Amire80) [08:40:35] hashar: Isn't the patch supposed to be merged first? [08:40:50] aharoni: in this case that updates a single jenkins job [08:40:54] so you can refrsh the job using JJB [08:41:00] curious [08:41:00] run it to verify it is working properly [08:41:04] OK, I can refresh it [08:41:11] comment back on the Gerrit change saying you refreshed it / verified it works fine [08:41:20] then someone +2 and it lands in [08:46:57] (03CR) 10Amire80: "Now I did. See https://integration.wikimedia.org/ci/view/BrowserTests/view/VisualEditor/job/browsertests-VisualEditor-language-screenshot-" [integration/config] - 10https://gerrit.wikimedia.org/r/201141 (owner: 10Amire80) [08:47:15] hashar: done [09:00:51] zeljkof: and also https://phabricator.wikimedia.org/T92613 :D [09:42:26] (03CR) 10Zfilipin: [C: 032] Add Hindi and Indonesian to the language screenshots job [integration/config] - 10https://gerrit.wikimedia.org/r/201141 (owner: 10Amire80) [09:46:49] (03Merged) 10jenkins-bot: Add Hindi and Indonesian to the language screenshots job [integration/config] - 10https://gerrit.wikimedia.org/r/201141 (owner: 10Amire80) [09:47:46] (03PS2) 10Hashar: Notify Collaboration team of failing browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/201083 (https://phabricator.wikimedia.org/T94152) (owner: 10Mattflaschen) [09:48:02] (03PS3) 10Hashar: Notify Collaboration team of failing browser tests [integration/config] - 10https://gerrit.wikimedia.org/r/201083 (https://phabricator.wikimedia.org/T94152) (owner: 10Mattflaschen) [09:49:57] (03CR) 10Hashar: "I have updated the commit summary message to point to two bugs we filled (and closed):" [integration/config] - 10https://gerrit.wikimedia.org/r/201083 (https://phabricator.wikimedia.org/T94152) (owner: 10Mattflaschen) [10:10:52] (03CR) 10Addshore: "Most likely :)" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/153399 (owner: 10Addshore) [12:50:53] (03CR) 10Hashar: [C: 031] Move extensions to use generic jshint & jsonlint jobs part 3 [integration/config] - 10https://gerrit.wikimedia.org/r/200744 (owner: 10Legoktm) [13:01:02] (03CR) 10Hashar: Package python deps with dh-virtualenv (031 comment) [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [13:07:54] (03CR) 10Hashar: Package python deps with dh-virtualenv (032 comments) [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [13:08:57] (03PS15) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) [13:14:32] (03CR) 10Hashar: [C: 031] "Rebuild the precise1 package at http://people.wikimedia.org/~hashar/debs/zuul/?C=M;O=D" [integration/zuul] (debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/195272 (https://phabricator.wikimedia.org/T48552) (owner: 10Hashar) [13:41:00] (03PS2) 10JanZerebecki: Make composer install despite state from previous job run [integration/config] - 10https://gerrit.wikimedia.org/r/200577 [13:46:15] (03CR) 10JanZerebecki: [C: 031] "After thinking about this it is the right solution for this repository. Additionally having the workspace in a defined state would be nice" [integration/config] - 10https://gerrit.wikimedia.org/r/200577 (owner: 10JanZerebecki) [13:48:30] PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [14:30:17] (03CR) 10Hashar: Create prepare-mediawiki-zuul-project builder macro (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/201022 (owner: 10Legoktm) [14:33:09] (03PS1) 10Aude: Update Wikidata branch [tools/release] - 10https://gerrit.wikimedia.org/r/201200 [14:35:13] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #458: FAILURE in 9 min 12 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/458/ [14:39:46] Yippee, build fixed! [14:39:46] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #69: FIXED in 8 min 36 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/69/ [14:43:37] zeljkof: pff that Gather job is flappy :( [14:43:49] it failed twice (67 & 68) then magically started to pass [14:46:28] hashar: it will fail again now :( [14:47:49] Yippee, build fixed! [14:47:50] Project browsertests-Wikidata-SmokeTests-linux-firefox-sauce build #204: FIXED in 30 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox-sauce/204/ [14:49:40] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #70: FAILURE in 8 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/70/ [14:51:01] well will use another job so [14:52:10] 00:00:40.000 Performance: No threshold configured for making the test unstable [14:52:10] 00:00:40.000 Performance: No threshold configured for making the test failure [14:52:11] great [14:52:13] gotta figure that out [15:07:45] <^demon|away> Ugh what a disgusting patch this has become [15:15:43] <^d> thcipriani: https://gerrit.wikimedia.org/r/#/c/198783/4/multiversion/checkoutMediaWiki.php makes me shudder [15:15:46] <^d> It's ugly as sin [15:18:33] ugly but working is a good thing. [15:19:05] s/good // [15:19:36] fair. [15:20:23] ugly > not automated, how about that? [15:21:33] +1 [15:21:33] <^d> It also allows us to use the same script (albeit faking the process) as prod [15:22:53] wunderbar. I rebuilt mw01 last night, I think it's about 1 updateWikiVersions away from serving stuffs. [15:22:59] staging-mw01 that is [15:23:38] <^d> Now that I have a working staging directory tin I'm going to attempt to get scap & co working [15:23:41] <^d> *on tin [15:24:16] cool beans. [15:24:34] <^d> scap will probably fail without a wiki to build i18n against [15:24:51] <^d> Maybe doing addWiki first is better? [15:25:19] shell scripting in php make my eyes burn [15:25:50] <^d> Nobody's written a git pecl module :p [15:29:08] Well obviously we should implement git in native php. There is no precedent for that being a truely horrible idea. [15:29:18] * bd808 glares at Jenkins and Gerrit [15:33:39] <^d> bd808: You know what's an awesome feature of Phabricator? [15:33:42] <^d> It doesn't use jgit! [15:33:52] epic! [15:35:37] <^d> bd808: yesterday I learned you can't run updateBranchPointers on $deploy_host until you've run sync-common at least once. [15:36:00] <^d> (and since nothing forces a sync-common run after install, you're stuck going "WHY CAN'T I CHECKOUT MW?!?") [15:36:22] bootstrapping a $deploy_host is a dark art [15:37:04] <^d> I wonder if we could make puppet run a sync-common once we've setup /srv/mw-staging/ [15:37:25] <^d> (one less manual bit...) [15:37:33] There is a puppet trigger to create /srv/mediawiki already isn't there? [15:37:54] <^d> Create the directory, yeah, but nothing's sync'd to it yet. [15:38:21] <^d> So when you try to run checkoutMW/updateBranchPointers the first time, it blows up because it can't find the targets. [15:38:24] https://github.com/wikimedia/operations-puppet/blob/2ebe5a3bbd6e3cb2adac8a31a67aa491bd4aedf2/modules/mediawiki/manifests/scap.pp#L48-L56 [15:38:44] <^d> dafuq? [15:39:13] Whatever makes /srv/mw-staging on the deploy host should add puppet bits to ensure that it is done before that block [15:40:11] It would fetch from the scap defined default master [15:40:22] taken from scap.cfg [15:51:38] <^d> Hmm, it's created by the git::clone operation in master.pp [15:54:22] <^d> I don't see a not shitty way to do this, ugh [16:02:58] ^d: yeah I am glad I am not fighting with setting tin up [16:03:12] <^d> It's mostly ok. [16:03:15] ^d we should build another one when this is done just to check. [16:03:28] <^d> There's just $magic_incantations you have to do that aren't quite fully puppetized [16:03:40] ^d: you can add a "before" clause to the git::clone [16:04:00] <^d> I didn't know that! [16:04:09] :) [16:04:19] Puppet has many mysteries [16:04:32] <^d> Is it implicit? [16:04:52] * ^d is used to explicitly defined params [16:04:55] It's a meta property of everything [16:05:00] <^d> Ahhhh [16:05:01] <^d> got it [16:05:21] https://docs.puppetlabs.com/references/latest/metaparameter.html [16:06:16] it's the inverse of require [16:07:15] When require/before aren't magic enough you can do things like this too -- https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/mediawiki/manifests/wiki.pp#L148-L151 [16:07:22] <^d> Patch incoming [16:08:30] <^d> https://gerrit.wikimedia.org/r/#/c/201209/ [16:09:33] <^d> $magic_incantations--; // yay! [16:09:42] * thcipriani reads scrollback [16:09:50] fwiw, I made this: https://gerrit.wikimedia.org/r/#/c/198173/ [16:10:02] for overriding scap.cfg variables in hiera [16:11:03] * ^d adds bd808 to that [16:11:39] You can write an /etc/scap.cfg file too. That might actually be better [16:12:10] https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/config.py#L47-L51 [16:12:37] ah, yeah, that is cleaner. [16:13:47] Imma ruminate on the best way to integrate that. [16:13:56] * thcipriani ruminates [16:15:16] We have a custom resource to create ini-files from a hash [16:16:13] Usage example at -- https://github.com/wikimedia/operations-puppet/blob/production/modules/hhvm/manifests/init.pp#L171 [16:16:46] If you are writing to a local file you shouldn't need sections as I recall [16:17:29] bd808: nice. I'll tweak my patch a bit. Thanks for pointing these things out :) [16:17:42] The custom function comes from -- https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/puppet/parser/functions/php_ini.rb [16:18:05] thcipriani: yw. I've soaked up a lot of random puppet lore over the last 2 years :) [16:18:32] And I wrote a lot of the scap mess too [16:32:48] greg-g: lost internet [16:32:50] :( [16:38:13] PROBLEM - Citoid on deployment-sca01 is CRITICAL: Connection refused [16:40:36] bd808: Yo. Seeing Phab team project requests for Search-Team and Availability-Team, and https://phabricator.wikimedia.org/project/profile/1141/ (MediaWiki-API-Team) already existing, and being afraid that unexperienced Phab users will assign their tasks to team projects instead of code projects (especially in case of "Search-Team), any idea for a sentence to add to project descs that explains //who// is supposed to add such team projects to [16:40:36] tasks? [16:41:21] I don't expect folks to really read project descriptions because they require some clicks to reach now (sigh), but better than nothing for the start. We could still go down the "somehow try to restrict who can add this project" road if needed though that won't be trivial at all [16:44:29] (oh well, I got to leave soon anyway so I like won't create those projects today, plus I see they are under "Needs Review/Feedback" on the workboard so maybe this is less urgent than I thought) [16:44:30] * greg-g points andre__ over to either -core or -devtools ;) [16:44:43] true that, true that. :) [16:45:00] sry [16:45:10] no worries, just so other who care can see [16:45:12] :) [16:49:45] (03CR) 1020after4: [C: 032] Update Wikidata branch [tools/release] - 10https://gerrit.wikimedia.org/r/201200 (owner: 10Aude) [16:49:52] (03Merged) 10jenkins-bot: Update Wikidata branch [tools/release] - 10https://gerrit.wikimedia.org/r/201200 (owner: 10Aude) [16:59:47] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:00:07] andre__: I think at this point most of us are used to re-triaging wrongly associated tasks [17:01:13] * greg-g nods [17:02:25] * andre__ sighs but yeah [17:03:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 48175 bytes in 0.632 second response time [17:04:20] andre__: was in a conf call. reading backscroll now [17:05:19] ah. I think it's ok if random folks assign to those tags. We can triage and remove them as needed [17:05:40] Protecting projects seems like a slippery slope [17:12:15] alright [17:23:56] greg-g: tuesdays @ 20 UTC ? no slots available sooner than that perhaps? [17:24:07] (that amounts to 10pm local time for me) [17:24:43] oh, heh, just PM'd you re that [17:24:54] yeah, that was when it was just people in US [17:25:05] * greg-g does fancy gcal fiddling [17:26:47] moved [17:27:01] mobrovac: I now can't come at all, but that's fine :) [17:27:26] ah [17:30:06] <^d> greg-g: Pre-announced https://lists.wikimedia.org/pipermail/wikitech-l/2015-April/081424.html [17:30:33] ^d: sweet [17:34:47] <^d> bd808: Puppet patch worked as intended on staging (and passed puppet compiler), correct order now. Letting puppet finish and then I'll see if updateBranchPointers works out of the box :) [17:54:06] greg-g, hey [17:54:32] on the importance of qqq: https://translatewiki.net/wiki/MediaWiki:Flow-topic-action-resummarize-topic/qqq [17:54:59] [ sorry, it was meant for another tab... but it's relevant for all MediaWiki devs :) ] [17:58:06] ^d, can we have a task for REL1_25 branching to add blockers to? [17:59:37] Yippee, build fixed! [17:59:38] Project browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #495: FIXED in 1 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-Math-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/495/ [18:05:19] Krenair: use the 1.25 project? [18:05:43] https://phabricator.wikimedia.org/tag/mw-1.25-release/ [18:06:11] thanks [18:11:51] that [18:34:47] (03PS1) 10Krinkle: zuul: Update sample data for demo [integration/docroot] - 10https://gerrit.wikimedia.org/r/201244 [18:34:57] (03CR) 10Krinkle: [C: 032] zuul: Update sample data for demo [integration/docroot] - 10https://gerrit.wikimedia.org/r/201244 (owner: 10Krinkle) [18:35:01] (03Merged) 10jenkins-bot: zuul: Update sample data for demo [integration/docroot] - 10https://gerrit.wikimedia.org/r/201244 (owner: 10Krinkle) [18:45:10] (03PS1) 10Legoktm: make-release: Don't re-list all bundled extensions for each release [tools/release] - 10https://gerrit.wikimedia.org/r/201246 [18:56:35] (03PS1) 10Legoktm: make-release: Add option to list all bundled extensions [tools/release] - 10https://gerrit.wikimedia.org/r/201247 [18:56:37] (03PS1) 10Legoktm: make-release: Unbreak the SMW bundle [tools/release] - 10https://gerrit.wikimedia.org/r/201248 [18:56:42] (03CR) 10jenkins-bot: [V: 04-1] make-release: Unbreak the SMW bundle [tools/release] - 10https://gerrit.wikimedia.org/r/201248 (owner: 10Legoktm) [18:56:46] (03CR) 10jenkins-bot: [V: 04-1] make-release: Add option to list all bundled extensions [tools/release] - 10https://gerrit.wikimedia.org/r/201247 (owner: 10Legoktm) [18:56:54] argh [18:57:38] (03PS2) 10Legoktm: make-release: Unbreak the SMW bundle [tools/release] - 10https://gerrit.wikimedia.org/r/201248 [18:57:40] (03PS2) 10Legoktm: make-release: Add option to list all bundled extensions [tools/release] - 10https://gerrit.wikimedia.org/r/201247 [19:18:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:23:07] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 48138 bytes in 0.602 second response time [19:23:09] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 48438 bytes in 0.945 second response time [19:58:05] bd808, where's the config for what goes into which fluorine:/a/mw-log/*.log file? [19:59:05] Krenair: wgDebugLogGroups -- https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L4157 [19:59:35] oh it's in InitialiseSettings, of course [19:59:36] thanks [19:59:45] The last part of the udp://... URI is the file name [19:59:45] looks like more problems with updating l10n: scap failed: CalledProcessError Command 'cp '/srv/mediawiki-staging/php-1.25wmf24/cache/l10n/'*.cdb '/tmp/scap_l10n_1816369030'' returned non-zero exit status 1 [20:00:07] because /srv/mediawiki-staging/php-1.25wmf24/cache/l10n/ is empty [20:00:17] argh. [20:00:25] did we not deploy the fix for that? [20:00:31] or was the fix wrong? [20:00:39] I don't know... [20:00:45] * bd808 looks [20:01:08] we didn't deploy the fix looks like [20:01:40] https://github.com/wikimedia/mediawiki-tools-scap/commit/aaa8832171ea46caf8e1b833958788b4f97e87c3 isn't on tin [20:02:04] twentyafterfour: want me to deploy it? [20:02:07] ok so how do I deploy that again? looks like /srv/deployment/scap/scap is owned by root [20:02:31] It's a trebuchet deploy [20:02:49] cd /srv/deployment/scap/scap && git deploy start && git checkout master && git fetch && git rebase origin/master && git deploy sync [20:02:54] bd808: ^ that? [20:03:01] yup [20:03:05] I'll try it [20:03:37] error: insufficient permission for adding an object to repository database .git/objects [20:04:13] argh [20:04:21] there are some bad permissions in there [20:04:30] drwxr-sr-x 2 root wikidev 4096 Mar 17 19:59 29/ [20:04:39] drwxr-sr-x 2 root wikidev 4096 Mar 17 19:59 38/ [20:05:01] So we will need a root to chmod g+w on the .git directory [20:09:27] bd808, so I assume logs that aren't listed here are just dropped completely, adding an entry for it will cause the file to be created and logs to start being recorded etc.? [20:09:51] correct on both counts [20:13:13] twentyafterfour: directory setgid bit explained -- https://en.wikipedia.org/wiki/Setuid#setuid_and_setgid_on_directories [20:14:07] I think that when trebuchet is run from Puppet it is not setting the right umask which leads to the bad .git permissions [20:14:19] I'll see if I can track that down [20:52:44] anyone mind doing a trivial review on https://gerrit.wikimedia.org/r/201345 ? wgContentHandlerUseDB is enabled everywhere in prod, but only enwiki in beta [20:55:46] <^d> +2 [20:55:51] ^d: thanks [20:56:05] <^d> Make sure to pull to tin so icinga doesn't complain [20:56:05] <^d> :) [20:56:08] omw :) [20:57:00] unstashed changes .... [21:15:55] <^d> Who's got the puppet stabber? [21:15:55] * ^d needs it [21:16:11] <^d> Error: Could not set 'file' on ensure: cannot generate tempfile `/etc/php5/apache2/php.ini20150401-17527-etx0c0-9' at 21:/etc/puppet/modules/mediawiki/manifests/php.pp [21:16:11] <^d> Error: Could not set 'file' on ensure: cannot generate tempfile `/etc/php5/apache2/php.ini20150401-17527-etx0c0-9' at 21:/etc/puppet/modules/mediawiki/manifests/php.pp [21:16:12] <^d> Wrapped exception: [21:16:13] <^d> cannot generate tempfile `/etc/php5/apache2/php.ini20150401-17527-etx0c0-9' [21:16:15] <^d> Error: /Stage[main]/Mediawiki::Php/File[/etc/php5/apache2/php.ini]/ensure: change from absent to file failed: Could not set 'file' on ensure: cannot generate tempfile `/etc/php5/apache2/php.ini20150401-17527-etx0c0-9' at 21:/etc/puppet/modules/mediawiki/manifests/php.pp [21:16:28] <^d> There is no /etc/php5/apache2/ [21:16:30] <^d> ugh [21:16:36] ^d: https://gerrit.wikimedia.org/r/#/c/196773/ [21:16:50] _joe_ told me that was the wrong way to fix it but it was on IRC and I forgot why >_> [21:16:53] ah, I know [21:17:00] because it should depend on the apache php package [21:17:01] and it isn't [21:17:05] because it’s just accidentally there... [21:18:42] 10Deployment-Systems, 7Documentation: Document Scap - https://phabricator.wikimedia.org/T94618#1172075 (10greg) >>! In T94618#1171945, @bd808 wrote: > If doc updates are enforced in code review then everyone will win. :) You're crazy. (But right) [21:42:02] 10Continuous-Integration, 3Fundraising Sprint H, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Write Jenkins job builder definition for CiviCRM CI job - https://phabricator.wikimedia.org/T91895#1172167 (10awight) [21:47:14] 10Deployment-Systems, 7Documentation: Document Scap - https://phabricator.wikimedia.org/T94618#1172212 (10mmodell) you know... if we used differential for code review, doc updates would be automatic. Also... I didn't even know doc.wikimedia.org existed [21:52:16] <^d> YuviPanda: I mean I suppose I can work around it easy enough... [21:52:23] <^d> But do we want a patch for this though? [21:55:11] I've been working under the assumption that if it can be solved by running puppet a few times, I haven't patched it for staging. [21:55:31] the mediawiki appserver role does the same sort of thing [21:55:57] <^d> This one isn't solved by running N times [21:56:12] yeah, it isn’t. I think it was solved for staging by basically me cherry-picking that patch on to the master [21:56:19] 10Deployment-Systems, 7Documentation: Document Scap - https://phabricator.wikimedia.org/T94618#1172259 (10bd808) >>! In T94618#1172212, @mmodell wrote: > you know... if we used differential for code review, doc updates would be automatic. > > Also... I didn't even know doc.wikimedia.org existed https://integ... [21:56:29] ^d: I guess thing to do is to poke _joe_, and see if we can re-jig them to do the ‘right thing’, or put a FIXME and go on... [21:56:46] depends on how many yaks we want to shave, demand for yak hair, and how suspicious the yaks are... [21:57:13] and environmental impact of yak shaving, phase of the moon, and Indiana’s yak bleeding freedom law... [21:57:42] <^d> Ugh environmental impact studies are the worst [22:01:06] omg what is that patch doing...! [22:01:17] <^d> Which one? [22:01:19] * bd808 looks for a trout to slap YuviPanda with [22:01:25] https://gerrit.wikimedia.org/r/#/c/196773/2/modules/mediawiki/manifests/php.pp,unified [22:01:26] I know [22:01:26] I know [22:01:35] it’s a terrible patch [22:02:06] That dir comes from some deb for sure [22:02:17] yeah I even know what deb it is, I think. [22:02:50] actually I don’t, but it’s modphp [22:02:58] so I’ll have to track that down and test them, etc... [22:03:44] in my defense it’s been unmerged forever :) [22:03:44] <^d> I'm fine with hacking around this in staging for now as long as there's a task/commit to track it [22:03:58] apt-file search /etc/php5/apache2 [22:04:36] Or I think: dpkg -S /etc/php5/apache2 [22:04:50] I am pretty sure it’s libapache2-mod-php5 [22:04:59] that sounds right [22:05:13] that would be the PHP SAPI for apache [22:05:58] <^d> libapache2-mod-php5: /etc/php5/apache2/conf.d [22:05:58] <^d> libapache2-mod-php5filter: /etc/php5/apache2filter/conf.d [22:06:12] <^d> Which is funny, because I /do/ have an apach2filter directory when puppet fails [22:06:14] <^d> Just not apache2 [22:12:42] (03CR) 10Mattflaschen: "We have a private list, e2 (this might be too spammy for the public list). e2 does go to all team members, not just devs, but they might " [integration/config] - 10https://gerrit.wikimedia.org/r/201083 (https://phabricator.wikimedia.org/T94152) (owner: 10Mattflaschen) [22:13:14] ^d: wanna try the same patch but less terrible now on staging or wherever? [22:13:28] <^d> Yeah lemme cherry-pick it in [22:16:35] ^d: actually, I realized that won’t actually work, *because* it didn’t fix itself in multiple puppet runs [22:16:46] ^d: so I guess that package isn’t being installed, actulaly [22:16:56] I remember now, this is why I didn’t do this earlier [22:17:36] <^d> Oh dur, 2 channels [22:17:41] <^d> Anyway, failed for obvious reason [22:18:58] ^d: yeah... [22:19:50] <^d> Why are we creating apache2/php.ini if we don't have apache2modphp5 installed? [22:24:53] ^d: good question. I didn’t dig that at all :D [22:24:55] as in [22:24:56] dig into [22:24:59] not the american ‘dig' [22:27:18] <^d> I mean for the deploy master, I'm not sure why you need apache at all [22:27:21] <^d> It's not serving anything [22:28:09] ^d: apache is needed, trebuchet uses it [22:28:17] but yeah, no idea why *php* is needed [22:28:25] <^d> Well PHP is needed [22:29:01] well, apache’s modphp rather [22:29:22] <^d> Oh yeah no all we need is cli [22:30:04] <^d> Actually, I think the fix is to move that config out of that class. It should be where modphp5 is installed, not this. [22:30:14] <^d> This is just cli and config stuff that can depend on php-common [22:30:27] <^d> php-cli, rather [22:32:33] <^d> Or not move it, but hrm [22:32:36] <^d> This freaking sucks [22:35:53] * ^d ragequits puppet for awhile [22:54:12] (03CR) 10Mattflaschen: "Erik also favors individual." [integration/config] - 10https://gerrit.wikimedia.org/r/201083 (https://phabricator.wikimedia.org/T94152) (owner: 10Mattflaschen) [23:04:31] James_F: fyi: https://www.mediawiki.org/w/index.php?title=MediaWiki_1.25&diff=1509973&oldid=1502448 [23:04:48] greg-g: OK. [23:05:25] /me pats ^d [23:07:37] greg-g: That's not a Wednesday, BTW. [23:07:50] :-) [23:08:30] it's the tarball release, it's ok [23:08:42] it's the last day of hackathon, we'll sprint it out there [23:08:51] (that, and chad's going on vacation on the 26th) [23:09:32] <^d> hehee :) [23:09:49] * James_F grins. [23:16:21] greg-g: Also, can I move the totally-not-a-canary-wiki zerowiki to group1 where it belongs? [23:16:30] * James_F grumbles. [23:18:05] it's in group0? [23:18:36] James_F: ^ [23:18:54] greg-g: Apaprently. :-( [23:19:01] weeeird [23:19:16] I think yurik was impatient for new code. [23:19:22] usually [23:19:24] But it's not really a good reason to disrupt the train. [23:23:22] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:43] PROBLEM - SSH on deployment-bastion is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:26:59] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48347 bytes in 0.643 second response time [23:36:32] RECOVERY - SSH on deployment-bastion is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [23:42:43] PROBLEM - SSH on deployment-bastion is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:52:35] RECOVERY - SSH on deployment-bastion is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)