[00:01:07] RECOVERY - SSH on deployment-salt is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [00:14:13] PROBLEM - Puppet failure on deployment-test is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [00:17:35] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [00:19:17] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [00:20:03] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [00:20:11] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [00:20:13] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [00:20:36] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:20:50] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [00:21:50] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:22:34] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [00:22:44] PROBLEM - Puppet failure on deployment-stream is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [00:23:02] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:23:12] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [00:23:16] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [00:23:52] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [00:24:02] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [00:24:38] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:27:41] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [00:29:05] PROBLEM - Puppet failure on deployment-db1 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [00:29:17] PROBLEM - Puppet failure on deployment-db2 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [00:29:55] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [00:29:55] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:30:25] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [00:30:41] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [00:31:37] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:32:18] !log seeing heavy swapping on deployment-salt; puppet processes using 250M+ memory each [00:32:21] Logged the message, Master [00:33:15] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0] [00:34:11] PROBLEM - Puppet failure on deployment-parsoid01-test is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [00:34:49] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [00:37:04] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [00:38:22] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [00:55:47] PROBLEM - Puppet failure on deployment-salt is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [01:09:07] (03PS1) 10Legoktm: Have utils.check_php_opening_tag check the file extension suffix [tools/scap] - 10https://gerrit.wikimedia.org/r/197460 [01:10:00] (03PS2) 10Legoktm: Have utils.check_php_opening_tag check the file extension suffix [tools/scap] - 10https://gerrit.wikimedia.org/r/197460 [01:11:24] (03CR) 10BryanDavis: [C: 032] Have utils.check_php_opening_tag check the file extension suffix [tools/scap] - 10https://gerrit.wikimedia.org/r/197460 (owner: 10Legoktm) [01:11:40] (03Merged) 10jenkins-bot: Have utils.check_php_opening_tag check the file extension suffix [tools/scap] - 10https://gerrit.wikimedia.org/r/197460 (owner: 10Legoktm) [01:17:58] (03PS1) 1020after4: Improved test for content preceeding (03CR) 10jenkins-bot: [V: 04-1] Improved test for content preceeding !log deployment-salt still unresponsive, lot's of io wait (94%) + swapping [01:20:11] Logged the message, Master [01:21:32] "Attempt to authenticate with the salt master failed" from trebuchet [01:21:45] marxarelli: puppet gone nuts? [01:22:08] (03PS2) 1020after4: Improved test for content preceeding bd808: not sure [01:22:13] lots of io wait [01:22:21] and swapping [01:22:29] https://tools.wmflabs.org/nagf/?project=deployment-prep#h_deployment-salt_cpu [01:22:58] puppet is using a crap ton of memory, but i think that's "normal" with the killer ruby 1.8 + webbrick combo [01:23:19] I can't get a shell to look [01:23:23] iostat says about 1M/s reads [01:23:30] wtf [01:23:34] on vda1 [01:23:51] what's running? [01:23:52] ~ 150 iops [01:24:00] (03CR) 10Legoktm: Improved test for content preceeding looks like I may get a shell after all... [01:24:58] i don't see anything running that shouldn't be [01:25:18] No giant pile of salt processes or anything? [01:25:18] just a lot of io (and cpu wait) for some reason [01:25:45] how many is a pile? [01:25:55] oh, there are quite a few [01:25:55] 5 puppetsigner.py crons running [01:26:02] and salt looks nutso [01:26:12] ah, ok [01:26:18] should i restart salt master? [01:26:27] something weird is going on [01:26:45] I'd start with those crons for YuviPanda|zzzz's new auto signer [01:26:53] and the kill the salt-master [01:27:32] We have seen salt just go completely nutso before [01:28:30] !log restarting salt master on deployment-salt [01:28:33] Logged the message, Master [01:28:40] (03PS3) 10BryanDavis: Improved test for content preceeding (03CR) 10BryanDavis: Improved test for content preceeding (03CR) 10Legoktm: "@20after4: We fixed that in Ie1d16423787a25e3c45e77d9447e8e2d51fd0299, to have the function check for the extension suffix." [tools/scap] - 10https://gerrit.wikimedia.org/r/197462 (https://phabricator.wikimedia.org/T92534) (owner: 1020after4) [01:31:38] (03CR) 1020after4: "I see, concurrent development. doh!" [tools/scap] - 10https://gerrit.wikimedia.org/r/197462 (https://phabricator.wikimedia.org/T92534) (owner: 1020after4) [01:32:33] (03CR) 1020after4: "I still think this is a bit of an improvement to the readability of check_php_opening_tag. Merge or abandon?" [tools/scap] - 10https://gerrit.wikimedia.org/r/197462 (https://phabricator.wikimedia.org/T92534) (owner: 1020after4) [01:33:14] (03CR) 10Legoktm: "I think the "if ' (03PS4) 1020after4: Improved test for content preceeding (03CR) 10jenkins-bot: [V: 04-1] Improved test for content preceeding (03CR) 1020after4: [C: 031] "removed the check for if ' 10Continuous-Integration: Jenkins: Create shell wrapper to setup MySQL database - https://phabricator.wikimedia.org/T57788#1127243 (10Krinkle) p:5Low>3High [01:39:06] (03PS5) 1020after4: Improved test for content preceeding (03CR) 10jenkins-bot: [V: 04-1] Improved test for content preceeding 10Continuous-Integration: Jenkins: Create shell wrapper to setup MySQL database - https://phabricator.wikimedia.org/T57788#643272 (10Krinkle) Raising priority per https://phabricator.wikimedia.org/T37912#1078582 and because SQLite is continuing to be unstable and cause critical problems and failing jobs blocking... [01:40:33] (03CR) 1020after4: "well without that check, it fails some tests." [tools/scap] - 10https://gerrit.wikimedia.org/r/197462 (owner: 1020after4) [01:45:47] !log kill 9'd puppetmaster processes on deployment-salt after repeated attempts to stop [01:45:51] Logged the message, Master [01:48:04] !log memory usage, swap, io wait seem to be back to normal on deployment-salt and kill/start of puppetmaster [01:48:07] Logged the message, Master [01:49:19] bd808: guess it was puppet master. the processes were all in D state and one was going crazy [01:53:38] (03PS6) 1020after4: Improved test for content preceeding (03CR) 1020after4: [C: 032] Improved test for content preceeding (03Merged) 10jenkins-bot: Improved test for content preceeding RECOVERY - Puppet failure on deployment-videoscaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:59:41] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [02:02:33] !log Updated scap to I58e817b (Improved test for content preceeding Logged the message, Master [02:02:43] RECOVERY - Puppet failure on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:09] RECOVERY - Puppet failure on deployment-apertium01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:09] RECOVERY - Puppet failure on deployment-parsoid05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:21] RECOVERY - Puppet failure on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:53] RECOVERY - Puppet failure on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:09] RECOVERY - Puppet failure on deployment-kafka02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:21] RECOVERY - Puppet failure on deployment-db2 is OK: OK: Less than 1.00% above the threshold [0.0] [02:04:56] RECOVERY - Puppet failure on deployment-cxserver03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:07:42] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [02:08:22] RECOVERY - Puppet failure on deployment-mediawiki03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:08] RECOVERY - Puppet failure on deployment-parsoid01-test is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:22] RECOVERY - Puppet failure on deployment-parsoidcache02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:48] RECOVERY - Puppet failure on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:31] RECOVERY - Puppet failure on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:11:59] RECOVERY - Puppet failure on deployment-memc03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:13:30] RECOVERY - Puppet failure on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:05] RECOVERY - Puppet failure on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:33] RECOVERY - Puppet failure on deployment-jobrunner01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:45] RECOVERY - Puppet failure on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:45] RECOVERY - Puppet failure on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:16:33] RECOVERY - Puppet failure on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:12] RECOVERY - Puppet failure on deployment-test is OK: OK: Less than 1.00% above the threshold [0.0] [02:19:58] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:39:27] 1783 Invalid parameter for message "logentry-massmessage-failure": a:1:{i:0;s:13:"edit-conflict";} in /srv/mediawiki/php-1.25wmf20/includes/Message.php on line 1007 [03:39:29] legoktm [03:41:09] Bunch of different indexes for this as well: [03:41:09] 5 Undefined index: 4 in /srv/mediawiki/php-1.25wmf21/extensions/BounceHandler/includes/ProcessBounceEmails.php on line 121 [03:46:44] Yippee, build fixed! [03:46:45] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #377: FIXED in 39 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/377/ [04:01:25] Yippee, build fixed! [04:01:26] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce build #541: FIXED in 30 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-chrome-sauce/541/ [04:39:53] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #505: FAILURE in 4 min 55 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/505/ [04:49:23] 10Continuous-Integration, 6Labs, 6operations: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1127424 (10Krinkle) >>! In T92710#1118969, @scfc wrote: > Looking at http://shinken.wmflabs.org/service/integration-slave1402/Puppet%20failure, shinken seems to have no... [05:05:34] Krenair: I have no idea what that error message means :/ [05:18:17] RECOVERY - Puppet failure on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [06:10:02] Yippee, build fixed! [06:10:02] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #372: FIXED in 42 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/372/ [06:55:02] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [07:00:18] RECOVERY - Puppet failure on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [07:15:51] 10Staging, 5Patch-For-Review: Create staging-sca* (Service Cluster A, currently hosting apertium, mathoid, citoid and cxserver) - https://phabricator.wikimedia.org/T91554#1127552 (10yuvipanda) This basically works now, including re-creations :D [08:47:19] (03CR) 10Gilles: "This is trivial and I could run jjb on it myself, but I don't have +2 here :)" [integration/config] - 10https://gerrit.wikimedia.org/r/197291 (owner: 10Gilles) [08:58:58] good morning [08:59:21] (03PS2) 10Hashar: Add multimedia alerts list to UW tests [integration/config] - 10https://gerrit.wikimedia.org/r/197291 (owner: 10Gilles) [08:59:29] hashar: morning! [08:59:36] I will be 5 minutes late, have to finish something [08:59:38] (03CR) 10Hashar: [C: 032] "Here is the +2 I havent run JJB though :(" [integration/config] - 10https://gerrit.wikimedia.org/r/197291 (owner: 10Gilles) [08:59:55] gi11es: bonjour! I will run JJB on that patch [09:00:04] cool, thank you [09:00:08] zeljkof: sure, take your time [09:02:04] (03CR) 10Hashar: "Finally I have updated the jobs :-D" [integration/config] - 10https://gerrit.wikimedia.org/r/197291 (owner: 10Gilles) [09:04:09] (03Merged) 10jenkins-bot: Add multimedia alerts list to UW tests [integration/config] - 10https://gerrit.wikimedia.org/r/197291 (owner: 10Gilles) [09:09:11] hashar: coming to the hangout [09:18:24] am I allowed to enable wgDebugQueries etc. on deployment-prep temporarily to debug https://phabricator.wikimedia.org/T92232 ? [10:24:47] RECOVERY - Puppet failure on deployment-salt is OK: OK: Less than 1.00% above the threshold [0.0] [11:13:31] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [11:43:34] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [12:36:13] 10Continuous-Integration, 6operations, 7Blocked-on-Operations: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-7+wmf2.1 or equivalent - https://phabricator.wikimedia.org/T88798#1128277 (10akosiaris) a:3akosiaris [13:38:59] <^d> YuviPandaaaaaaaaaaaaa [13:39:17] * ^d pokes about hiera of ES [13:39:42] y’know that doesn’t actually alert me [13:39:55] but I do seem to neurotically scan all channels anyway... [13:39:57] >_> [13:41:32] <^d> YuviPanda: I figured the latter would happen :p [13:42:26] welcome, mobrovac [13:42:34] grazie! [13:42:40] mobrovac: ^d thcipriani|afk and twentyafterfour are also able to help with betacluster stuff : [13:43:02] very good (for me) [13:43:04] :P [14:23:04] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0] [14:24:02] that's me [14:24:29] dunno why this msg, no failures are given to me on the host [14:25:10] in fact, i'm still trying to get to step 6 in https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Converting_a_host_to_use_local_puppetmaster_and_salt_master [14:25:54] mobrovac: hmm, I don’t know who wrote that, that part is somewhat outdated [14:26:03] yey :) [14:26:04] cool [14:26:05] ok [14:26:10] mobrovac: that entire step can now be replaced with just ‘oh, run puppet a few times and wait' [14:26:20] * YuviPanda edits [14:26:48] well, it's been running in a while loop for 20 mins now, still no failures [14:26:58] mobrovac: updated. [14:27:00] and puppet ca list gives no hosts [14:27:05] ok, lemme check [14:27:40] ah, that's easy :) [14:30:20] YuviPanda: hm, i've enabled the restbase and cassandra roles for the host, and i see the git repo has been synced, but no cassandra [14:31:09] * YuviPanda wonders how he got involved in so many things :) [14:31:24] hehe [14:31:27] mobrovac: I see cassandra, etc specified in the motd [14:31:34] ah no sorry, cassandra is there, it just failed [14:31:38] :) [14:31:45] yeah, I suspect so [14:32:03] so usually this is where you go and refactor puppet code so that it works on multiple environments :) [14:32:15] hashar: Given https://gerrit.wikimedia.org/r/#/c/194992/ is merged do we need to +2 and deploy https://gerrit.wikimedia.org/r/#/c/194990/ ? [14:32:18] eh that was my next question [14:32:45] YuviPanda: ok, thnx will try to figure something out in ops/puppet [14:33:13] YuviPanda: btw, is there a place in hiera there to place betacluster-specific configs? [14:33:55] mobrovac: yeah, thee’s hieradata/labs/deployment-prep.yaml, and also wikitech.wikimedia.org/wiki/Hiera:deployment-prep [14:34:10] mobrovac: basically your role should work fine in prod and beta, and you should be able to use hiera for anything that differs [14:34:32] perfect, because i surely need to adjust the peers for cassandra [14:34:38] thnx YuviPanda :) [14:34:44] :) [14:38:04] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:44:03] 10Staging, 5Patch-For-Review: Create staging-mx (Mail server, pollonium replacement) - https://phabricator.wikimedia.org/T91562#1128676 (10thcipriani) 5Open>3Resolved [14:44:04] 10Staging: Create staging cluster (tracking) - https://phabricator.wikimedia.org/T88702#1128677 (10thcipriani) [14:47:48] PROBLEM - Host deployment-restbase02 is DOWN: CRITICAL - Host Unreachable (10.68.17.189) [14:50:49] PROBLEM - Host deployment-restbase01 is DOWN: CRITICAL - Host Unreachable (10.68.17.220) [14:58:20] RECOVERY - Host deployment-restbase01 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms [14:59:08] (03PS4) 10Legoktm: Don't ignore l10n-bot in gate-and-submit pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/194990 (https://phabricator.wikimedia.org/T91707) [15:00:08] (03CR) 10Legoktm: [C: 032] Don't ignore l10n-bot in gate-and-submit pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/194990 (https://phabricator.wikimedia.org/T91707) (owner: 10Legoktm) [15:01:24] (03Merged) 10jenkins-bot: Don't ignore l10n-bot in gate-and-submit pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/194990 (https://phabricator.wikimedia.org/T91707) (owner: 10Legoktm) [15:06:33] !log deployed https://gerrit.wikimedia.org/r/194990 [15:06:36] Logged the message, Master [15:09:16] legoktm: seems l10nbot is not ignored by the test pipeline :D [15:09:26] I am writing a test [15:10:19] hashar: I don't see it creating jobs for https://gerrit.wikimedia.org/r/#/q/owner:%22L10n-bot+%253Cl10n-bot%2540translatewiki.net%253E%22,n,z [15:10:41] ho are you pairing with raymond right now so ? [15:11:04] maybe it just get ignored already [15:11:21] I'm not [15:11:29] I just saw him running the script [15:11:43] heh, the gate-and-submit queue is pretty big right now [15:12:56] yeah [15:13:00] but processed in parallel [15:13:03] 6Release-Engineering: Add mediawiki-ruby-api repository to Release Engineering GitHub team - https://phabricator.wikimedia.org/T93080#1128811 (10zeljkofilipin) 3NEW [15:13:15] though zuul might have a window limiting the number of changes being run in the gate [15:13:55] I have no clue who could resolve the above bug ^ [15:14:21] I have tried adding github tag/project in phab, but looks like there is nothing like that [15:15:03] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [15:15:05] zeljkof: i can [15:15:14] hashar: great [15:20:14] !log setting gallium # of executors from 5 back to 3. When jobs run on it that slowdown the zuul scheduler and merger! [15:20:17] Logged the message, Master [15:21:01] hashar: only problem is that if one fails, the whole queue has to be retried... [15:22:30] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [15:24:02] legoktm: yeah :) [15:24:13] legoktm: cause patches behind might fail because of the change ahead that failed [15:24:34] zuul limits the number of jobs in the pipeline [15:24:56] http://ci.openstack.org/zuul/zuul.html#pipelines look for "window [15:24:57] hashar: I think that's more of an edge case...most extensions in this case don't depend upon other extensions [15:28:21] (03PS1) 10Aude: Update Wikidata branch to wmf/1.25wmf22 [tools/release] - 10https://gerrit.wikimedia.org/r/197633 [15:31:08] 10Beta-Cluster: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1128868 (10greg) p:5Triage>3Normal [15:32:05] 10Beta-Cluster: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1127937 (10greg) The only thing I can think of is if a #BetaFeatures is causing this. [15:34:21] zeljkof: github wikimedia org has 'releng' and 'release engineering' teams! [15:34:44] hashar: is that good or bad? ;) [15:34:49] 6Release-Engineering: Add mediawiki-ruby-api repository to Release Engineering GitHub team - https://phabricator.wikimedia.org/T93080#1128876 (10hashar) Looking at https://github.com/orgs/wikimedia/teams has 'releng' and 'release engineering' teams! [15:34:50] feel free to merge them [15:35:12] Daniel and Brion are in one https://github.com/orgs/wikimedia/teams/releng [15:35:18] you are in the other https://github.com/orgs/wikimedia/teams/release-engineering [15:35:29] I can not even open this one https://github.com/orgs/wikimedia/teams/releng [15:35:32] heh [15:35:37] hashar: is there a reason mwext-Foo-testextension-zend jobs have a node: productionSlaves set? [15:35:37] i didn’t even know i was in that :D [15:35:52] hashar: I do not know what to do [15:36:00] hashar: maybe change my team to ruby? [15:36:03] and just put ruby gems there? [15:37:48] hashar: MySQL support. I'd like to evaluate it today and implement (if feasible for one guy in < 2 days) this week. [15:38:21] 10Beta-Cluster, 10MediaWiki-ResourceLoader: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1128882 (10Jdforrester-WMF) [15:38:22] hashar: SQLite issues have reached their limit. We have to to either 1) backport precise sqlite, 2) switch to MySQL, or 2) change ResourceLoader to use objectcache instead of dedicated busy tables [15:38:45] I've been blocked for a month, mediawiki core team has done a lot but it's on us to finish it now [15:39:36] 10Beta-Cluster, 10MediaWiki-ResourceLoader: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1127937 (10Jdforrester-WMF) This feels like it might possibly be caused by the recent ResourceLoader change by @ori to resequence module load... [15:39:52] 6Release-Engineering: Add mediawiki-ruby-api repository to Release Engineering GitHub team - https://phabricator.wikimedia.org/T93080#1128886 (10zeljkofilipin) I do not even see releng team. Feel free to rename my team to ruby and just add the ruby gems there. [15:41:06] hashar, legoktm: Putting the i18n jobs through Jenkins is awesome. Thanks so much. [15:41:10] 6Release-Engineering: Add mediawiki-ruby-api repository to Release Engineering GitHub team - https://phabricator.wikimedia.org/T93080#1128894 (10hashar) 5Open>3Resolved a:3hashar I have deleted the github 'releng' team and added the requested repo to https://github.com/orgs/wikimedia/teams/release-engineer... [15:41:18] :) [15:41:34] We have a Release Engineering GitHub team? [15:41:43] who knew [15:41:46] :) [15:41:55] Ah, the RelEng team at GitHub :P [15:42:04] as represented in its virtual form [15:42:05] :D [15:42:20] that is basically to grant rights to close pull requests [15:42:21] https://github.com/orgs/wikimedia/teams/release-engineering is 404 [15:42:28] Yeah, I get it [15:42:45] Krinkle: for mysql, awight wrote a bunch of shell helpers to setup/teardown a mysql database. He had the use case to run the CiviCRM tests. [15:43:13] hashar: Yeah, but for our main jenkins jobs. [15:43:29] 6Release-Engineering: Add mediawiki-ruby-api repository to Release Engineering GitHub team - https://phabricator.wikimedia.org/T93080#1128898 (10zeljkofilipin) Thanks! :) [15:43:34] hashar: What would it need beyond a CREATE DATABASE with a built-tag or somethign liek that as dataabse name and a drop db in mw-teardown? [15:43:50] The installer would be adapted to make mediawiki popular the mysql database of course [15:43:56] populate* [15:43:58] oh, maybe you can't see the teams if you aren't in the org (I'm not) [15:44:14] greg-g: Yeah, the link works for me [15:44:20] GitHub is very paranoid about visibility. [15:44:25] They use 404 for everything [15:44:46] huh [15:49:16] 10Deployment-Systems, 10Staging, 6operations, 7Puppet: provider => trebuchet doesn't work until manual 'git deploy start' on deployment-server - https://phabricator.wikimedia.org/T92978#1128917 (10greg) p:5Triage>3Normal [15:50:08] 10Deployment-Systems, 6MediaWiki-Core-Team, 5Patch-For-Review: Can't update l10n cache - https://phabricator.wikimedia.org/T92900#1128921 (10greg) I presume this is done/fixed now? [15:50:47] 10Deployment-Systems, 6Release-Engineering, 10Wikimedia-Hackathon-2015, 7HHVM: HHVM RepoAuthoritative Hackathon proof of concept - https://phabricator.wikimedia.org/T91074#1128926 (10greg) p:5Triage>3Low [15:51:10] Krinkle: hey! have you seen https://github.com/wikimedia/operations-puppet/blob/production/nodes/labs/staging.yaml? :) [15:51:22] interesting [15:51:25] setting something like that up for integration would allow you to delete / recreate instances easily [15:51:28] 10Deployment-Systems: Expose php warnings in mediawiki-config more visibly - https://phabricator.wikimedia.org/T87447#1128929 (10greg) p:5Triage>3Normal [15:51:42] 10Deployment-Systems, 6Release-Engineering, 7Documentation: update wikitech trebuchet instructions which still mention deployment::target - https://phabricator.wikimedia.org/T90571#1128931 (10greg) p:5Triage>3Normal [15:51:50] Krinkle: and that + hiera should basically allow you to not depend on wikitech... [15:52:13] YuviPanda: In 12 days we'll find out, that's when I'm gonna nuke our slaves again and re-create [15:52:50] Krinkle: :) cool. I’ve already enabled the enc for integration (https://github.com/wikimedia/operations-puppet/blob/production/nodes/labs/integration.yaml) [15:53:01] Thanks [15:53:13] down with wikitech wiki! [15:53:16] or something [15:53:36] YuviPanda: So the first lines of https://wikitech.wikimedia.org/wiki/Hiera:Integration are redundant [15:53:37] ? [15:53:39] greg-g: wikitech wiki itself is fine. ticking checkboxes, on the otherhand, is for managers…. [15:53:40] or even all? [15:53:46] YuviPanda: zing! [15:54:13] Krinkle: no, they aren’t yet, because of a bug that prevents the ops yaml.file from not being used at all until the puppetmaster is set appropriately... [15:54:20] Krinkle: so that’s needed, but I suspect that’ll be fixed before 12 days [15:54:27] OK [15:54:30] Either is fine [15:55:53] Krinkle: yeah, so for now just the puppetmaster stuff needs to stay on Hiera:Integration [15:57:18] 10Beta-Cluster, 10MediaWiki-ResourceLoader: http://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences lacks normal styling - https://phabricator.wikimedia.org/T93050#1128952 (10Krinkle) These tab styles are provided by `mediawiki.special.preferences.less` from the Vector skin, added to the `mediawiki.speci... [15:57:19] YuviPanda: what is the fix for that, out of curiosity? [15:57:20] RECOVERY - Puppet failure on deployment-pdf02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:57:30] RECOVERY - Puppet failure on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:57:54] thcipriani: so fix for that is to make the labs general puppetmaster (virt1000) use yaml+ldap ENC :D [15:58:14] thcipriani: so hat involves converting the script into a small service so it doesn’t load files and start a new LDAP connection for every request... [16:00:34] YuviPanda: makes sense. I _thought_ I saw something about Horizon in a phab ticket somewhere, got confused. [16:01:02] thcipriani: yeah, horizon should also help fix this, but I guess the ENC will just be a service that horizon calls out to... [16:02:23] legoktm: AssertionError: l10-bot should not enter check-voter pipeline [16:02:24] :D [16:03:58] woot [16:05:33] RECOVERY - Puppet failure on deployment-sentry2 is OK: OK: Less than 1.00% above the threshold [0.0] [16:06:10] (03PS2) 10Awight: Drop jslint jobs for fundraising CRM submodules [integration/config] - 10https://gerrit.wikimedia.org/r/196847 (https://phabricator.wikimedia.org/T91895) [16:06:17] (03PS14) 10Awight: Jenkins job builder definition for CRM job [integration/config] - 10https://gerrit.wikimedia.org/r/195063 (https://phabricator.wikimedia.org/T91895) [16:14:44] 10Quality-Assurance, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-18, 5Patch-For-Review: Automated tests for Sentry error logging - https://phabricator.wikimedia.org/T88078#1129086 (10Gilles) [16:17:33] greg-g: magic review queue in Gerrit : https://gerrit.wikimedia.org/r/#/q/is:open+reviewer:self+label:Code-Review%253D0%252Cuser%253Dself,n,z :D [16:17:48] hashar: So do you suspect there is more to it other than 1) ensuring mysql is installed and accessible by apache/jenkins-deploy, 2) create db in setup from unique id, 3) delete in teardown [16:18:01] in conf call with greg [16:18:12] I'm just checking if I'm missing something before I go about it and hit something you already know won't work that easy :) Appreciate your knowledge! [16:28:58] Krinkle: na that is about it I think [16:29:11] Krinkle: much like the scripts that prepare and teardown the sqlite environement [16:29:36] you can get the user / pass / dbname to be forged out of some build unique id [16:30:01] 10Continuous-Integration: Jenkins: Create shell wrapper to setup MySQL database - https://phabricator.wikimedia.org/T57788#1129166 (10Krinkle) a:3Krinkle [16:31:57] 10Continuous-Integration: Jenkins: Create shell wrapper to setup MySQL database - https://phabricator.wikimedia.org/T57788#643272 (10Krinkle) [16:31:58] 10Continuous-Integration, 6Scrum-of-Scrums, 7Blocked-on-MediaWiki-Core: MediaWiki installs in Jenkins frequently fail to access their sqlite database due to locks - https://phabricator.wikimedia.org/T89180#1129169 (10Krinkle) [16:32:37] 10Continuous-Integration, 6Scrum-of-Scrums, 7Blocked-on-MediaWiki-Core: Jenkins jobs using MediaWiki frequently fail due to database locks - https://phabricator.wikimedia.org/T89180#1129171 (10Krinkle) [16:32:46] (03PS1) 10Hashar: check-voter ignore l10-bot + tests [integration/config] - 10https://gerrit.wikimedia.org/r/197649 [16:33:10] 10Continuous-Integration: Jenkins jobs using MediaWiki frequently fail due to database locks - https://phabricator.wikimedia.org/T89180#1029348 (10Krinkle) [16:33:16] 10Continuous-Integration: Jenkins jobs using MediaWiki frequently fail due to database locks - https://phabricator.wikimedia.org/T89180#1029348 (10Krinkle) [16:33:22] (03CR) 10Hashar: "And we forgot the check-voter pipeline. Being done by https://gerrit.wikimedia.org/r/#/c/197649/ which also adds some basic tests covering" [integration/config] - 10https://gerrit.wikimedia.org/r/194990 (https://phabricator.wikimedia.org/T91707) (owner: 10Legoktm) [16:34:45] (03CR) 10Legoktm: [C: 031] check-voter ignore l10-bot + tests [integration/config] - 10https://gerrit.wikimedia.org/r/197649 (owner: 10Hashar) [16:35:25] 10Continuous-Integration: Jenkins jobs using MediaWiki frequently fail due to database locks - https://phabricator.wikimedia.org/T89180#1129188 (10Krinkle) Making MediaWiki core work properly with current versions of SQLite is not a Continuous Integration goal. That is for MediaWiki core to prioritise accordingl... [16:36:49] 10Continuous-Integration: Jenkins jobs using MediaWiki frequently fail due to database locks - https://phabricator.wikimedia.org/T89180#1029348 (10Krinkle) [16:43:25] (03PS1) 10Legoktm: Add few extension + skin jobs: [integration/config] - 10https://gerrit.wikimedia.org/r/197651 [17:01:43] (03PS1) 10Krinkle: Set up npm-test for mediawiki/extensions/Buggy [integration/config] - 10https://gerrit.wikimedia.org/r/197660 [17:17:09] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster - https://phabricator.wikimedia.org/T91102#1129353 (10mobrovac) [17:19:44] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster - https://phabricator.wikimedia.org/T91102#1129371 (10mobrovac) Once the patch is merged we can test it and make sure RB and Cassandra work in the VMs. RB's config has been changed so that it uses the Parsoid insta... [17:20:25] (03CR) 10Legoktm: [C: 032] Add few extension + skin jobs: [integration/config] - 10https://gerrit.wikimedia.org/r/197651 (owner: 10Legoktm) [17:20:46] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster - https://phabricator.wikimedia.org/T91102#1129383 (10mobrovac) [17:25:24] (03Merged) 10jenkins-bot: Add few extension + skin jobs: [integration/config] - 10https://gerrit.wikimedia.org/r/197651 (owner: 10Legoktm) [17:27:26] !log deployed https://gerrit.wikimedia.org/r/197651 [17:27:31] Logged the message, Master [17:31:01] greg-g: we don't have any SoS blockers, right? [17:31:17] our tasked, blocked by others that is [17:31:21] *task* [17:32:31] marxarelli: not really, but it's worth a ping to Ops to talk with Antoine re CI Isolation arch [17:32:36] lemme find the task [17:33:12] (03Abandoned) 10Krinkle: [WIP] Rewrite beta-update-databases-eqiad jobs as one [integration/config] - 10https://gerrit.wikimedia.org/r/186559 (owner: 10Krinkle) [17:33:13] marxarelli: https://phabricator.wikimedia.org/T92324 [17:33:38] there we go, just added the right #projects to it [17:35:12] 10Continuous-Integration: Why are the language screenshot tests stalled by so long? - https://phabricator.wikimedia.org/T89178#1129403 (10zeljkofilipin) This should not happen any more, since all browsertests* Jenkins jobs are now aborted if they are not finished in 3 hours. For more information see T92275. Plea... [17:35:25] 10Continuous-Integration: Why are the language screenshot tests stalled by so long? - https://phabricator.wikimedia.org/T89178#1129409 (10zeljkofilipin) 5Open>3Resolved [17:37:06] zuul stuck? [17:37:51] maybe.... [17:38:21] maybe not [17:38:21] un-stuck! [17:38:23] :) [17:38:30] no idea [17:38:53] greg-g: you were going to start an email thread between a lot of people :D [17:39:15] YuviPanda: gah.... remind me [17:39:21] * YuviPanda reminds greg-g [17:39:28] subject? [17:39:34] context? [17:39:40] * greg-g is blanking [17:39:46] greg-g: oh, right. [17:39:55] greg-g: ops commitment to beta / staging for next quarter [17:39:59] ahhhh [17:40:12] I have another one I want to start instead now :) [17:40:16] was going to be a meeting, and then you were good manager and asked ‘can it be an email thread instead’ :D [17:40:16] re ops support [17:40:22] sure! :) [17:40:45] s:beta/staging:deployment tooling: basically [17:46:58] (03PS1) 10Legoktm: Make mwext-WikivoteMapsYandex-testextension.* non-voting, requires composer [integration/config] - 10https://gerrit.wikimedia.org/r/197674 [17:47:46] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster - https://phabricator.wikimedia.org/T91102#1129460 (10mobrovac) For hooking up the extension, the pertinent files are [InitialiseSettings-labs.php](https://github.com/wikimedia/operations-mediawiki-config/blob/mast... [17:49:08] mobrovac: you can test patches by cherry-picking them on to the deployment-salt puppetmaster! [17:49:22] mobrovac: ssh to deployment-salt.eqiad.wmflabs, /var/lib/git/operations/puppet, and cherrypick your patch :) [17:49:32] ah right [17:49:35] YuviPanda: thnx [17:49:39] (03PS1) 10Legoktm: Make mwext-Cargo-jslint non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/197675 [17:49:52] mobrovac: yw :) so usually you just test it there, and tweak it till it works, and then merge... [17:50:09] (03CR) 10Legoktm: [C: 032] Make mwext-WikivoteMapsYandex-testextension.* non-voting, requires composer [integration/config] - 10https://gerrit.wikimedia.org/r/197674 (owner: 10Legoktm) [17:50:14] (03CR) 10Legoktm: [C: 032] Make mwext-Cargo-jslint non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/197675 (owner: 10Legoktm) [17:50:17] YuviPanda: you mean send it for review :P [17:50:32] potato, box-of-lizards. same thing. [17:50:48] :P [17:51:19] (03Merged) 10jenkins-bot: Make mwext-WikivoteMapsYandex-testextension.* non-voting, requires composer [integration/config] - 10https://gerrit.wikimedia.org/r/197674 (owner: 10Legoktm) [17:51:35] (03Merged) 10jenkins-bot: Make mwext-Cargo-jslint non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/197675 (owner: 10Legoktm) [17:52:22] chrismcmahon: do you know the status of https://phabricator.wikimedia.org/T84956 ? [17:52:25] !log deployed https://gerrit.wikimedia.org/r/197674 and https://gerrit.wikimedia.org/r/197675 [17:52:28] Logged the message, Master [17:52:33] chrismcmahon: SoS status [17:53:07] i don't remember anyone taking ownership in our weekly meeting :) [17:54:58] (03CR) 1020after4: [C: 032] Update Wikidata branch to wmf/1.25wmf22 [tools/release] - 10https://gerrit.wikimedia.org/r/197633 (owner: 10Aude) [17:55:04] (03Merged) 10jenkins-bot: Update Wikidata branch to wmf/1.25wmf22 [tools/release] - 10https://gerrit.wikimedia.org/r/197633 (owner: 10Aude) [17:55:30] 10Continuous-Integration, 6Collaboration-Team, 10Flow, 7Documentation: Generate Doxygen documentation for Flow PHP classes to doc.wikimedia.org - https://phabricator.wikimedia.org/T93107#1129505 (10Mattflaschen) [17:58:59] greg-g: > s:beta/staging:deployment tooling: basically [17:59:10] were you going to say something more after the ‘basically’? [17:59:21] oooooh [17:59:25] nevermind... [17:59:25] no [17:59:27] :) [17:59:28] my vim isn’t that strong [17:59:32] :) [17:59:35] I just saw what you did there [18:00:10] I should have commented out the "basically" [18:00:34] right [18:01:17] greg-g: either way, we should also do a ‘so how is beta now vs how it was before start of quarter’, and see what actually has improved... [18:02:41] * greg-g nods [18:03:05] I'm in back-to-back meetings for the next 3.5 hours (no lunch for greg today), will get to it this afternoon hopefully [18:03:45] greg-g: <3 cool [18:07:51] zeljkof: if you want to do a 30 min pairing session tomorrow before your 1:1 i could probably swing an 8am [18:08:13] marxarelli: sold! :) [18:08:27] marxarelli: moving the meeting [18:08:43] zeljkof: cool. let's do 9am for our regular session though [18:09:23] marxarelli: sure, [18:09:45] marxarelli: things get complicated when europe moves to daylight savings time [18:09:51] rad. accepted [18:09:56] we will adjust then [18:10:50] marxarelli: moved the meeting next week to 9:30am your time, I have 1:1 with greg-g at 9am [18:10:53] oh, right. that will make it later for you (assuming you're "Spring-ing" forward) [18:11:40] marxarelli: let's move the event to 9:30am your time from next week on, I will figure out what to do when the actual time change happens [18:12:58] marxarelli: sorry for e-mail spam, I am moving the meeting around, made some mistakes, fixing... :( [18:13:44] zeljkof: sounds good. if 9:30 PDT becomes too late for you, i'll give 8am a shot [18:14:00] marxarelli: great, let's see how it goes [18:14:06] I think I have fixed all times now [18:14:44] 10Deployment-Systems, 10Incident-20150312-whitespace, 6MediaWiki-Core-Team, 5Patch-For-Review: scap's check_php_syntax() should check for text before '3Resolved [18:15:14] marxarelli: my understanding of that Sentry ticket is that it is marked blocked on RelEng only because RelEng is very interesting in it's current status [18:15:25] interested [18:16:19] chrismcmahon: alrighty [18:16:40] (03CR) 10Zfilipin: [C: 031] Beta timing out jobs now abort + 45 mins for db update [integration/config] - 10https://gerrit.wikimedia.org/r/197226 (https://phabricator.wikimedia.org/T92906) (owner: 10Greg Grossmeier) [18:26:27] 10Deployment-Systems, 6MediaWiki-Core-Team, 5Patch-For-Review: Can't update l10n cache - https://phabricator.wikimedia.org/T92900#1129666 (10bd808) 5Open>3Resolved >>! In T92900#1128921, @greg wrote: > I presume this is done/fixed now? Yup. The underlying problem was fixed with https://gerrit.wikimedia.... [18:30:37] chrismcmahon: RelEng (well, greg-g) offered to do the instrumentation part of Sentry [18:30:59] not sure if that was official or just a "we'll see if we can help" thing [18:32:21] tgr: it's on the list for next quarter support, as in, will need to be prioritized during the quarterly review stuff [18:33:41] tgr: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/201415Q4#.22New.22_Ideas [18:54:24] what instrumentation needs to be done for sentry? [18:56:10] twentyafterfour: it needs to be transformed from a pip package with lots of dependencies into something acceptable for ops (trebuchet or debian package), it needs a puppet module, plus some sort of monitoring I assume [18:56:37] ah [18:58:11] tgr: yeah that's quite a bit of work, especially the debian packaging part. And that part most likely will require assistance from ops - getting a very simple package accepted by ops was a long and slightly painful process the one time I actually worked on such a task. [18:59:30] proper packaging is probably not going to happen as its a lot of effort and sentry upstream has become quite fast recently and dependencies change every month [18:59:59] eek [19:00:00] so with debian's separate package for every dependency it would be a neverending work [19:00:38] so it's either trebuchet, or just running virtualenv and packaging the resulting directory as a .deb [19:00:39] we don't need "real" Debian Guidelines Compliant packages for internal use-only, but that's still hard [19:00:50] just ask antoine how he did Zuul (tons of pythong deps) [19:00:58] -g [19:01:07] when I asked on the ops list, trebuchet seemed more popular [19:01:37] I think we should push for virtualenvs here. [19:01:43] and have a proper way to do security updates there [19:02:01] ‘fake’ debs that aren’t security maintained don’t seem very fun. [19:04:02] whether it's .deb or trebuchet or something else, the workflow would be fairly similar - have a packaging server which runs pip update in a virtualenv and exposes the changes for some sort of security review, then freeze the result and deploy it to the production machines [19:04:51] this is a really good use case for container-based deployment tooling [19:11:28] (03PS1) 10Krinkle: mw-setup: Move LocalSettings.php from mw-install-sqlite to mw-setup [integration/jenkins] - 10https://gerrit.wikimedia.org/r/197710 [19:12:19] (03PS2) 10Krinkle: mw-setup: Move LocalSettings.php from mw-install-sqlite to mw-setup [integration/jenkins] - 10https://gerrit.wikimedia.org/r/197710 [19:12:25] (03PS3) 10Krinkle: mw-setup: Move LocalSettings.php from mw-install-sqlite to mw-setup [integration/jenkins] - 10https://gerrit.wikimedia.org/r/197710 (https://phabricator.wikimedia.org/T57788) [19:12:31] (03CR) 10Krinkle: [C: 032] mw-setup: Move LocalSettings.php from mw-install-sqlite to mw-setup [integration/jenkins] - 10https://gerrit.wikimedia.org/r/197710 (https://phabricator.wikimedia.org/T57788) (owner: 10Krinkle) [19:12:39] twentyafterfour++ [19:13:34] (03Merged) 10jenkins-bot: mw-setup: Move LocalSettings.php from mw-install-sqlite to mw-setup [integration/jenkins] - 10https://gerrit.wikimedia.org/r/197710 (https://phabricator.wikimedia.org/T57788) (owner: 10Krinkle) [19:14:30] why does mediawiki-phpunit-zend only run for gate and submit on core patches? [19:17:16] because it's slowwwwww [19:17:19] awww [19:17:36] * aude is resorting now to pushing my patches to github so travis can test it [19:17:42] :/ [19:17:55] because there are php 5.3 specific things that tend to bite in tests [19:51:07] legoktm: We could add a 'check zend' thingy [19:51:27] that sounds like a good idea to me [19:54:10] tgr: good morning :) [19:57:02] Krinkle: legoktm would be helpful [19:57:14] not always does one need zend, i suppose [19:57:50] hashar: working on mysql now [19:58:15] hashar: I missed the RelEng meeting yesterday as I just got home from flight from Amsterdam>London the night before. [19:58:25] hashar: What's crackin' these weeks? [19:59:18] (03PS23) 10JanZerebecki: Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [19:59:21] (03CR) 10jenkins-bot: [V: 04-1] Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [20:01:56] (03PS24) 10JanZerebecki: Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [20:02:09] RECOVERY - Puppet failure on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:02:36] twentyafterfour: If you ever want to know which rsync hosts are taking the most traffic, here's some shell magic to find out from the logs on fluorine: cat /a/mw-log/scap.log | python ~bd808/scaplog.py |grep 'Copying to' | awk '{print $9}' | sort | uniq -c [20:02:38] RECOVERY - Puppet failure on deployment-fluoride is OK: OK: Less than 1.00% above the threshold [0.0] [20:05:38] (03CR) 10JanZerebecki: "PS23: removed triggering the non-voting extension" [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [20:06:30] tgr: I have replied to your sentry dependencies madness by email :) You might want to use git-deploy though [20:06:47] (03CR) 10jenkins-bot: [V: 04-1] Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [20:06:50] tgr: and in short, Ops aren't fan of using .deb packages for deployment. [20:07:09] Krinkle: chris is leaving :/ [20:07:27] Krinkle: the new staging cluster is going on though it is a bit of a challenge to setup all the services [20:07:39] Krinkle: I have praised your and Kunal work over the last two weeks [20:07:39] what, no! [20:08:06] Krinkle: and I have packaged Zuul / requested hardware for CI isolation \o/ [20:09:09] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1130056 (10bd808) [20:10:37] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096558 (10bd808) [20:12:37] hashar: Alrighty [20:12:41] Yeah, I heard about Chris. [20:13:05] QA will be different without him. [20:13:24] Or rather, it won't be. He made sure of that :-) [20:15:57] (03CR) 10Hashar: "Let see what Timo think about python :-)" [integration/config] - 10https://gerrit.wikimedia.org/r/197649 (owner: 10Hashar) [20:16:00] grrrr [20:16:04] s/think/thinks/ [20:16:23] we need a review process when commenting or sending emails [20:16:29] hashar: I think I know how you feel when you read some of my javacscript code. [20:16:36] I realise now [20:16:39] lol [20:16:40] Magic! [20:17:06] one difference is that you will probaqbly have a clue at what the python code is doing [20:17:20] while on my side it will take me an hour figuring out that nodejs is async / using callbacks [20:17:44] Yippee, build fixed! [20:17:44] Project browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #547: FIXED in 47 min: https://integration.wikimedia.org/ci/job/browsertests-UploadWizard-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/547/ [20:17:50] good bug wmf-insecte ! [20:19:55] (03PS25) 10JanZerebecki: Fix WikibaseJavaScriptApi tests [integration/config] - 10https://gerrit.wikimedia.org/r/180418 (https://phabricator.wikimedia.org/T86176) (owner: 10Adrian Lang) [20:21:56] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1130115 (10bd808) [20:23:38] hashar: I looked at dh-virtualenv some time ago, but it seemed to make a deb package which creates a virtualenv/pulls deps when it is installed, not when it is created [20:23:53] which seems quite pointless from a security point of view [20:24:02] might have misread the code though [20:26:58] thanks Krinkle that means a lot to me [20:29:52] 10Continuous-Integration, 10MediaWiki-Codesniffer, 10Possible-Tech-Projects, 3Google-Summer-of-Code-2015, 3Outreachy-Round-10: Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T89682#1130134 (10Platonides) Don't forget about check-vars either. [20:31:27] tgr: it is on creation :) [20:32:14] tgr: I have just added a bunch of python- as dependencies, then when building the packages dh-virtualenv can be made to use virtualenv --use-system-package (or something alike) [20:32:20] and grab all the rest from pypi [20:32:32] (with pbuilder you need to set the env variable USENETWORK=yes though) [20:33:19] tgr: I have some version of the package at http://people.wikimedia.org/~hashar/debs/zuul/ with the build log ( http://people.wikimedia.org/~hashar/debs/zuul/zuul_2.0.0-304-g685ca22-wmf1_amd64.build ) [20:36:33] hashar: that sounds great [20:36:55] how does that work for pip packages which need to be built? [20:40:22] tgr: pip compiles them [20:40:36] might ends up requiring a Depends: python-dev [20:40:38] or something alike [20:41:03] for Zuul I just needed plain python modules (statsd / six / babel maybe a few others) [20:41:24] I talked about it on the ops list a few weeks ago [20:41:35] it hasn't been stamped by ops yet. Gotta review it with filipo [20:42:07] thcipriani: github invite sent [20:42:08] yeah, but you need to gather source files when the deb package is created but you need to compile them when it is installed to make sure you use the right architecture, libraries, whatnot [20:42:26] bd808: yup, just got it, thanks [20:42:32] and pip does not have separate commands for those steps [20:42:35] or does it? [20:48:27] tgr: that is done when creating the package. [20:48:46] tgr: I have build mine using arch=amd64 in a chroot [20:49:45] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #529: FAILURE in 7 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/529/ [20:54:26] 10Beta-Cluster, 10Continuous-Integration, 10Math: beta-recompile-math-texvc-eqiad job fails with "/usr/local/bin/scap-recompile: No such file or directory" - https://phabricator.wikimedia.org/T91191#1130233 (10Physikerwelt) texvc is not changed frequently and hopefully will disappear soon However, the jenkin... [21:00:31] hashar: thanks! I'll give dh-virtualenv a try [21:18:21] Yippee, build fixed! [21:18:22] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #523: FIXED in 15 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/523/ [21:21:44] hashar: is there a reason zend extension jobs are pinned to production slaves and not labs? [21:22:07] yeah terrible slowness iirc [21:22:29] and labs instance are/were missing the tmpfs to speed up sqlite [21:23:21] should be good now :) [21:23:41] maybe migrate a small extension to labs and see what happens? [21:23:51] then a high traffic extension and finally the rest [21:23:58] legoktm: it should be all possible nowadays :) [21:25:39] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n, 5Patch-For-Review: the message Helppage-top-gethelp doesn't appear deployed to the Hebrew Wikipedia - https://phabricator.wikimedia.org/T92823#1130296 (10mmodell) So is this one resolved now? The patch has been submitted and today's de... [21:29:02] 10Beta-Cluster, 10RESTBase, 5Patch-For-Review: Update / maintain Beta Cluster restbase cluster - https://phabricator.wikimedia.org/T91102#1130299 (10hashar) > Figure out a way to update code in the beta cluster so that we can test stuff before releasing it into production For parsoid we made the parsoid ins... [21:30:18] ok, after meetings since 9am, I'm going to take a "lunch break" [21:30:19] bbiab [21:34:52] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #373: FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/373/ [21:44:33] 10Continuous-Integration: Migrate .*testextension-zend jobs to labs slaves - https://phabricator.wikimedia.org/T93143#1130352 (10Legoktm) 3NEW [22:07:31] 10Continuous-Integration: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown - https://phabricator.wikimedia.org/T50002#1130418 (10AndyRussG) [22:11:38] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n, 5Patch-For-Review: the message Helppage-top-gethelp doesn't appear deployed to the Hebrew Wikipedia - https://phabricator.wikimedia.org/T92823#1130423 (10bd808) >>! In T92823#1130296, @mmodell wrote: > So is this one resolved now? The... [22:15:49] 6Release-Engineering, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1130446 (10ssastry) [22:18:11] twentyafterfour: I've prepared the cherry-picks for l10nupdate in the wmf21 branch. Should I go ahead and merge and prep a submodule bump for swat? [22:18:40] bd808: sure [22:19:00] * bd808 will try to remember how to do all the parts [22:19:45] bd808: is this a sign that I need to automate more parts? ;) [22:19:57] yes please [22:20:11] or jsut how to kill submodules with fire [22:20:16] yes please ;) [22:22:43] submodules are slightly less painful since I learned about this: [22:22:45] [status] [22:22:47] submoduleSummary = true [22:23:09] ^ .gitconfig entry causes git status to show submodule summary [22:23:18] oooh [22:23:42] but it also causes git status to be slower [22:28:17] oh [22:28:25] bd808: I just have a script to do submodule updates [22:29:44] git submodule update --init --recursive && cd $SOMETHING && get fetch && get rebase origin/$BRANCH && .... [22:29:48] bd808: http://fpaste.org/199822/14267177/raw/ courtesy of Reedy :) [22:29:49] legoktm: is it something that could be generally helpful? if so, maybe include it in https://github.com/wikimedia/mediawiki-tools-release [22:30:10] possibly [22:30:24] That's a full HEAD reset [22:30:35] the nuclear option [22:30:46] that could potentially wipe out current security patches then? [22:31:09] security patches will get lost on tin any way [22:31:15] they aren't in gerrit [22:31:17] err, this is only for local usage [22:31:25] and I don't keep security patches locally... [22:32:11] I have different stuff for deploying the update on tin :P [22:35:37] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n, 5Patch-For-Review: the message Helppage-top-gethelp doesn't appear deployed to the Hebrew Wikipedia - https://phabricator.wikimedia.org/T92823#1130483 (10bd808) Backport to 1.25wmf21 is queued up for the [[https://wikitech.wikimedia.or... [22:51:48] 10Continuous-Integration, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, 3Fundraising Sprint Flaming Lips, 5Patch-For-Review: Write Jenkins job builder definition for CiviCRM CI job - https://phabricator.wikimedia.org/T91895#1130535 (10atgo) a:5awight>3hashar [22:57:30] 10Continuous-Integration, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, 3Fundraising Sprint Flaming Lips, and 2 others: Write Jenkins job builder definition for CiviCRM CI job - https://phabricator.wikimedia.org/T91895#1130568 (10Ejegg) [23:06:42] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1130608 (10greg) 3NEW [23:12:07] 6Release-Engineering, 10Ops-Access-Requests, 6Phabricator, 6operations: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1130638 (10greg) {F100025}