[00:15:21] Yippee, build fixed! [00:15:22] Project beta-scap-eqiad build #44078: FIXED in 21 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44078/ [00:48:33] (03PS4) 10Legoktm: Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) [00:50:39] legoktm: What does --ansi provide again? [00:50:59] --ansi Force ANSI output. [00:51:03] so helpful :p [00:51:05] Yeah, useful [00:51:38] I think it's colors [00:52:09] yeah, it forces colors [00:52:53] (03PS5) 10Legoktm: Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) [00:53:02] forgot to update the zuul config in ps4 [00:54:15] (03CR) 10Krinkle: [C: 031] Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [00:54:15] :) [00:54:20] Looks great [01:02:07] (03PS1) 10AndyRussG: CentralNotice: enable qunit-karma job [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) [01:07:10] (03PS2) 10AndyRussG: CentralNotice: enable qunit-karma job [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) [01:07:56] (03CR) 10Krinkle: [C: 031] CentralNotice: enable qunit-karma job [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [01:08:06] (03CR) 10Awight: [C: 031] "Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [01:31:10] 10Beta-Cluster, 10MediaWiki-extensions-Sentry, 6Multimedia, 10Wikimedia-Logstash: Channel PHP errors from Logstash to Sentry on the beta cluster - https://phabricator.wikimedia.org/T85239#1094548 (10Tgr) [01:31:56] 10Beta-Cluster, 10MediaWiki-extensions-Sentry, 6Multimedia, 10Wikimedia-Logstash: Channel PHP errors from Logstash to Sentry on the beta cluster - https://phabricator.wikimedia.org/T85239#942133 (10Tgr) [01:31:57] 7Blocked-on-RelEng, 6Multimedia, 6Scrum-of-Scrums, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1094551 (10Tgr) [01:34:23] 7Blocked-on-RelEng, 6Multimedia, 6Scrum-of-Scrums, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1094573 (10Tgr) [01:34:24] 10Beta-Cluster, 10MediaWiki-extensions-Sentry, 6Multimedia, 10Wikimedia-Logstash: Channel PHP errors from Logstash to Sentry on the beta cluster - https://phabricator.wikimedia.org/T85239#1094572 (10Tgr) [01:35:39] (03PS6) 10Legoktm: Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) [01:42:58] 6Release-Engineering, 6Team-Practices: Make log responsibilities changes - https://phabricator.wikimedia.org/T89049#1094627 (10Tgr) [01:44:12] (03CR) 10Legoktm: [C: 032] Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [01:50:37] (03Merged) 10jenkins-bot: Create generic "php-composer-test" job [integration/config] - 10https://gerrit.wikimedia.org/r/194461 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [01:52:20] !log deployed https://gerrit.wikimedia.org/r/194461 [01:52:28] Logged the message, Master [01:55:57] 6Release-Engineering: Investigate production and/or beta requirements for Sentry - https://phabricator.wikimedia.org/T89732#1094675 (10Tgr) [02:01:23] (03PS7) 10Legoktm: Use composer for ContentTranslation phplint and phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [02:02:45] (03CR) 10jenkins-bot: [V: 04-1] Use composer for ContentTranslation phplint and phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [02:13:23] (03PS8) 10Legoktm: Use composer for ContentTranslation phplint and phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [02:14:57] (03CR) 10Legoktm: [C: 032] Use composer for ContentTranslation phplint and phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [02:16:05] (03Merged) 10jenkins-bot: Use composer for ContentTranslation phplint and phpcs [integration/config] - 10https://gerrit.wikimedia.org/r/194340 (https://phabricator.wikimedia.org/T90943) (owner: 10Amire80) [02:21:25] !log deployed https://gerrit.wikimedia.org/r/194340 [02:21:30] Logged the message, Master [02:41:42] Krinkle: I just saw https://github.com/wmde/Diff/pull/37#issuecomment-77492123, is there a difference between composer run-script test versus composer test? [02:41:57] legoktm: One is supported and documented, the other is not. [02:42:09] unlike vendor/bin being in the path, that might stop working anyday [02:42:13] :D [02:42:45] heh [02:43:46] Also, I actually just updated composer on the slaves... https://gerrit.wikimedia.org/r/#/c/194364/ so it appears to be a pretty recent regression [02:45:45] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1094719 (10Legoktm) [03:00:12] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<37.50%) [03:04:47] (03PS1) 10Legoktm: Add experimental php-composer-test jobs for all voting phpcs extensions [integration/config] - 10https://gerrit.wikimedia.org/r/194803 (https://phabricator.wikimedia.org/T90943) [03:06:13] (03CR) 10Legoktm: [C: 032] Add experimental php-composer-test jobs for all voting phpcs extensions [integration/config] - 10https://gerrit.wikimedia.org/r/194803 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [03:07:16] (03Merged) 10jenkins-bot: Add experimental php-composer-test jobs for all voting phpcs extensions [integration/config] - 10https://gerrit.wikimedia.org/r/194803 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [03:55:23] Yippee, build fixed! [03:55:23] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #407: FIXED in 9 min 16 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/407/ [04:49:29] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #555: STILL FAILING in 18 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/555/ [05:01:57] (03PS1) 10Krinkle: Consolidate '{name}-jsduck' jobs [integration/config] - 10https://gerrit.wikimedia.org/r/194818 [05:04:20] 10Quality-Assurance, 7I18n: Add Sikuli to the machines that run browser tests - https://phabricator.wikimedia.org/T56393#1094819 (10Mattflaschen) >>! In T56393#1087507, @zeljkofilipin wrote: > @Mattflaschen the last time I have checked sikuli web page, it explicitly said running headless is not supported. Mayb... [05:07:17] 10Quality-Assurance, 7I18n: Add Sikuli to the machines that run browser tests - https://phabricator.wikimedia.org/T56393#1094825 (10Mattflaschen) You can also find other people discussing using Sikuli with xvfb, e.g. https://github.com/jatalahd/SikuliRobotLibrary/wiki/Hints-and-Tips [05:08:25] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #506: FAILURE in 26 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/506/ [05:14:19] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #589: STILL FAILING in 46 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/589/ [05:51:11] PROBLEM - Puppet staleness on deployment-zotero01 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [43200.0] [05:54:38] Project beta-scap-eqiad build #44112: FAILURE in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44112/ [06:14:52] Yippee, build fixed! [06:14:53] Project beta-scap-eqiad build #44114: FIXED in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44114/ [06:16:36] (03CR) 10Krinkle: [C: 032] "Deployed new 'jsduck' job." [integration/config] - 10https://gerrit.wikimedia.org/r/194818 (owner: 10Krinkle) [06:22:56] (03Merged) 10jenkins-bot: Consolidate '{name}-jsduck' jobs [integration/config] - 10https://gerrit.wikimedia.org/r/194818 (owner: 10Krinkle) [06:23:16] 10Beta-Cluster: Convert puppetmaster sync cronjob to Jenkins job - https://phabricator.wikimedia.org/T73305#1094908 (10yuvipanda) So we have alerts for long lived cherry-picks now, but *not* alerts for rebase conflicts. However, if there are no long lived cherry-picks, there shouldn't be any rebase conflicts eit... [06:29:34] !log Re-creating integration-slave1401 - integration-slave1404 [06:29:40] Logged the message, Master [06:35:10] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [09:17:30] !log Jenkins: upgrading and restarting. Wish me luck. [09:17:34] Logged the message, Master [09:22:29] PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [09:23:38] RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [09:58:19] hasharConf: looks like my network does not like me today :( [09:59:09] hasharConf: see you on monday, I can not connect to hangout any more [10:01:25] yeah i guess so :) [10:01:32] we can look at setting up a couple slaves [11:25:51] in about one hour I will start work on updating salt in deployment-prep from 2014.1.10 to 2014.7.1. it won't be instant, I need to see what instances are broken, what instances have salt issues now, etc etc. [11:26:22] during the actual upgrade there may be interruptions to salt and/or trebuchet if you were planning to use them [11:26:42] I will warn just before starting the actual upgrade and will notify when it is completed as well. [11:28:09] (getting a later start than I expected because I ran into an obstaacle at the last minute, which was probably pebkac in the end) [11:30:31] apergos: wheeeee :) [11:31:57] yeah [11:32:03] we'll see how much more funner this gets [11:32:03] Tobi_WMDE_SW_NA: are we having the meeting today? [12:14:46] Project beta-scap-eqiad build #44146: FAILURE in 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44146/ [12:20:48] (03CR) 10Zfilipin: [C: 04-1] "Looks good to me in general. Voting -1 because of a few minor things. See my inline comments." (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/193556 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [12:22:59] (03CR) 10Zfilipin: CentralNotice browser tests: even more platforms and browsers (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/193556 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [12:34:53] Yippee, build fixed! [12:34:53] Project beta-scap-eqiad build #44148: FIXED in 53 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44148/ [12:41:05] anyone know who owns deployment-apertium01? it's unresponsive to ssh or salt, but graphs show it up [12:41:25] very tempted to reboot it but I don't want to break anything [12:41:40] (need it to be respnsonsive for the salt upgrade) [12:44:58] apergos: just reboot it... [12:45:05] apergos: alex should know what’s up with that [12:45:07] if there’s anything up with it [12:45:11] actually I found out that yeah it's alex [12:45:16] so I'll check w/ hijm, he's around now [12:49:52] going to reboot that instance now. [13:24:07] ok kids, I'm starting the actual upgrade now. [13:24:28] I expect it to take 1/2 hour so allowing 1.5 hours. [13:58:46] YuviPanda: I see you have syndic running on deployment-salt [13:58:54] please fill me in, as I'm about to upgrade [13:59:03] apergos: oh, iI just installed it to see if the package installs. disregard / purg.e.. [13:59:07] hahahaha [13:59:07] sorry I should’ve uninstalled... [13:59:14] no, I'll let it upgrae if it wants [13:59:16] :-D [14:00:09] apergos: +1 [14:12:51] * ^demon|away yawns [14:17:43] <^d> apergos: how's the salt upgrade going on deployment-*? [14:18:14] YuviPanda: it claims it's removed and not installed, but I can't install over it (syndic) nor remove it [14:18:19] so heh I'm leaving it to you... [14:18:21] :-P [14:18:37] ^d: I have the master done, the clients are still talking to it usin their old versions... [14:18:49] about to apt-get update on them and then do them all [14:19:07] all instances are responsive right now so we will get them all done one way or another [14:19:39] <^d> Sounds good (and under control) [14:20:05] <^d> I'm going to step away for a few and finish waking up (read: coffee) [14:40:08] got a couple hosts with broken packages which break apt-get so I'm fixin those first [14:40:17] forgot how tedious this is [14:50:47] zeljkof_: have you tried updating to selenium-webdriver 2.45? seems to break page-object for me. [14:50:57] chrismcmahon: worked for me [14:51:31] zeljkof_: after update I get 'uninitialized constant ArticlePage::PageObject (NameError)' trying to run a test for MobileFrontend [14:52:27] from cucumber looks like [14:53:46] chrismcmahon: you have probably also updated mediawiki_selenium to 1.0 :) [14:53:51] that is why everything breaks [14:54:03] ah, could be [14:54:12] let me see about that... [14:54:14] revert [14:54:24] anyone know why a puppetmaster-self host would have an installed (well partially installed) pupetmaster that fails to start up? [14:54:25] and do bundle update selenium-webdriver [14:54:40] it's puppet 3.4.3 or so, ruby 1.9.something [14:54:53] thi is one of the two broken instances [14:55:03] apergos: I have upgrade the puppetmaster for the 'integration' project fairly recently [14:55:08] went from Precise to Trusty [14:55:16] and it worked fine [14:55:21] this is deployment-mx [14:55:39] apergos: it used to be its own puppetmaster [14:55:40] it's running trusty [14:55:49] apergos: and then it was bought back into the family [14:55:50] ah well the role is still set as puppetmaster-self [14:55:55] when I look at the config [14:56:07] apergos: all instances have that set, no? [14:56:21] and have a puppetmaster variable set as well so they hit the project puppetmaster instead of labs... [14:56:57] well it doesn't have the variable is the thing [14:56:59] jsut the class [14:57:05] maybe I should add that [14:57:15] really? [14:57:16] andit would 'just work'? /me says hopefully [14:57:18] I remember I set the variable... [14:57:20] yeah really, have a look [14:57:21] and it worked... [14:57:23] oh wow. [14:58:33] so do you have a minute to poke at that and make it jhappy [14:58:44] ? [14:58:59] I'm gonna look at deployment-apertium01 now, that's the other problem child [14:59:56] apergos: yeah, am looking now [15:00:02] thanks a lot [15:00:04] at -mx [15:01:37] apergos: interesting [15:01:40] [agent] [15:01:40] server = deployment-salt.eqiad.wmflabs [15:01:46] hahaha [15:01:46] so it is hitting deployment-salt... [15:01:55] so someone accidentally cleared the var? [15:02:01] maybe so [15:02:08] I’ve set it again [15:02:21] and also it looks like the local puppetmaster package should somehow get removed, maybe that would get it [15:02:34] yeah, let me do that [15:04:38] apergos: interesting again. I set the variable, and now it wants a puppet sign... [15:04:39] * YuviPanda gives it [15:04:55] maybe it wasn’t hitting deployment-salt, but the autoupdate made it seem like it was... [15:05:12] ewww [15:05:16] anyway... [15:05:21] it’s hitting deployment-salt now [15:05:24] ok [15:05:41] lemme see if that packge is still lying around there, obviously it needs to get tossed [15:05:59] apergos: I tossed puppetmaster and -common package [15:06:08] ah yay [15:06:14] ok let's see if my apt-get update works now [15:06:18] cool [15:08:48] <^d> Gah scrollback [15:11:06] yay both fixed up [15:11:58] apergos: \o/ [15:12:00] * YuviPanda goes for dinner [15:13:06] ok still haven't actually done the upgrade on the cients :-D making sure we have no extra salt minion daemons running... [15:13:12] getting closer :-D [15:24:47] did one client, looked great, and now about to do them all [15:25:33] boom [15:25:48] wait a little while for em to go around, wait a while for salt miions to recover [15:30:13] one bad upgrade, the rest are great [15:30:21] 081 has some issue, checking itnow [15:40:51] ok fixed up, they are all updated [15:42:01] YuviPanda: can I leave the syndic package cleanup to you? and also note that if you try to reinstall it before I get the prod cluster upgraded, you will have to add the salt ppa and the pinning on deployment-salt yourself [15:42:07] I'm about to clean those up [15:43:27] setup staging-db1 yesterday to test out staging-palladium—everything seems to work as expected on the staging-palladium side. There's still a fair amount to be done to adapt coredb_mysql to run in staging and production. [15:44:47] ^d: YuviPanda is there a plan for setting up the rest of the machines for staging? divide and conquer? [15:45:14] <^d> Yeah as long as the puppet/saltmaster is done divide and conquer [15:46:34] I'm of the opinion it's done. Lots of roles will end up being patched on it like coredb_mysql is now—test stuff before sending it to gerrit. [15:47:07] ^d: would you be willing to take a look at staging-palladium? See if there's anything I'm missing? [15:48:02] <^d> I really don't know what I'd be looking for [15:49:18] is puppetmaster setup more YuviPanda's thing? [16:03:30] <^d> thcipriani: Yeah, or an ops. I really dunno myself [16:03:53] <^d> If I see problems setting up another host I can help debug, but as far as just giving it a look, I really wouldn't know what's right/wrong. [16:14:36] 10Continuous-Integration: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown - https://phabricator.wikimedia.org/T50002#1095658 (10Krinkle) [16:14:37] 10Continuous-Integration, 10MediaWiki-ResourceLoader, 10MediaWiki-Unit-tests, 5Patch-For-Review: Fix "DatabaseSqlite::replace/single-row NOT NULL constraint failed" for md_module table - https://phabricator.wikimedia.org/T91567#1095657 (10Krinkle) 5Open>3Resolved [16:27:08] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1095675 (10Jdforrester-WMF) [16:27:40] !log Delete integration-slave1005 [16:27:47] Logged the message, Master [16:32:00] 10Continuous-Integration: hhvm Jenkins job fill up /tmp with perf-*.map files - https://phabricator.wikimedia.org/T64788#1095691 (10Krinkle) 5Resolved>3Open Looks like this is back. Over 2000 files in `/tmp` on integration-slave1008. Oldest one is from Feb 26 04:58. [16:32:01] 10Continuous-Integration: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1095693 (10Krinkle) [16:38:09] 10Continuous-Integration: Jenkins: Figure out long term solution for /tmp management - https://phabricator.wikimedia.org/T74011#1095712 (10Krinkle) [16:38:10] 10Continuous-Integration: /tmp/MWDocGen-* files are left behind on Jenkins slaves - https://phabricator.wikimedia.org/T84973#1095709 (10Krinkle) 5stalled>3Resolved {19dd605eceb365dd0d6b288f9589601cc5b4cea9} [16:40:44] !log Jenkins auto-depooled integration-slave1008 due to low /tmp space. Purged /tmp/npm-* to bring back up. [16:40:48] Logged the message, Master [16:46:12] RECOVERY - Puppet staleness on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:48:13] (03PS3) 10AndyRussG: CentralNotice: enable qunit-karma job [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) [16:49:15] (03CR) 10Awight: [C: 031] CentralNotice: enable qunit-karma job [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [16:51:05] Krinkle: Did you finish the CI instance migration project? I'd like to create the CiviCRM CI instance, if it won't cause quota issues? [16:51:51] awight: I did not. There's still bug remaining that will likely affect your instance as well [16:52:05] awight: Precise or Trusty? [16:52:10] Either [16:52:19] You can create 1 trusty instance now. [16:52:24] ah ok, great! [16:52:26] I'm creating one as well. [16:52:45] I deleted the candidate ones for migration. Re-creating them instead. [16:52:56] https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup#integration-slaveXXXX [16:53:04] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1095774 (10Jdforrester-WMF) [16:53:04] That should contain everything you need. [16:53:22] You'll want to not give it 'contintLabsSlave' so that it won't get used for other jobs. [16:53:24] (03PS1) 10Legoktm: Use composer-test for MobileFrontend & VisualEditor phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194892 (https://phabricator.wikimedia.org/T90943) [16:53:24] thanks for the pointer :D [16:53:36] And probaly not register in Jenkins (last step) until you're ready. [16:53:57] I've reduced to setup to the mininum, so don't skip anything :D [16:54:13] (03CR) 10Jforrester: [C: 031] Use composer-test for MobileFrontend & VisualEditor phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194892 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [16:54:14] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #39: ABORTED in 12 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/39/ [16:55:02] (03CR) 10Legoktm: [C: 032] Use composer-test for MobileFrontend & VisualEditor phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194892 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [16:56:08] (03Merged) 10jenkins-bot: Use composer-test for MobileFrontend & VisualEditor phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194892 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [16:57:09] !log deployed https://gerrit.wikimedia.org/r/194892 [16:57:13] Logged the message, Master [16:58:45] Krinkle: I can't find my notes from our earlier conversation with hashar, but was the agreement that I use something like "integration-civicrm-ci-dev" ? [17:03:29] awight: So remind me, what is this instance for again? [17:03:40] the 'ci-' part is redundant [17:04:11] ^d: so here's the last thing going on, I of course cannot make trebuchet work for the test/testrepo, the fetch goes fine on the 3 live target hosts (4th target is deleted) but the checkout never runs on them [17:04:14] eyeroll [17:04:34] pretty much doubt that's a salt issue but still I should dig around and see what's wrong [17:04:36] groan [17:05:03] Krinkle: k. The idea is that I need to run a test suite under PHPUnit, but the framework is CiviCRM+Drupal, and it only supports a MySQL schema. Therefore, my provisioning scripts require root mysql access and stuff. Antoine had the good suggestion that I create an isolated ci instance which is only used for this CiviCRM testing. [17:05:54] 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1095838 (10Jdforrester-WMF) 3NEW [17:06:01] Yeah, for the moment that seems wise. We should be able to support this in general in the future, but for now it's better to do it one-off without puppet and just document it. Then once it works we can puppetise it [17:06:21] awight: Make it m1.medium in that case, no need for large as it won't have as many jobs as other slaves [17:06:25] w/o puppet? I was imagining I could just add the mysql role [17:06:28] It'll have plenty of headroom still [17:06:38] awight: I mean the additional things wouldn't be in puppet [17:06:43] 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1095848 (10Jdforrester-WMF) [17:06:47] Use our slave and mysql roles as starting point yes [17:06:48] ah, right. yeah it's a short script, but atrocious :) [17:07:14] It calls into a Byzantine thing to provision the dbs [17:07:27] 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1095851 (10Krenair) I have a feeling I've asked this before, but - do we have those messages from Phabricator on *all* repos... [17:10:28] 6Release-Engineering, 6Phabricator, 10Wikimedia-Git-or-Gerrit: Gerritbot shouldn't post "Change merged by jenkins-bot:" messages any more - https://phabricator.wikimedia.org/T91766#1095863 (10Ciencia_Al_Poder) >>! In T91766#1095851, @Krenair wrote: > I have a feeling I've asked this before, but - do we h... [17:11:46] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #41: STILL FAILING in 9 min 54 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/41/ [17:12:23] Should I be sad about these errors? https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=integration&instanceid=7234f8a4-0db5-4cbc-8dc8-50364f1162c2®ion=eqiad [17:12:48] [1;31mError: /Stage[main]/Base::Standard-packages/Package[ack-grep]/ensure: change from purged to latest failed: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install ack-grep' returned 100: Reading package lists... [17:16:52] maybe... what happens if you run the command from the instance? [17:17:00] without the -q [17:17:43] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Create CI slave instance for CiviCRM testing - https://phabricator.wikimedia.org/T89894#1095906 (10awight) Created a m1.small instance, and following these instructions: https://wi... [17:18:30] apergos: thx, let me try that... [17:28:17] Project browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce build #42: ABORTED in 5 min 40 sec: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/42/ [17:30:39] fwiw, I'm pretty sure all those returned 100: Reading package lists... is apt-get update not being guaranteed to run before packages installs. [17:40:43] 6Release-Engineering, 6operations, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1095974 (10bd808) >>! In T1387#1043064, @fgiunchedi wrote: > ping? @ori @bd808 there was a question in https://gerrit.wikimedia.org/r/#/c/18356... [17:41:16] apergos: I don't see how to become root, yet... asking in the labs channel. [17:41:29] 10Staging: Create staging-mc* (memcached) - https://phabricator.wikimedia.org/T91546#1095975 (10demon) a:3demon [17:42:53] oh. yeah I dunno if you have permissions there or not [17:43:55] 10Staging: Create staging cluster (tracking) - https://phabricator.wikimedia.org/T88702#1095982 (10greg) [17:45:54] ^d: just so you're aware with the staging-mc instance. The salt master and the puppetmaster is now autoset to staging-palladium. [17:46:34] <^d> I saw! [17:46:43] <^d> So you just have to accept now, no silly config [17:47:13] yessir [17:49:24] one tiny problem with that that I just realized today: the salt-minion won't start without removing /etc/salt/pki/minon/minion_master.pub. Thinks the master's public key has changed, rather than the master. [17:54:06] that's the config order ting [17:54:23] I have a fix for that in puppet which will go live monay I guess or Tues [17:55:35] grrrr I can run deploy.checkout from the command line on deployment-bastion and it works. but git sync fails to do the checkout. bleah [17:55:48] that really makes it not salt's fault but I still don't like it [18:01:45] Krinkle: Do you have advice on how to test my CI job in a development way? I think you were saying I should hold off on the jenkins-slave role until I've tested? Should I just clone my repos by hand, and make up numbers to export for JOB_ID, etc? [18:02:07] awight: Nah, create the slave but without any of the standard labels. [18:02:13] E.g. give it debug-awight as label [18:02:22] Then create your job and use label: debug-awight [18:02:54] debug-awight && integration-civicrm-dev if you're paranoid [18:03:31] great, that sounds good [18:06:05] Krinkle: when you say, give it a label, you're talking about the Jenkins management panel? [18:08:52] 10Staging: Create staging-mc* (memcached) - https://phabricator.wikimedia.org/T91546#1096063 (10demon) Blocked on [[ https://gerrit.wikimedia.org/r/#/c/194909/ | gerrit 194909 ]] currently. Can be worked around but ideally the role Just Works [18:10:14] Fwiw, there's a conflict between role::ci::slave::labs and role::labs-mysql-server, [18:10:17] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Package[mysql-server] is already declared in file /etc/puppet/modules/contint/manifests/packages.pp:65; cannot redeclare at /etc/puppet/modules/mysql/manifests/server/package.pp:21 on node i-00000917.eqiad.wmflabs [18:15:48] (03CR) 10AndyRussG: "I tried to run this on Jenkins manually. Not sure if I got the right parameters, but it failed:" [integration/config] - 10https://gerrit.wikimedia.org/r/194773 (https://phabricator.wikimedia.org/T86092) (owner: 10AndyRussG) [18:17:35] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Create and provision CI slave instance for CiviCRM testing - https://phabricator.wikimedia.org/T89894#1096112 (10awight) [18:22:56] <^d> !log staging: set has_ganglia to false in hiera [18:23:00] Logged the message, Master [18:23:01] <^d> thcipriani: ^^ [18:23:37] <^d> memcache 5629 0.0 0.0 319980 1028 ? Sl 18:22 0:00 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1 [18:23:38] <^d> \o/ [18:23:58] hawt diggity. That was quick. [18:24:34] 10Staging: Create staging-mc* (memcached) - https://phabricator.wikimedia.org/T91546#1096153 (10demon) >>! In T91546#1096063, @demon wrote: > Blocked on [[ https://gerrit.wikimedia.org/r/#/c/194909/ | gerrit 194909 ]] currently. Can be worked around but ideally the role Just Works I lied, just needed to turn of... [18:26:01] ^d: so I've been stuck on getting staging-db working can I show you where I'm stuck? [18:26:09] <^d> Yeah, what's up? [18:26:25] ^d: I’m verifying staging-palladium now [18:26:29] <^d> thx [18:26:40] thcipriani: ^d yeah, has_ganglia is nice. [18:26:49] I spent a couple of weeks a long time ago doing that... [18:27:11] <^d> Well now it's fixed in hiera for all instances in staging [18:27:12] <^d> :) [18:27:14] <^d> No more headache [18:27:21] ^d: :) Maybe I should set it labswide [18:27:26] k, so I started with the coredb_mysql class, but I think that's wrong since: https://github.com/wikimedia/operations-puppet/blob/production/modules/coredb_mysql/manifests/packages.pp#L20 [18:27:28] <^d> not a bad idea [18:28:02] <^d> Ah yes, so we'll either have to go trusty + mariadb or precise + coredb [18:28:10] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Create and provision CI slave instance for CiviCRM testing - https://phabricator.wikimedia.org/T89894#1096172 (10awight) [18:28:23] ^d: thcipriani poke springle? I think precise+coredb is going away... [18:28:29] in favor of trusty + mariadb [18:28:32] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Create and provision CI slave instance for CiviCRM testing - https://phabricator.wikimedia.org/T89894#1096173 (10awight) 5Open>3Resolved [18:28:33] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Deploy CiviCRM integration job to WMF integration server - https://phabricator.wikimedia.org/T86374#1096174 (10awight) [18:28:33] and he is migrating them one by one... [18:29:05] right, so I'm on trusty using mariadb::config with staging-db1. If you checkout hieradata/labs/staging/host/staging-db1.yaml on staging-palladium that's the config I'm using. [18:30:00] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Configure Jenkins to run CiviCRM builds on Fundraising CI slave instance - https://phabricator.wikimedia.org/T89895#1096178 (10awight) [18:30:18] 10Continuous-Integration, 6Release-Engineering, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, and 2 others: Configure Jenkins to run CiviCRM builds on Fundraising CI slave instance - https://phabricator.wikimedia.org/T89895#1048130 (10awight) [18:31:11] there were a couple of tweaks I had to make to the mariadb role—the most egregious of which is using ensure => 'versionnumber' to make sure that --force-yes gets appended to apt-get install; however, now I think there's something out of order. I can't get puppet to start mariadb and I have to do some tweaking to get it to acknowledge the sqldata folder. [18:31:47] long story short: I could use another set of eyes/brains [18:32:00] <^d> nitpick, but sqldata should probably be on project storage instead of instance storage [18:32:28] ^d: uh, you shouldn’t be putting sqldata on NFS... [18:32:32] (which is project storage) [18:32:39] <^d> ok, nvm ignore me [18:32:41] it should be on /srv or something [18:32:55] which is where I put it via hiera. [18:33:00] also what was the name of the mc host that was just created? [18:33:11] I just want to verify salt and puppet on palladium before moving on... [18:34:11] <^d> YuviPanda: mc1 is done basically. I just spun up mc2 and 3 but haven't config'd them yet [18:34:19] right [18:34:51] Project beta-scap-eqiad build #44185: FAILURE in 42 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44185/ [18:37:26] greg-g: thanks for bringing up Gather [18:39:05] so, I can get mariadb to start after running https://mariadb.com/kb/en/mariadb/mysql_install_db/ but I feel like there's probably something I'm missing in puppet. [18:41:42] thcipriani: file a bug, and poke sean? I don’t know if anyone else really knows about the state of our db puppet code... [18:42:29] will do. Thanks! [18:43:19] chrismcmahon: :) [18:44:59] Krinkle: How are we restricting jobs to nodes with a specific label? Looks like we don't have the Job Restrictions plugin installed. My job is https://integration.wikimedia.org/ci/job/wikimedia-fundraising-civicrm/configure and desired executor is https://integration.wikimedia.org/ci/computer/integration-civicrm-dev/ [18:45:48] awight: It's part of Jenkins core [18:45:55] See any other job. Every job uses labels [18:46:03] OK i see it now, sorry! [18:46:12] "Restrict where this project can be run" [18:46:34] Yup [18:47:01] awight: Every subshell starts in $WORKSPACE [18:47:25] cd /srv/ssd/jenkins-slave/workspace/wikimedia-fundraising-civicrm would also not be reliable since it can include @2 or @3 when run in parallel or if locked for some reason. [18:47:49] And on slaves that ssd doesn't exist. [18:47:53] That's only on gallium prod [18:48:10] yah, I'm making sure to stay decoupled by using the shell vars [18:49:08] thcipriani: ^d hmm, interesting. [18:49:13] root@staging-palladium:/home/yuvipanda# sudo salt '*' cmd.run hostname [18:49:15] i-0000091a.eqiad.wmflabs: [18:49:18] staging-db1 [18:49:19] and that’s it [18:49:21] and salt isn’t running [18:49:26] on deployment-mc01, for example [18:49:35] <^d> no redis yet? [18:49:50] yeah this is due to having to delete the old salt master public key [18:50:19] right [18:50:23] so that has to be done manually [18:50:34] along with accepting the key on the master... [18:50:38] 6Release-Engineering, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015, 7Browser-Tests: Create pool of user accounts on beta cluster for browser test builds in Jenkins - https://phabricator.wikimedia.org/T90964#1096328 (10Krinkle) [18:50:45] <^d> I deleted it on mc1 thoughhhh [18:50:46] <^d> I thought [18:51:04] ^d: hmm, salt is still not starting... [18:51:08] let me look at log [18:51:13] 10Continuous-Integration, 7Browser-Tests, 7Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#1096332 (10Krinkle) >>! In T91669#1092979, @dduvall wrote: > Test suites need to be faster as well. Once we work out ways to increase test isolation (for in... [18:51:44] <^d> At one point I had a "Notice: /Stage[main]/Salt::Minion/Service[salt-minion]/ensure: ensure changed 'stopped' to 'running' " [18:51:44] 015-03-06 18:50:56,706 [salt.crypt ][CRITICAL] The Salt Master server's public key did not authenticate! [18:51:44] The master may need to be updated if it is a version of Salt lower than 2014.1.11, or [18:51:44] If you are confident that you are connecting to a valid Salt Master, then remove the master public key and restart the Salt Minion. [18:51:44] The master public key can be found at: [18:51:45] /etc/salt/pki/minion/minion_master.pub [18:51:46] ^d: [18:51:53] so it starts, and then is stopped again [18:51:58] * YuviPanda rms file [18:52:24] right, and now it starts up properly…. [18:52:41] * ^d grabs something stabby shaped for salt [18:52:51] so we’d have to manually kill that key... [18:52:57] on all the minions and start it back up agian [18:54:40] seems not desirable. Probably modify salt::minion to fix? [18:55:09] Yippee, build fixed! [18:55:10] Project beta-scap-eqiad build #44187: FIXED in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/44187/ [18:57:36] thcipriani: hmm, how exactly? [18:57:37] * YuviPanda looks [18:58:18] thcipriani: it’s technically a salt master change, and I wonder if it’s not a bad idea to have that require manual intervention... [18:58:26] ideally the salt master wouldn’t be set to the old one at all... [18:59:32] <^d> But I guess that happens in the original image before we get our chance to run puppet? [18:59:44] I would think it would be desirable to ensure absent on that file for new setups. Considering we allow salt master to be overriden. [19:00:02] I think it happens in the *first* puppet run [19:00:11] hmm, salt master override should activate on the first run itself, because hiera. [19:00:16] <^d> Yeah [19:00:29] <^d> So I'm wondering if it's part of the base image already [19:01:52] ^d: hmm, salt-minion is installed in base-image... [19:01:59] (see postinst.sh under labs_vmbuilder) [19:02:05] but I don’t know how tha’ts going ot affect the master key [19:02:33] thcipriani: how exactly ‘ensure absent’ and how just fore ‘new setups’? [19:02:49] and how would you do that in a ‘generic’ way that’s not going to affect all the other labs projects / prod? [19:02:51] * YuviPanda is curious [19:03:39] <^d> YuviPanda: Bleh, salt-minion getting started on instance boot but /before/ initial puppet run? [19:03:41] that occurred to me after I wrote it :) Could also try specifying public key file contents [19:04:25] ^d: I think the salt-minion starts as it gets installed, and then it gets stopped but maybe damage is done by then? [19:04:28] * YuviPanda isn’t sure [19:04:35] <^d> That's my guess [19:04:42] thcipriani: ah, yeah, that could perhaps work, instead of just setting the fingerprint [19:05:26] right, if the contents of $master_public_key isn't nil dump it into that file. [19:06:08] yup, and subscribe the minion to it [19:06:13] that could work [19:10:20] (03Abandoned) 10Awight: Populate CRM test dbs [integration/jenkins] - 10https://gerrit.wikimedia.org/r/158554 (owner: 10Awight) [19:14:06] 10Continuous-Integration: Jenkins: Set up perceptual diffs (visual regression testing) - https://phabricator.wikimedia.org/T64633#1096480 (10Krinkle) [19:14:13] 10Staging, 6operations: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096483 (10thcipriani) 3NEW [19:17:16] 10Staging: Create staging-db* (databases) - https://phabricator.wikimedia.org/T91545#1096522 (10thcipriani) [19:17:17] 10Staging, 6operations: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096523 (10thcipriani) [19:26:27] 10Continuous-Integration: Jenkins: Set up perceptual diffs (visual regression testing) - https://phabricator.wikimedia.org/T64633#1096544 (10Krinkle) [19:34:36] 6Release-Engineering, 6Engineering-Community, 6MediaWiki-Core-Team, 6Multimedia, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096558 (10bd808) 3NEW a:3bd808 [19:52:37] 10Continuous-Integration, 10MediaWiki-Codesniffer, 5Patch-For-Review: Convert existing legacy phpcs jobs to use composer entry point + versioning - https://phabricator.wikimedia.org/T90943#1096621 (10Jdforrester-WMF) [20:09:50] (03PS1) 10Legoktm: Use composer-test for TemplateData phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194939 (https://phabricator.wikimedia.org/T90943) [20:10:08] (03CR) 10Legoktm: [C: 032] Use composer-test for TemplateData phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194939 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [20:11:39] (03Merged) 10jenkins-bot: Use composer-test for TemplateData phpcs/phplint [integration/config] - 10https://gerrit.wikimedia.org/r/194939 (https://phabricator.wikimedia.org/T90943) (owner: 10Legoktm) [20:12:25] 6Release-Engineering, 6Engineering-Community, 6MediaWiki-Core-Team, 6Multimedia, and 3 others: Prepare Platform April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1096676 (10bd808) [20:12:52] !log deployed https://gerrit.wikimedia.org/r/194939 [20:12:55] Logged the message, Master [20:14:10] !log deployed https://gerrit.wikimedia.org/r/194939 for reals this time [20:14:14] Logged the message, Master [20:19:06] * legoktm wonders if we can have zuul templates inside zuul templates [20:30:35] legoktm: It's templates all the way down, man. [20:42:33] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #12: FAILURE in 3 min 45 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/12/ [20:42:42] Krinkle: regarding the errors with missing font packages, can you confirm which distribution those instances are on [20:43:12] mutante: Trusty [20:43:34] Krinkle: thanks. (bonus might be that we are also switching from Ubuntu to Debian anyways) [20:44:04] so jessie instances might make sense at some point. i dunno [20:51:11] Yippee, build fixed! [20:51:12] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #507: FIXED in 40 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/507/ [20:55:41] 10Staging, 6operations: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096760 (10coren) p:5Triage>3Normal I'm not sure that should be considered a bug - mysql_install_db is a very, //very// destructive operation that is probably unw... [20:59:30] Project browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce build #501: FAILURE in 15 min: https://integration.wikimedia.org/ci/job/browsertests-UniversalLanguageSelector-commons.wikimedia.beta.wmflabs.org-linux-firefox-sauce/501/ [21:11:15] 10Continuous-Integration, 7Browser-Tests, 7Epic: Make browser tests voting for all repos of WMF deployed code - https://phabricator.wikimedia.org/T91669#1096831 (10dduvall) >>! In T91669#1096332, @Krinkle wrote: >>>! In T91669#1092979, @dduvall wrote: >> Test suites need to be faster as well. Once we work ou... [21:23:45] 10Continuous-Integration, 6operations, 5Patch-For-Review, 7Puppet: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1096843 (10Dzahn) no, after re-checking this on a trusty prod host i had to amend and the old package names still exist on Tru... [21:32:25] (03PS1) 10Krinkle: oojs-ui-jsduck-publish: Run build to support @example demos [integration/config] - 10https://gerrit.wikimedia.org/r/194957 [21:35:25] Zuul packaging is in progress: Successfully installed zuul [21:35:26] :=D [21:36:37] !log Provisioning integration-slave1401 - integration-slave1404 [21:36:41] Logged the message, Master [21:37:10] hasharAway: :) :) [21:41:56] (03PS2) 10Krinkle: oojs-ui-jsduck-publish: Run build to support @example demos [integration/config] - 10https://gerrit.wikimedia.org/r/194957 [21:42:22] hasharAway: integration-slave1405 has been pooled for a day now. [21:42:26] I think it's working :) [21:42:33] Re-recreated without Zuul patch [21:42:43] so that applies only to precise intances for now [21:42:55] and also the tox dependency issues is still broken on precise so I'm not creating those for now [21:43:19] The permissinos issue were fix by ops. they reverted from the root-only puppet run [21:45:08] Krinkle: amazing! [21:45:35] greg-g: and I filled bugs https://github.com/spotify/dh-virtualenv/issues/created_by/hashar :) [21:45:42] greg-g: since I am a good OSS citizen [21:46:20] Krinkle: regarding tox, we had the issue previously [21:46:20] hasharAway: yes you are. I'll mail you your cookie. :) [21:46:40] Krinkle: that is puppet installing tox from the package repository, and it never keep it up to date [21:47:06] Krinkle: we should use a debian package instead. I got much more experience with packaging now, so might be possible [21:47:26] Awesome error message: raise VersionConflict(dist,req) # XXX put more info here [21:48:25] hasharAway: Hm.. so the operations-puppet data admin list test is requiring a sub-module of 'tox' that does not exist in the older version? [21:48:40] Krinkle: I havent investigated [21:49:07] Krinkle: the new instance probably come with a new minor version of tox which is behaving slightly differently [21:49:47] Krinkle: a past example https://phabricator.wikimedia.org/T85662 [21:50:12] on Jan 1st .... WTF was I working on that day! [21:51:17] hasharAway: Ah. So it can just be manually upgraded and that fixes it? [21:51:30] hasharAway: I didn't know about that patch. Please try to keep that documented on the setup page :) [21:51:37] Krinkle: wasn't the error due to site-package being readable by root only ? [21:52:02] hasharAway: There are two tasks blocking 12xx instances. permissions error (on trusty instances) and importerror (on precise instances) [21:52:18] The former was fixed by adding umask to the puppet manifest [21:52:37] and is now redundant with permissions having changed back to how they were. The ops change to umask from 2-3 montsh ago was reverted. [21:52:45] hasharAway: https://phabricator.wikimedia.org/T91526 [21:53:05] 10Continuous-Integration: Fix "ImportError: Entry point ('console_scripts', 'tox') not found" on integration-slave12xx instances - https://phabricator.wikimedia.org/T91526#1096918 (10Krinkle) Possibly related: {T85662} [21:54:18] Krinkle: well tox is around for sure [21:54:35] seems it could not read it either [21:54:37] hasharAway: Yes, the error comes from tox. Not bash. [21:54:52] I guess we want to create a fresh Precise image and see what happens [21:54:58] apparently the slave12xx got deleted [21:55:10] hasharAway: That's what I did. That error is from a fresh Precise image and then running a job on it [21:55:23] what is the intance ? [21:55:34] It no longer exists [21:55:58] hasharAway: Because ops made many changes since last week for us, it didn't make sense to fix it. The changes we made were not forward compatible, so it required re-creation. [21:56:10] for example https://phabricator.wikimedia.org/T91525#1092871 [21:56:11] yup [21:56:22] so you can recreate a single Precise instance using the new image [21:56:26] and see whether it still happens [21:56:30] I'm finishing the trusty instances first. [21:56:45] I dont know how to fix/investigate the tox error so I'll leave that to you for when you have time of rit. [21:56:56] I am 90% sure it is a perm issue as well [21:57:05] It's okay if it takes longer. I'm just glad trusty is back to being 'clean' or at least documented/fixable. [21:57:17] yeah :) [21:57:21] precise can wait probably [22:01:00] 10Staging, 6operations: mariadb puppet module doesn't start mysql service in labs (possibly anywhere) - https://phabricator.wikimedia.org/T91797#1096926 (10thcipriani) >>! In T91797#1096760, @coren wrote: > I'm not sure that should be considered a bug - mysql_install_db is a very, //very// destructive operatio... [22:01:07] hasharAway: For testing Zend, do we need/want Precise in the future? [22:01:24] I suppose there is a way to circumvent the php->hhvm alias on Trusty, right? [22:01:40] but it'll be a different version of course [22:02:52] Curious if we want to keep that around for that reason, or that that makes more sense to do via Travis CI. Because if we test PHP 5.3 that way, we should also test PHP 5.4/5.5/5.6. Especially since those are increasingly more common. I'd say its more important to test PHP 5.4+ than php 5.3. [22:03:46] Although we can also do what Travis CI does: compile php 53/4/5/6 on Trusty. For e.g. a regression job that runs once a day to catch regressions. [22:04:08] but that doesn't scale for all projects. And with librarisation, we'll want do it for all. Hm.... [22:05:30] <^d> Krinkle: `php5` aliases to zend when you've got php -> hhvm [22:05:43] Cool [22:06:46] Krinkle: we just need PHP 5.3 [22:10:59] 10Continuous-Integration, 7Jenkins: Launching Jenkins slave agent fails with "java.io.IOException: Unexpected termination of the channel" - https://phabricator.wikimedia.org/T91697#1096947 (10Krinkle) [22:12:16] hasharAway: Hm.. once we get to the point that builds are consolidated (e.g. mediawiki-composer-npm) we could just run that on hhvm, and then on gate-and-submit run it on php53/php54/php55/php56/hhvm in separate VMs. If we have the capacity anyway. Seems relatively straight forward. [22:12:22] Krinkle: nobody looks at Travis CI [22:12:38] hasharAway: I know, but it's useful for regression testing where you can look at before a release [22:12:39] so if one send some patch which is not 5.3 compatible, that will surely ends up landing in the branch [22:12:45] Yeah, that's fine [22:12:52] Same with browser tests [22:13:01] then you end up doing post commit reviews [22:13:12] I am not going to head that way [22:13:26] I would love to just drop PHP 5.3 and update straight to 5.5 [22:13:48] hasharAway: Look at it from the other way: You want to know that MediaWIki works on those php versions before releasing a major version. You can either 1) test it manually every 3 months, 2) Have something do it for you. [22:14:29] And that can either be Travis CI every commit (and you only look at it once a week or whenever, some volunteer action will also happen), or we do it ourselves. [22:14:31] There isn't another option. [22:14:47] ./projects/integration/build-area/zuul_2.0.0-1_all.deb [22:14:51] * hasharAway flexes [22:15:09] awight: Did you sign your new instance with the integration-puppetmaster? [22:15:59] Krinkle: we support PHP 5.3, so we have to run tests with it [22:16:17] Krinkle: if the problem is getting rid of Precise instance, we can build PHP 5.3 on Trusty [22:16:27] !log beta-scap-eqiad is has been waiting for 50minutes for an executor on deployment-bastion.eqiad (which has 5/5 slots idle) [22:16:31] Logged the message, Master [22:16:41] also Travis use some kind of utility to easily switch between diffeerent version. Bryan told me about it earlier [22:16:47] hasharAway: I disconnected/relaunched the slave agent, but it won't process the queue. [22:16:50] Ideas? [22:17:36] Krinkle: yep, thanks for including that in the docs. [22:17:45] hasharAway: Yeah, they compile php versions (into an out-of-PATH directory like /usr/something/php/53/, 54, 55, 56) and then you use 'phpvm use 54' which is essentially just doing 'alias php = usr....php54' [22:18:06] So all instances have all php versions [22:19:21] awight: good to see you have an instance! [22:20:30] Krinkle: it ? [22:20:36] hasharAway: Yes, it's going very well, thank you! I'm able to debug the provisioning stuff for my Civi app... [22:20:42] Krinkle: hoooo deployment-bastion slave is deadlocked again :( [22:20:51] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #557: ABORTED in 2 min 27 sec: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/557/ [22:21:25] One thing I should mention is that the provisioning script downloads some 3rd-party stuff, including packagist tarballz... Is the VPS isolation good enough that I don't have to worry too much about malice? [22:22:21] Krinkle: gotta restart Jenkins :( [22:22:30] hasharAway: Why? [22:22:39] What state is preserved? [22:22:40] Krinkle: I dont have the bug report offhand [22:22:51] Can we delete the slave registry and re-create the node? [22:22:56] but somehow the Jenkins Gearman plugin holds all the executors [22:23:01] Other instances are fine, so there's got to be something somewhere [22:23:02] or consider them locked by something else [22:23:06] and does not assign any jobs to it [22:23:17] OK. Don't resart yet [22:24:02] Re-enabling Gearman did the trick [22:24:10] oh [22:24:22] !log Re-establishing Gearman connection from Jenkins (deployment-bastion was deadlocked) [22:24:26] Logged the message, Master [22:24:30] next thing [22:24:36] get rid of Jenkins and the Gearman plugin :ÿ [22:24:58] hasharAway: Yeah, let's finish that on Monday . Shouldn't take more than a day [22:28:09] (03PS4) 10Hashar: Package python deps with dh-virtualenv [integration/zuul] (debian-precise-venv) - 10https://gerrit.wikimedia.org/r/194520 [22:28:25] bah I am using gerrit as a dropboxo [22:28:35] (03CR) 10Hashar: "check experimental" [integration/zuul] (debian-precise-venv) - 10https://gerrit.wikimedia.org/r/194520 (owner: 10Hashar) [22:29:17] Krinkle: create a Precise instance, apply the puppet manifest and we can check it once it has build over the weekend [22:35:49] and next week [22:35:59] I am going to fight with Debian quilt https://www.debian.org/doc/manuals/maint-guide/modify.en.html :D [22:36:18] or how to hacks upstream sources [22:39:53] and you know a tool is bad when it starts by setting up shell alias [22:39:54] :/ [22:43:32] hasharAway: Hehe [22:43:41] hasharAway: :-( another puppet failure [22:43:50] Why can't puppet just return an error code if something fails? [22:44:14] Krinkle: what do you mean? [22:45:08] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree.value (<37.50%) [22:45:14] 10Continuous-Integration, 6operations, 7Puppet: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097104 (10Krinkle) 3NEW [22:45:15] hasharAway: ^ [22:45:24] Mar 6 22:24:50 integration-slave1402 libapache2-mod-php5: apache2_invoke: Enable module php5 [22:45:24] Mar 6 22:24:50 integration-slave1402 libapache2-mod-php5: apache2_reload: Your configuration is broken. Not restarting Apache 2 [22:45:29] wtf [22:46:22] 10Continuous-Integration, 6operations, 7Puppet: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097112 (10Krinkle) [22:46:35] ah [22:46:46] I ill be happy when we can copy ansi sequences and paste them as colorized markdown [22:46:53] OR [22:46:57] we can switch to Windows [22:47:06] syslog does not have colors [22:47:07] Mar 6 22:24:50 integration-slave1402 libapache2-mod-php5: apache2_invoke: Enable module php5 [22:47:07] Mar 6 22:24:50 integration-slave1402 libapache2-mod-php5: apache2_reload: Your configuration is broken. Not restarting Apache 2 [22:47:15] seems like some apache conf file is wrong? [22:47:22] you might need to run puppet a few times [22:47:23] hasharAway: Yeah, something [22:47:32] hasharAway: I re-ran it a second time and it fixed it. [22:48:10] 10Continuous-Integration, 6operations, 7Puppet: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097115 (10Krinkle) p:5Triage>3Low [22:48:29] Krinkle: there is an apache command to test the config [22:48:45] 10Continuous-Integration, 6operations, 7Puppet: Puppet (silently) fails to setup apache on some integration-slave14xx instances - https://phabricator.wikimedia.org/T91832#1097118 (10Krinkle) Re-running puppet two (!) more times eventually fixed this. Lowering priority since we're moving on regardless, but it... [22:49:02] hasharAway: I wonder how this is for production? Do they also have to reboot machines and run puppet 10x before the instance is really ready? [22:49:03] apachectl configtest [22:49:20] Krinkle: no clue [22:49:40] While I'm obviously doing other stuff it's taking almost 2 weeks now just to create a couple instances. [22:50:06] new image, and we havent installed slaves for a few [22:50:25] + the Zuul bad installation is definitely not helping [22:51:04] Krinkle: if you are around during our early afternoon, we can pair a bunch [22:51:19] though I really need that zuul.deb ready for testing [22:51:40] hasharAway: Zuul instances from puppet works fine on trusty now [22:51:43] Didn't need a patch [22:52:22] hasharAway: And we're using Hiera now for the self-puppetmaster [22:52:25] Which saves one step [22:54:57] great! [22:55:38] Krinkle: you did wonders this week Timo! You should probably take some rest and enjoy the week-end now [22:55:47] I am surely going to bed crash now [22:56:12] Ha, I'll be working for another 4 hours before I'm done for today, but I'll join you in that thought. [22:56:13] thanks :) [22:56:23] And you as well for joining the investigation. [22:59:05] tried my best while rtfm debian docs :-D [22:59:25] nyway, have all sweet dreams! [23:17:11] (03CR) 10Jforrester: [C: 031] oojs-ui-jsduck-publish: Run build to support @example demos [integration/config] - 10https://gerrit.wikimedia.org/r/194957 (owner: 10Krinkle) [23:20:51] 10Continuous-Integration, 6operations, 5Patch-For-Review, 7Puppet: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097220 (10Dzahn) a:3Dzahn [23:21:16] * James_F sighs. [23:21:51] 10Continuous-Integration, 6operations, 5Patch-For-Review, 7Puppet: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1093266 (10Dzahn) ``` @integration-slave1405:~# dpkg -l | grep 'kannada\|oriya\|unfonts\|libertine' ii fonts-linuxlibertine... [23:25:49] 10Continuous-Integration, 6operations, 5Patch-For-Review, 7Puppet: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097226 (10Dzahn) 5Open>3Resolved Notice: Finished catalog run in 33.31 seconds root@integration-slave1405:~ not sure how... [23:29:57] !log Pool integration-slave1401 [23:30:04] Logged the message, Master [23:59:22] 10Continuous-Integration, 6operations, 5Patch-For-Review, 7Puppet: Puppet class Mediawiki::Packages::Fonts fails to install various fonts - https://phabricator.wikimedia.org/T91685#1097267 (10Dzahn) still suggesting https://gerrit.wikimedia.org/r/#/c/194828/ but it's just something i noticed while looking...