[00:07:24] RainbowSprinkles what key will we put in secret? [00:07:31] the id_rsa.pub key from gerrit's repo [00:10:01] https://gerrit.wikimedia.org/r/#/c/363755/ [00:15:00] We'd generate a keypair [00:15:06] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/ssh/userkeys/gerrit2] is already declared in file /etc/puppet/modules/ssh/manifests/userkey.pp:70; cannot redeclare at /etc/puppet/modules/ssh/manifests/userkey.pp:70 on node gerrit-test3.git.eqiad.wmflabs [00:15:06] For the labs instances [00:15:07] hmm [00:15:22] We'll want to remove ours then [00:15:28] (where we declare it outselves [00:15:30] ah [00:15:40] thanks will do that now [00:17:12] Error: /Stage[main]/Gerrit::Jetty/Scap::Target[gerrit/gerrit]/User[gerrit2]/home: change from /var/lib/gerrit2 to /var/lib/scap failed: Could not set home on user[gerrit2]: Execution of '/usr/sbin/usermod -d /var/lib/scap gerrit2' returned 8: usermod: user gerrit2 is currently used by process 5977 [00:17:49] we can setup a temp user for this [00:17:56] called gerrit-migrate lol [00:20:04] now we are getting some where [00:21:19] yay [00:21:22] puppet finsihed [00:21:24] finished [00:21:28] RainbowSprinkles && [00:21:34] woops meant to be ^^ [00:23:58] note: puppet passes after running puppet twice [00:24:01] woop [00:24:16] now need to generate a key we can use in the private repo [00:26:45] yea, i think so [00:27:25] make a new keypair and put the private part in labs/private and the pub part in normal gerrit module as "id_rsa_labs.pub" or so .. sorry , maybe "cloud" [00:27:41] Don't we already have some throwaway keypairs we install in labs? [00:27:43] maybe not unless it's also cloud/private [00:28:01] created a new key [00:28:06] it didn't look like it when glancing at that puppet class [00:28:28] eh.. 108 content => secret('gerrit/id_rsa'), [00:28:29] https://gerrit.wikimedia.org/r/#/c/363755/3/modules/secret/secrets/keyholder/gerrit.pub [00:28:37] oh.. keyholder [00:28:46] that is a different key? [00:28:47] Yeah, this has to go in keyholder now [00:29:09] Maybe we should deploy with a 2nd user? The gerrit2 keypair is basically used for replication purposes right now [00:29:26] i've named the user gerrit-migration [00:29:27] that is what confused me.. i was looking at the latter part [00:29:30] gerrit2 will not work [00:29:53] maybe we should not call it just "id_rsa" [00:29:55] but more specific [00:30:05] i called it gerrit.mysql [00:30:06] now that we have 2 uses for it [00:30:08] woops [00:30:11] i meant gerrit.pub [00:30:28] or different keypairs? [00:30:39] i've removed id_rsa.pub [00:30:41] now [00:30:48] i've updated https://gerrit.wikimedia.org/r/#/c/363726/ [00:31:28] this removes ssh::userkey https://gerrit.wikimedia.org/r/#/c/363726/7/modules/gerrit/manifests/jetty.pp [00:31:34] that key is for cluster sync [00:31:35] yeh [00:31:42] as we now do it in keyholder [00:31:45] that is unrelated to deploying [00:31:47] it caused puppet errors [00:31:53] and would break sync between the 2 servers, right [00:32:06] no, that's just the name [00:32:11] the key will still be there [00:32:15] as scap adds it [00:32:22] maybe it's better to treat this separately.. one key for deployment, one for sync [00:32:35] it already confused us right now [00:32:35] you carn't. it fails [00:32:36] with [00:32:37] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/ssh/userkeys/gerrit2] is already declared in file /etc/puppet/modules/ssh/manifests/userkey.pp:70; cannot redeclare at /etc/puppet/modules/ssh/manifests/userkey.pp:70 on node gerrit-test3.git.eqiad.wmflabs [00:32:58] because scap calls that ssh::userkey too [00:33:26] hmm.. it wouldnt be a problem if they used different resource names / separate keys ? [00:35:35] mutante: No, just need to drop the ssh::userkey{} declaration, it's redundant to scap::target{} [00:36:06] Also: we should be able to use the existing keypair you have for labs? It's already committed and live everywhere [00:36:17] All we need to do then is add that key to the master's keyholder [00:36:33] where's that key? [00:37:32] oh.. then ignore me, i thought the key for cluster sync was still needed separate from deploy changes [00:37:34] Actually, I lied. That key's busted. [00:37:44] You know, we could just use service-deploy user or w/e for this. [00:37:57] Because A) we don't want scap actually handling service restarts [00:38:07] and B) Gerrit2 doesn't need to *write* to the jars (nor should it, tbh) [00:38:16] ok [00:38:19] Then we don't need to bother with new keypairs [00:38:23] Or prepping keyholder [00:38:26] where will review_site be? [00:38:30] isn't that key needed totally separate from how gerrit gets deployed though? slightly confused still [00:38:39] /srv/deployment/gerrit/gerrit/review_site? [00:38:41] No, this is a key for deploying [00:38:45] ok [00:38:49] I figured originally we could reuse gerrit2 [00:38:52] But that seems problematic [00:38:57] ok [00:39:01] So yeah, let's just use the standard service-deploy user [00:39:07] ok [00:39:10] paladox: Probably something like that [00:39:10] changing it to service-deploy [00:39:13] ok [00:39:24] will scap overwrite it or try to remove it? [00:39:36] Actually, we can't do that right now. So yeah, it'll be symlinks at first [00:39:47] ok [00:39:59] So we'll deploy to /srv/deployment... and then /var/lib/gerrit2/review_site/ will all symlink as appropriate [00:40:05] gerrit.war + plugins dir [00:40:09] yep [00:40:41] Anyway, it's like almost 6pm, I'm going to have a beer and relax. Good evening gentlemen [00:40:56] enjoy and good evening too [00:41:06] it's 1:40am here. /me will watch tv then. [00:44:27] paladox: NCIS [00:44:29] am i right [00:44:32] yes [00:44:36] heh :) [00:44:42] :) [00:45:33] that's played alot over here on 5 usa [00:45:45] but it is also on sky catch up tv service [00:46:41] i stopped paying comcast because they are so evil, so no tv, heh [00:46:59] lol [00:47:28] my tv providers give you the best offers if you threaten to quit. [00:48:04] i was really proud that i managed to quit the right way (they dont just let you go ) [00:48:45] lol, mine is fined if they do that. [00:49:03] the trick was to FIRST personally dump the hardware at their location so you cant use it anymore and then cancel the contract. otherwise they make you mail it back and give you 10 calls [00:49:39] looks for terrestrial TV on nocable.org :) [00:50:18] lol [00:50:19] sad .. everything is yellow or red [00:50:34] * paladox is greatful we have ofcom regulating tv providers. [00:51:14] could have 20 channels if he had an outdoor antenna in a non-existing garden [00:51:35] lol, what about freeview? [00:51:51] what's that? streaming? [00:52:26] also, we are off-topic. sorry, and cu later:) [00:52:28] no, it's digital tv through an arial [00:52:39] oh yea, that's what i tried [00:52:53] though probaly not what you are thinking off [00:53:03] theres digital and analog [00:53:21] i know, analog is completely dead though [00:53:27] yeh [00:53:32] they switched that off years ago [00:53:48] there's freeviewplay which is streaming. [00:53:56] Only 4 years [00:54:03] https://www.freeview.co.uk/why-freeview/freeview-play [00:54:27] They forced us to use a digital dish when they did that. [00:55:27] yea, so i need a new TV or box so that it can be connected to the Internet [00:55:35] only to get what i had before ..without all that [00:56:01] and as a bonus now i have to install security patches on my TV as well.. or worry about being watched on the camera, hehe [00:56:30] though you have to pay a tv license to even be able to watch bbc iplayer or own a tv here [00:57:42] hehee, here they laugh about "tv license" but the cable prices are just soo much more [00:58:02] oh. [00:58:03] and at least you got some non-commercial station [00:58:11] hahaha [00:58:12] bbc [00:59:08] paladox: i guess i have to watch starwars on telnet again ... [00:59:16] lol [00:59:28] telnet towel.blikenlights.nl [00:59:54] lol [01:00:01] telnet towel.blinkenlights.nl [01:00:07] that's the right one [01:01:20] yep [01:02:04] * paladox will go now [01:02:15] bye, cu. out [01:02:18] will work on the scap thing again later today (i say today as it's 2am) [01:02:26] thanks and you too [01:16:31] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [01:18:21] (03CR) 10Krinkle: "If the generation itself is meant to be automated from a job, then I'd also support cloning it as part of the job and using it from there " [tools/release] - 10https://gerrit.wikimedia.org/r/356430 (owner: 10Chad) [01:41:23] (03Abandoned) 10MaxSem: Test JsonConfig with Kartographer [integration/config] - 10https://gerrit.wikimedia.org/r/352279 (owner: 10MaxSem) [01:50:32] (03CR) 10Krinkle: [C: 032] Sniff that the short type form is used in @return tags [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/362593 (https://phabricator.wikimedia.org/T145162) (owner: 10Legoktm) [01:51:41] 10MediaWiki-Codesniffer, 10Patch-For-Review: Provide a Codesniffer rule to enforce "short" type definitions: int and bool, not integer and boolean - https://phabricator.wikimedia.org/T145162#3414398 (10Krinkle) In addition to `@return`, we should handle `@param` as well. [01:51:44] (03Merged) 10jenkins-bot: Sniff that the short type form is used in @return tags [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/362593 (https://phabricator.wikimedia.org/T145162) (owner: 10Legoktm) [01:56:09] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<22.22%) [01:56:33] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [05:03:26] 10Continuous-Integration-Config, 10Patch-For-Review: WebPlatformAuth PHPUnit job fails "You MUST install Composer dependencies" - https://phabricator.wikimedia.org/T157419#3414445 (10Krinkle) a:03Umherirrender [05:03:34] 10Continuous-Integration-Config, 10Patch-For-Review: WebPlatformAuth PHPUnit job fails "You MUST install Composer dependencies" - https://phabricator.wikimedia.org/T157419#3004602 (10Krinkle) 05Open>03Resolved p:05Triage>03Normal [05:10:26] 10Continuous-Integration-Config, 10MediaWiki-extensions-Other: FlickrAPI test failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154847#2926008 (10Krinkle) [05:19:02] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: MediaWiki core PHPCS job should only run against files changed in HEAD - https://phabricator.wikimedia.org/T158974#3414577 (10Krinkle) [05:20:03] 10Release-Engineering-Team (Backlog), 10Wikimedia-Site-requests: Don't allow non-existent wikis in server configuration files - https://phabricator.wikimedia.org/T115138#3414583 (10Krinkle) [05:22:34] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10JavaScript, and 2 others: Refactor webdriverio tests for mediawiki core so users and pages are created via the api - https://phabricator.wikimedia.org/T167502#3414586 (10Krinkle) [05:28:09] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Run WebdriverIO tests in CI for extensions - https://phabricator.wikimedia.org/T164721#3414608 (10Krinkle) [05:28:11] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10JavaScript, and 2 others: Refactor webdriverio tests for mediawiki core so users and pages are created via the api - https://phabricator.wikimedia.org/T167502#3414607 (10Krinkle) 05Open>03Resolved [06:24:17] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #414: 04FAILURE in 1 hr 44 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/414/ [06:55:58] (03PS1) 10Florianschmidtwelzow: Archive AjaxLogin extension [integration/config] - 10https://gerrit.wikimedia.org/r/363774 (https://phabricator.wikimedia.org/T169670) [07:11:10] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:41:07] (03PS5) 10Revi: Add revi to whitelist. [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [08:42:49] (03PS6) 10Revi: Add revi to whitelist. [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [08:44:16] (03CR) 10Revi: "I have some sort of obsession with ABC order, but realized PS5 had typo after I uploaded it." [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [08:47:06] 10Beta-Cluster-Infrastructure, 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: Beta thumbnails are broken - https://phabricator.wikimedia.org/T169114#3414879 (10fgiunchedi) 05Open>03Resolved Looks like this is fixed, we don't have poolcounter in beta I think? Anyways if we do we can... [08:54:17] 10Release-Engineering-Team, 10Page-Previews, 10Reading-Web-Backlog, 10Epic: [EPIC] Generate compiled assets from continuous integration - https://phabricator.wikimedia.org/T158980#3414898 (10phuedx) >>! In T158980#3413033, @Jdlrobson wrote: > I replied to Reading-web-team] Using a bundler in another of our... [09:08:16] 10Continuous-Integration-Infrastructure: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3414934 (10hashar) Over night I filled https://github.com/oerdnj/deb.sury.org/issues/642 #upstream and > I don't think the PHP packages are... [09:21:08] 10Continuous-Integration-Infrastructure: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3414959 (10hashar) On a Nodepool Jessie instance using an image from Wednesday July 5: ``` $ export PHP_BIN=/usr/bin/php7.0 $ php --version... [09:30:40] 10Continuous-Integration-Infrastructure: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3412134 (10Legoktm) Upgrading phan could? also resolve this given https://github.com/etsy/phan/commit/80ac316b4672dbe6f9c65f4435e67839de2c133... [09:48:21] 10Continuous-Integration-Infrastructure, 10Upstream: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3415043 (10hashar) We found out the issue to be a change in php-src between 7.0.20 and 7.0.21: ``` $ git grep -e 'PHP_SQLITE3_V... [09:48:36] 10MediaWiki-Codesniffer, 10Patch-For-Review: Provide a Codesniffer rule to enforce "short" type definitions: int and bool, not integer and boolean - https://phabricator.wikimedia.org/T145162#3415045 (10Ricordisamoa) What about e.g. `double` instead of `float`? [09:49:04] legoktm: you are awesome :] [09:49:11] legoktm: (re phan / sqlite version) [09:51:41] addshore: for phan sqlite issue, we gotta upgrade Phan to get https://github.com/etsy/phan/commit/80ac316b4672dbe6f9c65f4435e67839de2c1335 [09:51:58] namely, remove the "ext-sqlite3": "0.7-dev" version requirement [09:52:17] since php7.0.21 has changed the version of the sqlite module to be the php version (instead of the hardcoded 0.7-dev) [09:52:50] 10Continuous-Integration-Infrastructure, 10Upstream: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3415064 (10hashar) The fix is to upgrade Phan to 0.8.0 or later. [10:26:45] 10Deployment-Systems, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10monitoring, and 2 others: Deploy statsv with scap3 - https://phabricator.wikimedia.org/T129139#3415118 (10fgiunchedi) [10:59:32] PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:31:54] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:18:36] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [12:37:08] (03PS1) 10Hashar: Bump Phan 0.7..0.8 for mediawiki/core [integration/config] - 10https://gerrit.wikimedia.org/r/363807 (https://phabricator.wikimedia.org/T169904) [12:39:32] (03CR) 10Hashar: "Did a recheck of mediawiki/core patch https://gerrit.wikimedia.org/r/#/c/92621/" [integration/config] - 10https://gerrit.wikimedia.org/r/363807 (https://phabricator.wikimedia.org/T169904) (owner: 10Hashar) [12:39:45] addshore: are you still around by any chance ? [12:39:51] yu [12:39:57] p [12:40:05] addshore: I got bump esty/phan from 0.7 to 0.8 [12:40:30] because 0.7 requires ext-sqlite3 0.7-dev [12:40:43] however php 7.0.21 has changed the sqlite3 extension version [12:40:51] so that bails out ( https://phabricator.wikimedia.org/T169904 for the long story) [12:40:56] looks like core is passing just fine :] [12:41:47] addshore: my question being, is there anything preventing a bump of phan to 0.8 ? [12:42:03] nope, unless it breaks tests [12:42:05] but it shouldnt [12:42:10] (03CR) 10Hashar: [C: 032] Bump Phan 0.7..0.8 for mediawiki/core [integration/config] - 10https://gerrit.wikimedia.org/r/363807 (https://phabricator.wikimedia.org/T169904) (owner: 10Hashar) [12:42:10] good [12:42:10] :D [12:42:20] I am going to bump it for extensions as well so :D [12:43:56] (03Merged) 10jenkins-bot: Bump Phan 0.7..0.8 for mediawiki/core [integration/config] - 10https://gerrit.wikimedia.org/r/363807 (https://phabricator.wikimedia.org/T169904) (owner: 10Hashar) [12:44:43] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:46:36] (03PS1) 10Hashar: Bump Phan 0.7..0.8 for mediawiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/363809 (https://phabricator.wikimedia.org/T169904) [12:47:33] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [12:52:45] that is going to be a mess [12:53:19] (03CR) 10Hashar: [C: 032] "some extensions are going to fail as a result" [integration/config] - 10https://gerrit.wikimedia.org/r/363809 (https://phabricator.wikimedia.org/T169904) (owner: 10Hashar) [12:53:37] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:54:28] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Upstream: mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version - https://phabricator.wikimedia.org/T169904#3415336 (10hashar) The Jenkins jobs running Phan have been updated to use Phan 0.8. mediawiki/core pass... [12:54:46] (03Merged) 10jenkins-bot: Bump Phan 0.7..0.8 for mediawiki extensions [integration/config] - 10https://gerrit.wikimedia.org/r/363809 (https://phabricator.wikimedia.org/T169904) (owner: 10Hashar) [12:55:30] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Upstream: Upgrade etsy phan 0.7..0.8 (was mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version) - https://phabricator.wikimedia.org/T169904#3415338 (10hashar) [12:59:35] 00:02:33.730 [12:59:35] 00:02:33.730 [12:59:36] bah [13:02:33] hashar: fixing, do you have a patch where it happens? [13:03:36] dcausse: I have bumped esty/phan to 0.8 got a patch that might fix it [13:04:39] hashar: adding use SearchResultSet on EmptyResultSet.php should fix it, I can upload if you want [13:04:47] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:04:54] dcausse: ah yeah please do :] [13:04:57] ok [13:04:58] because I have no idea what I am doing [13:05:15] I was merely blindly s/SearchResultSet/ResultSet/ :D [13:05:32] dcausse: you can refer to T169904 [13:05:32] T169904: Upgrade etsy phan 0.7..0.8 (was mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version) - https://phabricator.wikimedia.org/T169904 [13:13:50] (03Abandoned) 10Hashar: Use phan version 0.8 [integration/config] - 10https://gerrit.wikimedia.org/r/363222 (owner: 10Addshore) [13:16:45] meh... now I have Call to deprecated function \ParserCache::singleton [13:18:09] :( [13:22:31] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:26:40] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Upstream: Upgrade etsy phan 0.7..0.8 (was mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version) - https://phabricator.wikimedia.org/T169904#3415362 (10hashar) [13:27:03] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Upstream: Upgrade etsy phan 0.7..0.8 (was mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version) - https://phabricator.wikimedia.org/T169904#3412134 (10hashar) [13:28:14] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Upstream: Upgrade etsy phan 0.7..0.8 (was mediawiki-core-php70-phan-jessie requested PHP extension sqlite3 has the wrong version) - https://phabricator.wikimedia.org/T169904#3412134 (10hashar) Follow up of the phan update: CirrusSearch has some... [13:31:25] dcausse: looks like you fixed it :] [13:31:43] addshore: and Wikibase has a few errors https://phabricator.wikimedia.org/T169980 beside that everything else pass :] [13:36:24] hashar: cool [13:38:12] hashar: +2ed, should be merged pretty soon [13:39:02] dcausse: \o/ [13:40:04] addshore: for wikibase, https://gerrit.wikimedia.org/r/#/c/363828/ ignore a rule apparently introduced in phan 0.8 [13:40:37] ack [13:46:42] Yippee, build fixed! [13:46:42] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #452: 09FIXED in 2 min 41 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/452/ [13:56:27] !log Nodepool: uploaded new Ubuntu Trusty image [13:56:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:57:28] !log Nodepool: updating snapshot-ci-trusty [13:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:58:42] addshore: the wikibase selenium job is broken :( [13:58:52] D: [14:03:59] !log Image snapshot-ci-trusty-1499435837 in wmflabs-eqiad is ready [14:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:04:03] PROBLEM - Puppet errors on integration-slave-docker-1000 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:33:46] RainbowSprinkles oh we can use gerrit2 for scap i think. switching of manage_user should prevent the user from being created by scap. We will want to create a work around for now until we move everything under /srv and can move the user default home too. [14:39:05] RECOVERY - Puppet errors on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:08] yay works [14:48:41] PROBLEM - Puppet errors on deployment-sca04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:53:23] !log deployment-prep: add port 9632 to security group "sca" https://horizon.wikimedia.org/project/access_and_security/security_groups/593/ - T148129 [14:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:53:28] T148129: Productization of Recommendation API - https://phabricator.wikimedia.org/T148129 [14:53:33] (03CR) 10Zppix: [C: 031] "> I have some sort of obsession with ABC order, but realized PS5 had" [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [14:53:58] !log deployment-prep: change webproxy http://recommendation-api-beta.wmflabs.org/ to deployment-sca02 (has the proper security rule) - T148129 [14:54:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:55:13] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:57:48] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:57:59] hashar: when you get a moment can you review gerrit.wikimedia.org/r/363041 ? [14:59:15] 10Beta-Cluster-Infrastructure, 10Operations, 10Performance-Team, 10Thumbor, 10Patch-For-Review: Beta thumbnails are broken - https://phabricator.wikimedia.org/T169114#3387489 (10hashar) >>! In T169114#3414879, @fgiunchedi wrote: > Looks like this is fixed, we don't have poolcounter in beta I think? Anywa... [14:59:57] Zppix: yup [15:00:02] thanks hashar :) [15:00:25] (03CR) 10Hashar: [C: 032] Add revi to whitelist. [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [15:00:41] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:01:29] hashar im wondering do you know what class the deployment servers use please? [15:01:34] As it seems im using profile::mediawiki::deployment::server [15:02:07] (03Merged) 10jenkins-bot: Add revi to whitelist. [integration/config] - 10https://gerrit.wikimedia.org/r/363041 (owner: 10Zppix) [15:03:04] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:04:29] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: MediaWiki core PHPCS job should only run against files changed in HEAD - https://phabricator.wikimedia.org/T158974#3415728 (10hashar) 05Open>03Resolved... [15:04:35] I was thinking if I should modify comitter identity (typo with new virtualbox vm) but obviously too late [15:04:55] lol [15:06:32] revi: meh I used to have 2 committer identities till i figured out how to modify it (when i used git instead of webui i used another email, but now its all the same) [15:06:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikimedia-Hackathon-2017, 10Wikimedia-Logstash, and 2 others: Send Jenkins build log and results to ElasticSearch - https://phabricator.wikimedia.org/T78705#3415735 (10hashar) Quick update: I have been busy with other duties a... [15:07:55] Well, I have a preset list of config files [15:07:58] but forgot to apply it [15:10:17] PROBLEM - Puppet errors on deployment-changeprop is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:12:42] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:13:18] PROBLEM - Puppet errors on deployment-db04 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:13:20] PROBLEM - Puppet errors on deployment-mcs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:13:23] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3415741 (10hashar) a:05hashar>03None **Status update** There are a few patches for puppet.git that are... [15:13:38] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:15:47] PROBLEM - Puppet errors on deployment-sca03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:19:45] Zppix: and deployed! [15:20:05] hashar: thanks again, you do too much for 1 human to do :) have a good rest of the day! [15:23:46] Zppix: thx :) [15:23:50] np [15:27:31] PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:41:12] thcipriani: hi, im wondering is it possible for scap to ssh in as one user and sudo su to another user? [15:41:41] Im trying to deploy gerrit with scap but using a different user is not working as it needs the ssh key added to that user. [15:42:02] IIRC setting ssh_user and git_repo_user to different users should do that [15:42:04] * thcipriani double checks [15:42:41] i've set git_repo_user: gerrit2 [15:42:46] in scap/scap.cfg [15:42:53] but scap deploy complains about permissions [15:44:19] hrm, looks like we got rid of git_repo_user at some point. Adding complication and no one was using it :( [15:44:58] oh [15:45:20] so yeah you'll have to be able to ssh as the ssh_user [15:45:41] Is there a way i can add my ssh key to the gerrit2 user? [15:45:51] it's not in ldap so carn't add it in wikitech. [15:46:10] do i add the key under .ssh/* [15:46:38] hrm, if it's a local user you'll have to check the ssh config for where authorized_keys go [15:46:44] ah ok [15:47:04] the AuthorizedKeysFile path in /etc/ssh/sshd_config [15:47:21] AuthorizedKeysFile /etc/ssh/userkeys/%u /etc/ssh/userkeys/%u.d/cumin [15:48:14] so yeah %u is the user, so /etc/ssh/userkeys/gerrit2 should work [15:48:41] ah [15:48:42] thanks [15:59:45] hmm fails [16:06:18] 10Gerrit: Login to Git (gerrit review) doesn't work - https://phabricator.wikimedia.org/T169996#3415866 (10Reception123) [16:08:04] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:08:30] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3415886 (10Paladox) Happened again T169996 [16:09:47] paladox: hmm this seems to be a recurring error I see [16:10:19] Reception123 it supposed to be fix. The index is out of date or something causing your gerrit: part to be removed from the db. [16:11:50] Jul 7 16:10:57 gerrit-test3 sshd[26633]: error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup returned status 1 [16:12:12] 10Gerrit: Login to Git (gerrit review) doesn't work - https://phabricator.wikimedia.org/T169996#3415901 (10demon) [16:12:17] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3415904 (10demon) [16:18:18] RECOVERY - Puppet errors on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:29:30] ah [16:29:32] fixed it [16:32:17] are you talking about the login error? It still doesn't work for me [16:35:11] nope, i mean scap deploying gerrit2. [16:35:25] demon will have to go into the db and add the gerrit: back there. [16:35:52] ok [16:45:25] hmm [16:45:26] i get this now [16:45:27] https://phabricator.wikimedia.org/P5704 [16:45:28] thcipriani ^^ [16:45:55] hrm [16:46:08] > ​Host key verification failed. [16:46:42] yep [16:46:46] but i can ssh in find [16:46:53] ssh gerrit2@gerrit-test3 [16:47:01] scap deploy --force fails [16:47:04] but scap deploy works [16:47:13] what is being rsync'd? [16:47:34] gerrit.war [16:47:36] and the plugins [16:47:44] but i doint see any changes on gerrit-test3 [16:47:45] I think that's where the hostkey is failing [16:47:46] for the files [16:48:37] since it looks like the ssh succeeds, and the fetch succeeds [16:49:28] yep hmm [16:57:48] I was directed to this channel. I am wondering when will be released 1.29 Mediawiki version [16:58:38] I'll be putting out a new release candidate today [16:58:54] There's been some pretty nasty bugs that cropped up during the first release candidate and we've been trying to get them fixed [16:58:56] thank you RainbowSprinkles [16:58:59] (Sorry, I know we're behind schedule) [16:59:04] thank you for your help [17:01:43] i think for now i will deploy as my self [17:01:53] getting local users to work with ssh is complicated. [17:11:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Patch-For-Review: CI for operations/puppet is taking too long - https://phabricator.wikimedia.org/T166888#3416029 (10faidon) FTR, as I mentioned on IRC, these three changes are continuing down the path of accumula... [17:13:37] RainbowSprinkles we will need to have an actual user that exists in ldap for scap deploys [17:13:55] It doesn't have to be in LDAP. We don't create them in LDAP in prod [17:14:03] Why will system users not work? [17:14:18] trying to ssh with it, i had to make alot of hacks [17:14:22] to get it to work [17:14:30] then it failed when trying to scap deploy --force [17:14:44] i guess if it works in prod, i can keep the user i have set. [17:15:05] What sort of hacks? We use scap in other things...not sure what makes gerrit special [17:15:13] *in labs [17:15:21] i had to hack the private repo to add the ssh key for gerrit2 [17:15:28] in /etc/ssh/*/gerrit2 [17:15:34] That's not a hack, that's the correct thing to do [17:15:41] oh [17:15:48] I said yesterday: we need a keypair [17:16:00] How else would you add a keypair if you don't commit to git? [17:16:06] yep. [17:16:15] i have a patch to do that [17:16:25] has a fake ssh key that i generated locally. [17:16:41] though when scapping i got this error [17:16:41] https://phabricator.wikimedia.org/P5704 [17:16:54] "Host key verification failed." [17:16:58] it must have been gitfat [17:18:04] That's because you need the target host's host key in your known_hosts [17:18:13] oh i see [17:18:44] so it needs to be in gerrit2? and in paladox (which i am deploying as but has the ssh user as gerrit2 in the scap config) [17:28:56] yep it's git fat and the ssh key wil keep getting overwrritten by puppet. So i will use my user. It will work for prod as you will be using archiva which you can clone from but carn't push. [17:29:03] Im using rsync over ssh. [17:34:34] anyways it's almost ready for the wip: part to be removed [17:35:14] just need to add a new config to ln -s plugins from /var/lib/gerrit2/review_site/plugins to /srv/deployment/gerrit/gerrit/plugins [17:35:49] reason why i say config is i doint want to end up breaking something [17:35:58] we can remove the config when we move it to scap fully [17:40:17] I'll have a look at this again next week, won't have time today [17:43:46] ok [17:47:57] Reception123 did the logging in problems start today? [17:49:11] paladox: I very rarely actually log into Gerrit as I don't really contribute a lot, last time I logged in was June 14th. I haven't tried since then, but when I tried today it didn't work. [17:49:25] ah ok thanks [19:31:56] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:56:40] I'm trying to upload a 10M json file to phabricator, but it keeps saying: "No configured storage engine can store this file. See "Configuring File Storage" in the documentation for information on configuring storage engines. [19:57:15] file a bug? i'm not finding anything in phab search related [19:58:52] ebernhardson: I think the file size is too big [19:59:09] :S i guess i'll dump it on rutherfordium then [19:59:14] I just uploaded a copy of sitematrix.json to https://phabricator.wikimedia.org/F8648672 [19:59:25] ebernhardson: We limited it due to the WP0 spam [19:59:48] oh, isuppose that makes sense. I was sure i had uploaded similar files before [19:59:58] When? :) [20:00:07] Try a file under 10M and see if it works? :P [20:00:28] just tried a 4.3M, same error [20:00:46] Curious [20:00:52] Did we drop it further? [20:00:58] Mine was only 128L [20:01:37] from my upload history it looks like this file is larger than the ones i've been uploading previously, others were only ~800kB. [20:04:50] twentyafterfour: ^ what's the phab upload limit atm? [20:08:15] i can live with it though, i can just upload to rutherfordium it juts doesn't have the nice hitsory and searchability :) [20:08:46] they are just ML models documenting progress of training a search ranker for tickets [20:09:57] (Video uploads seem to be sufficiently restricted already by [20:09:57] "storage.mysql-engine.max-size" being set to 4194300). [20:10:12] ebernhardson: 4096KB? :P [20:11:00] lol. ok [20:11:13] this overly large model was an abberation anyways, it documents doing something wrong :) [20:23:06] Reedy you should be able to see the limit some where in /config/ [20:23:10] as your an admin :) [20:23:45] Phab annoys me that I have to manipulate urls to get to those pages [20:24:16] oh [20:24:16] Reedy atleast it gives you the pages at all [20:24:27] LocalSettings4lyf [20:25:04] lol [20:25:33] Reedy did you find the limit? I can get the full url for you if you want? [20:26:10] https://phab-01.wmflabs.org/config/edit/storage.mysql-engine.max-size/ [20:26:26] -> https://phabricator.wikimedia.org/config/edit/storage.mysql-engine.max-size/ [20:28:07] Yup, so it's 4194300 as I already posted :) [20:28:32] Hilarious [20:28:32] storage.s3.bucket [20:28:32] Amazon S3 bucket. [20:28:32] Current Value: null [20:28:39] S3 is linked tohttps://phabricator.wikimedia.org/S3 [20:29:39] lol [20:30:46] is that 4mb? [20:30:50] yes [20:30:53] ok thanks [20:31:01] hmm, i wonder if this will affect diffusion [20:31:05] It's seemingly in bytes for some amusing reason [20:31:08] it uses files when the file is large. [20:35:04] PROBLEM - Puppet errors on integration-slave-docker-1000 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:38:57] PROBLEM - Host deployment-sca02 is DOWN: PING CRITICAL - Packet loss = 37%, RTA = 2401.80 ms [20:39:16] PROBLEM - Host deployment-mathoid is DOWN: CRITICAL - Host Unreachable (10.68.23.236) [20:40:03] thats happening for us too [20:40:19] RECOVERY - Host deployment-sca02 is UP: PING OK - Packet loss = 0%, RTA = 3.30 ms [20:42:48] RECOVERY - Host deployment-mathoid is UP: PING OK - Packet loss = 0%, RTA = 3.18 ms [21:10:03] RECOVERY - Puppet errors on integration-slave-docker-1000 is OK: OK: Less than 1.00% above the threshold [0.0] [21:28:06] (03CR) 10Bearloga: [C: 031] R based job for wikimedia/discovery/ortiz (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/362309 (https://phabricator.wikimedia.org/T153856) (owner: 10Hashar) [21:34:19] (03PS2) 10Jforrester: Add tests for forbidding use of backtick operator [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363518 (owner: 10Legoktm) [21:34:26] (03CR) 10Jforrester: [C: 032] Add tests for forbidding use of backtick operator [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363518 (owner: 10Legoktm) [21:35:21] (03Merged) 10jenkins-bot: Add tests for forbidding use of backtick operator [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363518 (owner: 10Legoktm) [21:53:03] 10Release-Engineering-Team (Kanban), 10Operations, 10Wikimedia-Site-requests, 10Patch-For-Review: Update to interwiki map - https://phabricator.wikimedia.org/T169979#3416997 (10Zppix) a:03demon [21:53:23] 10Release-Engineering-Team (Kanban), 10Wikimedia-Site-requests, 10Patch-For-Review: Update to interwiki map - https://phabricator.wikimedia.org/T169979#3416999 (10demon) 05Open>03Resolved [22:04:38] (03PS2) 10Jforrester: Add Squiz.Classes.SelfMemberReference to ruleset [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363525 (owner: 10Legoktm) [22:20:55] (03CR) 10Jforrester: [C: 031] "Seems entirely sane. Sadly the MW-core test run is useless right now because we can't upgrade yet because of an upstream bug…" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363525 (owner: 10Legoktm) [22:31:53] 10Release-Engineering-Team (Kanban), 10releng-201617-q4, 10MediaWiki-General-or-Unknown, 10MW-1.29-release, 10Release: Release MediaWiki 1.29 - https://phabricator.wikimedia.org/T153271#2874741 (10MacFan4000) All blockers are now resolved. [22:50:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Next): Set scap.cfg's canary_dashboard_url to useful beta logstash url - https://phabricator.wikimedia.org/T168211#3417171 (10greg) [22:51:05] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Next): Set scap.cfg's canary_dashboard_url to useful beta logstash url - https://phabricator.wikimedia.org/T168211#3358637 (10greg) p:05Triage>03Normal [22:55:09] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team, 10MobileFrontend: MobileFrontend Chrome browser test job has become unstable - https://phabricator.wikimedia.org/T167994#3417179 (10greg) Looks like this is better? https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-MobileFrontend/BR... [22:55:29] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team, 10MobileFrontend: MobileFrontend Chrome browser test job has become unstable - https://phabricator.wikimedia.org/T167994#3417183 (10greg) 05Open>03stalled [23:04:03] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team, 10Reading-Web-Backlog: MobileFrontend Chrome browser test job has become unstable - https://phabricator.wikimedia.org/T167994#3417201 (10Jdlrobson) It's definitely improved. The error at https://integration.wikimedia.org/ci/view/Reading-Web/job/sele... [23:26:43] 10MediaWiki-Codesniffer: Allow inline comments in mediawiki codesniffer - https://phabricator.wikimedia.org/T170025#3417287 (10Umherirrender) [23:29:37] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team, 10Reading-Web-Backlog: MobileFrontend Chrome browser test job has become unstable - https://phabricator.wikimedia.org/T167994#3417314 (10greg) Generally, a single failure due to infrastructure instability for the past week seems pretty decent. Not p... [23:35:52] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 10Scoring-platform-team-Backlog: ORES spamming Beat Cluster's logstash - https://phabricator.wikimedia.org/T170026#3417317 (10greg) [23:36:04] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 10Scoring-platform-team-Backlog: ORES spamming Beta Cluster's logstash - https://phabricator.wikimedia.org/T170026#3417330 (10greg) [23:37:25] (03PS1) 10Chad: Fix a billion things wrong with branch handling and submodules [tools/release] - 10https://gerrit.wikimedia.org/r/363986 [23:38:00] (03CR) 10Chad: [C: 032] Fix a billion things wrong with branch handling and submodules [tools/release] - 10https://gerrit.wikimedia.org/r/363986 (owner: 10Chad) [23:39:41] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#3417355 (10greg) >>! In T144006#2689359, @hashar wrote: > What is left is deployment-tmh01 which needs some packaging work for Jessie as I understood it. That was Oct 2016 :... [23:40:09] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#3417359 (10greg) [23:46:35] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10Epic: split mediawiki tests in unit/integration/smoke tests to speed up CI - https://phabricator.wikimedia.org/T162350#3417364 (10greg) [23:49:44] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog): Enhance debian-glue job packages validation - https://phabricator.wikimedia.org/T158553#3417366 (10greg) [23:53:46] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Performance-Team, 10WebPageTest: Where to trigger WebPageTest jobs? - https://phabricator.wikimedia.org/T166756#3417368 (10greg) Looks like the answer here is "dedicated Jenkins worker" and "have Jenkins deal with the concurrency... [23:58:29] 10Release-Engineering-Team (Kanban), 10releng-201617-q4, 10MediaWiki-General-or-Unknown, 10MW-1.29-release, 10Release: Release MediaWiki 1.29 - https://phabricator.wikimedia.org/T153271#2874741 (10mwjames) > All blockers are now resolved. Not sure how you come to this conclusion but for example T167946... [23:58:40] 10Continuous-Integration-Infrastructure, 10Patch-For-Review: Disable core dumps generation on CI labs slaves - https://phabricator.wikimedia.org/T96025#3417377 (10greg) [23:58:47] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#3417374 (10greg) 05Open>03Res...