[00:00:03] *is borked [00:00:12] wonder if I ever puppetised that file [00:00:40] * bd808 tries a hard stop and start [00:00:54] 06Release-Engineering-Team, 15User-greg: Create SOW for contractor - https://phabricator.wikimedia.org/T146711#2669449 (10greg) https://docs.google.com/document/d/1PpKPIv9B6bweXU7OQcBYo_s_mJfg75yxucuK_ln2uRg/edit [00:01:24] A hared restart seems to have fixed it [00:01:38] *hard (typing sux) [00:02:02] the unit reports "active (exited)" though which doesn't seem quite right [00:04:07] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2669408 (10bd808) ``` bd808@deployment-fluorine02:~$ sudo service udp2log-mw status ● udp2log-mw.service Loaded: loaded (/etc/init.d/udp2log-mw; static) Active: active (exited) since Wed 2016... [00:04:52] Krenair: ^ it looks like puppet tries to start it every 30 minutes [00:05:25] I bet it forks in some weird way that confuses systemd [00:08:34] well, with systemd the programs generally shouldn't fork by themselves… [00:09:09] * bd808 is not going to rewrite udp2log to make the systemd authors happy [00:09:43] actually under no circumstances will I rewrite udp2log :) [00:13:29] * Krenair sighs [00:13:40] I'll deal with it later, after putting out a different fire [00:14:14] You should just punt it to someone on techops [00:30:55] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2669507 (10AlexMonk-WMF) [00:33:31] yeah [00:33:37] that'll work [00:33:59] actually it might, there was a ticket for upgrading prod to jessie like this, and that was created by someone in techops [00:39:06] bd808, there is a /lib/systemd/system/udp2log-mw.service file [00:39:09] don't know if it's in use [00:39:18] it still has the --its-systemd from when I was trying to set it up [00:40:30] anyway I have to be up in 6-7 hours, gnight [00:40:39] o/ [00:41:02] Moritz has done systemd stuff that's I've filed in phab before [00:41:17] typing, it's hard yo [00:50:36] PROBLEM - Puppet staleness on deployment-prometheus01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [02:41:45] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2669559 (10Peachey88) [04:15:23] PROBLEM - Puppet staleness on deployment-db1 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [04:19:05] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #154: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/154/ [04:27:02] PROBLEM - Puppet staleness on deployment-db2 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [06:09:18] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2669641 (10jcrespo) I doubt myisam supported that in the first place. [07:58:47] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2669820 (10Aklapper) >>! In T146673#2668203, @mmodell wrote: > It looks like innodb does not support stemming, among other l... [08:20:08] twentyafterfour: sorry I missed you yesterday, the net here was bad and I dropped off and my phone has issues as well so it's a blackhole. I reran a full search reindex that it still going post the weird search blocking and dropped tables fiasco. in theory all things are ok and moving towards consistency [10:25:14] (03PS1) 10Hashar: Fix doc comment for zuul-cloner-extdeps [integration/config] - 10https://gerrit.wikimedia.org/r/312982 [10:32:31] (03CR) 10Hashar: [C: 032] Fix doc comment for zuul-cloner-extdeps [integration/config] - 10https://gerrit.wikimedia.org/r/312982 (owner: 10Hashar) [10:33:38] (03Merged) 10jenkins-bot: Fix doc comment for zuul-cloner-extdeps [integration/config] - 10https://gerrit.wikimedia.org/r/312982 (owner: 10Hashar) [10:34:51] (03PS1) 10Hashar: [MobileFrontend] exp job for npm run lint:modules [integration/config] - 10https://gerrit.wikimedia.org/r/312986 (https://phabricator.wikimedia.org/T146748) [10:37:26] (03CR) 10Hashar: [C: 032] [MobileFrontend] exp job for npm run lint:modules [integration/config] - 10https://gerrit.wikimedia.org/r/312986 (https://phabricator.wikimedia.org/T146748) (owner: 10Hashar) [10:38:10] (03Merged) 10jenkins-bot: [MobileFrontend] exp job for npm run lint:modules [integration/config] - 10https://gerrit.wikimedia.org/r/312986 (https://phabricator.wikimedia.org/T146748) (owner: 10Hashar) [10:52:38] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:56:54] 10Beta-Cluster-Infrastructure: deployment-fluorine02 does not have logs - https://phabricator.wikimedia.org/T146723#2669408 (10hashar) [11:05:18] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2670228 (10hashar) p:05Unbreak!>03High... [11:32:38] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [12:27:45] (03PS6) 10Hashar: (WIP) mediawiki queue testing (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/241660 [12:28:42] (03CR) 10jenkins-bot: [V: 04-1] (WIP) mediawiki queue testing (WIP) [integration/config] - 10https://gerrit.wikimedia.org/r/241660 (owner: 10Hashar) [13:47:10] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #158: 04FAILURE in 3 min 9 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/158/ [14:35:40] (03PS7) 10Hashar: Test mediawiki gate queue only has mw projects [integration/config] - 10https://gerrit.wikimedia.org/r/241660 (https://phabricator.wikimedia.org/T107529) [14:36:19] (03CR) 10jenkins-bot: [V: 04-1] Test mediawiki gate queue only has mw projects [integration/config] - 10https://gerrit.wikimedia.org/r/241660 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [14:48:59] 06Release-Engineering-Team, 06Developer-Relations, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2670772 (10MelodyKramer) I think it's okay to email them early and tell them about the idea so that they know it exists. What you have i... [15:15:36] 06Release-Engineering-Team, 06Developer-Relations, 10Phabricator, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2670856 (10greg) [15:48:14] (03PS1) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [15:49:24] (03CR) 10jenkins-bot: [V: 04-1] Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [15:50:53] (03PS2) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [15:51:20] Another fix for gerrit https://gerrit-review.googlesource.com/#/c/87310/ :) [15:51:43] We will need to update our css file for gerrit until they update gerrit and approve that patch. [15:51:44] (03CR) 10jenkins-bot: [V: 04-1] Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) (owner: 10Hashar) [15:59:26] (03PS3) 10Hashar: Decouple repos from mediawiki gate queue [integration/config] - 10https://gerrit.wikimedia.org/r/313028 (https://phabricator.wikimedia.org/T107529) [16:29:10] !log Cherry-picked https://gerrit.wikimedia.org/r/#/c/313035/ on deployment-puppetmaster [16:29:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:43:25] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671121 (10Paladox) >>! In T142158#257398... [16:43:36] 06Release-Engineering-Team: Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2671122 (10mmodell) [[ https://www.mediawiki.org/wiki/Gerrit/git-review#Debian.2FUbuntu.2FMint | The instructions on wiki ]] point to install version 1.21 [16:44:44] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T145220#2671126 (10AlexMonk-WMF) [16:55:07] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671144 (10Paladox) Maybe * "(Segmentati... [17:14:39] hashar i belive i found the fixes for https://phabricator.wikimedia.org/T142158 but im not 100% sure, but the bugs traceback to a specific function in gc looks very similar to ours [17:17:00] hashar it got fixed in php 5.6.15 [17:17:09] Per changelog [17:18:27] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671206 (10Paladox) I belive I have found... [17:18:36] changelog https://phabricator.wikimedia.org/P4123 [17:19:32] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:20:50] 06Release-Engineering-Team: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2671208 (10MBinder_WMF) [17:24:01] 06Release-Engineering-Team: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2671208 (10Paladox) I belive there was a search problem yesterday and they are reindexing search currently. It may take a while to reindex everything since they had to switch from myisa... [17:24:15] chasemp ^^ would the reindexing cause that? [17:26:01] paladox: yes (cc chasemp ) [17:26:10] Oh thanks [17:26:43] I explained the problem but i need to link to the task /me updates my comment to include T146673 [17:29:06] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2667974 (10Paladox) It seems the switch over to innodb may have caused T146789 Looking online MySQL 5.7 supports full text... [17:32:14] 06Release-Engineering-Team, 10Phabricator: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2671253 (10hashar) [17:43:31] 10Beta-Cluster-Infrastructure, 10Wikimedia-Extension-setup, 07Category, 10FileAnnotations (Beta Cluster Release): Release FileAnnotations on the Beta Cluster - https://phabricator.wikimedia.org/T144302#2671281 (10MarkTraceur) [17:47:29] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671325 (10Paladox) https://bugs.php.net/... [17:50:12] 06Release-Engineering-Team: Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2671335 (10hashar) That is one thing to update. There is also multiple odd ways of installing: * `sudo` which really should never be used. Instead the recommended way would be to install in... [17:56:02] 06Release-Engineering-Team: Remove .gitreview from MediaWiki and Extensions - https://phabricator.wikimedia.org/T146293#2671343 (10hashar) Oh and I found some light/duplicate informations on https://www.mediawiki.org/wiki/Gerrit/Tutorial#Installing_git-review :( [17:56:46] 10Continuous-Integration-Infrastructure, 10Graphoid: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671345 (10Paladox) [18:05:46] 10Continuous-Integration-Infrastructure, 10Graphoid: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671366 (10Paladox) Yep I'm correct https://github.com/dominictarr/feedopensource/issues/25 Were missing package libgif-dev [18:11:42] (03PS1) 10Paladox: nodepool: Add packages libcairo2-dev, libpango1.0-dev and libgif-dev [integration/config] - 10https://gerrit.wikimedia.org/r/313044 (https://phabricator.wikimedia.org/T146783) [18:22:40] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671457 (10Paladox) Fixes we should backp... [18:24:22] PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42) [18:24:55] 06Release-Engineering-Team, 10Phabricator: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2671464 (10greg) [18:24:58] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2671465 (10greg) [18:25:44] 06Release-Engineering-Team, 10Phabricator: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2671208 (10greg) The reindex appears to still be in progress: https://grafana-admin.wikimedia.org/dashboard/db/mysql?panelId=2&fullscreen&from=1474886388229&to=147490282... [18:29:23] ebernhardson hi i doint think your patch https://gerrit.wikimedia.org/r/#/c/306072/ correctly disabled garbage collection [18:29:38] since you did -dzend.enable_gc=0 instead of -d zend.enable_gc=0 [18:30:01] since it wont work if you have -d next to zend since it will consider that a variable or a config [18:30:09] where in fact it is a ini config [18:30:57] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2667974 (10greg) >>! In T146673#2671231, @Paladox wrote: > It seems the switch over to innodb may have caused T146789 That'... [18:31:12] (03Draft1) 10Paladox: Correctly disable garbage collection [integration/jenkins] - 10https://gerrit.wikimedia.org/r/313048 (https://phabricator.wikimedia.org/T142158) [18:31:15] (03Draft2) 10Paladox: Correctly disable garbage collection [integration/jenkins] - 10https://gerrit.wikimedia.org/r/313048 (https://phabricator.wikimedia.org/T142158) [18:32:13] 06Release-Engineering-Team, 10Phabricator, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2671505 (10Paladox) Oh, but In innodb depending on what MySQL or mariadb version you have dosent support full text support.... [18:32:35] ebernhardson ^^ fixed, hopefully that will disable it now [18:32:36] :) [18:40:45] (03PS1) 10Paladox: Disable garbage collection for mw-phpunit.sh too [integration/jenkins] - 10https://gerrit.wikimedia.org/r/313051 (https://phabricator.wikimedia.org/T142158) [18:40:55] paladox: ? [18:41:48] ebernhardson I am trying to find out why the problem still happends [18:41:54] paladox: oh, about garbage collection? The problem is (after talking to a php core contributor, who also works for us :) that disabling garbage collection is really just papering over the problem [18:41:54] for https://phabricator.wikimedia.org/T142158 [18:42:16] ebernhardson i belive i have found a fix [18:42:21] It is in php 5.6.15 [18:42:29] paladox: what happens is everything gets dumped into the garbage collector, so somewhere something is wrong and it happens to error in the garbage collector because everything ends up there [18:42:42] See https://phabricator.wikimedia.org/T142158#2671325 [18:42:54] Which links to the bugs that describe the same problem were having [18:43:14] But the fix is in php 5.6.15 and not in php 5.5 which is now un supported [18:43:22] but we could backport the fix and make a wmf release [18:43:58] paladox: can't hurt to try [18:44:08] Yep [18:44:32] (03CR) 10Hashar: [C: 04-1] "That should be done in operations/puppet.git via service::packages" [integration/config] - 10https://gerrit.wikimedia.org/r/313044 (https://phabricator.wikimedia.org/T146783) (owner: 10Paladox) [18:44:40] But i have no idea how we can backport and make a release for wmf, maybe hashar or legoktm knows? [18:45:16] 10Continuous-Integration-Infrastructure, 10Graphoid, 13Patch-For-Review: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671552 (10hashar) For the context as to how dependencies are installed on the CI slaves see T119693 In puppet.git edit `modules/graphoid/manifests/packa... [18:46:42] 10Continuous-Integration-Infrastructure, 10Graphoid, 13Patch-For-Review: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671557 (10Paladox) @hashar that has already been done, see https://github.com/wikimedia/operations-puppet/blob/3218df65dcc4c9d42ce6deef0e130db817613f58/m... [18:47:44] paladox: will be a giant pain, it doesn't look like we currently maintain our own zend fork [18:48:26] Oh, well we could easly create a repo or find someone who is update php 5.5 and ask them to use those fixes nicely [18:48:27] ebernhardson: paladox: we used to have cherry pick, at least for Zend 5.3 [18:48:33] but ops are reluctant to keep doing that [18:48:53] oh [18:48:53] so I guess we would want to disable the garbage collector again :] [18:49:01] I thought it was disabled [18:49:12] hashar i have this patch https://gerrit.wikimedia.org/r/#/c/313051/ [18:49:13] I have linked to the patch iirc [18:49:25] that disables it for the rest of php unit tests [18:49:25] it was harnessed with a PHP v < 5.4 or something [18:49:34] I also have https://gerrit.wikimedia.org/r/#/c/313048/ [18:49:37] so when we have bumped mw requirement to 5.5 the condition has been removed [18:49:43] oh [18:49:45] until we hit that bug which causes some random issue in the gc [18:50:00] Could i have the link so we can bump that to 5.5 [18:50:00] there are most probably upstream bugs filled with a test case [18:50:03] please? [18:50:08] and running the test case would confirm [18:50:18] yep i belive i found some bugs [18:50:24] that were fixed in php 5.6 [18:50:33] hashar see https://phabricator.wikimedia.org/T142158#2671325 [18:52:02] (03CR) 10Paladox: "@hashar it already exist there at https://github.com/wikimedia/operations-puppet/blob/3218df65dcc4c9d42ce6deef0e130db817613f58/modules/gra" [integration/config] - 10https://gerrit.wikimedia.org/r/313044 (https://phabricator.wikimedia.org/T146783) (owner: 10Paladox) [18:58:02] 06Release-Engineering-Team, 06Developer-Relations, 10Phabricator, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2671604 (10EdErhart-WMF) Hi all! Thanks for pinging me on this. Mel has some great points above, and all I'd add is the... [18:58:14] paladox: your zend.gc patch is a good find :] [18:58:31] Oh thanks :) [19:03:00] paladox: but that does not change anything [19:03:15] Oh, i thought it disables gc [19:03:57] (03Abandoned) 10Hashar: Correctly disable garbage collection [integration/jenkins] - 10https://gerrit.wikimedia.org/r/313048 (https://phabricator.wikimedia.org/T142158) (owner: 10Paladox) [19:04:43] hashar i found out why wikibase is failing [19:04:44] https://github.com/wikimedia/integration-config/blob/eb480d87db07c0d3ecab68f63f17f922130814a1/jjb/wikidata.yaml#L15 [19:04:51] they wernt applying the disable gc patch [19:04:56] i am going to upload a patch now [19:07:07] (03PS1) 10Paladox: Disable garbage collection for wikibase tests [integration/config] - 10https://gerrit.wikimedia.org/r/313056 (https://phabricator.wikimedia.org/T142158) [19:07:12] hashar ^^ [19:07:13] :) [19:09:09] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 10MediaWiki-Unit-tests, 10MediaWiki-extensions-WikibaseClient, and 4 others: Job mediawiki-extensions-php55 frequently fails due to "Segmentation fault" - https://phabricator.wikimedia.org/T142158#2671652 (10Paladox) >>! In T142158#260329... [19:24:50] (03PS1) 10Hashar: Drop beta-cxserver-update-eqiad [integration/config] - 10https://gerrit.wikimedia.org/r/313057 [19:27:36] (03CR) 10Hashar: [C: 032] Drop beta-cxserver-update-eqiad [integration/config] - 10https://gerrit.wikimedia.org/r/313057 (owner: 10Hashar) [19:28:21] (03Merged) 10jenkins-bot: Drop beta-cxserver-update-eqiad [integration/config] - 10https://gerrit.wikimedia.org/r/313057 (owner: 10Hashar) [19:34:57] (03PS2) 10Paladox: Disable garbage collection for wikibase tests [integration/config] - 10https://gerrit.wikimedia.org/r/313056 (https://phabricator.wikimedia.org/T142158) [19:37:42] hashar would that ^^ work? [19:42:36] (03Abandoned) 10Hashar: nodepool: Add packages libcairo2-dev, libpango1.0-dev and libgif-dev [integration/config] - 10https://gerrit.wikimedia.org/r/313044 (https://phabricator.wikimedia.org/T146783) (owner: 10Paladox) [19:42:59] (03CR) 10Paladox: "oh" [integration/config] - 10https://gerrit.wikimedia.org/r/313044 (https://phabricator.wikimedia.org/T146783) (owner: 10Paladox) [19:43:34] 10Continuous-Integration-Infrastructure, 10Graphoid, 13Patch-For-Review: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671713 (10hashar) So the build fails because: ``` 00:00:42.192 ../src/Image.h:19:21: fatal error: gif_lib.h: No such file or directory 00:00:42.192 #inc... [19:55:41] (03Abandoned) 10Hashar: Revert "Move npm-node-4 off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/306722 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [19:59:53] paladox: the garbage collector thing for Zend, I will look at it tomorrow (unless i forget :D) [20:07:09] (03PS1) 10Hashar: Bring back npm-node-4 to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/313061 (https://phabricator.wikimedia.org/T143938) [20:07:36] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2671789 (10hashar) [20:10:17] hashar ok, thanks :) [20:10:23] (03CR) 10Hashar: [C: 032] Bring back npm-node-4 to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/313061 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [20:10:58] hashar i can remind you tomarror about gc if you want, but then again i may forget, but hopefully not [20:11:11] (03Merged) 10jenkins-bot: Bring back npm-node-4 to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/313061 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [20:11:14] paladox: yeah that would be appreciated :D [20:11:22] Ok :) [20:11:46] !log Reloading Zuul to deploy 3c3289aa1a for T143938 and T146783 [20:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:12:00] If deployment-mira is the new deploy server then a lot of things on wikitech need to be updated -- https://wikitech.wikimedia.org/w/index.php?search=deployment-tin [20:12:13] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2671800 (10hashar) [20:12:26] maybe someone should make a template for the deployment-prep deploy server name [20:12:43] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2584030 (10hashar) I have moved the `node-npm-4` job back. Is roughly a dozen of builds per hours, hardly a dent. [20:13:01] bd808: I guess we should switch back [20:14:01] 10Continuous-Integration-Infrastructure, 10Graphoid, 13Patch-For-Review: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671803 (10hashar) 05Open>03Resolved a:03hashar The job has been moved to the proper instances :] Sorry! [20:14:02] hashar your getting to have -jessie in the npm 4 name [20:14:02] paladox: thanks for the Graphoid recheck [20:14:07] Your welcome [20:14:14] deployment-tin doesn't even seem to be online. I'm getting "No route to host" when trying to ssh to it [20:14:16] yeah the name is a bit inconsistent [20:14:17] getting = forgetting [20:14:25] Since it wont go offline [20:14:58] bd808: yeah we named it deployment-tin02 . Will fill a task to drop that one, create a new one named deployment-tin and make it the primary [20:15:22] !sal [20:15:22] https://tools.wmflabs.org/sal/releng [20:15:40] ^^the most useful and most reliable tool on toolslabs :D [20:15:54] reusing hostnames in labs can have problems. Sometimes things don't clean up properly in out dns [20:16:06] (03PS1) 10Paladox: Add node-4 to parameter_functions.py for nodepool_params [integration/config] - 10https://gerrit.wikimedia.org/r/313066 [20:16:07] hashar ^^ [20:16:11] heh. I'm not sure it's the most reliable [20:16:14] (03PS2) 10Paladox: Add node-4 to parameter_functions.py for nodepool_params [integration/config] - 10https://gerrit.wikimedia.org/r/313066 [20:17:26] 10Continuous-Integration-Infrastructure, 10Graphoid, 13Patch-For-Review: npm-4 jenkins fails on canvas install - https://phabricator.wikimedia.org/T146783#2671808 (10Paladox) This is now fixed as @hashar has brought the tests back to nodepool :) [20:17:52] Woops didnt notice you closed ^^ that task [20:18:10] 10Beta-Cluster-Infrastructure, 10Graphoid, 06Services: Automate Graphoid deployment to beta cluster (and auto-rebuild?) - https://phabricator.wikimedia.org/T146810#2671809 (10Yurik) [20:19:19] (03Abandoned) 10Paladox: Add node-4 to parameter_functions.py for nodepool_params [integration/config] - 10https://gerrit.wikimedia.org/r/313066 (owner: 10Paladox) [20:24:14] yurik: for autoupdate of graphoid and other services on beta , we have a task somewhere but I cant find it [20:24:34] yurik: the idea is that since those repositories are using scap3 nowadays, we can just trigger a job that run on the beta cluster deployment server [20:24:46] that will then: cd /srv/deployment/$GERRIT_REPO_NAME && scap3 deploy [20:24:49] or something like that [20:26:29] autodeployment task: https://phabricator.wikimedia.org/T131857 [20:26:45] we should sprint that at one point :D [20:26:59] I am pretty sure it is going to be straightforward [20:27:37] yup. deployment tin is already a jenkins node. Just have to make sure jenkins-deploy is in the deploy-service group [20:28:21] hashar looks like tmh videojs beta feature is almost ready to be merged https://gerrit.wikimedia.org/r/#/c/312327/ :) [20:28:29] will be getting a new looking video player [20:28:52] :), that supports mobiles and plugins, so we can extend it now :) [20:30:24] hashar a test site https://ogvjs-testing.wmflabs.org/wiki/Video has already been setup by brion, with videojs enabled by default :) [20:31:50] hashar, that won't work :( [20:32:13] i wrote in the bug about it -- we only merge the deploy repo right before push to production [20:32:14] yurik: define "that" ? :D [20:32:18] ah [20:32:32] well I guess we can have beta run out of master? [20:32:51] it would be very cool if we would run the build script (which uses docker) once a day [20:33:16] plus whenever the master changes [20:33:19] this way it would pick up the dependencies too [20:33:31] can we run docker build scripts in jenkins? [20:33:32] docker? [20:33:35] no [20:33:48] bummer - that's how those deploy repos get built [20:34:26] at the moment we run them on our machines, and commit all the changed files to the deploy repos, merge and deploy it right away [20:40:37] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 07Nodepool: Investigate why Nodepool instances are sometime slow to reach READY state - https://phabricator.wikimedia.org/T146813#2671885 (10hashar) [20:41:39] yurik: I guess that should indeed be made automatic [20:41:52] yurik: what Wikidata people are doing is that they have a build system that triggers daily [20:42:03] send the build result to gerrit for CI to test the build [20:42:22] and that is landed on beta days before deployment to prod (iirc) [20:43:39] hashar, isn't wikidata all php based? [20:43:39] right, but again - that's a very different tool chain [20:43:39] all nodejs-based services are built in a docker contaier [20:46:57] yurik: nodejs/php/docker/instance whatever! In the end it is all about integrating bunch of code together and testing it :] [20:47:12] if we could at least auto deploy your /deploy repo on beta, that would be a good start [20:47:13] hashar, true, but i thought you said you cannot run docker in the test env? [20:47:42] as to how you build it / why you are using Docker.. that is a different problem related to making the build automatic [20:47:54] (different but equally desirable :D ) [20:47:59] automatize everything! [20:49:56] i agree, but the "value" of auto-deploying merged deploy repo to beta cluster is fairly low - i usually do a manual test by doing a git review of the deploy repo, but then, before merging it, i pull it into a user dir on the production server and run it with a slightly different configuration (different port, 1 instance) - and test if everything is still good. I know this is not ideal (far from it), but it allows me to catch most problems before [20:49:56] deploying [20:55:11] yurik: but that is only one aspect of the beta cluster. it's an integration environment. Not just for you but for everyone else who might depend (including MW) on you/your service. [20:55:13] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 07Nodepool: Investigate why Nodepool instances are sometime slow to reach READY state - https://phabricator.wikimedia.org/T146813#2671989 (10hashar) [20:55:35] by only testing in production right before you deploy in prod you cut off a ton of useful testing potential [20:55:47] greg-g, i agree - that is a major problem [20:56:07] but it is based on our practice of only merging right before deploy [20:56:14] so, your "fairly low" above is actually "fairly dang high and important and identifying a major problem" [20:56:25] which isn't healthy either [20:56:36] mergning right before deploy is horrible [20:56:41] full stop. [20:57:05] yurik: what is your plan to rectify this? [20:57:10] greg-g, agreed once more :) Your proposals are welcome on how we can fix this :) [20:57:13] lol [20:58:02] I haven't seen any questions from you/your team about how to do this until now and I haven't seen any good faith guesses/attempts yet either. [20:58:07] greg-g, this is a practice that was recommended by the services team - they might have changed their recommendation by now [20:58:33] merging right before deploy? That is wrong. [20:58:58] the whole "deploy repo" approach is also wrong :( [20:59:09] because we keep large binary blobs in git [20:59:23] scap supports git-fat [20:59:31] * yurik looks up the fat [21:00:05] yurik: our tooling provides what you need to do this right, I can 98% garauntee it [21:00:24] greg-g, perhaps we should discuss the overall services approach once more in Januarry [21:00:30] you might have to change how you do some things, but that's the cost side of the cost/benefit analysis of doing things consistently [21:01:10] is there a task about auto deploying tileator(?) [21:01:13] in beta [21:01:15] i don't think it should be solved just for the two of our services (graphoid & maps) when we have ~10. We should establish a good practice [21:01:40] tilerator/kartotherian is a huge can of warms for this kind of deployment, because it requires a humongous DB :) [21:01:42] and what the issues/blockers are for that to happen [21:01:44] but yes, there is a phab ticket [21:02:01] bare metal hardware [21:02:07] with 1TB SSD [21:02:15] link the task, not hardware requirements [21:12:14] greg-g, can't find the task at the moment - we had a few discussions about it with ops - its mostly that ops don't have an easy way to configure real servers for testing. [21:12:25] so its a work in progress [21:12:36] gehel had some thoughts about it [21:13:06] graphoid needs real hardware? [21:13:09] I doubt that [21:13:15] no, it doesn't [21:13:17] graphoid is running beta right now [21:13:27] i was only talking about maps [21:13:28] is that also a "merge right before deploy"? [21:14:52] yes. I am happy to discuss it - (cc: mobrovac_ and gwicke` ) -- unless it has changed since a year ago, this is how all services are being built [21:15:06] and we have close to 10 i think? [21:15:11] might be mistaken [21:15:17] ok, I saw scrollback of "let's get autodeploying in beta working" and then confusion [21:15:22] so I'm trying to figure out that confusion [21:15:37] parsoid is auto-updated in beta, ftr [21:16:06] not sure it is still autoupdated [21:16:15] I think we (I?) got rid of the autoupdate [21:17:24] i would like the autodeploy to work, and that's why i raised the issue to begin with, but my understanding is that we don't have the needed tooling to build the deployment packages in docker containers [21:18:34] that is two different topics yurik :] [21:19:10] first is how to package up the code+dependencies etc [21:19:15] hashar, the topic of autobuilding is the same as autodeployment :) We don't build it unless we plan to deploy it soonish :) [21:19:31] the result is proposed / hosted somewhere [21:19:42] then the second topic is how to auto deploy that package [21:20:02] and a future topic would be: trigger tests instead of manually checking whether everything is fine :] [21:20:09] 06Release-Engineering-Team (Deployment-Blockers), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 07Beta-Cluster-reproducible, and 7 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesi... - https://phabricator.wikimedia.org/T145819#2672111 [21:20:24] what I am saying is that build and deploy should be decoupled [21:20:47] and I am pretty sure we can trivially solve the auto deploy part [21:21:13] (after all it is just about pulling the /deploy and running scap3) [21:21:22] BUT you would have to slightly adjust your workflow [21:21:36] so that instead of merging and the deploy straight to prod [21:21:55] you would: merge -> wait for deploy on beta -> do tests / run test suite or whatever -> push to prod [21:22:09] hashar, all is correct, but the true value would come from the master being built automatically and deployed - just like we have with mediawiki. Relying on a human to run the build script is pretty bad [21:22:19] which is really similar to what you are used to do. But then , I have no idea how other service team do it [21:22:37] PROBLEM - Keyholder status on deployment-mira is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:22:41] then yeah we can look at making the build automatic [21:23:04] I know WMDE has a build script for Wikidata (which is a colletion of extensions and edepdnencies [21:23:17] hashar, but WMDE is not a service, are they? [21:23:24] it is irrelevant [21:23:34] it is a very different tool chain - that's my point :) [21:23:41] they have the same workflow as you :D [21:23:54] yeah the actual tools are different [21:23:57] i am all for having auto-build packages [21:24:02] but the chain / steps are similar [21:24:36] well get a task and CC us [21:24:38] (03PS1) 10Harej: Specifying EventLogging as a CollaborationKit dependency. [integration/config] - 10https://gerrit.wikimedia.org/r/313122 [21:24:40] perhaps. So again, we really need to involve the services team - they do all the services plumbing - i simply use it [21:24:56] if you can point to material explaining the build tooling you are using , we can get with some ideas to make it automatic [21:25:22] and you can survey other services maintainers, they might have ideas / different point of view [21:25:44] in the end what matters to me (and I guess releng) is that you devs dont waste time doing manual steps [21:26:03] and get reasonable automatic testing (again to save manual time) [21:26:49] yurik: anyway I am gonna sleep sorry :( But the deploy part can be covered via https://phabricator.wikimedia.org/T131857 or a sub task [21:27:00] hashar, https://github.com/wikimedia/service-runner#service-runner [21:27:03] look for "build" [21:27:06] the autobuild, get us a task with a doc about your build process and we can see :] [21:27:14] s/see/look/ [21:27:43] there used to be a big doc, but they moved it somewhere, looking... [21:28:03] yurik: well I am not going to read it tonight for sure [21:28:22] yurik: but if more or less all services are building their /deploy the same way, that will be fairly trivial to automatize [21:28:39] the same it is trivial to automatize the deployment (ie: run scap3) [21:28:57] I sense it might just be about service-runner build [21:28:57] :D [21:30:07] anyway I have to sleep for real :( [21:30:43] good night :) [21:31:20] thx!!! [21:31:37] yurik: oh and Graphoid npm job is fixed for real/Definitely :] [21:41:27] (03CR) 10Paladox: [C: 031] Specifying EventLogging as a CollaborationKit dependency. [integration/config] - 10https://gerrit.wikimedia.org/r/313122 (owner: 10Harej) [21:59:03] Yippee, build fixed! [21:59:03] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #159: 09FIXED in 1 min 2 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/159/ [22:22:49] 06Release-Engineering-Team, 10Phabricator: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2672384 (10mmodell) The innodb fulltext search index may not behave exactly the same as the old search engine. I would like to experiment with elasticsearch again since... [22:30:12] 06Release-Engineering-Team, 06Developer-Relations, 10Phabricator, 10Wikimedia-Blog-Content: blog.wikimedia.org post on Phabricator improvements - https://phabricator.wikimedia.org/T141457#2672393 (10mmodell) >>! In T141457#2671604, @EdErhart-WMF wrote: > Hi all! Thanks for pinging me on this. Mel has some... [22:54:49] 06Release-Engineering-Team, 10Phabricator: Phab Advanced Search no longer showing typical results - https://phabricator.wikimedia.org/T146789#2672425 (10Paladox) +1 to using elasticsearch, @mmodell could you setup a separate task for that and possibly make that a high priority please?