[04:30:46] PROBLEM - Puppet errors on deployment-puppetdb01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [04:59:03] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<55.56%) [05:10:47] RECOVERY - Puppet errors on deployment-puppetdb01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:57:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [06:58:30] 10Continuous-Integration-Infrastructure (shipyard), 10Operations, 10Patch-For-Review, 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3689558 (10Joe) I have a proposal: what about controlling semantic versioning via the changelog but allowing peopl... [07:04:02] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [08:13:03] hashar: sorry to bother you again ... but https://gerrit.wikimedia.org/r/#/c/383830/ ? [08:19:19] gehel: bonjour. I was too busy yesterday :D [08:19:50] hashar: No problem. This is still not urgent, so you can still feel free to ignore it. [08:20:11] I'll keep pinging you, just to make sure you ignore this on prupose and not just forget about it :) [08:21:28] oh I get mail notifications from Gerrit :d [08:21:39] I am looking at it right now [08:23:53] (03CR) 10Hashar: "A subtile difference is that git submodules would no more be processed. I guess I am going to make '{name}-maven' to process them and reb" [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:24:44] ^ damn, another reason to not like git submodules :) [08:24:47] ;D [08:35:54] (03PS1) 10Hashar: jjb: maven jobs now process submodules [integration/config] - 10https://gerrit.wikimedia.org/r/384664 [08:36:09] gehel: moarr madness ^ [08:36:19] I made the maven jobs to process git submodules [08:37:25] hashar: thanks! [08:41:38] (03CR) 10Gehel: [C: 031] "LGTM (as far as I can understand it)" [integration/config] - 10https://gerrit.wikimedia.org/r/384664 (owner: 10Hashar) [08:42:17] (03PS4) 10Hashar: Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:43:15] (03CR) 10Addshore: [C: 032] final umask & ENV fixes for mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/384597 (owner: 10Addshore) [08:43:27] (03CR) 10Addshore: [C: 032] Switch from mwext-php70-phan-jessie to mwext-php70-phan-docker [integration/config] - 10https://gerrit.wikimedia.org/r/384614 (owner: 10Addshore) [08:43:41] wheeee, lets do it! [08:43:41] (03CR) 10Hashar: [C: 032] "Rebased on top of https://gerrit.wikimedia.org/r/#/c/384664/" [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:44:00] (03CR) 10Hashar: [C: 032] jjb: maven jobs now process submodules [integration/config] - 10https://gerrit.wikimedia.org/r/384664 (owner: 10Hashar) [08:44:01] hashar: going to switch the phan jobs to docker for extensions today :) [08:44:20] gehel: +2 ed :) [08:44:33] addshore: I am not around today though :D [08:44:43] meh, I will be! :D [08:45:02] addshore: one sure thing: make sure that no state is kept between runs. Eg the $WORKSPACE/cache or $WORKSPACE/log end up not being clearable by Jenkins [08:45:07] since files belong to nobody:nogroup [08:45:19] hashar: yup, that was what looks me so long with this [08:45:27] I set the umask before creating any files [08:45:58] have 1 docker image that sets up the src directory, and then another one that runs the tests on the src dir. The dirs are cleared at the start of each run (as with the other docker jobs) [08:46:25] might be worth thinking about always setting the umask for the user [08:46:38] OR, in the base CI image, make a user called CI, and set the umask :) [08:46:51] (03Merged) 10jenkins-bot: final umask & ENV fixes for mwext-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/384597 (owner: 10Addshore) [08:46:53] * addshore ends brain dump [08:47:47] (03Merged) 10jenkins-bot: Switch from mwext-php70-phan-jessie to mwext-php70-phan-docker [integration/config] - 10https://gerrit.wikimedia.org/r/384614 (owner: 10Addshore) [08:47:49] (03Merged) 10jenkins-bot: jjb: maven jobs now process submodules [integration/config] - 10https://gerrit.wikimedia.org/r/384664 (owner: 10Hashar) [08:47:51] (03CR) 10jerkins-bot: [V: 04-1] Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:48:15] (03CR) 10Hashar: [C: 032] Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:52:43] (03PS5) 10Hashar: Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:52:54] (03CR) 10Hashar: [C: 032] Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:53:04] addshore: bah I fetched the repo on contint1001 [08:53:10] addshore: I guess you can fab deploy_zuul [08:53:19] ack! will do! [08:53:28] my fab command for some reaosn is broken right now [08:54:25] !log reload zuul for https://gerrit.wikimedia.org/r/384614 [08:54:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:54:35] (03Merged) 10jenkins-bot: Switch wikidata query service to use generic maven job template [integration/config] - 10https://gerrit.wikimedia.org/r/383830 (owner: 10Gehel) [08:55:21] hashar: Thanks! I'll recheck https://gerrit.wikimedia.org/r/#/c/383791/ to validate... [08:55:33] !log delete unused mwext-php70-phan-jessie-docker 'project' in jenkins UI [08:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:58:05] sweeeet :) [08:58:57] gehel: change deployed! [08:59:12] gehel: merge conflict apparently [08:59:32] yep, checking... [08:59:39] I am away for a bit, more paper work [08:59:44] will catch up in half an hour or so [08:59:47] hashar: good luck! [09:02:03] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:10:18] 10Browser-Tests-Infrastructure, 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 6 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3689759 (10zeljkofilipin) [09:13:39] gehel: I did a recheck of the last merged change ( https://gerrit.wikimedia.org/r/#/c/384648/1 ) https://integration.wikimedia.org/ci/job/wikidata-query-rdf-maven/2/console [09:14:36] hashar: I thought you were busy on something else ... :) [09:14:55] * gehel understands... paperwork is sooo boring... [09:31:54] heh, hashar think phan for core is done for docker now too, I'll switch that too today and watch how it does :) [09:33:18] (03PS1) 10Addshore: mediawiki-core-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/384672 [09:33:20] (03PS1) 10Addshore: Use mediawiki-core-php70-phan-docker for core [integration/config] - 10https://gerrit.wikimedia.org/r/384673 [09:43:21] (03CR) 10Addshore: [C: 032] mediawiki-core-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/384672 (owner: 10Addshore) [09:43:25] (03PS1) 10Addshore: Remove mediawiki-core-php70-phan-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/384676 [09:43:36] (03CR) 10Addshore: [V: 031 C: 032] "https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan-docker/1/" [integration/config] - 10https://gerrit.wikimedia.org/r/384673 (owner: 10Addshore) [09:44:34] (03Merged) 10jenkins-bot: mediawiki-core-php70-phan-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/384672 (owner: 10Addshore) [09:44:50] (03Merged) 10jenkins-bot: Use mediawiki-core-php70-phan-docker for core [integration/config] - 10https://gerrit.wikimedia.org/r/384673 (owner: 10Addshore) [09:45:48] !log reload zuul for https://gerrit.wikimedia.org/r/384673 [09:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:50:22] addshore: awesome :) [09:57:43] (03Abandoned) 10Addshore: docker: git [integration/config] - 10https://gerrit.wikimedia.org/r/378534 (owner: 10Addshore) [09:58:01] (03Abandoned) 10Addshore: docker: puppet, shallow fetches & no second clone [integration/config] - 10https://gerrit.wikimedia.org/r/374507 (owner: 10Addshore) [10:02:16] (03PS1) 10Addshore: docker: mediawiki-phan use phan/phan instead of etsy/phan [integration/config] - 10https://gerrit.wikimedia.org/r/384680 [10:04:19] (03CR) 10Addshore: [C: 032] docker: mediawiki-phan use phan/phan instead of etsy/phan [integration/config] - 10https://gerrit.wikimedia.org/r/384680 (owner: 10Addshore) [10:06:26] (03Merged) 10jenkins-bot: docker: mediawiki-phan use phan/phan instead of etsy/phan [integration/config] - 10https://gerrit.wikimedia.org/r/384680 (owner: 10Addshore) [10:07:10] (03CR) 10Addshore: "I would be pro trying to switch to the 'pattern' I am using for the phan jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [10:07:53] (03CR) 10Addshore: [C: 031] Convert tabs to 4 spaces [integration/config] - 10https://gerrit.wikimedia.org/r/381286 (owner: 10Hashar) [10:13:13] I am off for real ! [10:13:46] (03CR) 10Hashar: "Ahhhh I will have to look at the phan pattern. I guess at some point we will need some kind of reference guide line :]" [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [10:14:08] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/381286 (owner: 10Hashar) [10:14:18] * hashar waves [10:15:00] 10Continuous-Integration-Infrastructure (shipyard): Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3666829 (10Addshore) So I had to add a BUNCH of zuul and other env vars to the docker-zuul-env macro for the phan jobs. It now looks like this. ``` -... [10:15:20] (03CR) 10jerkins-bot: [V: 04-1] Convert tabs to 4 spaces [integration/config] - 10https://gerrit.wikimedia.org/r/381286 (owner: 10Hashar) [10:17:56] 10Continuous-Integration-Infrastructure (shipyard), 10Operations, 10Patch-For-Review, 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3689890 (10Addshore) 1 more thing to throw into the mix. Right now we have a mediawiki-phan image, and I want to... [10:18:11] bye hashar! [10:18:55] (03PS3) 10Hashar: Convert tabs to 4 spaces [integration/config] - 10https://gerrit.wikimedia.org/r/381286 [10:19:19] (03CR) 10Hashar: "Now ignores binary files:" [integration/config] - 10https://gerrit.wikimedia.org/r/381286 (owner: 10Hashar) [10:19:25] ahh lunchhhh [10:31:41] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3689932 (10zeljkofilipin) This is no longer CI/Ruby problem, but beta/configuratio... [10:32:38] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Developer-Relations (Oct-Dec 2017), 10User-zeljkofilipin: WebdriverIO tech talk - https://phabricator.wikimedia.org/T171852#3689934 (10zeljkofilipin) p:05Normal>03High [10:33:04] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3689935 (10WMDE-leszek) Thanks a lot @zeljkofilipin. I believe we should take it ov... [10:59:30] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3689991 (10Tobi_WMDE_SW) @zeljkofilipin Thanks! I have a related question, would it... [11:15:06] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 3 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3690000 (10zeljkofilipin) @Tobi_WMDE_SW It should already take screenshots, but I h... [11:37:59] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.011 second response time [11:43:26] (03CR) 10Zfilipin: Run Selenium tests for Reading Web extension (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/384041 (https://phabricator.wikimedia.org/T162256) (owner: 10Zfilipin) [12:30:57] 10Gerrit, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board, 10Unplanned-Sprint-Work: Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3690315 (10bmansurov) >>! In T178189#3688117, @Legoktm wrote: > Is there a reason we're not using the Debian packag... [13:04:28] Yippee, build fixed! [13:04:29] Project selenium-Math » chrome,beta,Linux,BrowserTests build #547: 09FIXED in 27 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/547/ [13:04:34] Yippee, build fixed! [13:04:34] Project selenium-Math » firefox,beta,Linux,BrowserTests build #547: 09FIXED in 33 sec: https://integration.wikimedia.org/ci/job/selenium-Math/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/547/ [13:08:08] 10Continuous-Integration-Infrastructure (shipyard), 10WMDE-Analytics-Engineering, 10Patch-For-Review, 10User-Addshore, and 2 others: Have CI run lintr for analytics/wmde/WDCM R files - https://phabricator.wikimedia.org/T176194#3690442 (10Tobi_WMDE_SW) [13:08:13] 10Release-Engineering-Team (Watching / External), 10Contributors-Team, 10MobileFrontend, 10Operations, and 4 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3690441 (10Tobi_WMDE_SW) [13:12:14] (03PS1) 10Aude: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/384700 [13:14:51] (03CR) 10Aude: [C: 032] Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/384700 (owner: 10Aude) [13:15:30] (03Merged) 10jenkins-bot: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/384700 (owner: 10Aude) [13:17:03] 10Gerrit: Replace using certificates with tokens when using its-phabricator - https://phabricator.wikimedia.org/T178385#3690478 (10Paladox) [13:17:29] 10Gerrit: Replace using certificates with tokens when using its-phabricator - https://phabricator.wikimedia.org/T178385#3690490 (10Paladox) p:05Triage>03High Setting high priority due to the depreciation and the confusing behaviour. [13:20:44] PROBLEM - Host integration-slave-docker-c2-m4-d40-1004 is DOWN: CRITICAL - Host Unreachable (10.68.20.39) [13:33:10] RECOVERY - Host integration-slave-docker-c2-m4-d40-1004 is UP: PING OK - Packet loss = 0%, RTA = 0.76 ms [13:35:37] !log swapped integration-slave-docker-1004 for integration-slave-docker-c2-m4-d40-1004 (So we have more 4GB executors) [13:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:02:03] 10Gerrit: Replace using certificates with tokens when using its-phabricator - https://phabricator.wikimedia.org/T178385#3690643 (10Paladox) Please do not change the priority as this is important. This change should be done for gerrit 2.14+ and not for 2.13. I am not sure about keeping support for certificates. [14:05:57] !log deleted slave integration-slave-docker-1004 [14:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:08:22] (03CR) 10Addshore: [C: 032] Remove mwext-php70-phan-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/384615 (owner: 10Addshore) [14:08:23] (03CR) 10Addshore: [C: 032] Remove mediawiki-core-php70-phan-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/384676 (owner: 10Addshore) [14:10:44] PROBLEM - Host integration-slave-docker-1004 is DOWN: CRITICAL - Host Unreachable (10.68.17.167) [14:11:41] (03Merged) 10jenkins-bot: Remove mwext-php70-phan-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/384615 (owner: 10Addshore) [14:11:43] (03Merged) 10jenkins-bot: Remove mediawiki-core-php70-phan-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/384676 (owner: 10Addshore) [14:24:03] I feel like 🚂🚃🚃🚃🚃 should be in this channel's topic. [14:24:06] ;) [14:37:15] no_justification: mind if I fuss with the hiera settings for 'chad-jenkins' to get puppet running there? Or are you still working on it? [14:43:09] RECOVERY - App Server Main HTTP Response on deployment-mediawiki07 is OK: HTTP OK: HTTP/1.1 200 OK - 46288 bytes in 9.878 second response time [14:49:00] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: Connection refused [14:49:58] PROBLEM - Puppet errors on deployment-mediawiki07 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:54:04] RECOVERY - App Server Main HTTP Response on deployment-mediawiki07 is OK: HTTP OK: HTTP/1.1 200 OK - 46290 bytes in 5.496 second response time [15:08:05] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:10:11] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:13:49] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:21:51] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:25:32] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Add createAccount method to nodemw - https://phabricator.wikimedia.org/T173505#3690890 (10zeljkofilipin) a:03zeljkofilipin [15:27:26] (03PS2) 10WMDE-leszek: Only run npm job for changes in data-values/value-view [integration/config] - 10https://gerrit.wikimedia.org/r/383872 (https://phabricator.wikimedia.org/T178083) [15:30:00] RECOVERY - Puppet errors on deployment-mediawiki07 is OK: OK: Less than 1.00% above the threshold [0.0] [15:34:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:34:05] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:49:54] milimetric: investigating all the beta puppet errors on deployment-aqs0{1,2,3} instances, I think failures are coming from https://gerrit.wikimedia.org/r/#/c/379730/11 where the servers can't find hieradata info, so I see things like: Error 400 on SERVER: Could not find data item profile::aqs::monitoring_enabled in any Hiera data file -- could I get you to take a look? I'm not sure what those values [15:49:56] should be in beta. [15:50:40] thcipriani: thanks for the ping, joal and I will look at it as soon as I get out of this meeting [15:50:50] awesome, thank you :) [15:55:24] thcipriani: ok, so this needs Luca who's on vacation. The puppet config appears compatible with prod which we just joined and seems ok, but caused this regression [15:55:38] is it ok to wait until he gets back? [15:55:55] you're welcome to squelch / shut off those machines or whatever you need to do to keep your sanity [15:56:05] when does he get back? [15:58:34] fwiw it's mostly missing hieradata values that's making puppet freakout, just need to provide info to the profile about the state of the world. [15:59:05] (we have a weird and often frustratingly different hieradata hierarchy in beta) [16:06:06] thcipriani: sorry, tomorrow he gets back [16:06:37] I will try to fumble through the hieradata config then [16:10:14] I have a feeling you may be better able to determine the appropriate values that I would. thanks and sorry :( [16:11:06] you can see all the places beta is looking for these values using /var/lib/git/operations/puppet/utils/hiera_lookup on deployment-puppetmaster02 [16:11:12] /var/lib/git/operations/puppet/utils/hiera_lookup -v --fqdn=deployment-aqs01.deployment-prep.eqiad.wmflabs profile::aqs::monitoring_enabled [16:26:08] PROBLEM - Host integration-slave-docker-c2-m4-d40-1004 is DOWN: CRITICAL - Host Unreachable (10.68.22.184) [16:33:10] RECOVERY - Host integration-slave-docker-c2-m4-d40-1004 is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms [16:36:46] 10Gerrit, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board, 10Unplanned-Sprint-Work: Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3683839 (10phuedx) ``` 100644 blob 58300d81c556dd9088653bb656e379609015de8d 13670308 node_modules/clarinet/test/t... [16:43:08] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:46:50] andrewbogott: You can kill that instance outright [16:46:55] Test thing, long abandoned [16:47:27] no_justification: thanks! [16:49:49] halfak: nice [16:55:43] 10Beta-Cluster-Infrastructure: Beta: acme-setup failing in beta deployment-cache-upload04 - https://phabricator.wikimedia.org/T178404#3691149 (10thcipriani) [16:56:14] 10Beta-Cluster-Infrastructure: Beta: acme-setup failing in beta deployment-cache-upload04 - https://phabricator.wikimedia.org/T178404#3691176 (10thcipriani) I don't know if this is related to the work on {T174720} but I added folks from that task here. [17:23:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:18] RECOVERY - Puppet staleness on deployment-restbase02 is OK: OK: Less than 1.00% above the threshold [3600.0] [17:27:23] thcipriani: did you create the instances currently used as docker slaves? [17:27:42] addshore: most of them, yeah [17:27:51] What process did you use? [17:28:36] I created the 1005 instance 2 weeks ago and everything seemed to go just fine, and I tried creating a 1006 (so we have 2 nodes that can run phan jobs) but kept running into puppet issues [17:28:59] Any docs would be greatly appreciated! [17:29:17] hrm, I assigned classes in the horizon interface. [17:29:56] by clicking the "assign role" button I guess? [17:30:54] under "Puppet configuration" the nodes that you made seem to have "Hiera Config" that lists the class [17:31:30] eh, looks like i put it in the hiera config section: http://tyler.zone/docker-setup-config.png [17:32:01] thcipriani: okay, and is there a "special" way to run puppet on the integration hosts? or just "sudo puppet agent -tv" ? [17:32:28] I think -t implies -v but that's all I do: sudo puppet agent -t [17:32:32] I marked https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Setup#puppet as outdated as the script it references doesnt seem to exist [17:33:47] hrm, I hadn't seen that. That may be true... [17:34:24] Well, anyway, I seem to be having issues spinning up new instances before even trying to apply the role now [17:37:03] PROBLEM - Host integration-slave-docker-1006 is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [17:37:05] so these differences seem like they should be 6 in one half-a-dozen in the other. That is, assigning the role with assign role should work (probably better) than my method. Same with running puppet-run. Seems to just reimplement splay and add output to the log file. [17:37:21] *should work the same (probably better) [17:37:54] So, I just deleted integration-slave-docker-1006 again (the one i just created) [17:38:09] PROBLEM - Host integration-slave-docker-c2-m4-d40-1004 is DOWN: CRITICAL - Host Unreachable (10.68.22.136) [17:38:21] but, simply spinning up, waiting until I can login and then trying "sudo puppet agent -t" fails [17:38:31] I have definitely had some issues recently with waitforcert, but I can't really remember what they were [17:38:55] hmm, yeh, the issue im hitting seems to have something to do with the certs [17:39:13] #wikimedia-cloud suggested I look at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes#Switch_to_new_puppetmaster Did you follow that? [17:40:20] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Include Blubber metadata in Dockerfile output as labels - https://phabricator.wikimedia.org/T178022#3691432 (10dduvall) [17:40:56] I hadn't seen that page, but it has what I did on it: run puppet, sudo rm -rf /var/lib/puppet/ssl, run puppet again, sign the cert on the puppet master, run puppet again [17:41:49] although our puppetmaster is integration-puppetmaster01 [17:45:37] thcipriani: ack! right :) [17:45:57] thcipriani: I guess I could / should have just gone and looked at the .bash_histroy of the nodes! that would have told me everything [17:46:37] Also, while doing all of this I found a bug in horizon! woo! https://phabricator.wikimedia.org/T178409 [17:47:41] nicely done :) [17:48:32] lots of weird magic that I've learned to mostly ignore in setting up instances. [17:49:25] which is not good. [17:50:38] (03PS1) 10Chad: deploy notes: stupid kludge to skip extensions that didn't exist before [tools/release] - 10https://gerrit.wikimedia.org/r/384729 [17:57:42] thcipriani: yeh, I was huting for docs today and came up a bit blank ;) [17:59:30] looking at the .bash_history of 1003 [17:59:32] rm -fR /var/lib/puppet/ssl [17:59:33] mkdir -p /var/lib/puppet/client/ssl/certs [17:59:33] cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs [17:59:33] puppet agent -tv [17:59:41] I'll try this and if it works write it down somewhere! [17:59:48] huh, don't know why I did that [18:00:49] yeah, I think it makes sense to put it at least on https://www.mediawiki.org/wiki/Continuous_integration and maybe some place for beta, too [18:03:49] (03CR) 10Thcipriani: [C: 032] deploy notes: stupid kludge to skip extensions that didn't exist before [tools/release] - 10https://gerrit.wikimedia.org/r/384729 (owner: 10Chad) [18:31:15] no_justification: It looks like the submodules for bundled extensions aren't automatically updating: https://github.com/wikimedia/mediawiki/tree/REL1_30/extensions [18:31:37] Auto-updating submodules suck so much [18:31:39] They break all the time [18:32:01] https://gerrit.wikimedia.org/r/#/q/branch:REL1_30+status:merged+-project:mediawiki/core I think it's just SyntaxHighlight and ImageMap [18:32:05] it's missing .gitsubmodules [18:32:11] legoktm ^^ [18:32:17] No it isn't [18:32:18] gitmodules i means [18:32:18] https://github.com/wikimedia/mediawiki/blob/REL1_30/.gitmodules [18:32:23] I see what's up [18:32:24] oh [18:32:27] The whole canonical name issue [18:32:30] * no_justification sighs [18:32:36] Have to drop the /p/ [18:32:51] if we need to do it manually I can do that I was just expecting it to be automated :) [18:33:06] * legoktm afk -> lunch [18:33:59] https://gerrit.wikimedia.org/r/#/c/384734/ [18:34:09] legoktm no_justification ^^ [18:34:39] damn my editor converted tabs to spaces, will fix that [18:34:50] Wrong repo.... [18:34:53] This is mediawiki core [18:35:38] ah i see [18:35:40] There's no /p/ in that whole file [18:35:44] You just made whitespace changes [18:36:29] legoktm: https://gerrit.wikimedia.org/r/#/c/384736/ [18:36:35] Yeh just realised that. sorry. :) [18:36:59] https://gerrit.wikimedia.org/r/#/c/384735/ [18:37:05] better now ^^ :) [18:37:35] +2'd [18:38:01] thanks :) [18:38:31] 1.29 https://phabricator.wikimedia.org/source/mediawiki/browse/REL1_29/.gitmodules does the same thing [18:39:12] https://gerrit.wikimedia.org/r/#/c/384739/ [18:39:14] :) [18:39:21] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:39:46] thanks :) [18:40:59] Also https://gerrit.wikimedia.org/r/#/c/384742/ for REL1_29 like I did in 1_30 [18:41:54] :) [18:41:55] +1 [18:46:14] no_justification there was a qunit failure [18:46:15] https://integration.wikimedia.org/ci/job/mediawiki-extensions-qunit-jessie/43036/console [18:46:25] doint think it's caused by my change [18:46:42] https://gerrit.wikimedia.org/r/#/c/384735/ [18:52:03] I love how the REL1_xx branches load Wikibase even though we don't care on that branch [18:56:47] heh [19:19:12] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #163: 04FAILURE in 30 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/163/ [19:34:57] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #163: 04FAILURE in 45 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/163/ [19:36:45] 10Release-Engineering-Team, 10Fundraising-Backlog, 10Fundraising Sprint Gondwanaland Reunification Engine, 10MW-1.29-release (WMF-deploy-2017-04-04_(1.29.0-wmf.19)), and 2 others: Fix raw HTML tags breaking 1.29.0-wmf19+ - https://phabricator.wikimedia.org/T162716#3691939 (10DStrine) [19:38:27] 10Gerrit, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board, 10Unplanned-Sprint-Work: Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3691942 (10Legoktm) >>! In T178189#3690315, @bmansurov wrote: >>>! In T178189#3688117, @Legoktm wrote: >> Is there... [19:47:44] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - /var/lib/docker/overlay2/35ca40c8e8fcc59fd40848e1a0c40275d7f2db69a5a57323328ae88010578006/merged is not accessible: Permission denied [19:54:44] RECOVERY - Disk space on contint1001 is OK: DISK OK [19:55:45] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:58:55] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [19:59:58] ^ sorry, that is because gerrit was down for about 2 minutes due to maintenance [20:00:08] should recover very soon [20:00:27] did downtime for services on cobalt itself but these are indirect [20:02:21] Yippee, build fixed! [20:02:21] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #164: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/164/ [20:03:59] (03PS2) 10Chad: deploy notes: stupid kludge to skip extensions that didn't exist before [tools/release] - 10https://gerrit.wikimedia.org/r/384729 [20:12:49] 10Gerrit, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board, 10Unplanned-Sprint-Work: Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3692026 (10bmansurov) >>! In T178189#3691942, @Legoktm wrote: > Is there a plan for constantly keeping it up to dat... [20:23:54] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:35:45] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:40:08] (03PS5) 10Dduvall: WIP Service pipeline jobs [integration/config] - 10https://gerrit.wikimedia.org/r/380551 (https://phabricator.wikimedia.org/T175297) [20:42:56] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #550: 04FAILURE in 1 min 56 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/550/ [20:44:16] * paladox testing a breaking change to its-phabricator https://gerrit-review.googlesource.com/#/c/plugins/its-phabricator/+/133790/ [20:59:48] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #165: 04FAILURE in 24 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/165/ [21:00:25] woo it works [21:00:34] no_justification twentyafterfour ^^ [21:00:55] i am replacing certificates with tokens. [21:01:03] :) [21:01:48] i am dropping certificates and username field [21:01:50] not needed [21:04:26] just need to update the tests and it's ready to land [21:04:57] 10Gerrit: Replace using certificates with tokens when using its-phabricator - https://phabricator.wikimedia.org/T178385#3692159 (10Paladox) a:03Paladox done in https://gerrit-review.googlesource.com/?polygerrit=0#/c/plugins/its-phabricator/+/133790/ [21:10:07] Yippee, build fixed! [21:10:07] Project selenium-MinervaNeue » firefox,beta,Linux,BrowserTests build #165: 09FIXED in 34 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/165/ [21:38:55] paladox: cool [21:39:00] :) [21:41:12] wow gerrit ui has improved a lot (upstream) [21:44:59] Yippee, build fixed! [21:45:00] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #166: 09FIXED in 19 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/166/ [21:47:58] !log delete wmfreleng/mediawiki-extensions-phan from docker hub [21:48:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:48:36] !log added slave integration-slave-docker-1006 (1x 4GB ram executor) [21:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:57:28] thcipriani: fyi https://www.mediawiki.org/w/index.php?title=Continuous_integration/Docker&diff=2586762&oldid=2560238 [21:57:29] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3692346 (10mmodell) a:03mmodell [21:58:48] addshore: awesome :) [21:59:18] going to switch 100[123] to have the class enabled in horizon too instead of using the hiera config [21:59:31] and incase you missed it, phan for core and extensions is now on docker :) [22:02:08] \o/ nice. [22:02:27] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Provide git repositories on docker slaves to act as reference to git clone - https://phabricator.wikimedia.org/T178076#3692350 (10Addshore) 05Open>03Resolved [22:06:13] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3692351 (10mmodell) 05Open>03Resolved #PhabTaskGraph [22:08:26] !log replaced integration-slave-docker-c2-m4-d40-1005 with integration-slave-docker-1005 T178409 [22:08:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:08:32] T178409: Applied puppet classes not appearing in horizon for integration-slave-docker-c2-m4-d40-1005.integration.eqiad.wmflabs - https://phabricator.wikimedia.org/T178409 [22:10:13] PROBLEM - Host integration-slave-docker-c2-m4-d40-1005 is DOWN: CRITICAL - Host Unreachable (10.68.21.87) [22:28:29] greg-g: can I grab a deploy window some time this week to deploy ReadingLists to beta? [22:28:33] task is T174651 [22:28:33] T174651: Beta testing of the ReadingLists extension - https://phabricator.wikimedia.org/T174651 [22:29:02] it's not quite production-ready yet but having it on beta allows the apps teams to work on their side of the code [22:32:01] tgr: sure, that's innocuous enough, pick a time that works for you (assuming all is good with pre-deploy needs) [22:32:38] tgr: I don't see a security review, is that planned/in the pipeline? [22:32:59] there was one, I'll link it from the task [22:33:19] there were some concerns but none of them security-related; mostly DB performance [22:33:22] kk, just didn't see it on the dependencies, I didn't search hard :) [22:52:46] (03Draft2) 10Cicalese: Replace cicalese@mitre.org with cindom@gmail.com. [integration/config] - 10https://gerrit.wikimedia.org/r/384900 [22:56:28] twentyafterfour woo the tests pass for this https://gerrit-review.googlesource.com/?polygerrit=0#/c/plugins/its-phabricator/+/133790/ new change. Tests fail though because of another change that was merged months ago :) [23:00:30] failing with just [23:00:31] https://phabricator.wikimedia.org/P6143 [23:00:38] which is the same before change is applied [23:03:06] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3692484 (10CCicalese_WMF) Thank you! [23:10:25] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3692494 (10CCicalese_WMF) Hmmm. I just tried adding this to the extension page, and it looks like it's looking for #mediawiki-extensions-PhabTaskGraph instead... [23:25:09] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3650894 (10demon) Fixed. You were using the `bugzilla` parameter which assumes the old-style project names. Swapped for `phabricator` [23:29:54] 10Release-Engineering-Team (Kanban), 10Project-Admins: create a new component project for the PhabTaskGraph - https://phabricator.wikimedia.org/T177221#3692531 (10CCicalese_WMF) Excellent! Thank you! [23:56:30] (03PS1) 10Gergő Tisza: Add ReadingLists extension [tools/release] - 10https://gerrit.wikimedia.org/r/384909 (https://phabricator.wikimedia.org/T174651)