[00:31:17] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:39:51] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be03 - https://phabricator.wikimedia.org/T190683#4080501 (10EddieGP) The title for swift::init_device comes from a hiera lookup (hiera key `swift_storage_drives`) . Openstack browser shows this key is set to the value 'lv-a' on deployment-... [00:46:54] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be03 - https://phabricator.wikimedia.org/T190683#4094961 (10EddieGP) Related: T184236 and the attached patches. [07:25:31] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [10:35:57] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be03 - https://phabricator.wikimedia.org/T190683#4095116 (10MarcoAurelio) I've been looking into https://horizon.wikimedia.org/project/puppet/ and apparently I cannot do anything from there but to simply see. I am also unfamiliar with Pupp... [10:38:32] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Request for shell access on deployment-prep - https://phabricator.wikimedia.org/T190925#4095118 (10MarcoAurelio) +1 from me. Eddie has proven to be helpful and knowledgeable. No concerns from me. [10:54:08] PROBLEM - Host deployment-videoscaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.130) [10:54:52] PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211) [11:43:25] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be03 - https://phabricator.wikimedia.org/T190683#4095140 (10EddieGP) Actually this is a duplicate. After https://gerrit.wikimedia.org/r/#/c/361648/ the "/dev/swift/" part will be implicit as well as the trailing "1", and after https://gerr... [11:43:45] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-ms-be03 - https://phabricator.wikimedia.org/T190683#4095142 (10EddieGP) [11:43:51] 10Beta-Cluster-Infrastructure, 10Operations, 10media-storage, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-ms-be0[34] with evaluation error in swift module - https://phabricator.wikimedia.org/T184236#4095145 (10EddieGP) [12:37:54] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Request for access to the beta cluster - https://phabricator.wikimedia.org/T190755#4095176 (10EddieGP) [12:40:12] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4095178 (10zeljkofilipin) >>! In T179188#4094387, @zeljkofilipin wrote: > Created testing job: [[ https://integration.wikimedia.org/ci/view/... [12:43:52] 10Beta-Cluster-Infrastructure, 10DNS, 10Operations, 10Traffic, and 3 others: Ferm/DNS library weirdness causing puppet errors on some deployment-prep instances - https://phabricator.wikimedia.org/T153468#4095180 (10EddieGP) [12:48:06] (03PS1) 10Hashar: Explain why we require composer dev dependencies [integration/quibble] - 10https://gerrit.wikimedia.org/r/423239 [12:48:08] (03PS1) 10Hashar: Factor out code to copy to log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/423240 [12:54:19] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4095186 (10EddieGP) 05Open>03Resolved a:03EddieGP >>! In T132259#3879429, demon wrote: > Is this really best as a tracking task or should we add it to the... [14:02:21] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10User-zeljkofilipin: Q4 Selenium framework improvements - https://phabricator.wikimedia.org/T190994#4095222 (10zeljkofilipin) [14:02:25] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it - https://phabricator.wikimedia.org/T182412#4095221 (10zeljkofilipin) [14:02:27] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10Epic, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), and 2 others: Q3 Selenium framework improvements - https://phabricator.wikimedia.org/T182421#4095223 (10zeljkofilipin) [14:02:40] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10User-zeljkofilipin: Q4 Selenium framework improvements - https://phabricator.wikimedia.org/T190994#4090304 (10zeljkofilipin) [14:02:44] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10Epic, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), and 2 others: Q3 Selenium framework improvements - https://phabricator.wikimedia.org/T182421#3822905 (10zeljkofilipin) [14:53:16] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [15:11:57] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4095239 (10zeljkofilipin) [15:48:42] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:04:46] (03CR) 10Hashar: [C: 032] Explain why we require composer dev dependencies [integration/quibble] - 10https://gerrit.wikimedia.org/r/423239 (owner: 10Hashar) [17:05:04] (03CR) 10Hashar: [C: 032] Factor out code to copy to log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/423240 (owner: 10Hashar) [17:05:13] (03Merged) 10jenkins-bot: Explain why we require composer dev dependencies [integration/quibble] - 10https://gerrit.wikimedia.org/r/423239 (owner: 10Hashar) [17:05:36] (03PS2) 10Hashar: Factor out code to copy to log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/423240 [17:05:41] (03CR) 10Hashar: [C: 032] Factor out code to copy to log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/423240 (owner: 10Hashar) [17:06:07] (03Merged) 10jenkins-bot: Factor out code to copy to log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/423240 (owner: 10Hashar) [17:08:25] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4095296 (10zeljkofilipin) [17:48:50] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4095325 (10zeljkofilipin) [18:34:37] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 hphp_invoke - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 287 bytes in 0.004 second response time [18:35:59] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:54:29] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:38:53] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:39:36] PROBLEM - Puppet errors on deployment-kafka05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:41:28] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:41:31] PROBLEM - Puppet errors on deployment-kafka04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:47:24] no_justification i found this https://gwtmaterialdesign.github.io/gwt-material-demo/#about lol [19:47:37] * paladox wishes there was a polymer javascript version of that site :) [19:47:46] all the ui in there looks pretty :) [20:02:27] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mira - https://phabricator.wikimedia.org/T191110#4095458 (10MarcoAurelio) Still erroring: ``` maurelio@deployment-mira:~$ sudo puppet agent -tv Info: Using configured environment 'future' I... [20:03:40] eddiegp: ^^ [20:03:51] Hauskatze: give me a minute [20:03:59] sure [20:04:07] puppetmaster may need to be updated [20:04:12] if there is merge conflicts [20:05:47] ran on puppetmaster02 and no errors there [20:07:07] Umm, what? "Could not find data item" [20:07:39] That's the same thing as before, but that file it's now in is definitely part of the hiera for deployment-mira. [20:07:47] hallo Katze [20:07:58] du kannst es nochmal probieren aber der Puppenmeister braucht vielleicht sync :pp [20:08:01] * mutante hides [20:08:24] i meant updating the git repo [20:08:25] and thanks Eddied too [20:08:28] on the puppetmaster [20:08:35] what paladox says :) [20:08:37] /var/lib/git/operations/puppet [20:08:39] Hmm, is the deployment-prep puppetmaster auto-fetching the operations/puppet repo at all? I guess not. [20:08:45] * eddiegp is too slow [20:08:52] eddiegp it should [20:08:59] but won't if there are merge conflicts [20:09:08] mutante: I'm tired of socks and puppets [20:09:10] :P [20:09:36] Hauskatze: haha, ok :) but just follow paladox on this one :) [20:09:42] :) [20:09:47] so _what_ needs to be done here? [20:09:56] git fetch origin [20:09:58] git pull .. i think [20:10:02] and then git rebase origin [20:10:03] what he says [20:10:05] I've found http://shinken.wmflabs.org/host/labs-puppetmaster [20:10:12] git pull will block any future auto updates [20:10:17] paladox: right, where? which server? [20:10:18] as i found in the past :) [20:10:27] good point. mayeb that is _exactly_ what happened [20:10:29] Hauskatze on the deployment-prep puppetmaster [20:10:31] somebody like me did git pull [20:10:32] hehe [20:10:40] and in /var/lib/git/operations/puppet [20:10:52] paladox: okay, I'll check there [20:10:53] Or it's just some crazy cherry-pick, which I heard to be normal on deployment-prep ;) [20:11:01] yea, that too [20:11:02] yeh :) [20:11:10] probably it's both on top of each other [20:11:33] https://en.wikipedia.org/wiki/Turtles_all_the_way_down [20:11:41] heh [20:11:45] paladox: var/lib/git has two folders [20:11:52] Hauskatze /var/lib/git/operations/puppet [20:11:55] labs and operations [20:11:55] cd /var/lib/git/operations/puppet [20:12:22] Your branch is ahead of 'origin/production' by 18 commits. [20:12:22] (use "git push" to publish your local commits) [20:12:22] nothing to commit, working tree clean [20:12:24] heh [20:12:40] git reset --hard origin/master && git pull origin master [20:12:46] ? [20:12:48] Umm, no. [20:12:51] Hauskatze nope doint run git reset [20:12:56] That'd drop all the cherry-picks [20:12:56] run [20:12:59] git fetch origin [20:13:00] and then [20:13:03] git rebase origin [20:13:23] maurelio@deployment-puppetmaster02:/var/lib/git/operations/puppet$ git fetch origin [20:13:23] error: cannot open .git/FETCH_HEAD: Permission denied [20:13:24] lol [20:13:29] sudo [20:13:46] https://xkcd.com/149/ [20:13:51] ;) [20:13:57] lol [20:14:18] maurelio@deployment-puppetmaster02:/var/lib/git/operations/puppet$ sudo git rebase origin [20:14:18] Current branch production is up to date. [20:14:28] sudo make me a deployment burger [20:14:37] after sudo git fetch origin [20:15:04] still: [20:15:06] maurelio@deployment-puppetmaster02:/var/lib/git/operations/puppet$ sudo git status [20:15:06] On branch production [20:15:06] Your branch is ahead of 'origin/production' by 18 commits. [20:15:06] (use "git push" to publish your local commits) [20:15:06] nothing to commit, working tree clean [20:15:13] yep that's ok :) [20:15:19] so hmm, it's current [20:15:21] what does git log show on top? [20:15:30] does it show the Hiera change? [20:15:47] aha [20:15:54] it's asking for the group now [20:16:01] ;) [20:16:02] profile::kubernetes::deployment_server::git_owner: trebuchet [20:16:04] is one fix [20:16:06] now it wants [20:16:13] profile::kubernetes::deployment_server::git_group [20:16:14] maurelio@deployment-puppetmaster02:/var/lib/git/operations/puppet$ sudo git log [20:16:14] commit 82af816b705f6a103b165408adb5f02aa8f09652 [20:16:14] Author: Stephane Bisson [20:16:14] Date: Tue Mar 27 15:23:23 2018 -0400 [20:16:14] Make 'style' and 'storage id' available to maps services [20:16:14] Make 'style' and 'storage id' variables available as config [20:16:16] you fixed one bug and then see the next. situation normal [20:16:16] vars for kartotherian, tilerator, and tileratorui. [20:16:18] Bug: T112948 [20:16:19] T112948: All map location names should be in the user's language - https://phabricator.wikimedia.org/T112948 [20:16:20] Change-Id: Iec7a99a7360d71c8fd57545d41800474f78e5c52 [20:16:20] but progress [20:16:27] Hauskatze we know what's wrong now [20:16:35] aleluya [20:16:38] lol [20:16:46] profile::kubernetes::deployment_server::git_group [20:16:50] eddiegp ^^ [20:16:50] Well, top of git log will show the cherry-picks anyway [20:16:52] that's exactly why i wanted to show how to fix it for next time too [20:16:56] heh, but nice [20:17:45] eddiegp change is not on git log mutante [20:17:54] wikidev [20:17:59] i guess it should be ^^ [20:18:06] Hauskatze: Sure? Should be the 19th from top, so scroll a bit ;) [20:18:08] https://github.com/wikimedia/puppet/blob/3a2551ce098b17259b12baea6e683486df8dd28c/hieradata/role/common/deployment_server.yaml#L190 [20:18:26] i expected it on top but there are 18 others above it because they are cherry pciked... [20:18:29] meh [20:18:51] nope, it is not there [20:19:12] cherry picks are at the top [20:19:12] weird.. since the puppet run says otherwise ? [20:19:56] running git log -n50 --oneline [20:20:03] paladox: Where you've got the git_group error message from? [20:20:11] 167f282b8d hiera: fix deployment-mira, lacking ::git_owner [20:20:17] eddiegp from https://phabricator.wikimedia.org/T191110#4095458 [20:20:29] if you test something on this you are really testing all these 18 cherry-picks and not what production is ... [20:20:48] mutante https://gerrit.wikimedia.org/r/#/c/423256/ [20:21:18] Well yeah, the puppet master was updated, I just misread the error message to be still the old one. My bad :) [20:22:04] ok, merged on prod master [20:22:06] eddiegp Hauskatze fixed in https://gerrit.wikimedia.org/r/c/423256/ [20:22:07] you should repeat this [20:22:15] fetch the next fix [20:22:18] and run puppet [20:22:22] git fetch origin && git rebase origin ?? [20:22:26] yep [20:22:31] okay... [20:23:03] 'git pull --rebase' is shorter and does the same in one step btw [20:23:18] how many hosts are there starting with deployment- in host name? [20:23:54] mutante: 74 [20:23:58] lol? [20:24:07] madness [20:24:18] is this "mira" one special? [20:24:21] According to the "instances" list on https://tools.wmflabs.org/openstack-browser/project/deployment-prep [20:24:29] because production doesnt have "mira" anymore [20:24:47] is it used? [20:24:55] Yes, it is. [20:25:00] ok [20:25:03] okay so I ran the git fetch/git rebase thing [20:25:15] doing sudo puppet agent -tv now [20:25:36] deployment-prep also uses deployment-tin and will continue to do so even if there's no "tin" in prod any more [20:26:12] sudo puppet agent -tv on puppetmaster -> okay [20:26:16] now on mira [20:27:25] no error now [20:27:34] but a bunch of things I'll paste [20:28:07] also [20:28:12] deployment-tin puppet is failing [20:28:14] [20:38:53] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:28:57] https://phabricator.wikimedia.org/P6917 [20:30:54] no 'recovery' message from shinken? [20:31:05] I did this for the bot message :| [20:31:07] :P [20:31:16] That paste looks like a normal puppet run to me :) [20:31:29] Shinken runs every 5-10min ish, so just wait for it ;) [20:31:34] yep [20:31:49] Hauskatze oh i wonder why deployment-tin showed as failing? [20:31:50] on -tin [20:31:53] The last Puppet run was at Thu Mar 29 14:14:27 UTC 2018 (3257 minutes ago). [20:32:02] now looking there [20:32:19] oh [20:32:30] is it the same error as mira [20:32:59] Error: Error while evaluating a Function Call, Could not find data item profile::kubernetes::deployment_server::git_owner in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/kubernetes/deployment_server.pp:5:16 on node deployment-tin.deployment-prep.eqiad.wmflabs [20:33:16] yep same error [20:33:19] diving into var lib git operations puppet [20:33:26] Yeah, we can copy what we added to deployment-mira to deployment-tin. Or just move it to some common file. [20:33:38] Probably the latter is a better idea. [20:34:34] paladox: there's a problem as there's no operations/puppet on deployment-tin on var/lib/git [20:34:46] yes, avoid using hieradata/hosts/ [20:34:51] Hauskatze mutante https://gerrit.wikimedia.org/r/#/c/423257/ [20:34:51] avoid using host names from prod [20:34:58] try using hieradata/role/ [20:35:02] ./common/ [20:35:07] Hauskatze yep [20:35:12] that is only on the puppet master [20:35:15] not on the clients [20:35:27] mutante: labs has no role-based hiera lookup [20:35:39] Just common, project specific, host specific [20:35:49] and prefix [20:35:52] by hostname [20:35:59] you can use deployment-prefix ? [20:36:00] fixed tin in https://gerrit.wikimedia.org/r/#/c/423257/ [20:36:26] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [20:36:31] hah [20:36:33] l [20:37:12] still I'm seeing a critical error: labs-puppetmaster/Labs Puppetmaster HTTPS is UNKNOWN since 4M 2w 5d 1h 34m 24s [20:37:21] paladox: Really let's not have the same line in both the deployment-mira.yaml and deployment-tin.yaml. move it to deployment-prep/common.yaml instead and get rid of it in -mira [20:37:24] ok, i will merge that one more change for tin [20:37:27] but then i'm also out [20:37:33] ok [20:37:35] and i dont see a point in doing this for 74 hosts [20:38:26] deployment-prep has 74 hosts in total. It doesn't have 74 deployment servers, so there's no need to do this for all hosts ;) [20:38:43] done [20:39:09] just dont expect me to have time to merge this kind of thing for each VPS that gets created becaues we hardcode the host names [20:40:45] mutante this https://gerrit.wikimedia.org/r/#/c/423257/ should fix it for other deployment hosts too [20:41:01] deployment-prep/common.yaml [20:41:40] ready to run stuff when told [20:41:45] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#4095478 (10Paladox) [20:41:49] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mira - https://phabricator.wikimedia.org/T191110#4095477 (10Paladox) 05Open>03Resolved [20:42:05] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Puppet: Puppet broken on deployment-mira - https://phabricator.wikimedia.org/T191110#4093928 (10Paladox) [20:42:06] first i'm merging my own actual change that i came here for [20:42:14] now i'm rebasing that [20:42:38] Hauskatze: done [20:42:59] shall I git fetch etc on puppetmaster again? [20:43:23] Well hiera in labs is just insane, being splitted across wikitech, horizon, operations/puppet and labs/private. I'd be all in for killing three of those, preferably storing everything in horizon and not needing a operations/puppet merge to change labs hiera. [20:43:55] paladox: fetch and rebase on puppetmaster again? [20:44:41] i abandoned your change because i already did that before seeing it [20:44:44] just to clean up [20:45:01] Hauskatze yep [20:45:20] mutante: no problems, thanks [20:46:01] done [20:46:07] ok guys, i have to get off the bus [20:46:12] which means i lose my wifi [20:46:17] which is part of the bus :P [20:46:26] thanks mutante -- see you later :) [20:46:29] cya [20:46:31] heh :) [20:46:35] * paladox has to go too [20:46:39] thanks mutante :) [20:46:41] paladox knows i like to us busses [20:46:43] and work from them [20:46:48] yeh heh [20:46:51] * mutante waves [20:47:20] puppet ran on deployment-tin [20:48:20] https://phabricator.wikimedia.org/P6918 [20:49:30] Looks all good, just waiting for shinken again [20:49:48] deployment-eventlog05/Puppet staleness <-- I think we can fix this with sudo puppet agent -tv [20:50:23] now a silly question: are we suposed to do everyday the fetch and rebase on the puppetmaster server? [20:50:41] nope [20:50:57] paladox said it should auto-update, I don't know how it does though. [20:51:01] it should refresh by it's self as long as there is no merge conflicts [20:51:10] every 30 mins [20:51:20] i think [20:53:29] evenlog05 error is different [20:55:26] tin is now down from 100% to CRITICAL: 30.00% of data above the critical threshold [0.0] [20:58:56] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [20:59:02] :) [21:03:48] i'm back .. hanging out behind the railway station. but as logn as that train doesn't leave i can leech oif Amtrak wifi [21:04:19] nice to see that recovery :) [21:04:40] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx - https://phabricator.wikimedia.org/T191151#4095484 (10MarcoAurelio) [21:05:28] my house cat demands that I feed him now [21:05:35] so brb [21:05:53] RECOVERY - Puppet staleness on deployment-eventlog05 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:05:54] Hauskatzes Hauskatze :D [21:05:56] geb der Katze Futter [21:06:09] ah oh, another recovery [21:06:23] but.. I did sudo puppet agent disable there [21:06:25] surprise recoveries are the best recoveries [21:06:27] leaving it as it was [21:06:31] yea, you should call the cat "Meta" [21:06:42] it's called 'Miko' [21:06:44] Hauskatze of Hauskatze is a good one [21:07:02] Neko Mimi [21:08:22] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:08:23] regarding that recoery on deploynment-eventlog .. thanks to using common.yaml ;) [21:08:34] more coming in, nice [21:08:47] mx02 [21:09:07] I guess I can reenable the puppet there on eventlog [21:12:36] mx02 looks like some letsencrypt stuff [21:13:33] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx - https://phabricator.wikimedia.org/T191151#4095500 (10EddieGP) [21:13:37] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#4095503 (10EddieGP) [21:14:50] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx02 due to some Letsencrypt stuff - https://phabricator.wikimedia.org/T191152#4095504 (10MarcoAurelio) [21:16:56] Hauskatze: See Krenairs sentence "Created a new system, ran into the problem that https://gerrit.wikimedia.org/r/#/c/403326/ fixes" on T184244. I guess T191152 is just another duplicate to that one. [21:16:57] T191152: Puppet broken on deployment-mx02 due to some Letsencrypt stuff - https://phabricator.wikimedia.org/T191152 [21:16:57] T184244: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244 [21:17:19] oh, sorry for duplicating then [21:17:38] No problem, I'll just merge this one as well (already did for the first)1 [21:18:07] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx02 due to some Letsencrypt stuff - https://phabricator.wikimedia.org/T191152#4095516 (10EddieGP) [21:18:10] 10Beta-Cluster-Infrastructure, 10Patch-For-Review, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#4095518 (10EddieGP) [21:19:14] deployment-etcd-01 should be fixed too [21:20:32] or not [21:20:33] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, Could not find data item profile::etcd::tlsproxy::listen_port in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/etcd/tlsproxy.pp:6:35 on node deployment-etcd-01.deployment-prep.eqiad.wmfla [21:20:34] bs [21:20:51] * Hauskatze headesks [21:23:48] eddiegp: that's not reported yet I think [21:23:56] the tlsproxy.pp stuff [21:23:58] * eddiegp looking [21:24:09] profile::etcd::tlsproxy::listen_port [21:24:16] no idea what's that [21:24:59] * eddiegp laughs [21:25:02] T191107 [21:25:03] T191107: deployment-etcd-01 puppet errors - https://phabricator.wikimedia.org/T191107 [21:25:08] https://github.com/search?q=org%3Awikimedia+profile%3A%3Aetcd%3A%3Atlsproxy%3A%3Alisten_port&type=Code [21:25:08] And you were the one to report that :D [21:25:42] * Hauskatze ashamed [21:26:12] have to run [21:26:35] ok, me too [21:28:04] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#4095534 (10MarcoAurelio) [21:28:26] 10Beta-Cluster-Infrastructure, 10Puppet: deployment-etcd-01 puppet errors - https://phabricator.wikimedia.org/T191107#4095535 (10MarcoAurelio) https://github.com/search?q=org%3Awikimedia+profile%3A%3Aetcd%3A%3Atlsproxy%3A%3Alisten_port&type=Code [21:37:14] Actually back, that alert turned out to be nothing. [21:37:56] 10Beta-Cluster-Infrastructure, 10Puppet: deployment-etcd-01 puppet errors - https://phabricator.wikimedia.org/T191107#4095540 (10MarcoAurelio) @Paladox Should we add `profile::etcd::tlsproxy::listen_port: ` to https://github.com/wikimedia/puppet/blob/production/hieradata/labs/deployment-prep/host/deplo... [21:40:08] Hauskatze: sup [21:40:44] ok, laters [21:41:34] I'm here [21:41:39] maps03 fixed [21:41:49] puppet was disabled and now catalog updated [21:42:18] !log Ran sudo puppet agent --enable and sudo puppet agent -tv on deployment-maps03 to fix puppet staleness [21:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:43:20] and critical threshold is lowering now [21:50:37] 10Beta-Cluster-Infrastructure, 10Puppet: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka04 - https://phabricator.wikimedia.org/T191154#4095542 (10MarcoAurelio) [21:52:18] RECOVERY - Puppet staleness on deployment-maps03 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:52:29] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Puppet: Puppet broken on deployment-mira - https://phabricator.wikimedia.org/T191110#4093928 (10Dzahn) this also fixed puppet runs on a bunch of other deployment-* hosts thanks to using common.yaml instead of ./hosts/ bonus token for that ! thanks [21:54:24] mutante: maps03 fixed as well :D [21:54:43] edited my comemnt [21:58:09] heh 17:52 [21:58:11] 10Beta-Cluster-Infrastructure, 10Puppet: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka04 - https://phabricator.wikimedia.org/T191154#4095555 (10EddieGP) Role deleted from puppet git by @Ottomata in 661eea7bda, but still applied to deployment-kafka04 according to https://tools.wmfl... [21:58:15] 23:58 here [22:04:02] CRITICAL: deployment-puppetmaster02/Long lived cherry-picks on puppetmaster [22:04:10] shot shot shot [22:08:35] That one probably wins the "alert no-one cares about" award of the year. [22:09:26] hopefully your request will be approved and so someone with knowledge about friking puppet can help there [22:10:29] I'm optimistic about that :) [22:14:00] :) [22:14:58] 10Beta-Cluster-Infrastructure, 10Puppet: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4095558 (10EddieGP) [22:15:08] 10Beta-Cluster-Infrastructure, 10Puppet: Error: Could not find class role::kafka::jumbo::mirror for deployment-kafka0[45] - https://phabricator.wikimedia.org/T191154#4095542 (10EddieGP) Affects deployment-kafka05 as well. [22:26:06] just make tickets where you ask if that isntance is even used and if not then andrew will be happy to delete it and the thing can be removed [22:26:30] based on "role deleted in production" that seems pretty likely [22:27:00] and you save the time of dealing with shinken details [22:27:12] 10Release-Engineering-Team, 10DNS, 10Operations, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4019041 (10KATMAKROFAN) Why can't we just merge it into Meta-Wiki? [22:40:47] true, will do that in the future. I've already pinged ottama in that ticket, so he'll probably tell if that instance is useless. [23:00:21] 10Release-Engineering-Team, 10DNS, 10Operations, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4095598 (10KATMAKROFAN) After renaming foundationwiki, we should enable use of LDAP accounts on there.