[07:53:43] _joe_: could you take a quick look (and C+1, maybe ;-)) to https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/596150 ? releng merged the helm-lint patch, so linting always fails on the raw chart currently [07:54:06] <_joe_> jayme: sure, gimme 5 minutes [07:54:57] sure [07:57:23] <_joe_> jayme: I'm not even sure why that raw chart is there [07:57:30] <_joe_> alex might have more context [07:58:27] I feel like I have seen it beeing used somewhere. But I cant recap where that was [09:00:02] Is there somewhere that I can look for a WMF plan for php versions that might be coming up? [09:06:57] <_joe_> addshore: short answer: no. Long answer: we'll be migrating to buster sometimes in the next year [09:07:55] <_joe_> so probably 7.3 next year [09:08:11] ack! ty [09:08:32] <_joe_> addshore: we might be able to move faster once we switch to k8s [09:09:06] that almost sounds like you think that is coming before 7.3 :P but I figure that is not the case [09:09:07] <_joe_> deploying a new set of docker images (and roll those back!) is faster than upgrading a fleet this way [09:09:24] <_joe_> addshore: realistically no, but I'd love to [09:09:30] <3 [09:09:41] _joe_: thats very true [09:09:59] I din't thought much about the implications of k8s for php version upgrades, thats great though, indeed [09:16:53] <_joe_> addshore: there are a few fundamental questions we need to answer there [09:29:32] if anyone is reimaging/installing a stretch system today, please ping me, I've made a change to the installer and want to doublecheck everything is still fine [09:57:44] _joe_: I'v been running mediawiki on k8s with a 1-2GB docker file now for half a year or so, its been a fun ride [10:03:46] <_joe_> fun? please do tell [11:41:48] vgutierrez: I have an acme-chief question [11:42:18] go ahead [11:42:26] mmmm [11:42:55] let me try to articulate an intelligent question [11:43:36] so I have a toolforge profile that uses `letsencrypt::cert::integrated` and other that uses `acme_chief::cert` and I don't know why [11:45:23] so letsencrypt::cert is part of the previous LE puppetization [11:45:27] arturo: the first one is older and "standlone" LE and the second one is using a central server to care about getting the LE certs. at some point i think Krenair made it work in cloud. [11:46:01] well, the real thing is that I'm trying to test some toolforge stuff in the toolsbeta CloudVPS project. We don't seem to have any infra for running acme_chief in toolsbeta (we need a VM for that, right?) [11:46:39] yup, you need a puppet master and an acme-chief instance [11:46:40] vgutierrez, mutante: that makes sense, thanks [11:46:47] https://openstack-browser.toolforge.org/puppetclass/role::acme_chief::cloud [11:47:03] see above, it is already used in deployment-prep and tools [11:47:14] I think I will simply disable TLS for these testing servers in toolsbeta [11:47:14] so you should be able to copy from tools to toolsbeta [11:48:35] i could imagine that disabling TLS might be more work than applying the role to an instance [11:49:31] well, I don't know, I'm walking unknown territory for me here [11:50:20] i would recommend to avoid the hack and instead make toolsbeta as much as tools as possible [11:51:19] since the role is already there and works [11:52:09] that's what I want to do too [11:53:26] let me create a phab task to brain dump what would need to be done [11:59:52] I just created T252762 [11:59:53] T252762: tools/toolsbeta: improve acme-chief integration - https://phabricator.wikimedia.org/T252762 [14:58:45] elukey: hey, are you aware the errors coming out of prometheus-amd-rocm-stats? [15:04:17] shdubsh: should already be fixed [15:04:33] sorry for the spam :( [15:04:52] no worries! thanks! [15:30:08] Other SRE, please help me welcome our GSOC student: https://twitter.com/jynus/status/1260955359386181633 [15:31:10] direct medium link: https://medium.com/@ajupazhamayil/gsoc-2020-wikimedia-initial-biweekly-report-ad21905ad5d [15:42:51] o/ [15:43:26] jynus: very cool [15:47:22] volans: expect tickets with cumin complains brought up by the student soon 0:-D [15:48:06] jynus: in our best tradition they most likely already exist (the tickets) ;) [15:48:14] probably [15:48:38] he said to me "ugh, I have no way to hide the cumin stdout messages?" [15:48:56] maybe even it is possible already or on the latest version [15:48:57] yes and no [15:49:19] not yet cleanly, but there is a way and that's what we're doing in spicerack at the moment [15:49:46] yeah, he wanted to do a hack, but I said to report it first [15:49:50] also I'd like to know what are your plans, as ideally all the automation efforts should convolute into spicerack to take advantage of the framework [15:49:58] including specific tools as library [15:50:04] like we do for conftool and cumin [15:50:05] well, this is going to be its own package [15:50:29] and last thing I want is an external contributor on an important repo [15:50:48] *internal [15:51:18] he is not going to work on automation [15:51:42] not adecuate for a student [15:51:47] jynus: https://phabricator.wikimedia.org/T212783 [15:52:05] I knew it! :-D [15:52:27] volans: as long as it is well structured, we can integrate it later into spicerack, I guess [15:52:51] but for gsoc I think it is better it is separate [15:53:01] so merge != production [15:53:09] agree? [15:53:18] sure, that's what I meant with 'including specific tools like we do for confctl and cumin' [15:53:25] oh [15:53:31] sorry, then I missunderstood you [15:53:43] just develop it thinking not only as a CLI but also how it can be imported [15:53:46] what did you mean with your question? [15:53:47] by other code [15:53:51] yeah, totally [15:53:57] in fact was literally his first commit [15:54:12] going from a cli first to a class first [15:54:29] but please please please don't have high expectations [15:54:36] which question? [jynus| what did you mean with your question?] [15:54:50] "also I'd like to know what are your plans" [15:55:03] this [15:55:10] what you just said [15:55:11] :) [15:55:12] so for GSoC the plan is to be its own thing [15:55:22] because it makes sense in that context [15:55:23] this is the trick if you want to suppress cumin output for now [15:55:24] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/spicerack/+/master/spicerack/remote.py#632 [15:55:31] volans: that is great! [15:55:34] thanks a lot! [15:56:09] what I mean is that don't expect this will be a super great thing- we will try to make it better [15:56:25] and if it works, we will productionize and integrate [15:56:37] sure, that makes sense [15:56:59] this is not a 6-month project of mine, just a summer project :-D [15:57:16] in fact, it is recommened to be considered as a 1-week enginner projet kind of scope [15:57:25] so your expectations are in the right place :-D [15:57:35] I know I know :) [15:57:51] I will pass on your code [15:57:55] thank you a lot! [15:58:02] anytime [16:33:41] Should https://wikitech.wikimedia.org/wiki/Incident_documentation/20200501-vc-link-failure not say OUTAGE WORKED AROUND instead of OUTAGEWORKED AROUND [16:34:50] XioNoX: ^ [16:35:40] RhinosF1: updated [16:36:19] XioNoX: ty, when it comes to incident docs, seemed a better idea to ask then boldly change :) [16:42:49] You're fixing a typo, you're not disputing facts/evidence :P [16:43:02] True [16:44:13] Reedy: I guess I’m too used to Extension:IncidentReporting and on the site that uses it I don’t have access to fix spelling / factual mistakes and I find many. [18:09:08] hey folks, what was the name of that handy android app we were all using for alerting on SMS pages? [18:09:13] was it 'beeper'? [18:11:04] I've used Klaxon years ago [18:11:11] these days we're migrating to VictorOps [18:11:53] I have a new phone that refuses to highlight in any way SMS pages [18:12:18] and that beeper app was simple and noisy as hell :-P but I lost it [19:54:37] I have a favor to ask y'all. I filed T252815 for Marti Johnson, but I don't have access to the vm where this code is running now and I also do not have time to chase this bug today. Can anyone poke around a bit on miscweb1002.eqiad.wmnet and see if they can figure out why PHP sessions might be failing? I'm pretty sure they would be stored on the local filesystem somewhere unless there is fancy stuff in the php.ini to send them elsewhere. [19:54:38] T252815: Iegreview login failing with csrf token missing warning - https://phabricator.wikimedia.org/T252815 [19:55:21] Marti is trying to finish up data export using the app for the new grant approval round starting this weekend. [20:02:22] bd808: /dev/vda1 ext4 18G 18G 0 100% / [20:03:26] volans: well that sounds bad [20:03:36] apparently it has been "fixed" [20:03:44] according to icinga comment on the ack [20:03:53] sorry my bad [20:05:00] misread the icinga table [20:06:08] bd808: so, 9.1 GB are for old static sites [20:06:15] and 5.5 are in /root [20:06:19] not much to free on the fly [20:07:16] I have no context on the content there [20:08:41] volans: understood. this is a "pile of small random things" host based on the puppet role [20:09:08] probably all things that would be better off in containers on a k8s cluster, but that's a different problem [20:09:55] the stuff on /root are from yesterday [20:09:59] I guess mutante was working on it [20:10:23] from 'last' [20:10:43] what I can do it to move the 2 tar.gz from /root on another host for now [20:11:14] volans: that would be helpful if you have time [20:13:08] bd808: on it [20:15:39] 780MB freed [20:15:44] running puppet [20:17:10] bd808: can you retry? [20:17:30] volans: I'm in! you are a wizard :) [20:19:23] yw :)