[04:34:15] I will be restarting s5 and s6 primary master in 25 minutes [04:38:59] heh, and tendril is down of course [04:39:06] good timing tendril [05:47:16] good morning, I am checking the analytics-related alerts, but I already know what the problem is, will take a bit before fixing (nothing urgent) [07:18:51] FYI, I'm removing boron later the morning [07:31:12] ack [07:49:13] note for myself, in VictorOps you can leave comments by going on the "timeline" tab of the incident [07:51:27] if victorops has been working ok for me for a while, is it ok if I remove my conct from icinga to avoid double notification? [08:00:45] created https://wikitech.wikimedia.org/wiki/Incident_documentation/20200505-wdqs-deploy for the incident [08:46:51] volans: hi there! how do you feel about having a few quality-of-life packages added to the cumin hosts? (specifically colordiff and byobu right now) [08:48:18] kormat: HTTP 301; location: moritzm ^^^ ;) [08:48:41] kormat: ah, I didn't know colordiff. I usually do `diff | vipe` to get colors :) [08:49:14] personally no prob with colordiff I use it locally (I also make heavy use of icdiff fwiw) [08:49:56] byobu I'm not familiar, and seems it does quite few things, not sure about that [08:50:21] it's basically a set of configs for tmux/screen [08:50:22] kormat: what specific use cases are you trying to fix? [08:51:18] volans: i'm doing quite a bit of stuff from cumin1001, incl dbctl and wmf-auto-reimage. having colored diff for dbctl config diff would be nice, and i normally use byobu for running tmux [08:51:26] neither are high priority [08:51:57] volans: the package description says "DevOps environment", but byobu is really not much more than nice curtains for screen at the end of the day :) [08:53:35] I defer to moritzm for that one :) [08:58:36] what about emojidiff? [09:07:57] XioNoX, jbond42: do you have a sec to chat about the sre.hosts.rotate-pdu-password cookbook? [09:08:34] yep [09:10:37] the first oxymoron is the name, it should really be: sre.pdus.rotate-password [09:11:33] the second one is adding the change_snmp functionality in it's latest CR, that seems worth for a different cookbook like: sre.pdus.rotate-snmp-community (or rotate-snmp for short) [09:11:58] if we move both to the sre.pdus. directory, then the common functionalities could be moved to the __init__.py and be used by both cookbooks [09:12:02] to be DRY [09:13:35] is WET (Why Every Time) the opposite of DRY? :) [09:14:50] ema: you should add it to the wiki page, the current ones listed are: [09:14:50] "write every time", "write everything twice", "we enjoy typing" or "waste everyone's time" [09:14:56] ;) [09:15:02] aahahah [09:15:24] https://en.wikipedia.org/wiki/Don%27t_repeat_yourself#DRY_vs_WET_solutions [09:15:49] volans: I don't have any strong preference [09:16:06] volans: that would be original work though, can't add it to the wiki page [09:16:43] lol [09:17:12] put it on twitter, get a million likes and then add it with a reference ;) [09:29:13] sorry volans was going through rebase hell all because s/pdu_fqdn/pdu/ :@ [09:29:48] yeah I say the CRs, no prob :) [09:30:33] anyway i dont have a strong opinion on this either for now ignore the WIP PS and ill add another one to facter things out into an __init__.py file [09:43:29] jbond42: that makes sense only if we move that to an sre.pdus [09:43:43] having those functionalities in the sre.hosts doesn't seems so wise IMHO [09:44:25] yes sorry wasnl't clear but im creating sre.pdus and sre.pdus.rotate-password [09:46:21] sounds great, and feel free to do that after the current refactoring [09:46:24] that's totally ok [09:46:25] no hurry [11:42:04] apergos: hey. i'm working on reimaging es1024, which is part of es5. it currently has a bunch of wikiadmin mysql processes. is it ok to take it down for a while to reimage it? [11:47:20] kormat: can you give me a sample of a query or two? [11:48:05] oh this is es, not mysql core [11:48:18] yes, it's fine, go ahead [11:48:30] great, thanks :) [12:42:59] is there some way to tell icinga "please check this service now?" [12:43:55] kormat: yes, reschedule [12:44:34] click the service, then rom the dropdown "Re-schedule next service check" and just hit submit [12:44:38] (which means 'now') [12:44:50] ok cool. :) [13:05:30] <_joe_> kormat: it should be noted that it's not a deterministic action at all [13:05:54] that might explain why it took about 4mins to complete [13:06:10] <_joe_> what really happens is you say "hey icinga, can you schedule this check now, pretty please?" and icinga says "yes", but it's really saying "ok, as soon as I get to it" [13:06:31] gotcha :) [13:06:44] <_joe_> and ofc, sometimes icinga just doesn't feel like making your check at all [13:07:01] so.. alertmanager when? :) [13:08:14] kormat: also our current icinga host is a bit short on CPU, see T251644 too [13:08:14] T251644: Icinga refresh hardware selection (2020) - https://phabricator.wikimedia.org/T251644 [13:08:14] <_joe_> soon as far as I've heard :P [13:08:28] for your last question 301 to the Observability team :-P [13:08:56] volans: trying to test a cookbook and i seem to be doing something wrong can you help :) https://phabricator.wikimedia.org/P11146 [13:09:38] sure [13:09:56] you're using the path, not the namespace [13:10:09] sre.pdus.rotate-snmp [13:10:11] yes its not installed yet wanted to test from a local checkout [13:10:18] cookbool -l to list them [13:10:29] to see if your settings are set correctly [13:10:33] *cookbook -l [13:11:04] i thught i could use a file thats not installed [13:11:06] "COOKBOOK Either a relative path of the Python file to execute" [13:11:51] have you followed https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Creating_your_local_environment ? [13:11:58] if so, let's check the config [13:13:28] no ill set up a local config file then [13:21:09] working now thanks. [13:21:25] great :) [13:21:32] thanks for testing the docs :D [13:21:38] :) [13:21:43] glad it's still up to date [13:21:57] is that a bug in the argparse help message? [13:22:06] or am i missing something elses? [13:22:50] specificly this bit https://phabricator.wikimedia.org/P11148 [13:23:28] yes, that still works [13:23:29] cookbook -c config.yaml sre/wdqs/data-reload.py -h [13:24:21] oh its a file relative to `cookbooks_base_dir:` not PWD [13:24:31] yes [13:24:43] ahh ok that was the bit i was missing cheers [13:24:52] we can add it [13:25:11] the original reason was to use file autocomplete [13:25:16] may just be me :), either way definetly not urgent :) [13:25:20] but you have to cd into the base path first [13:25:34] I'm not sure if anyone has used it [13:25:38] I can check :D [14:15:58] i was going to ask if anyone was using d-i-test.eqiad, but i see it's been running an installer since november of some year, so i think the question is moot :) [14:16:34] it's testing very well then [14:16:44] lol [14:28:40] volans: how do i run a cookbook with logging.DEBUG? [14:29:01] * jbond42 would say this is the last one but its unlikley ;) [14:29:31] jbond42: lol, -v, --verbose Verbose output, also for the cookbook. [14:29:34] global options [14:29:47] so cookbook -v foo.bar.baz --cookbook-options [14:30:00] if you run with dry-run, it automatically sets DEBUG too [14:30:45] im running like this and i dont even see the info logs [14:30:46] cookbook -vc ~/cookbook/config.yaml sre.pdus.rotate-snmp --no-rw 10.193.0.3 [14:31:50] sorry i am seeing info just not in the format i expected but i dont see the debug [14:31:54] * jbond42 gose to double check [14:32:38] yes as far as i can tell i only see the info messaged with -v [14:32:44] have you defined a logs_base_dir? [14:33:01] logs are there and the -extended one is always with debug [14:33:06] even when not running in debug more [14:33:08] *mode [14:33:20] as for the stdout/err it should be there [14:33:48] ahh yes i see the extended log file now thanks [14:35:22] jbond42: unrelated question for you whenever you have a minute -- am I holding this wrong? https://phabricator.wikimedia.org/P11150 [14:36:04] rzl: it only supports one O, P or C unfortunatly [14:36:12] ahhh got it, thanks [14:36:38] np [14:44:42] jbond42: sorry, can I ask you another one? https://phabricator.wikimedia.org/P11151 [14:45:51] weirdly that shows "Finished: FAIL" but https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/22296/console shows "Finished: SUCCESS" and in both outputs it doesn't look like it actually tried to compile the changes [14:46:34] yeah, weird [14:46:37] looking, fyi you can add JENKINS_API_TOKEN and JENKINS_USERNAME to your environment and pcc will pick them up [14:46:45] rzl: I think you need to set a COMPILER_MODE ? [14:46:53] compiler_mode=change [14:47:04] or maybe it's MODE=change [14:47:07] jbond42: yeah, I normally run this through a wrapper script that picks up the change number automatically, so I haven't worried about it :) [14:47:20] skipping that to troubleshoot this just in case [14:47:28] rzl: compare with a rebuild I issued of yours https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/22297/console [14:47:33] ack and cdanis that shouldn't be required [14:47:36] hm okay then [14:48:04] it looks like it is [14:48:11] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/22298/console this is a straight rebuild of 22296 [14:48:18] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/22297/console this is with MODE=change [14:49:01] cdanis: ill clarify it didn;t use to be required unless something has changed recently [14:50:24] cdanis: I'll use that at least as a workaround, thanks [14:50:55] hey, I don't know what anything does, I just push buttons and see what happens [14:51:45] i did push some work on this a few weeks ago so its possible i broke things not sure how much its used [14:52:06] i also think i remember the underlying docker image was updated recently? [14:53:02] cdanis: where are you setting $MODE? putting it in ./utils/pcc's env doesn't seem to affect anything [14:53:13] rzl: idk, I did the rebuilds from the jenkins UI [14:53:15] ahh okay [14:53:21] cdanis: I have changed that jenkins job earlier today ( !log ed it) [14:53:44] it was a local hack and I have ported it to the tool we use to define the jobs (jjb), so maybe that broke it despite me testing it [14:53:55] pcc.py is strange about it rzl [14:53:58] see line 152 [14:54:06] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/594497 [14:54:17] i think this fixes it (at least in my testing) [14:54:25] that looks right to me jbond42 [14:55:02] jbond42: thank you! [14:55:33] thats merging now, let me know if you hit any other issues [14:56:02] ill have a dig on friday to see if i can find the underlining issue but i probably wont spend much time on it [14:56:15] unless more issues come up :) [14:56:23] sgtm [14:57:22] btw rzl the script you have that uses pcc if you could share it, i have a task on the bottom of my wish list to create a post-commit hook which automaticly runs pcc [14:58:43] sure, not much to it -- the interesting part is due to cdanis [14:58:47] this is in my .zshrc https://phabricator.wikimedia.org/P11152 [14:59:05] oh I'm happy to be credited with such things [15:02:23] thats usefull i didn't know about the detail page :) thanks both [15:02:51] yeah gerrit's API is alirght [15:03:26] yes i should really looking to it more [15:03:55] we could get it from the output of git review too [15:05:47] ack thx [15:31:01] cdanis: rzl: my best guess re pcc is that before hash.ar PS (https://gerrit.wikimedia.org/r/c/integration/config/+/594479/1/jjb/operations-puppet-catalog-compiler.yml) the local hack had a default MODE hardcoded [15:38:37] OHHH [15:38:49] jbond42: the job does not have a default for mode indeed [15:38:49] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build? [15:38:57] COMPILER_MODE: _____________ [15:39:09] guess I can make it to use "change" [15:44:14] jbond42: cdanis: rzl: I have updated the job to set the COMPILER_MODE , can be seen at https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build [15:44:20] hashar: change would be the sensible ption everything elses is experimental and only used by me nowadays i think [15:44:24] thx [15:44:26] my fault for breaking it :/ [15:44:43] I guess I have missed copying the default value [15:45:49] \o/ [15:45:51] o/ [15:46:07] though, I did run the job and managed to get some output [15:46:37] no harm and easy fix cheers :) [15:53:13] yeah [15:53:23] though I should have poked here to have someone to double check / verify ;D [15:53:31] instead of breaking the workflows of bunch of people bah [16:18:53] <_joe_> so cdanis, rzl next level would be finding which classes were modified by the last commit, and use it for passing to pcc the right profiles to test :P [16:19:47] <_joe_> you can just reuse the logic you find in ruby for running CI tests [16:24:38] <_joe_> (yes, that was a trap, in case you wondered) [16:25:27] literally no one, anywhere, wondered [18:12:03] have we ever patched php-fpm? [18:21:58] we rebuild it, but don't apply any local patches [18:22:59] if we apply something, it should be self-contained and small, so that it doesn't block speedy rebases to 7.2.x security releases [18:23:23] yeah, if I were to go down this road, it would be something self-contained and small, and hopefully upstreamable [18:24:00] given that we rebuild php7.2 anyway, that's totally viable, then [20:10:05] Speaking of php7.2, is T250515 something I could try to do myself or would I just be wasting time? https://wikitech.wikimedia.org/wiki/Reprepro is a little dense. [20:10:05] T250515: Please provide our special component/php72 in buster-wikimedia - https://phabricator.wikimedia.org/T250515 [20:34:54] James_F: depends on how savvy you are with debian packaging? [20:35:08] if you gotta learn the toolchain, see you again in august ;] [20:35:20] I am half kidding [20:35:22] August 2022, it feels like. [20:35:38] for php, it is better to let SRE manage it [20:36:01] I was just gonna say, it might be better finding who you can bribe with $beverage [20:36:01] Ack. [20:36:05] I don't know how they have build php7.2, it is not on Gerrit last time I have checked