[07:04:07] Alright, esams traffic is draining away [07:04:38] it's downtimed as much as I can. Will wait a bit and then start tearing it apart [07:07:26] do we need the geodns fixes to move some traffic out of eqiad, like yesterday? i.e. this sort of thing https://gerrit.wikimedia.org/r/#/c/operations/dns/+/545288/ [07:08:03] XioNoX: [07:09:01] apergos: we're in the night in the US so I don't think for now, but thx for the link I'll keep it handy if it's needed [07:11:14] https://gerrit.wikimedia.org/r/#/c/545294/ this is the other patch that goes along with it, just so the link's here [07:11:51] thx! [07:32:14] hmm what's the easiest way of getting a valid POST request to https://en.wikipedia.org/api/rest_v1/media/math/check/tex :? [07:35:07] curl -X POST "https://en.wikipedia.org/api/rest_v1/media/math/check/tex" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "q=string" that seems like it... [13:52:11] akosiaris: is there a reasonable/no-op default I should use on cloud-vps for 'profile::backup::ferm_directors'? Getting some complaints about the missing key at the moment. [14:06:21] jbond42: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/543845/ broke at least some cloud-vps puppetmasters since it looks up undefined hiera things. Should I just set all those to [] ? [14:06:33] e.g. profile::puppetmaster::common::puppetdb_hosts [14:08:52] yes andrewbogott it should just be empty sorry [14:09:21] what about profile::puppetmaster::common::command_broadcast? [14:09:22] false? [14:09:40] yes although that was added to labs.yaml i just missed pupetdb_hosts [14:09:46] cool [14:10:00] thanks [14:16:01] akosiaris: nm, fixed [16:36:17] mutante: what do you mean by cp3057 "not found"? [16:36:53] bblack: Host cp3057.mgmt.esams.wmnet not found: 3(NXDOMAIN) [16:37:00] i merged the change to add mgmt names [16:37:08] and checked them.. and they are all fine except this one [16:37:28] I think that's just a caching issue [16:37:29] PS1 had a typo about it .. i fixed it in PS2 [16:37:33] it will self-resolve shortly [16:37:37] or we can purge it [16:37:38] then i merged it .. and it's like i did not fix it [16:37:51] but i can see it directly in /etc/gdnsd/zones [16:37:54] on authdns1001 [16:38:06] is this a negative caching issue? [16:38:10] right [16:38:17] someone hit the hostname before authdns had it [16:38:20] (or something) [16:39:13] ok! [16:40:55] fixing now [16:41:05] was about to say ..dont think we need to purge it [16:41:13] ok:) [16:42:04] hmmm maybe it's something else going on, "host" output is strange [16:42:22] [authdns1001:/etc/gdnsd/zones] $ grep cp3057 * [16:42:22] 10.in-addr.arpa:185 1H IN PTR cp3057.mgmt.esams.wmnet. [16:42:23] wmnet:cp3057 1H IN A 10.21.0.185 [16:42:36] host 10.21.0.185 [16:42:36] 185.0.21.10.in-addr.arpa domain name pointer cp3057.mgmt.esams.wmnet. [16:42:57] reverse works.. all the other ones in that range work too [16:43:26] oh it's just rec_control issues [16:43:43] according to powerdns docs: 'rec_control wipe-cache mgmt.esams.wmnet$' should get them all [16:43:56] https://gerrit.wikimedia.org/r/#/c/operations/dns/+/545599/2/templates/wmnet [16:43:59] but it purged zero negative records, whereas an explicit purge of cp3057.mgmt.esams.wmnet caught some [16:44:09] ah [16:44:16] works now! thanks [16:44:18] so their tree-purge thing isn't really reliable, or maybe doesn't cover negatives? donno [16:45:02] alright, it was just to unblock papaul in esams..seems really busy there [17:20:56] yes it is :) [17:37:52] let me know if you need more mgmt names or something [18:05:14] XioNoX: expected? Host mr1-esams is DOWN [18:05:21] nop [18:05:35] ugh [18:05:58] also alerts now on cr2 and cr3 esams about OSPF status [18:07:00] just the mgmt router? [18:07:01] I think MR1 crashed.... [18:07:50] yes, mr1 was first [18:07:52] yeah I don't have anything on console neither and no ports are bliking, all on [18:08:13] just mgmt, as far as we know? [18:08:36] looks like it. except cr2/cr3 alerting because of OSPF status [18:09:00] correct [18:09:08] I'm connected on console now [18:10:28] I power cycled it [19:37:38] Hello is this a Site Reliability Engineer channel? [19:38:55] Skripter: yes [19:39:53] Cool, can anyone give me some tips on entering the SRE field? I've been interviewing for positions, but haven't had any luck. [19:42:14] Skripter: there's the Google book that is free https://landing.google.com/sre/books/ [19:42:25] this is a specific SRE team's channel though, not a general one [19:42:57] I read some of that book, but not all of it. Sorry, I just want to connect with other SREs, I don't know where to find a general one. [19:44:08] Skripter: https://boards.greenhouse.io/wikimedia/jobs/1754191 [20:05:25] Skripter: if you are interested in volunteering, all the repos and tickets are public. more info in PM [20:56:18] Skripter: https://blog.alicegoldfuss.com/how-to-get-into-sre/ [20:56:50] it is a start :) [21:56:05] Thank you for the big help mutante and effie. I have a lot of information to look through! [23:55:14] cdanis: in BetterWorks, that pratice task that you recommended to complete.. is there any kind of "close" action that is more than just "100%" and less than "delete" ? [23:55:39] i feel like i want to have it in "resolved" or something and then disappear.. instead of sitting there at 100% [23:58:16] I just deleted mine [23:58:42] alright, thanks [23:59:39] I think I set the due date on mine to the date I "finished" it and then set my default view to include a filter for only "active" things