[16:30:39] fundraising-tech-ops, Operations, ops-eqiad: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3134508 (RobH) a:Cmjohnson [16:58:46] (PS2) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [16:59:18] (CR) jerkins-bot: [V: -1] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [17:00:43] fr-tech: Kill for the love of killing! Kill for the love of Kali! [17:00:43] -- Hindu saying [17:00:44] -- discuss. [17:00:48] (PS3) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [17:01:16] Tellin ya, slander's a right-winger [17:01:21] (CR) jerkins-bot: [V: -1] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [17:05:28] (PS4) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [17:06:01] (CR) jerkins-bot: [V: -1] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [17:07:38] o/ [17:11:04] (CR) Awight: "> What do you think of making the whole lock a context object?" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343965 (owner: Awight) [17:12:23] Fundraising-Backlog: Turn process-control lock module into a context manager - https://phabricator.wikimedia.org/T161536#3134651 (awight) [17:13:05] ejegg[m]: > let's leave run-job accepting a single arg without needing to flag it [17:13:27] I think we'll be adding more args to the command, actually [17:13:31] e.g. --one-off [17:13:59] IMO, the flag helps make usage clear even with the one arg. [17:18:33] (CR) Awight: Use argparse to read the CLI; cron-generate is flaggy rather than pipey (2 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343960 (owner: Awight) [17:18:43] Fundraising-Backlog, FR-Amazon: Amazon Ipad and mobile NULL referrer - https://phabricator.wikimedia.org/T161539#3134700 (DStrine) [17:18:48] (PS6) Awight: Use argparse to read the CLI; cron-generate is flaggy rather than pipey [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343960 [17:21:12] ejegg|stdup: ^ I'd like to finish up the arg parsing this morning, so we can iterate on the deployed tool [17:21:55] (Abandoned) Awight: Comments from thcipriani [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343389 (owner: Cdentinger) [17:22:01] (PS5) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [17:22:29] sounds good awight , taking a look at PS6 [17:22:33] (CR) jerkins-bot: [V: -1] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [17:23:23] ugh, something timezone-y with python date parsing? [17:25:06] date handling is one of the few parts of the python libs that I despise [17:25:20] Shouldn't be timezoney tho [17:25:21] awight always-required args feel like they should be positional... [17:25:38] but if you think it's clearer, I'm not super attached to that [17:26:18] those tests are passing locally, but failing in CI with a timestamp mismatch [17:26:30] ejegg: My hesitation is coming from partial discussions with Jeff_Green... [17:26:37] ooh, are we using the actual audit file date format? [17:26:43] hesitation to make it positional? [17:26:46] I think he prefers a single commands with options to different stuff [17:26:49] yeah [17:26:57] and I lean towards a stable of commands [17:27:01] e.g. run-job-once [17:27:06] kill-job [17:27:12] would run-job ever not need a config file? [17:27:14] yes [17:27:20] run-job --kill ;-) [17:27:32] also, I don't like mixing positional and named args [17:27:33] killall [17:27:35] k [17:27:54] but more realistically, run-job jobfile.yaml --once ? [17:28:05] I would rather run-job --once --job-file jobfile.yaml [17:28:31] k, i can accept that [17:28:36] ejegg: oh hey. It doesn't look like time tuples have a timezone. [17:29:19] hmm, that didn't change the test failure [17:29:21] lmao--this is exactly how I imagined the time libs were written: https://wiki.python.org/moin/WorkingWithTime#Time_Zones [17:29:35] haha [17:29:47] cwd|afk: you'll appreciate this ^ [17:29:56] (CR) Ejegg: [C: 2] Use argparse to read the CLI; cron-generate is flaggy rather than pipey [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343960 (owner: Awight) [17:30:07] ty, sorry to be a goat [17:30:27] (Merged) jenkins-bot: Use argparse to read the CLI; cron-generate is flaggy rather than pipey [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343960 (owner: Awight) [17:30:29] (CR) Ejegg: [C: 2] run-job option flag is required now [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344499 (owner: Awight) [17:30:38] (PS4) Ejegg: Fixes suggested by thcipriani [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/343965 (owner: Awight) [17:30:46] (PS3) Ejegg: run-job option flag is required now [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344499 (owner: Awight) [17:30:56] (PS2) Ejegg: Test for environment parameter [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344501 (owner: Awight) [17:33:19] awight: i'm not sure if I actually prefer arguments vs stables of commands, what's the context? [17:33:40] ejegg: Can we just assert int and assume the audit timestamp is epoch seconds? [17:33:45] Jeff_Green: ohey! [17:33:56] Jeff_Green: We're talking about this patch, [17:34:14] ... [17:34:20] >>> print dateutil.parser.parse('2017/03/22 00:00:44 -0700').strftime('%s') [17:34:22] hold on, I'm using a java app [17:34:23] 1490158844 [17:34:26] >>> print dateutil.parser.parse('2017/03/22 00:00:44 -0600').strftime('%s') [17:34:29] 1490158844 [17:34:31] >>> print dateutil.parser.parse('2017/03/22 00:00:44 -0200').strftime('%s') [17:34:34] 1490158844 [17:34:37] grrr [17:34:42] Jeff_Green: https://gerrit.wikimedia.org/r/#/c/343960/6/bin/run-job [17:34:54] looking [17:34:55] ejegg: nice one [17:35:16] iso8601 is not actually supported by py [17:35:26] but [17:35:33] print dateutil.parser.parse('2017/03/22 00:00:44 -0700') [17:35:35] 2017-03-22 00:00:44-07:00 [17:35:46] so parse doesn't just discard the tz [17:35:51] awight: my preference in this case would be that you don't feed arbitrary job files on the command line, but have a dir where they all live and you select one by a CLI argument [17:35:54] strftime does :( [17:36:10] because...we don't want a tool that enables live-hacking, we want it to run only what's in revision control [17:36:19] Jeff_Green: good point! [17:37:14] also if it were me I would hate having to remember where the job files are, i'd like to be able to get a list from the script and pick one [17:37:20] Jeff_Green: Although, we can already run anything we want as jenkins [17:37:30] so I'm not sure this is a crucial security fix [17:38:01] that's something we might want to fix someday [17:38:09] anyway, worthwhile for the simplification either way [17:38:10] yaah [17:39:03] the more things we can shuttle through utilities that log and build visibility, the less we have to spelunk when figuring out wtf happened [17:39:42] hey btw, is there syntax yet for the remote-poke? i haven't enabled that yet b/c I didn't know what to enable [17:42:13] (sorry, broken glass situation on the home front) [17:42:28] k [17:42:31] Jeff_Green: awesome, I will continue to learn these philosophies [17:42:49] yah there's syntax for the cron generation if that's what you mean [17:42:54] lessee [17:42:56] ya [17:44:34] cron-generate --job-directory ${jobdir} --output-file /etc/cron.d/process-control [17:44:39] (PS6) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [17:45:17] (CR) jerkins-bot: [V: -1] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [17:45:41] Jeff_Green: I'm assuming it will be run on the remote box, so the deployment server doesn't need to have any tools installed [17:46:03] yes that's right [17:46:07] btw. I was playing with ansible this weekend. [17:46:16] It makes me happy that someone is finally gonna kill the puppet [17:47:05] nice [17:47:40] It's also highly complementary to puppet, so you can migrate slowly and painlessly [17:47:54] so...we're going to have a main config file, right? rather than passing all that stuff on the command line (which makes for fugly sudo) can we make it get all that from the config file? [17:48:05] can do [17:48:07] k [17:48:10] oh hey [17:48:16] ja [17:48:49] I wanted to ask you--I had an idea for how to kill our /etc/fundraising trickery. [17:48:56] ohyes? [17:48:59] I'd like to build /etc/process-control.yaml into the source, and install a symlink on our instances [17:49:14] /etc/pc -> /etc/f/pc [17:49:51] Alternatively, I'd be okay with giving up on the /etc/fundraising thing entirely, since devs never touch those paths directly I don't care [17:49:55] what do you mean re. building /etc/process-control.yaml into the source? [17:50:07] meaning, this tool is almost generally useful [17:50:22] So building the string literal "/etc/fundsraising/pc.yaml" into the source would be a pity [17:50:42] just make it look for /etc/pc.yaml and include an example with the documentation [17:50:51] exactly. [17:51:03] that's fine, I'll just have puppet manage /etc/py.yaml [17:51:20] ok cool--I'll move in that direction with the rest of our stuff as well. [17:51:50] i can't really remember why we had /etc/fundraising in the first place... [17:52:28] cos I spent my childhood in FreeBSD [17:52:37] :-P [17:52:45] I have some nostalgia for separating user /etc from system /etc [17:52:51] although I have scars from it [17:54:56] Fundraising-Backlog, fundraising-tech-ops: Move all /etc/fundraising config into /etc, drop the subdirectory - https://phabricator.wikimedia.org/T161544#3134816 (awight) [17:56:05] Jeff_Green: what's your day like? I'm imagining a pc.deb deployment in a couple of hours, and then tickling pilot jobs for a bit. [17:56:53] I've got kid-pickup at 3 (~20 min), otherwise available [17:57:36] nice. yeah hours of code review before I'll use the red phone again. We're really close to having a useful, dev-only sandbox. [17:57:58] cool [17:58:52] Another kitsch gem from this heap of cassettes: Partisans of Vilna [17:58:57] Serios stuff [18:00:56] (PS7) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [18:01:16] better freaking work... [18:01:58] Jeff_Green: Sudo protects us from rogue CLI args sufficiently, right? So I can leave the argparse in to allow us to override etc config on personal boxes? [18:02:41] ejegg: sorry to see it! The nice part about py fail is that it's usually over within the day. I appreciate the flexibility to do "anti" patterns when I need to. [18:02:47] awight: thinking... [18:03:38] ejegg: In your judgment, should we move process-control/processcontrol/ to p-c/lib/ ? [18:03:55] agreed, i prefer 'insufficient magic' fails to 'too much magic' fails [18:04:16] awight: it would be nice not to have to come up with sudo syntax to limit args [18:04:23] huh [18:04:34] awight: it's just going in with system python libs now? [18:04:35] I thought that was the default behavior, somehow [18:04:38] looking [18:04:52] ejegg: yeah, like /usr/local/python2.7/dist-packages/processcontrol/*.py [18:05:35] awight you can specify what you allow for sudo, unless you specify it will allow arguments [18:05:44] I'm not familiar with conventions for homegrown debs [18:05:45] Jeff_Green: okay, lemme comment that out for now [18:05:58] so they go in local, unlike offical debs? [18:06:00] ejegg: eh just on the python level [18:06:06] oh. um [18:06:19] no /usr/lib [18:06:28] ^, [18:06:43] k, sorry, you want to change the python package name to have a - in it? [18:06:59] ohhh, you just mean in the repo [18:07:40] eh, it does feel redundant, but that's the standard way to tell it the module name, right? [18:08:05] I can't quite tell. Sort of the Wild West out there [18:08:35] Let me... not jiggle the everything and focus on zero-arg mode, I guess [18:10:01] Jeff_Green: to clarify, I'm going to remove the CLI argparse for now and do all config [18:10:11] We can go the extra yard next year [18:10:16] so... etc config file points to job config dir and crontab [18:10:26] but still need a job name on the cli, right? [18:10:42] run-job #you know which job [18:12:41] (PS8) Ejegg: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 [18:12:58] ejegg yup, that sounds right [18:13:05] (PS2) Ejegg: WIP Test audit data for Express Checkout refunds [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) [18:13:33] well wait, we're talking about a couple different things [18:13:39] (CR) jerkins-bot: [V: -1] WIP Test audit data for Express Checkout refunds [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) (owner: Ejegg) [18:15:12] cron-generate without any arguments should read all the *yaml files and generate the cron.d file, that way I can allow fr-tech to run that anytime without arguments [18:15:27] ut-oh, the raiders are moving to las vegas. oakland can't be happy right now [18:15:41] (PS1) Awight: zero-argument mode from cron-generate [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [18:15:45] Jeff_Green: oh, you were just worried about cron-generate! [18:15:56] right, b/c it writes stuff as root [18:16:02] that's the sudoy one yeah [18:16:03] (PS1) Awight: ignore build products [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344980 [18:16:06] (CR) jerkins-bot: [V: -1] zero-argument mode from cron-generate [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [18:16:33] the other script IMO should only accept arguments that allow it to work with existing *yaml files in the configured jobs-file directory [18:16:40] perfect. [18:16:51] --list-jobs will make a lot more sense [18:17:04] rather than suddenly having to pass a job-dir and stuf [18:17:15] then I can allow fr-tech to run that with arguments, unrestricted, as user jenkins [18:17:35] k, cool [18:17:38] Jeff_Green: That's sweet. [18:17:54] and also ensures it only runs stuff with changes logged in git [18:17:55] jenkins: Thank you for your service. [18:18:00] so you can have all your kill, stop, runonce, whatever stuff--just as long as the job config is coming from the yaml files so it's necessarily in revision control [18:18:04] yeah totally [18:18:40] (PS3) Ejegg: WIP Test audit data for Express Checkout refunds [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) [18:18:59] then you adjust jobs in localsettings, and push them out via rsync_blaster just like FR code [18:19:12] Jeff_Green: remind me, devs will be able to deploy the p-c tool itself? [18:19:14] (CR) jerkins-bot: [V: -1] WIP Test audit data for Express Checkout refunds [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) (owner: Ejegg) [18:19:20] I think we only discussed CR+2 [18:19:46] no. packaging will remain an ops task [18:19:52] kk [18:20:01] Nice to have *some* boundaries :p [18:20:07] ha yeah [18:23:46] fr-tech: permission to go heads-down for standup? [18:24:46] long as you bring up some pearls from yr dive! [18:26:13] I'll just drift 4x4 around the ebbed oyster bed, to start with... [18:26:21] There's some lovely filth down 'ere! [18:34:12] (PS2) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [18:34:40] (CR) jerkins-bot: [V: -1] Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [18:37:10] (PS4) Ejegg: Mark express checkout refunds with correct gateway [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) [18:37:18] XenoRyet: ^^^ might be the one [18:37:31] Cool, let me take a look. [18:37:46] (CR) jerkins-bot: [V: -1] Mark express checkout refunds with correct gateway [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) (owner: Ejegg) [18:38:50] lessee about that fail now [18:38:54] Fundraising-Backlog, FR-Amazon: NULL referrers - https://phabricator.wikimedia.org/T161539#3134963 (Pcoombe) [18:39:18] right, I never remember to lint before reviewing [18:39:53] Yea, me either. [18:40:02] (PS5) Ejegg: Mark express checkout refunds with correct gateway [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) [18:41:12] There was a brief initiative to have "composer test" do the nasty, locally [18:42:49] Fundraising-Backlog, FR-Amazon: NULL referrers - https://phabricator.wikimedia.org/T161539#3134969 (Pcoombe) These are *not* only from Amazon. See a breakdown here: https://docs.google.com/spreadsheets/d/10Pkd3W72s3BlEI6IX5VzECk0fiOIdlhZAbdxijaP9qg/edit#gid=0 Amazon does have a particularly large number... [18:46:21] awight: 'tox' is supposed to do it all in the tools repo [18:46:59] but it gives me a buncha import errors I haven't bothered to investigate [18:50:37] We can prune stale tests in the interest of making that functional... [18:55:20] think it's something local - pip is dying trying to find urllib3 somewhere in the tox env setup [18:55:33] but i've got python-urllib3 installed systemwide [18:55:47] oh? needs to be in requirements.txt [18:55:57] yeah it's missing there [18:56:19] once you add it, do something like tox --recreate [18:56:23] or just rm .tox [18:56:43] it's only used by tox and pip though, not our actual code [18:56:56] True Love - Todd Rundgren [18:56:57] Where have I been! [18:56:57] they oughtta know their own reqs [18:57:12] oh? [18:57:16] * awight boggles [18:57:38] I get 19 tests passed. [18:57:51] lmk if I can help compare the setup? [18:57:53] with just 'tox' [18:57:55] ? [18:57:58] (CR) XenoRyet: [C: 2] Mark express checkout refunds with correct gateway [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) (owner: Ejegg) [18:58:02] nosetests works perfectly for me [18:58:06] thanks XenoRyet ! [18:58:06] yeah, trying now with removed .tox [18:58:12] No worriesd [18:58:12] that's got a predecessor too [19:02:40] XenoRyet: any luck with the same issue on the IPN front? [19:02:59] ejegg: yeah tox works out of the box [19:03:25] ehh, I can run pyflakes . && nosetests [19:03:33] will worry about tox later [19:03:34] kk awkward though [19:03:39] Still working on it, I've been kind of bogged down with beurocracy this morning, but I think I'll be able to work something out pretty quick. [19:04:14] k. you saw the tools patch had a parent to CR? [19:04:28] Oh, nope, missed that. Looking now. [19:04:32] thanks! [19:04:41] back shortly... [19:05:48] ooh, context manager from inside a condition :( [19:07:13] (PS13) AndyRussG: [WIP] Custom mixin param handlers [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/343953 [19:07:37] (CR) XenoRyet: [C: 2] Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [19:07:52] (Abandoned) Awight: run-job option flag is required now [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344499 (owner: Awight) [19:08:07] (PS3) Awight: Test for environment parameter [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344501 [19:08:53] (PS1) AndyRussG: [WIP] Banner sequence campaign mixin [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/344988 [19:11:03] fr-tech ^ first WIP patch for banner sequence... Still not there, but for the curious, you can see the OOjs widget structure for the admin interface, including drag-and-drop step reodrering! Also shows how the custom param handler feature thing (from the preceding patch) actually does stuff ;p [19:16:53] (Merged) jenkins-bot: Fix dictionary comparison [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344725 (owner: Ejegg) [19:16:55] (Merged) jenkins-bot: Mark express checkout refunds with correct gateway [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/344727 (https://phabricator.wikimedia.org/T161121) (owner: Ejegg) [19:19:09] (CR) jerkins-bot: [V: -1] [WIP] Banner sequence campaign mixin [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/344988 (owner: AndyRussG) [19:23:15] mock.patch hell [19:27:39] (PS3) Awight: [WIP] Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [19:28:04] (CR) jerkins-bot: [V: -1] [WIP] Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [19:34:47] fr-tech anyone for single-char CR ? https://gerrit.wikimedia.org/r/344684 [19:35:36] (CR) Awight: [C: 2] "Vastly more secure!" [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/344684 (https://phabricator.wikimedia.org/T161153) (owner: Ejegg) [19:36:57] (Merged) jenkins-bot: Use https endpoint for iDEAL availability [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/344684 (https://phabricator.wikimedia.org/T161153) (owner: Ejegg) [19:37:11] thanx! [19:45:40] fr-tech I'm connecting now, but I'm on a very slow librarynet again [19:46:38] hangout issues, brt [20:36:02] (PS4) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [20:36:23] ejegg: Whew! Seems like more furniture than I needed to move, but I think that PS is ready for eyeballs [20:37:05] thanks, I'll take a peek! [20:45:48] (PS1) Awight: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 [20:48:28] Jeff_Green: okay, we're weaned from livehack args. Now the signatures are "cron-generate" and "run-job --job basefilename" [20:48:55] where basefailname is the job filename under job_directory, without the .yaml suffix [20:49:08] We'll need to re-up from process-control.example.yaml of course [20:49:47] Probably not ready to roll the new .deb until tomorrow, fyi. [20:50:21] ok [20:50:27] sounds good [20:52:26] awight: oh wow, you really did have to go through some contortions for the tests [20:53:35] :) You should see the broken pieces I left on the shop floor [20:54:00] I'm sure I could have done something better with the config classes themselves, to make the tests not so hair-raising... [20:55:56] My python is pretty unacculturated, I feel like I understand the Ruby paradigm of how to solve everyday life issues much better than the py one [20:56:16] hmm. or used to, at least. Now I'm just growing a beard on my desert island [20:56:28] heh [20:57:30] (PS1) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 [20:58:26] ejegg: (and thcipriani!) Thanks for visiting and bringing the fish :) [21:00:07] Jeff_Green: I'm not sure how to implement --kill... [21:00:31] The process will be owned by jenkins, so killing should go through the p-c scripts, I s'pose [21:00:56] hrm [21:01:27] Also, I don't think a pidfile is a good fit here cos we sometimes want to run duplicate jobs in parallel [21:01:37] thinking...it's a separate instance of the script killing the first one, so the kill instance doesn't know the PID of the one its killing, or the subprocess of that one [21:01:38] though we could set that aside [21:01:43] yeah [21:02:03] I can mess with process groups, if that might make the pidfile feasible [21:02:05] that means you'd want a way of addressing each discrete instance right? [21:02:15] yeah, at least on a job-type level. [21:02:25] kill all pp audit [21:02:27] oh-- [21:02:48] and ejegg had a great suggestion to allow multiple tags on a job definition [21:02:56] so we could killall audit or killall paypal [21:03:22] disable all scheduled queue-touchy jobs [21:03:37] lockfiles list the pid already [21:03:48] * awight feels momentarily awesome [21:03:51] ejegg: ty [21:03:56] ha [21:04:11] Let's do that and leave parallel duplicate jobs for a snowy day? [21:04:13] how does the second running process set its lockfile? [21:04:26] yeah um I think we're locked out of that at the moment [21:04:35] the second job will throw an exception "don't be a dummy" [21:04:48] do we ever run jenkins jobs in parallel now? [21:04:51] no [21:05:14] We have daydreamed for years, but most jobs would die in a whirlpool due to assumptions [21:05:18] what if you make the script listen for a signal? [21:05:48] i.e. if you send it a sig somethingerother it knows to log and slay its subprocess and exit? [21:06:17] that would have to be a signum per job type, though? Seems fraught. [21:06:39] would it? [21:06:43] lockfiles give us everything we need for non-parallel jobs [21:07:02] signals do sound like a neat thng to do though [21:07:10] when you invoke "run-thingis --job party --action slay" [21:07:35] this run-things finds the lockfile for the run-thingis instance running 'party' [21:07:38] gets the PID [21:07:44] we could even... have a way to pause jobs safely [21:07:45] and sends a signal to run-thingis running on that PID [21:07:56] and release or kill depending on some investigation [21:07:57] bah [21:08:00] pipe dreams [21:08:48] so then you'd need a signal type per action or something I guess? [21:09:00] or i guess you could have a different way of signaling which action to take [21:09:23] currently, the lockfiles will suffice [21:09:29] They're already named after the job [21:10:06] you could make it listen for HTTP! :-P [21:10:17] Should lockfiles go in /var/run/process-control, or is /tmp good enuf for now? [21:10:25] hehe encrypted json payloads [21:11:00] imo tmp is ok for now, but maybe make it easy to change down the road in case we think of a reason it matters? [21:12:46] +1 [21:23:43] (PS1) Awight: Show job status in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 [21:24:11] (CR) jerkins-bot: [V: -1] Show job status in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 (owner: Awight) [21:24:15] I'm shooting out for a while now - but have just shared a google doc attempting to articulate the goals of the silver pop api work…. if anyone has comments [21:26:34] (CR) Ejegg: [C: -1] "looks good, and with some snazzy mocking tricks! few small fixes suggested." (7 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [21:26:49] eileen: Wow, nice analysis! Might that be easier to maintain as a sheet? [21:27:32] awight: no - then you'll fell entitled to add lots of stuff :-) [21:27:57] I'll see if it needs moving over once I had feedback [21:28:56] I think I'm a deadline whore - I really struggled yesterday to articulate anything about what I was doing there & suddenly, just before I was about to shoot out the door to meet someone I started to send an email to Caitlin that morphed into that [21:30:00] It can be inspiring to feel the pitchfork [21:34:00] (PS2) Awight: Show job status in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 [21:34:02] Fundraising Sprint Far Beer, Fundraising-Backlog, FR-Smashpig, Patch-For-Review, Unplanned-Sprint-Work: Handle iDEAL push notifications - https://phabricator.wikimedia.org/T161153#3135446 (Ejegg) [21:34:30] Fundraising Sprint Far Beer, Fundraising-Backlog, FR-Smashpig, Patch-For-Review, Unplanned-Sprint-Work: Handle iDEAL push notifications - https://phabricator.wikimedia.org/T161153#3123126 (Ejegg) Ready to deploy, just needs network config [21:49:07] Jeff_Green: Is it a problem that /tmp files can usually be overwritten by other users, so I could get --kill to send a sigkill as another user to an arbitrary process? [21:49:23] , if pidfiles are stored in /tmp [21:49:58] i would set the permissions on the file so that only user jenkins can overwrite them [21:50:15] right. ty [21:58:49] (PS1) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [22:02:52] Jeff_Green: ok, I just proved to myself that I can't remove other users' files in /tmp even though it's sticky +t. [22:03:07] good! [22:03:22] whoami [22:03:49] santa clause? [22:04:19] Mmm, how about one more cookie [22:05:00] you two are entertaining to be a bystander for :) this level of banter is why I stay in #frtech :) [22:05:06] FYI, I'm forcing the pidfile contents to an int before forking "kill {pid}", https://gerrit.wikimedia.org/r/#/c/345062/1/bin/run-job line 28 [22:05:23] * awight high-fives chasemp [22:05:52] chasemp: did you bring cookies? [22:06:03] I failed to bring cookies so that's shameful [22:06:04] chasemp: Want to do some Python CR? [22:06:32] * awight silently engages trolling motor [22:07:17] heh :) tag me in coach! [22:07:41] oh hey, for real [22:07:43] https://gerrit.wikimedia.org/r/#/projects/wikimedia/fundraising/process-control,dashboards/default [22:08:51] are you writing some process supervisor [22:08:52] ? [22:09:00] nuts, I can't change the gerrit ACL https://gerrit.wikimedia.org/r/#/admin/groups/1315,members [22:09:33] For situating us historically ;), https://docs.google.com/document/d/1UkfeQFvOQ0FVLdNJN1rJSEVPwCOoXeIb8wJ2AyBY5hc/edit [22:09:46] We're killing Jenkins [22:09:56] the FR use cases are pretty simple, we think. [22:10:38] This tool takes yaml job descriptions and can either control them in a safe enough fashion that fr-tech devs have commandline access on production boxes, [22:11:02] and we also write a cron.d entry which schedules the scheduled jobs [22:11:28] It's been fun work so far, and really easy to motivate myself to do every morning: [22:11:35] kill Jenkins [22:11:37] do you use jenkins now as a generic distributed scheduling job control mechanism? [22:11:45] the prime minister would be pleased [22:11:54] chasemp: yes except not distributed [22:11:58] heh, almost as motivating as killing activemq [22:12:06] There's only one instance and it controls just that box [22:12:13] other boxen are already puppet->cron [22:12:41] ejegg|phone: :) motivating enuf to be smartfoning to help kill [22:13:32] I don't anticipating us having any jobs that need to be managed across dynamically scaled servers, so we're fine with a separate set of configuration files per box [22:13:50] They come as .yaml files, one per job [22:14:10] the README might be of interest [22:14:35] I will indeed peruse thanks [22:14:55] Any time--it's late there eh [22:15:34] I'll be at it for the rest of the week, then hopefully after that we'll be stable, any feedback still welcome of course. [22:16:30] cool [22:21:13] (PS1) Awight: update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 [22:23:43] Jeff_Green: ejegg|phone: I'd like the next episode to focus on repeated failure handling. [22:24:25] We have ad hoc things laying around which can temporarily disable jobs when more than some threshold number of jobs fail. [22:24:45] any tgoughts on where to store state? [22:24:46] This is a solved problem though, I hope? [22:24:55] not really. /var/log/process-control/sqlite? [22:25:00] s/log/lib [22:25:45] Can we monitor the failures with an external tool, and run-job --disable JOB rather than building the machinery ourselves? [22:25:55] fail2ban? [22:26:13] hah, seems odd, but maybe? [22:27:47] would mean a separate set of logs for errors, i think [22:27:56] https://pypi.python.org/pypi/backoff [22:28:08] so failmgr has something to watch [22:28:58] so, disabling the job means rewriting crontab? [22:29:03] I don't understand where that backoff module stores its tallies... [22:29:29] or updating the yaml? [22:29:50] Self-modifying? [22:29:53] yipes [22:30:28] I was thinking, run-job --disable would set something in sqlite and we OR with yaml.disabled [22:31:10] you could store via pickle [22:31:12] do we check the sqlite only for runs via cron somehow? [22:31:38] * awight gets acid reflux thinking about .pickle [22:31:38] why not store the last job state in a file, and always check that at startup? [22:31:46] why acid? [22:31:50] Jeff_Green: I think we want >> 1 job's state [22:32:07] pickle is opaque to the non-pythonic world is all [22:32:21] so only useful if you really need to store native data structures [22:32:28] ah. it made sense to me from the perl (storable) world so I figured it was groovy [22:32:45] well I s'pose the same merits would apply [22:32:59] but whatever the format, if you store a state file there's no reason you can't keep more history in it? [22:33:08] you would have to explain why that's okay over a beer [22:33:10] read it, decide how much of it to keep, write it back at the end? [22:33:41] sure, db and filesystem are pretty fluid [22:34:09] hehe and here I am arguing for yet another IP-coupled format, .sqlite [22:34:31] it's another module to load, more syntax to deal with [22:35:03] So these history files would persist even when the job has been successful? [22:35:13] yeah [22:35:30] that's how, for example, frdeploy crap keeps track of commits etc [22:35:49] Not to dismiss this interesting tangent, but if the "backoff" module I linked above is worth using, we should probably defer to whatever storage they've worked out. [22:36:15] awight: is it available as a debian package? [22:37:28] i don't know how featureful you want to be, but if it's just like "do something different if the past X runs were failures" it seems trivial to store in a little state file between runs [22:38:26] Hard to believe it, but nvm about the module. It's for sitting there and retrying a command 10x in the same python thread. [22:38:30] baahaha [22:38:39] :-P [22:39:06] Jeff_Green: yeah I'm happy to implement this feature, however simple. [22:39:22] I might as well wait until we're across the starting line tho and can migrate the CRM box [22:39:40] what does jenkins do now? [22:39:53] it seems to keep trying for some things? [22:41:04] We don't have any of that configured in J [22:41:27] a very few jobs have custom, application logic to self-suicide after a fail threshold [22:41:59] oic [22:42:13] Fundraising-Backlog, fundraising-tech-ops: process-control repeated failure handling - https://phabricator.wikimedia.org/T161567#3135567 (awight) [22:42:22] then I like your suggestion of circling back to this feature at a later date [22:42:27] done! [22:42:41] I'm getting confused about the MVP, myself. [22:43:19] The goalposts look quite nice in their new home, so I'm glad they've moved :) [22:43:35] I'll give the phabricator some TLC... [22:44:19] awight: curious about the deb makefile - why tar first then debuild? [22:44:49] ejegg: it's some peculiarity of debuild, AFAIK [22:45:07] pbuilder is cleaner, but much slower IIRC [22:45:25] k, debuild expects a tar file, huh? [22:46:39] it doesn't actually extract the file, but it compares the current files in your repo directory with the .orig.tar.* [22:46:57] Dies if there are any changes that aren't accounted for by patches, AIUI [22:55:39] (CR) Ejegg: [C: 2] "fancy!" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 (owner: Awight) [22:55:41] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135613 (awight) [23:04:00] fr-tech: Jeff_Green: https://phabricator.wikimedia.org/T161569 I'd love help defining Good Enuf [23:05:34] awight: looks like the process-control goals listed there are all met by the work in CR right now [23:05:55] I hope [23:06:18] But I might not be imaginative enough to foresee what else we need? [23:06:38] i really want streaming output, but I'd start testing without it [23:06:56] hmm [23:07:04] yeah that's for real [23:07:06] darn [23:07:44] pretty annoying to code, tho [23:07:49] ejegg: wait, is syslog good enough though? [23:07:54] for the short term [23:08:08] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135643 (awight) [23:08:13] assuming that the jobs all syslog whatever goes to stdout? [23:08:22] They... don't [23:08:31] But the ones we're concerned about might [23:08:36] k [23:08:39] e.g. queue consumers and audit parsers [23:08:44] rightright [23:09:14] yeah, i think we can start swapping out jobs without it [23:09:54] Sure, but I'm trying to keep my eye on "done" [23:09:59] (CR) Ejegg: [C: 2] --list-jobs action (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (owner: Awight) [23:10:00] i.e., swapped all jobs [23:10:42] (CR) Awight: --list-jobs action (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (owner: Awight) [23:12:03] Fundraising-Backlog, Patch-For-Review: process-control should save a log file per job run - https://phabricator.wikimedia.org/T161155#3135664 (awight) Open>Resolved [23:12:53] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135673 (awight) [23:14:13] awight: k, no streaming is still fine for that. next time we want to tweak silverpop performance we'll have some motivation! [23:14:37] hehe that thing is a beartrap [23:16:08] Fundraising-Backlog: process-control streams to log - https://phabricator.wikimedia.org/T161571#3135678 (awight) [23:16:18] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135691 (awight) [23:17:03] (CR) Ejegg: [C: -1] "lock.py needs to know about slug and run_dir, right? just pass to begin() ?" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 (owner: Awight) [23:17:13] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135613 (awight) [23:19:36] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3135708 (awight) [23:23:32] (CR) Ejegg: "would be really cool if JobWrapper listened for sigkill and logged the output so far plus a message about being killed" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [23:31:20] ejegg: awesome! [23:31:40] listening for sigkill? [23:31:42] I needa family for a few hours, but I'll respond and patch tonight. [23:31:53] I'll try my hand at that one [23:32:01] even better :D [23:32:30] (PS1) Awight: Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 [23:32:40] (CR) jerkins-bot: [V: -1] Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 (owner: Awight) [23:35:06] (PS5) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [23:35:08] (PS2) Awight: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 [23:35:10] (PS2) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 [23:35:12] (PS3) Awight: Show job status in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 [23:35:14] (PS2) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [23:35:16] (PS2) Awight: update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 [23:35:18] (PS2) Awight: Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 [23:45:22] Fundraising-Backlog, Wikimedia-Fundraising, MediaWiki-extensions-CentralNotice, Operations, and 2 others: Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyr... - https://phabricator.wikimedia.org/T45250#3135775