[01:09:35] (CR) Ejegg: [C: 2] update README (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 (owner: Awight) [01:11:03] (CR) Ejegg: [C: 2] Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 (owner: Awight) [01:44:52] (PS1) Ejegg: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 [01:49:51] (PS2) Ejegg: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 [04:11:11] (PS6) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [04:11:20] (CR) Awight: Scripts take no CLI arguments (7 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [04:13:33] (PS3) Awight: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 [04:20:54] (Abandoned) Awight: Show job status in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345022 (owner: Awight) [04:24:46] (PS3) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [04:24:48] (PS3) Awight: update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 [04:24:50] (PS3) Awight: Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 [04:24:52] (PS3) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 [04:25:23] (CR) jerkins-bot: [V: -1] update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 (owner: Awight) [04:25:25] (CR) jerkins-bot: [V: -1] Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 (owner: Awight) [04:27:43] Fundraising-Backlog: process-control script to list all jobs and statuses - https://phabricator.wikimedia.org/T161584#3136130 (awight) [04:38:37] (PS4) Awight: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 [04:38:39] (PS4) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [04:38:41] (PS4) Awight: update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 [04:38:44] (PS4) Awight: Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 [04:38:45] (PS4) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [04:38:47] (PS1) Awight: Store job slug [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 [04:38:49] (PS1) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [04:43:06] (Abandoned) Awight: Fix crontab CLI params [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345071 (owner: Awight) [04:44:06] (Abandoned) Awight: update README [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345063 (owner: Awight) [04:44:47] (PS5) Awight: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 [04:45:40] (PS2) Awight: Store job slug [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 [04:45:42] (PS7) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [04:45:44] (PS5) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [04:45:46] (PS5) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [04:45:48] (PS2) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [04:46:08] (CR) jerkins-bot: [V: -1] Store job slug [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 (owner: Awight) [04:47:17] (PS3) Awight: Store job slug [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 [04:47:19] (PS8) Awight: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 [04:47:21] (PS6) Awight: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [04:47:24] (PS6) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [04:47:25] (PS3) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [04:52:51] (CR) Awight: WIP still output when killed (4 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [04:54:21] (CR) Awight: [C: 1] "Also, just +1'ing as a nod to this being critical-path functionality, even for MVP. Thanks for thinking of it! I'll add to T161569." [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [04:54:51] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3136151 (awight) [05:05:50] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3136158 (awight) @Jgreen @cwdent I noticed that we're only provisioning to the new CRM server and not the old one. Wasn't the plan to migrate jobs on the o... [05:09:31] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3136160 (awight) job files are being provisioned as group www-data, 640, but I don't see why the webservers should be able to read these. We could the jenki... [05:15:52] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3136162 (awight) p:Triage>High [06:22:26] (PS3) Awight: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [06:22:28] (PS1) Awight: Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 [06:22:30] (PS1) Awight: [WIP] Tests for signal handling [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 [06:23:07] (CR) jerkins-bot: [V: -1] [WIP] Tests for signal handling [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 (owner: Awight) [06:23:09] (CR) jerkins-bot: [V: -1] WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [06:23:20] (CR) jerkins-bot: [V: -1] Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 (owner: Awight) [06:52:43] (PS2) Awight: Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 [06:52:45] (PS4) Awight: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [06:52:47] (PS2) Awight: [WIP] Tests for signal handling [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 [06:53:24] (CR) jerkins-bot: [V: -1] [WIP] Tests for signal handling [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 (owner: Awight) [11:54:54] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3136684 (Jgreen) We packaged it for Precise and puppetflung it to barium last week, is something absent? [14:49:04] Wikimedia-Fundraising, Technical-Debt: Refactor fundraising banner code - https://phabricator.wikimedia.org/T161608#3137017 (Pcoombe) [15:09:08] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3137108 (awight) When I looked last night, there was no /srv/p-c... [15:10:39] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3137109 (Jgreen) Ah, you're right--I just hadn't rsyncblastered it. It's there now. [15:23:34] fundraising-tech-ops, Operations, ops-eqiad, Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3137201 (Cmjohnson) frdev1001 is plugged into pfw1 port 5 [15:41:07] fundraising-tech-ops, Operations, ops-eqiad, Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3137300 (Cmjohnson) Added to racktables [16:03:03] (CR) Ejegg: "some distinctions between signals to the parent JobWrapper process and to the job subprocess" (3 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [16:04:50] (CR) Ejegg: "we can't catch sigkill, just sigterm and sigint" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 (owner: Awight) [16:16:54] (CR) Ejegg: [C: 2] "Looks like some extra code got caught up in the rebase carousel, but it's good code!" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 (owner: Awight) [16:17:18] (Merged) jenkins-bot: Store job slug [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345088 (owner: Awight) [16:23:12] fundraising-tech-ops, Operations, ops-eqiad, Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3137505 (Jgreen) [16:34:36] (CR) Ejegg: [C: 2] "Looks good! The wizardry that makes validate_global_config pass under test is a bit too advanced for me." (4 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [16:39:27] (PS6) Ejegg: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 (owner: Awight) [16:39:48] (CR) Ejegg: [C: 2] Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 (owner: Awight) [16:41:43] (Merged) jenkins-bot: Scripts take no CLI arguments [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344979 (owner: Awight) [16:42:53] (CR) Ejegg: "nother function caught in the rebase blender" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [16:48:54] (CR) Ejegg: "JobWrapper.status seems to have forgotten about the run_dir now! Maybe you're moving that into the lock code?" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [16:49:18] (Merged) jenkins-bot: Makefile for lulz [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345006 (owner: Awight) [17:00:44] fr-tech: If you think education is expensive, try ignorance. [17:00:44] -- Derek Bok, president of Harvard [17:00:44] -- discuss. [17:01:16] sorry fr-tech, got to miss another -talk, meeting someone for lunch soon! [17:02:56] ¡Buen provecho! [17:08:36] https://www.change.org/p/al-franken-give-spiders-all-the-help-they-need-to-eat-every-human-on-earth-within-one-year?recruiter=701699165 [17:11:08] Fundraising Sprint Waiting for Godot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Fix Civi bug where creating a smart group with total giving >= 1000 saves as = 1000 - https://phabricator.wikimedia.org/T152044#2836375 (DStrine) I see some commits on this. let's discuss h... [17:16:23] haha I love poking into fr-tech-talk and finding out that I'm not late :p [17:17:36] awight I'm missing it today, sorry! [17:17:59] no worries, I'm just chuckling at not being the last one [17:20:27] cwd: hiya, you recover from your trip? [17:23:52] ejegg|lunch: I wrote a simple test for your WIP, as a followup patch, and also rebased your PS so watch out! [17:25:02] awight: more or less, just catching up on email [17:25:09] and the process-control progress [17:25:23] and the apparently unrelated orphan rectifier trash fire? [17:26:43] Yeah that was a nice touch, to discover that a ton of our tests are flawed [17:31:25] awight: it looks like a junk response from their api? [17:33:35] It was us stashing non-cc transactions in the pending db [17:33:56] The orphan rectifier has no way of filtering by payment_method, so it was churning through all the iDEal [17:34:28] Only happens when the initial payment response is a redirect, which is why we didn't notice it earlier [17:36:12] cwd: so uh [17:36:16] wanna do some CR? [17:36:25] https://gerrit.wikimedia.org/r/#/projects/wikimedia/fundraising/process-control,dashboards/default [17:36:56] yeah sure thing [17:36:59] TY [17:37:00] i will dig in to this [17:44:09] can you dig it [17:45:03] awight: what is a slug? [17:46:03] Might not be the industry standard term [17:46:21] but AIUI that's what you call the snippet of URL that looks nice for robots [17:46:34] so e.g. Orphan Rectifier would be orphan-rectifier or whatev [17:46:58] oh hey it is a thing. https://en.wikipedia.org/wiki/Slug_%28web_publishing%29 [17:47:37] ah yes SEO urls [17:47:43] didn't know that term [17:48:22] We could give it a better name--this would be the time to do it [17:48:28] "short name" "machine name"? [17:48:55] Alternatively, we could go all radical and use full human naming for even the job files. [17:49:02] Ingenico Orphan Rectifier.yaml [17:49:06] I wouldn't mind. [17:49:25] spaces in filenames trip me out [17:49:31] If we did that, I would probably want to get rid of the "name" key in the file cos DRY [17:49:36] kk let's leave the slugs then [17:49:40] https://en.wikipedia.org/wiki/Slug [17:49:45] (PS1) XenoRyet: Mark refunds with correct gateway [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) [17:50:11] cwd: Bigger picture, can you look over these points? https://phabricator.wikimedia.org/T161569 [17:50:24] Let's define the MVP for logging while we're at it. [17:50:49] This being p-c logging, not the stdout/err capture of the subprocess [17:51:11] yeah, nice phab ticket [17:51:13] til that Python logging is an acceptable interface and we don't have to roll our own. [17:51:25] that's good [17:52:06] The end goal is something like, log p-c actions to syslog at info level. But is that necessary for MVP? [17:52:41] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3137911 (AndyRussG) [17:53:32] hmm [17:54:41] <3 https://docs.python.org/2/library/logging.config.html [17:54:56] So easy. We dump a global config block into logging.config.dictConfig [17:54:59] awight: are we still doing files in /tmp? [17:55:34] cwd: Just the lockfiles at the moment [17:55:43] Happy to move them. There's a config for that [17:55:53] you just have to build the puppet theater [17:55:59] yea [17:56:06] it doesn't bother me [17:56:10] using /tmp for that [17:56:16] p-c.example.yaml at the tail of the patchchain has the current defaults [17:58:42] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3137911 (Nuria) @AndyRussG: can we look at the data to make sure it is safe to retain, the risk normally comes from cross checking datasets and evaluating that risks is something we have to do befo... [18:04:16] awight: how come defaulting to not stdout the crontab? [18:05:58] cwd: You can stdout it if you set output_path=console [18:06:21] But Jeff's opinion was that the normal invocation should have zero CLI arguments [18:06:36] so I think we want the default behavior to be, actually write the thing in place [18:06:49] oh--yeah and ">" is not fun in a sudo script [18:07:09] cron-gen | sudo tee ? [18:07:55] Maybe you can talk Jeff into a wrapper which calls cron-gen and does the > cron.d from there? [18:08:12] iono if that's win though [18:08:35] yeah i'd have to write both scripts and compare [18:08:38] probably 6's [18:09:56] now that i think about it the case for no args is probably legit [18:10:19] It feels sorta nice doing it all in the script, because we can do "final" validation of the output like USER == jerkroot [18:26:57] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3137911 (DStrine) @Nuria I just added you to an email about the legal requirements. TLDR: we are allowed to keep the data for now. We need a place to park the data so we can aggregate it. That will... [18:40:43] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3138098 (Ottomata) FYI, we are planning on improving Hive EventLogging integration next quarter: T153328 [19:11:53] (PS1) Awight: Initialize Python logging from global configuration [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345202 [19:14:25] (CR) Ejegg: [C: -2] "Need to look for something else" (1 comment) [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) (owner: XenoRyet) [19:14:54] (CR) Ejegg: [C: -2] "(and tests are always nice to have!)" [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) (owner: XenoRyet) [19:15:07] (CR) Ejegg: [C: -1] "(and tests are always nice to have!)" [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) (owner: XenoRyet) [19:15:46] sorry for the reviewspam XenoRyet [19:16:00] No worries. Good call on the translation, that slipped my mind. [19:16:26] I'll write up a test for it too, it'll be good to have. [19:16:34] thanks! [19:17:48] I put a couple sample EC uefind IPNsup here: https://gerrit.wikimedia.org/r/344701 [19:17:53] erk [19:18:03] refund IPNs, that is [19:18:24] Yea, I saw those. Thanks, they're useful. [19:18:49] Anyway, gonna go grab a bite now. Be back in a while. [19:30:13] (CR) Ejegg: [C: 2] ignore build products [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344980 (owner: Awight) [19:32:59] (Merged) jenkins-bot: ignore build products [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/344980 (owner: Awight) [19:33:40] awight: is sigkill branch about cleaning up on ^C? [19:36:19] logging just got awesome, [19:36:20] --------------------------------------------------------------- Captured log --------------------------------------------------------------- [19:36:24] lock.py 39 INFO Writing lockfile. [19:36:26] job_wrapper.py 94 ERROR Job Timing out job timed out after 0.1 seconds [19:36:29] lock.py 49 INFO Clearing lockfile. [19:36:32] job_wrapper.py 80 ERROR Job Timing out job failed with code -9 [19:36:35] cwd: yeah exactly. [19:36:42] Most importantly, don't drop the captured logs on ^C [19:37:20] woo [19:37:27] yeah good thinking [19:37:41] although hopefully we're not really in a position to ^C these generally [19:38:48] awight: nice! [19:39:12] :D [19:39:26] cwd also to preserve output on --kill-job [19:39:28] cwd: Things happen though, for example fatal PHP error [19:39:39] where the process sends itself a signal [19:39:41] which actual sends sigterm, so, branch is misnamed [19:40:04] awight: ahh, so subprocess CAN send something to its parent? [19:40:21] oh right--well it sends SIG_CHILD when it dies, I believe [19:40:28] * awight wonders now how that can be [19:40:37] but subprocess must be intercepting that. [19:40:48] ^C sends sigint [19:41:05] K I see, cool I'm wrong that PHP failures sig process-control [19:41:13] which i think turns into sigkill if unhandled [19:41:36] whoa [19:41:38] the way we run it i would assume that is handled by apache? [19:42:02] well not in this situation [19:42:05] duh [19:43:01] so if you send sigint to a python process that has shelled out php [19:43:07] does it pass that along to php? [19:43:27] cwd when we intercept it, we can decide what to send to the subprocess [19:43:47] right now we get a sigterm and we kill subprocess with sigkill [19:43:54] maybe a bit harsh! [19:44:31] if you don't do anything does php keep running? [19:44:48] yep, you can totally ignore sigterm if you want [19:45:12] magic [19:45:55] sigint might be good in that situation [19:46:23] https://lwn.net/Articles/278717/ [19:46:34] to send to the child process? [19:46:55] yeah [19:47:02] k, sounds good [19:47:26] you could handle it on the php side if there was something special to do there [19:47:33] but otherwise it will just kill it [19:47:47] does that sound reasonable to you? [19:48:08] yeah, makes sense to let the job clean up if it's got a shutdown handler [19:48:36] in fact, our python scripts that use any db stuff try to rollback and disconnect on termination signals [19:50:31] Fundraising-Analysis, Fundraising-Backlog: Upstream remaining Civi fixes - https://phabricator.wikimedia.org/T161645#3138256 (Eileenmcnaughton) [19:58:37] Fundraising Sprint Waiting for Godot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Fix Civi bug where creating a smart group with total giving >= 1000 saves as = 1000 - https://phabricator.wikimedia.org/T152044#3138282 (Eileenmcnaughton) Yep this is done - fixed upstream... [20:00:38] (PS7) Cdentinger: --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [20:09:02] Aargh--log capture is coupled to testing framework, so we can't easily support both nose and pytest [20:10:15] In other exciting news, the nosetests log capture somehow delays string formatting, so you might see a logline with values that weren't assigned until later in control flow. [20:10:19] https://www.rotemy.com/blog/posts/2012/nosetests-logcapture-pitfall/ [20:10:51] Won't affect us, but makes for a dramatic pitfall! [20:16:15] sounds bizarre [20:18:34] (PS5) Ejegg: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 [20:18:44] (CR) jerkins-bot: [V: -1] WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [20:22:48] hrm? [20:23:06] cwd / awight ^^^ [20:23:13] same failure locally [20:23:32] but it looks like the 'process' property isn't accessible from the signal handler [20:23:50] or maybe I'm not starting that run_thread correctly [20:29:25] (PS2) Awight: Initialize Python logging from global configuration [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345202 [20:29:27] (PS1) Awight: Log, don't print [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345221 [20:34:36] ejegg: That doesn't like the right way to test the signal handler--can't we send the parent process a real signal? [20:35:00] the job wrapper is running in the same process as the tests, isn't it? [20:35:10] Ah, I guess the test should use subprocess [20:35:23] Was this too messy? https://gerrit.wikimedia.org/r/#/c/345095/2/tests/test_job_wrapper.py [20:36:02] awight: doesn't that make the test try to kill itself? [20:36:09] ah :) [20:36:13] that's why it wasn't working [20:36:21] lemme try with subprocess [20:36:35] I was expecting the job_wrapper signal.signal to protect us, still [20:36:54] hm, right [20:37:17] os should allow a process to send itself whatever signals it wants? [20:38:06] I was wondering, would it be better if the lockfile held child PID? [20:38:21] If we send signals to the child and not the parent, we might get away with reduced complexity [20:38:27] i.e., no need for signal handlers. [20:38:38] oh hey, that sounds good! [20:38:54] interesting [20:39:06] the handler will exit anyway once the child does [20:39:07] mm one gotcha, we don't have a child PID when we want to assert the lock [20:39:42] drat [20:39:55] I like that the p-c process can be killed easily, for example if it got stuck somehow. Once we're streaming logs that won't be as big a deal. [20:40:04] k [20:40:17] But we could open the lock with "" or whatever [20:40:27] then write child PID later [20:40:29] eeeew [20:42:11] erk, subprocess is going to make mocking pretty damn hard [20:42:15] hmm [20:42:51] lemme rebase this on top of some logging to see what's actually happening [20:43:35] :) [20:43:47] thanks for that, BTW! [20:44:00] is this for sure an MVP feature? [20:44:09] cwd killing commands? [20:44:19] i feel like we might be getting a little speculative about what we want signal handling to look like [20:44:19] I was about to discuss whether or not syslogging was MVP, then it just happened to write itself [20:44:27] without having actually plugged it in and saw [20:44:40] +1 but we really do have to solve the problem of dropping logs [20:44:44] awight: yeah that seems like an obvious win [20:44:46] we do want to be able to stop stuff, too [20:44:47] syslog [20:45:17] right, the streaming stdout is maybe too tricksy for mvp [20:45:24] signal handling oughtta be simpler [20:45:31] well, since cron is firing these jobs off how does ^C come into the picture? [20:45:34] Yeah I think that was the right direction to go in [20:45:42] cwd: How do we kill a job? [20:45:58] Currently, see https://gerrit.wikimedia.org/r/#/c/345062/2 [20:46:07] how did we kill a jenkins job? [20:46:25] a cryptic button in the web ui, Iguess [20:46:33] i don't even remember this button [20:46:47] right. the little red square in the workers pane [20:46:52] i mean the linux command line provides a facility for killing stuff [20:47:03] yeah but devs can't do that [20:47:22] and the signal handler is needed so killing doesn't mean losing all the job's stdout [20:47:22] So I implemented "run-job --kill-job [sorry] JOB_NAME" [20:47:49] well, i never killed a jenkins job before [20:47:56] Also, jobs die on their own. And in unix that can look like the child got a signal [20:48:16] Good :) [20:48:36] so i think it's safe to say we don't have to do it very often [20:50:07] so all's i'm saying is the use case sounds nebulous to me, but would probably become clear upon using it in prod for a minute [20:50:14] Yeah but the odds of needing that lever as we're deploying an entirely new job runner are higher than usual [20:51:04] and ty for pushing back on fuzzy use cases! [20:51:10] what cases can you see using it besides something appears to be hanging? [20:51:38] cause we have a timeout for that right? [20:53:13] (PS1) Awight: Log notable events [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 [20:53:21] (CR) jerkins-bot: [V: -1] Log notable events [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 (owner: Awight) [20:54:19] cwd: totally. How about I push it to the end of the patch chain so we don't have to worry... [20:54:49] It shouldn't block more important features [20:56:23] yeah sounds good [20:56:28] (PS3) Ejegg: Initialize Python logging from global configuration [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345202 (owner: Awight) [20:56:37] i am not trying to say i don't think we should have this feature [20:56:41] (CR) Ejegg: [C: 2] "Dig it!" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345202 (owner: Awight) [20:56:49] just that we might make more informed decisions about it later [20:57:06] once we've seen how this thing behaves IRL [20:57:38] awight: the status stuff looked mergeable if you want to bring that to the front of the line [20:57:41] (PS2) Awight: Log notable events [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 [20:57:50] ejegg: kk [20:57:57] You have any unsubmitted PS? [20:58:00] (Merged) jenkins-bot: Initialize Python logging from global configuration [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345202 (owner: Awight) [20:58:09] awight: nope, all my cards are on the table [20:58:09] whew! [20:59:05] oh hey, backlog grooming [21:01:00] I'll sit it out, enjoy! [21:02:19] (PS7) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [21:02:42] no that's not it... [21:04:26] Fundraising-Analysis, Fundraising-Backlog: Upstream remaining Civi fixes - https://phabricator.wikimedia.org/T161645#3138411 (ggellerman) p:Triage>Normal [21:07:04] Fundraising-Backlog, FR-Amazon: NULL referrers - https://phabricator.wikimedia.org/T161539#3138416 (ggellerman) p:Triage>High [21:09:24] (PS8) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [21:09:27] ejegg: we can't hear you [21:09:31] you are super-robot [21:09:39] * awight hunts for GIFs [21:09:44] can you see the hangout chat? [21:10:07] dstrine: can you see my chat messages? [21:10:11] http://images6.fanpop.com/image/photos/32500000/Robot-Unicorn-Attack-Evolution-robot-unicorn-attack-32539428-900-612.jpg [21:12:33] Fundraising-Backlog, Patch-For-Review: process-control script to list all jobs and statuses - https://phabricator.wikimedia.org/T161584#3138426 (ggellerman) p:Triage>High [21:13:34] Fundraising-Backlog: process-control streams to log - https://phabricator.wikimedia.org/T161571#3138429 (ggellerman) p:Triage>Normal [21:17:05] Fundraising Sprint Waiting for Godot, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, Patch-For-Review: Fix Civi bug where creating a smart group with total giving >= 1000 saves as = 1000 - https://phabricator.wikimedia.org/T152044#3138438 (ggellerman) Open>Resolved [21:17:33] Fundraising-Backlog, fundraising-tech-ops: Move all /etc/fundraising config into /etc, drop the subdirectory - https://phabricator.wikimedia.org/T161544#3138441 (Ejegg) p:Triage>Low [21:18:09] Fundraising-Backlog, fundraising-tech-ops: process-control repeated failure handling - https://phabricator.wikimedia.org/T161567#3138443 (Ejegg) p:Triage>Normal [21:18:26] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Added new Tag to Production Civi - https://phabricator.wikimedia.org/T142439#3138445 (ggellerman) Open>Resolved [21:20:57] Fundraising Sprint Octopus Untangling, Fundraising Sprint Qwerty Thwacking, Fundraising Sprint Rocket Surgery 2016, Fundraising Sprint Stirring The Pot, and 4 others: Epic: Create frack vm cluster - https://phabricator.wikimedia.org/T142533#3138449 (ggellerman) Open>Resolved [21:21:22] Fundraising-Backlog: Turn process-control lock module into a context manager - https://phabricator.wikimedia.org/T161536#3138450 (Ejegg) p:Triage>Low [21:21:39] Fundraising-Backlog, FR-Smashpig: SmashPig pending db pruner is broken - https://phabricator.wikimedia.org/T161260#3138452 (Ejegg) p:Triage>Normal [21:23:38] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: are Ingenico chargebacks creating errors in Civi? - https://phabricator.wikimedia.org/T131154#3138467 (ggellerman) Open>Resolved [21:23:50] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: are Ingenico chargebacks creating errors in Civi? - https://phabricator.wikimedia.org/T131154#2157585 (DStrine) @MBeat33 Have we seen any new issues like this? Since this is pretty old we will resolve this until we hear more. [21:36:33] Fundraising-Backlog, FR-Ingenico, MediaWiki-extensions-DonationInterface, MW-1.29-release (WMF-deploy-2017-03-28_(1.29.0-wmf.18)), Patch-For-Review: Orphan rectifier is silent about communication failures - https://phabricator.wikimedia.org/T161160#3138512 (ggellerman) p:Triage>Normal [21:39:15] Fundraising Sprint Far Beer, Fundraising-Backlog, Unplanned-Sprint-Work: Orphan rectifier was getting stuck on non-CC transaction - https://phabricator.wikimedia.org/T161651#3138518 (Ejegg) [21:45:43] fundraising-tech-ops: upgrade all frack servers to debian/jessie - https://phabricator.wikimedia.org/T146479#3138544 (cwdent) [21:45:45] fundraising-tech-ops: frack eqiad hardware refresh - https://phabricator.wikimedia.org/T133524#3138545 (cwdent) [21:45:47] Fundraising Sprint Far Beer, fundraising-tech-ops, Epic: EPIC: build fundraising civicrm (barium) replacement server on Debian Jessie, with HHVM or PHP5.5 - https://phabricator.wikimedia.org/T136959#3138540 (cwdent) stalled>Open a:Jgreen>cwdent [21:46:41] Fundraising-Backlog, fundraising-tech-ops: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3138552 (cwdent) [21:46:43] Fundraising Sprint Far Beer, fundraising-tech-ops, Epic: EPIC: build fundraising civicrm (barium) replacement server on Debian Jessie, with HHVM or PHP5.5 - https://phabricator.wikimedia.org/T136959#2353563 (cwdent) [21:48:34] Fundraising Sprint Deferential Equations, Fundraising Sprint English Cuisine, Fundraising Sprint Far Beer, Fundraising-Backlog: Update fr-tech job roles on wiki - https://phabricator.wikimedia.org/T158710#3044743 (cwdent) Open>Resolved Looks great! [21:51:18] Fundraising Sprint Far Beer, Fundraising-Backlog, Patch-For-Review: process-control script to list all jobs and statuses - https://phabricator.wikimedia.org/T161584#3138578 (ggellerman) [21:51:29] (Abandoned) Awight: [WIP] Tests for signal handling [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345095 (owner: Awight) [21:54:32] fr-tech sorry, can't even hear you all now! [21:55:39] awight: how about --kill sends sigint to the subprocess, closes the logs, and exit()s [21:55:42] (PS4) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [21:55:44] (PS3) Awight: Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 [21:55:46] (PS6) Awight: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [21:55:53] does it need to be more complicated than that? [21:56:08] cwd: I think that's already implemented, even. [21:56:32] cwd --kil is in another process though [21:56:41] and can't talk directly to the subprocess [21:56:54] we need the signal handler to get the logs closed [21:56:55] But it can easily send a sigkill to the subprocess [21:56:57] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: are Ingenico chargebacks creating errors in Civi? - https://phabricator.wikimedia.org/T131154#3138587 (MBeat33) Nope, haven't seen any recently so fine with me to resolve - thanks. [21:57:08] hum, ~easily [21:57:09] awight: oh right, if we record subprocess pid [21:57:13] right, that [21:57:16] ejegg: well i was just thinking we log "i was killed at " [21:57:25] haha how meta [21:57:29] cwd yeah, that takes a signal handler [21:57:36] it seems like how the php procs clean up is not this thing's problem [21:58:19] right, but ditching all the intermediate logs to something we had to kill is no good for diagnostics [21:58:41] i get that, but why would we kill something? [21:59:46] like what situations do you imagine needing to kill a process? [22:01:07] iono, maybe it's sending tons of email with the wrong text [22:01:26] or deleting contacts or somethign [22:01:35] would we even know before it finished? [22:01:47] depends on the job [22:02:48] (PS8) Awight: [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [22:03:21] i mean, i fully intend to be around for all the testing of this [22:03:35] and i can kill(1) something if it's thrashing [22:03:51] ejegg: Wait, why does "i was killed" need a signal handler? If the child process is killed by a signal, then the Popen stuff exits normally and p-c has a chance to capture logs, etc. [22:03:52] but it seems like an edge case [22:04:10] yeah, p-c should exit normally [22:04:15] (CR) jerkins-bot: [V: -1] WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [22:04:22] if the php procs aren't cleaning up properly that's their problem [22:04:37] you can send it sigint so it has an opportunity to respond [22:04:45] (CR) jerkins-bot: [V: -1] [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [22:05:01] child proc cleaning up properly isn't a concern, that I know of [22:05:36] The concern is actually about the p-c parent process cleaning up properly [22:06:13] The way I had implemented --kill-job, it would sig the parent [22:06:18] which caused problems. [22:06:37] awight: ah right, 'i was killed' can be determined from the exit code [22:06:37] yeah, i think sigint the child and bail would be the ideal situation [22:07:42] ejegg: I like the compromise you've already coded up--if the parent process receives a signal, it kills the child. [22:07:58] That lets us be lazy about what we sig [22:08:08] but what sends the parent proc the signal? [22:08:13] why not just use an arg? [22:08:27] How do you plan to find the process? [22:08:32] ps | grep? [22:08:37] (PS2) Ejegg: Log, don't print [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345221 (owner: Awight) [22:08:49] We already have lockfiles that tell us whether a job is running and where to kill it [22:08:51] (CR) Ejegg: [C: 2] "Cleaner test runs as a bonus!" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345221 (owner: Awight) [22:08:57] --kill jobname ? [22:09:02] right [22:09:16] (Merged) jenkins-bot: Log, don't print [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345221 (owner: Awight) [22:09:20] i mean, that's all I was thinking. [22:10:14] although, if the config file changed while it was running it wouldn't know where to look [22:10:37] shell abstractions are always leaky [22:11:10] (PS3) Ejegg: Log notable events [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 (owner: Awight) [22:11:26] but only one will run at a time right? [22:11:34] we do not have concurrency support at this stage do we? [22:11:44] one of each type i mean [22:11:47] yeah nothing about our jobs can handle concurrency yet [22:11:58] Except maybe eileen's deduping from the future :_) [22:12:26] (CR) Ejegg: [C: 2] "Useful!" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 (owner: Awight) [22:12:49] (Merged) jenkins-bot: Log notable events [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345236 (owner: Awight) [22:12:58] so it should be easy to look at ps and know which job it is [22:13:44] why would we go to ps when we already have the info tho [22:14:35] the lockfile is named the job name? [22:14:39] y [22:14:44] sure that works too [22:14:56] another thing that we will have to twiddle if we decide we need concurrency [22:15:37] soon, we'll need history and structured status to do fail counts [22:16:00] should be able to get it out of syslog right? [22:16:04] (PS9) Ejegg: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) (owner: Awight) [22:16:30] no, programs shouldn't take anything out of syslog IMO [22:16:47] (CR) Ejegg: "undo the config.py change?" (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) (owner: Awight) [22:17:31] well the data is available anyway [22:18:20] (CR) Awight: --list-jobs action (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) (owner: Awight) [22:19:06] (PS9) Awight: [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [22:19:15] (CR) jerkins-bot: [V: -1] [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [22:21:41] (PS10) Awight: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) [22:22:34] cwd: I think we could iterate the p-c.deb today, if you have time? [22:23:04] yeah definitely [22:24:15] (CR) Ejegg: [C: 2] "Nice! Horks on bad config files, but that can be TODO" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) (owner: Awight) [22:24:44] awight: i think we just have to update the packages repo and everything else will happen? [22:25:00] (Merged) jenkins-bot: --list-jobs action [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345008 (https://phabricator.wikimedia.org/T161584) (owner: Awight) [22:25:15] cwd: also need to manually update the /etc/process-control.yaml template [22:25:19] and I think tweak the path [22:25:33] U might also look through the default paths and see if they need puppet creating [22:28:52] (PS2) XenoRyet: Mark refunds with correct gateway [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) [22:29:31] (PS3) XenoRyet: Mark refunds with correct gateway [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) [22:29:33] ok [22:30:33] (PS4) XenoRyet: Mark refunds with correct gateway [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) [22:31:36] awight: weird, I guess you /do/ need to import logging.config [22:31:53] awight: have you run a job on live yet? [22:31:54] (CR) XenoRyet: "Existing tests seem to cover this, as we're throwing both old and new style messages through and checking the gateway field on the way out" [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/345189 (https://phabricator.wikimedia.org/T161121) (owner: XenoRyet) [22:32:04] I guess that pulls logging.getLogging in along with it? [22:32:10] argh? I had it succeed both ways locally, and fail both ways locally. [22:32:18] fnord.... [22:32:53] cwd: sort of. yes, but with issues [22:33:25] I ran the orphan rectifier, and learned that we had been choking on iDEALs all week [22:33:42] ah, so were the failmails actually somehow related? [22:33:55] I forget what the failmails were about... [22:34:03] something else, maybe [22:34:37] it was some failure on their end i think [22:34:46] but thought maybe you surfaced it by changing something [22:35:12] (PS1) Ejegg: Show jobs with invalid configuration in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 [22:37:47] (PS1) Ejegg: Fix logging.config import [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345259 [22:38:40] (PS5) Ejegg: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [22:38:50] (CR) Cdentinger: Show jobs with invalid configuration in --list-jobs (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 (owner: Ejegg) [22:42:39] (PS2) Ejegg: Show jobs with invalid configuration in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 [22:42:59] (CR) Ejegg: Show jobs with invalid configuration in --list-jobs (1 comment) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 (owner: Ejegg) [22:44:02] (PS6) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [22:44:04] (PS4) Awight: Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 [22:44:06] (PS7) Awight: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [22:44:08] (PS10) Awight: [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 [22:44:24] ejegg: crap, I think we're destructively rebasing [22:44:42] awight: which paths are you concerned about? we shouldn't need the cron stuff to just test the runner right? [22:44:43] awight: dang, what did we lose? [22:44:47] (CR) jerkins-bot: [V: -1] WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [22:44:49] (CR) jerkins-bot: [V: -1] [WIP] --kill-job [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345062 (owner: Awight) [22:44:52] (CR) jerkins-bot: [V: -1] Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [22:44:54] (CR) jerkins-bot: [V: -1] Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 (owner: Awight) [22:45:09] but maybe the /var/log one... [22:45:09] ejegg: Just noticing that you're producing PSs of the same changes as I am [22:45:12] ah, passing tests [22:45:29] awight: oh, i've been rebasing before CR+2ing [22:45:37] kk should be harmless then [22:47:45] awight: oh weird, we did lose 'from processcontrol ' before 'import config' in that lock.py rebase [22:48:24] or.. no we didn't [22:48:35] just have that 'import config' to delete [22:50:17] (PS7) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [22:51:02] (PS5) Awight: Only speak in slugs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345094 [22:52:26] awight want to merge https://gerrit.wikimedia.org/r/345259 (restores your logging.config import) [22:52:29] ? [22:52:29] (PS8) Awight: WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [22:52:42] Do we need it? [22:52:51] awight: I can't run stuff locally without it [22:53:02] AttributeError: 'module' object has no attribute 'config' [22:53:09] I sort of want to understand what's going on... [22:53:15] How are you running? [22:53:22] which commit? [22:55:10] http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html#the-submodules-are-added-to-the-package-namespace-trap [22:55:16] I'm on bfa63e7f34f7538c4171163a53e0f903a6ccc18e and get success with both "tox" and "pytest" [22:55:20] the example is even logging.config [22:55:46] (CR) jerkins-bot: [V: -1] WIP still output when killed [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345082 (owner: Ejegg) [22:55:46] i'm on d467d5378c9d [22:56:13] ejegg: ty, good find [22:57:03] (PS2) Awight: Fix logging.config import [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345259 (owner: Ejegg) [22:57:11] (CR) Awight: [C: 2] Fix logging.config import [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345259 (owner: Ejegg) [22:57:17] ty! [22:57:40] (Merged) jenkins-bot: Fix logging.config import [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345259 (owner: Ejegg) [23:01:12] (CR) Ejegg: "This is great! maybe rename to run_directory to match output_directory?" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [23:01:55] eh, rename can happen after things merge if that'll make things less confusing [23:02:58] no sweat, I might as well do it now. [23:03:03] cool [23:03:10] Wrestling with some log_capture for a minute [23:03:58] (PS1) Awight: Nicer log capture [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345260 [23:05:55] (PS8) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [23:07:12] cwd: lmk which SHA you grab if you get around to packaging [23:07:38] Right now, the open CR is internal cleanup and nice to have features [23:08:07] awight: ooh, i sorta thought you did the packaging :) [23:08:23] i was starting to get familiar before i left but am not a pro [23:08:50] iono how Jeff likes to do it, I think I saw a .sh maybe [23:08:59] pbuilder... something [23:09:09] yeah, there's a note in the packages repo [23:09:25] if he did it on a prod box i can find history [23:11:13] yeah i see it [23:11:54] (CR) Ejegg: "Oops, JobRunner.status() still needs to know about this" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [23:13:02] ejegg: I gotta encapsulate this under lock... [23:13:20] k, different patch then? [23:14:02] (PS9) Awight: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 [23:15:04] (PS10) Ejegg: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [23:15:21] (CR) Ejegg: [C: 2] Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [23:15:32] awight: would you like me to try rolling master in? [23:15:48] I've gotta scoot in a few, but I'll be back on late [23:16:07] (Merged) jenkins-bot: Configurable working files directory, run_dir [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345089 (owner: Awight) [23:17:26] cwd: totes. I think master has everything we need [23:17:55] & it would be good to get some more real-world feedback before spinning off on hypotheticals ;) [23:18:44] agreed, i think i can trace the bash history [23:19:19] n.p. if it doesn't happen today... [23:19:37] the thing i'm not seeing is it getting built for precise [23:19:41] maybe that didn't happen yet? [23:20:26] it did [23:20:29] iono how though [23:21:09] yup it's deployed on oldbox [23:21:12] mmm i might have found it [23:21:43] * awight imagines a serpentine hallway with perhaps more dragons than actually lie dormant [23:22:19] reading someone else's bash history makes me feel dirty [23:22:46] :D [23:22:59] oh, we forgot to tell you something about this job... :p [23:23:19] heh [23:23:21] it's dirty? [23:23:38] It's a dirty world but it still spins [23:25:55] (PS1) Awight: mini-lock login encapsulation [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345265 [23:28:41] cwd: Idly storytelling for your amusement, the second pilot job I tried to run was the PP audit downloader... [23:29:00] which failed because it relied on shell syntax to set PYTHONPATH env [23:29:31] awight: heh, well maybe it's good we're dredging stuff up [23:30:00] got an environment: key now [23:30:15] undoubtedly good to dredge here [23:30:37] too bad about the time crunch [23:31:09] yeah I'm thrilled to do this, but it sucked that the work is coupled to an urgent upgrade [23:31:45] not too wounded about it however, cos we might have never done it otherwise [23:31:49] i'm not sure what the lesson is besides keeping an EOL calendar [23:32:08] and do everything on it 3 months early :) [23:32:52] The only workarounds I can think of would have been to press harder on the Jenkins backport [23:33:30] but yeah we all sort of agreed to let that one DNR [23:34:17] it may have been a better short term fix [23:34:24] +1 that. [23:34:25] i had a lot of Gut Feelings about the scope of the upgrade [23:34:40] it was just messy from the get go [23:34:59] It's just that ruthlessly severing subprojects is the only way to survive an upgrade like this [23:35:05] but i can't prove it would have ended in defeat [23:35:30] like, can we port machines but keep php5.3 for now? I would have gone for that, for example. [23:35:57] but isn't the lack of php5.3 tests the whole issue? [23:36:05] not that it would have helped, but just to demonstrate how seriously I take carving out incremental steps [23:36:49] yeah it's a cost/benefit balance, i have no idea how hard getting 5.3 running on jessie is [23:36:56] in a way that doesn't involve trusting questionable packages [23:37:09] "can't upgrade until new code to replace jenkins is feature-complete" is a *really* bad place to be [23:37:15] but luckily, it's a good time for it [23:38:00] as much as it can be [23:38:10] Plus, this project is a lot more fun than twiddling undocumented PayPal params [23:38:19] i'm mostly sorry that i forgot i was going on vacation [23:38:34] wat that part was fine [23:39:10] hehe "mission-oriented" ~= "guilt-ridden" [23:39:23] heh [23:39:26] basically [23:41:04] (CR) Awight: [C: 2] Show jobs with invalid configuration in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 (owner: Ejegg) [23:42:02] (Merged) jenkins-bot: Show jobs with invalid configuration in --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345258 (owner: Ejegg) [23:42:07] i hate to keep defending myself but i also think jenkins has been buggy af lately and what upgrading 6 years of java at once might have caused is a big unknown [23:43:19] This is the hit of the century [23:43:43] That jenkins guy has been parking on my lawn for years [23:44:19] heh [23:48:52] I fired up certbot yesterday--worked nicely, but I'm sad about the lacking puppet support [23:49:26] certbot? [23:49:27] Beats the pants off of openssl -arbitraryflags and paying a ton of cash, though [23:49:35] yah that's the newest letsencrypt frontend [23:49:41] oh nice [23:49:43] https://certbot.eff.org/ [23:49:43] i have a cron [23:50:00] very nice [23:50:53] this is how we can make this irrelevant: http://www.cbsnews.com/news/internet-privacy-bill-vote-coming-in-the-house/ [23:51:20] omg I was hearing about that [23:51:27] reckless maniacs [23:51:37] selling us out for subpennies [23:51:41] sadly like not even on the radar right now [23:52:01] https://www.change.org/p/al-franken-give-spiders-all-the-help-they-need-to-eat-every-human-on-earth-within-one-year?recruiter=701699165 [23:53:25] hair-raising! [23:54:50] hey - Leanne has raised an issue that if the contact id already exists the email is not added - this is in benevity import -but it happens in the main wmf_civicrm_contribution_message_import routine. [23:54:51] if ( !$msg['contact_id'] ) { [23:54:52] wmf_civicrm_message_create_contact($msg); [23:54:52] } [23:54:52] else { [23:54:52] if (isset($msg['employer_id'])) { [23:54:52] civicrm_api3('Contact', 'create', array('contact_id' => $msg['contact_id'],'employer_id' => $msg['employer_id'])); [23:54:52] } [23:54:53] // We have set the bar for invoking a location update fairly high here - ie state, [23:54:53] // city or postal_code is not enough, as historically this update has not occurred at [23:54:54] // all & introducing it this conservatively feels like a safe strategy. [23:54:54] if (!empty($msg['street_address'])) { [23:54:55] wmf_civicrm_message_location_update($msg, array('id' => $msg['contact_id'])); [23:54:55] } [23:54:56] } [23:55:44] I could add it in there - like with location update - I guess it won't impact the big processing jobs as they won't have contact_id set.... [23:57:37] eileen: +1, nice edge case [23:57:49] I'm not sure where else we're importing a contact_id [23:58:13] Probably manually keyed Engage matching gifts, but those wouldn't be adding an email ddy [23:58:20] no… perhaps recurrings? [23:58:51] huh. yeah, recurrings might be trying to update the email address