[00:00:09] cwd: don't feel like you have to work late to sync with my lazy scheule... [00:00:26] (CR) Ejegg: [C: 1] "Approving awight's improvements. cwd want to give it a final look?" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 (owner: Ejegg) [00:02:33] Heh barleynet [00:02:47] vs. maiznet [00:03:26] awight: yeah i'm gonna head out soon but i think i can get a new .deb up [00:05:24] is barleynet gluten free? [00:06:59] heheh [00:07:18] going to watch the NCAA finals in a bit [00:07:19] depends if there's a glut of bits in the pipes [00:07:48] If the fiberoptics doesn't get cleaned regularly they do get a bit gluteny [00:09:59] * awight sees where this is going [00:11:04] somewhere gooey, methinks [00:16:25] (PS2) Awight: Document hardcoded /etc path [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345654 [00:16:49] (CR) Awight: [C: 2] "Self-merging docs" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345654 (owner: Awight) [00:17:13] (Merged) jenkins-bot: Document hardcoded /etc path [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345654 (owner: Awight) [00:17:26] (PS2) Awight: Short -l flag for --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345480 [00:20:25] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3152748 (awight) [00:21:25] * awight is excited to have a shiny new Jenkins-killing machine soon [00:28:13] argh got a broken version in my local reprepro [00:28:38] ooh no [00:28:39] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3152753 (awight) [00:28:42] in what way? [00:28:52] & you definitely had the "force pythonic build"? [00:28:56] ah i built it with the wrong code [00:29:08] whew! [00:29:09] because of rsync argument order [00:29:28] i think i managed to delete it [00:29:38] what a labor intensive process [00:29:55] when you get there, lmk if you are building 16b98706f9132e1995ce701b1a09987890544451 into 1.0.3? [00:30:22] in case you need it on hand: https://gerrit.wikimedia.org/r/wikimedia/fundraising/process-control [00:30:40] Finding s*t in Gerrit is the unpleasant [00:31:00] awight: which commit is that? not on master? [00:31:31] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3152754 (awight) [00:31:52] oh? [00:32:18] oops, my checkout was blocked [00:32:38] well i removed the package but it still won't re-add [00:32:42] That one: da7eb9b95ce598b28655e6f35064c0107676f07e [00:32:50] "Document hardcoded /etc path" [00:33:03] not re-adding on production or VM? [00:33:06] it was not in there [00:33:13] but i'm going to have to stall on this anyway [00:33:15] on the vm [00:33:22] but that's where i make changes to the repo [00:33:30] then push them to live if ok [00:34:07] i need to figure out how to purge the package from the repo [00:34:12] Can you push frack puppet changes to a branch? [00:34:28] no changes to puppet yet [00:34:31] just the packages repo [00:34:39] ohh that is beyond me [00:34:55] you bumped the version? [00:37:06] yeah [00:37:13] but the new version has the old code (my fault) [00:37:28] and i don't know how to get it out of the repo [00:37:33] double bump like a camel [00:37:38] i tried the obvious remove command but it didn't take [00:38:15] Isn't it fine to leave the broken version if we push a newer one? [00:38:32] i don't know [00:38:39] it doesn't sound like something jeff would green light [00:38:59] hehe, I love that you have this conscience figure [00:39:14] heh [00:39:18] OKOK I think you're right 8) [00:39:18] i'm sure it's simple to remove, i just need the incantation [00:39:33] BTW, T161569 [00:39:34] T161569: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569 [00:40:21] We seem to have just two more steps, * cron-generate (giant step), and * verify all features work on production [00:41:29] cwd: In the future, how would you like me to publish frack-puppet changes for you to review? [00:41:38] format-patch? or is there an internal gerrit? [00:41:54] Jeff_Green always talks about wanting to do that, but I don't think it will happen cos Java [00:41:57] there is no gerrit, patch is fine but something more robust would be nice [00:42:07] a la diff -u? [00:42:17] there is a git format-patch [00:42:19] I forget if I have push rights... lemme see [00:42:31] right kk. I like that one, too. [00:43:08] I guess you can have arbitrary comments preceding a diff, which is so cool in a file format [00:44:14] yeah diff/patch is a treat [00:46:46] You know if there's something like "git -p" for raw patch files? [00:47:15] I wish git could do merge markers, but I guess that would require knowing the parent rev. [00:47:55] Looks like "git am" has an interactive mode, maybe that goes down to chunk granularity? [00:48:30] hey i think i got it to reinstall [00:50:00] DOn't miss the game! [00:50:37] eh dani can tell me what happened [00:50:54] if this works i don't mind sticking around, it would be a huge weight off to get this working [00:51:30] Or just check in in two hours :) [00:51:47] heh well the new code built an empty package [00:52:04] oh hell no [00:52:14] which commit, so I can try locally? [00:52:37] awight: 7792f501bbb19350543f81b9717afa09c96e280a [00:52:49] you were able to replicate last time right? [00:53:21] yeah totally [00:53:25] it was the Makefile, that time [00:53:38] feel free to remove if it's faster than waiting for me to replicate [00:53:41] ok well maybe i will go watch the game if you wanna give that a shot [00:54:02] i can check in later and build again if you figure it out [00:54:03] So I use "make deb" to build locally [00:54:05] it's a pretty rote process [00:54:13] yeah we can go async no problem [00:54:34] I get all the build files. [00:54:40] ok thanks, hope it wasn't my mistake [00:54:54] but if it was that's probably a good sign to step away for awhile :) [00:54:55] I have usr/bin/ usr/lib, all that [00:55:05] yes, good argument! [00:55:08] see ya [00:55:17] /reboot [00:55:27] later! [01:05:28] Fundraising-Backlog, MediaWiki-extensions-CentralNotice, Patch-For-Review: CentralNotice: Remove unused code for banner preview in banner editor - https://phabricator.wikimedia.org/T161907#3146941 (awight) Yes, thank you! Any future work on banner previews will probably be to add language and countr... [01:07:23] (CR) Awight: [C: 2] "Looking forward to the antipatch that removes this cruft, nice find." (1 comment) [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345862 (https://phabricator.wikimedia.org/T161907) (owner: AndyRussG) [01:09:22] (CR) Awight: [C: 2] Add FIXME comments in centralnotice.js (1 comment) [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345864 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:10:33] oh, dear. And when I try to download the patch with real substance, Gerrit 404s [01:10:43] (Merged) jenkins-bot: Comments flagging possibly unused code for inline banner previews [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345862 (https://phabricator.wikimedia.org/T161907) (owner: AndyRussG) [01:11:35] awight: thanks! [01:12:10] AndyRussG: mysterious forces are preventing me from git review -d the real patches tho [01:12:20] * awight ouija's the patch [01:13:55] (Merged) jenkins-bot: Add FIXME comments in centralnotice.js [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345864 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:14:52] Too much gluten, I guess [01:21:41] (CR) Awight: Admin UI: Consolidate and refactor client-side bucket change handler (2 comments) [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345872 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:23:12] (CR) Awight: Admin UI: Consolidate and refactor client-side bucket change handler (1 comment) [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345872 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:30:20] (CR) Awight: [C: 2] "Works for me!" [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345872 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:32:00] (CR) Awight: [C: 2] ""looks" good" [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/346033 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:33:15] (Merged) jenkins-bot: Admin UI: Consolidate and refactor client-side bucket change handler [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/345872 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:47:54] (PS1) Awight: Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 [01:51:26] (CR) Awight: [C: 2] Admin UI campaign editor: Isolate and selectively load js/css [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/346034 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:51:37] (CR) Awight: [V: 1 C: 2] Admin UI campaign editor: Isolate and selectively load js/css [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/346034 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [01:54:47] (PS2) Awight: Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 [01:55:12] (CR) jerkins-bot: [V: -1] Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 (owner: Awight) [02:02:33] awight|afk: fun times! [02:24:43] Nice cleanups, though [02:33:45] (PS3) Awight: Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 [02:44:24] awight: I confess the attempt at clean was calling to me... [02:45:34] I have similar feelings every time I look at CN [02:45:49] I wanna "modernize" all the weird listy UI [02:46:02] Though... I've done enough damage in there already :) [02:47:45] awight: not at all! Also, I think yes it'll be modernity soon ;p [02:48:08] A few things are stopping me from doing so, luckily. [02:52:55] (PS1) Awight: Copy logging to stdout when run interactively [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346230 [12:30:04] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3153588 (Jgreen) [12:33:22] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3153594 (Jgreen) I checked-off "cron-generate can somehow write to /etc/cron.d/process-control" -- we have a sudo wrapper /usr/lo... [14:23:27] (CR) Cdentinger: [C: 2] Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 (owner: Ejegg) [14:40:27] (Merged) jenkins-bot: Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 (owner: Ejegg) [15:08:41] cwd thanks for the CR [15:08:56] is the deb packaging still frustrating? [15:11:44] quite [15:35:54] cwd I've got my Jessie box running, but I'm still a little unclear on the best way to build the package. Want to walk me through some commands, and I can see if I run into the same issues? [15:36:53] ejegg: i've almost got it, hopefully [15:36:59] removing the makefile helped [15:38:24] ejegg: "make deb" for the easy way (not like production) [15:41:32] cwd: That --build=pybuild stuff didn't do anything? [15:41:44] That was how I got past the Makefile thing [15:42:22] i also removed that [15:42:27] and the debian dir in the git repo [15:42:49] and copied a couple changes to the packages repo debian dirs [15:42:58] and removed a couple deps from the precise dir [15:42:58] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3154120 (awight) @Jgreen Great, thanks! I've verified that it works. [15:43:21] removed that? huh [15:44:18] The forked debian/ is kinda disturbing, hopefully we can rejoin those during cleanup [15:44:30] drat, needs 62MB of packages for devscripts & lintian. How else can I help with process-control while those are downloading? [15:44:51] hi all. ejegg I was wondering if we should make this high priority or higher: https://phabricator.wikimedia.org/T162094 [15:45:05] ah, I see more cr to do [15:45:11] hi dstring, looking [15:45:26] MBeat: fyi^^ [15:45:43] ty dstrine [15:45:45] going to try to build the live package now [15:45:45] ejegg: yeah that's just what I was gonna say. https://gerrit.wikimedia.org/r/#/c/346230/ and https://gerrit.wikimedia.org/r/#/c/346229/ would be rad. [15:45:50] I'll fix that merge conflict... [15:46:12] none of that cron-generate stuff is uuuuh making live cron jobs is it? [15:46:26] yessir I just made a live job :D [15:46:28] fail? [15:46:41] It's a harmless command, at least. [15:46:45] but */5 minutes [15:46:48] dstrine should be simple enough grep for those txns [15:47:54] (PS4) Awight: Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 [15:48:49] ejegg: so is this feeling like this is pointing to a bigger issue? do you think we should add it to the sprint? [15:49:14] my gut says to investigate [15:49:36] dstrine: certainly worth looking! [15:50:05] If there's any issue, I'd guess it's just that the audit parsing takes a long time during the NL campaign [15:50:39] because none of the iDEAL donations come in to Civi via the payments server [15:51:08] so the audit parser has to search for all of them in the logs [15:51:51] ok I added it to the sprint. MBeat please keep updating the task if you get more reports [15:51:56] will do [15:52:27] (PS1) Awight: Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 [15:52:39] cwd: ^ that one should go with the next package... [15:53:05] is it necessary for things to run? [15:53:15] i just got done with 2 hours of building one [15:53:24] omfg [15:53:30] That's terrible to hear [15:53:33] but no, it can wait [15:53:43] just about to build it on live [15:53:50] It's necessary for testing the cron generation but we can hold off [15:53:53] (CR) jerkins-bot: [V: -1] Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 (owner: Awight) [15:54:22] (PS2) Awight: Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 [15:55:41] My hope is that we'll be able to test through all the jobs today, and tomorrow we'll be ready to deploy crons... [15:56:12] that sounds lovely [15:56:25] ejegg: Not sure you saw this, but https://docs.google.com/a/wikimedia.org/spreadsheets/d/1qfXaBmhW45qSbFRqgJs_zpeEt6ZIo2hrZcBtRhVXS9w/edit?usp=drive_web [15:56:58] awight: new p-c available on barium [15:57:03] cwd: Remind me where you dumped the text-mangled specs from our old Jenkins jobs? [15:57:07] oh whoot! [15:57:39] cwd: was that 7792f501bbb19350543f81b9717afa09c96e280a ? [15:57:40] awight: see /srv/process-control-jobs on the deploy host [15:57:52] ty [15:57:53] that will probably go away but it can be a template for now [15:58:07] awight: a212cadcbcd1a0ff94b74f9895ccf4be6afe3d2c [15:58:08] I have to step out for an hour, but when I'm back we can make some serios progress [15:58:08] thanks awight [15:58:24] i stuck a couple more commits in the new one [15:58:33] gonna build for jessie now [15:58:45] ejegg: feel free to convert jobs, make localsettings with commented out schedules and test and stuff, if you have p-c time [15:59:27] cwd: oh great, that environment merge was a big one [16:00:16] darn, I can push tags to GH but not gerrit [16:01:24] awight: hopping on a plane back to high bandwidth in an hour, will be able to do more stuff on the servers there. Going to stick to CR for now! [16:17:17] (CR) Ejegg: [C: 2] Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 (owner: Awight) [16:17:49] (Merged) jenkins-bot: Protect against symlinks and ".." directory transversal [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346229 (owner: Awight) [16:28:02] (PS1) Awight: Cast environment variables to string [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346317 [16:29:23] YES. cwd ejegg I just confirmed that "command sequence" works on production [16:30:05] wicked [16:30:15] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3154223 (awight) [16:30:52] Fundraising Sprint Far Beer, fundraising-tech-ops, Epic: EPIC: build fundraising civicrm (barium) replacement server on Debian Jessie, with HHVM or PHP5.5 - https://phabricator.wikimedia.org/T136959#3154228 (awight) [16:30:54] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Write basic process-control, something good enough to run all CRM jobs. - https://phabricator.wikimedia.org/T161569#3135613 (awight) Open>Resolved a:awight Marking this task as done. Next step is to convert and test all the jobs. [16:31:44] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Migrate all jobs to process-control - https://phabricator.wikimedia.org/T162163#3154229 (awight) [16:31:49] awight: did we do the target_host thing? is there something that will prevent them from running on more than one server at a time? [16:33:18] cwd: haha yikes [16:33:22] no there's nothing about target host. [16:33:28] It'll run everywhere that we cron-generate [16:33:35] i think i was talking to Jeff_Green about that [16:33:44] it seems like we don't want any sort of concurrency at this point [16:33:48] Let's figure out what the job repo should look like. [16:34:08] right, no concurrency nor any host but one production server, for now [16:34:28] could we add target host to the job config file? [16:34:59] dstrine: Good news, it looks like we're finished with process-control MVP features, and on to testing with real jobs. [16:35:07] Jeff_Green: we could, if that makes sense [16:35:13] is that visible enough for easy maintenance? [16:35:25] but how would we deal with primary vs failover server? [16:35:30] could we only target one server for install for now? [16:35:34] in puppet? [16:35:51] awight i guess we'd have to tweak all the config files [16:36:06] it would be nice if we can use this on the dev/qa server too [16:36:16] Isn't that the sort of manual failover action we don't want, tho? [16:36:18] awight: ejegg cwd Jeff_Green Adam's update is great news. are we still headed to any downtime? Also when are we unfrozen for deploys like paypal work? [16:37:23] dstrine: i'm not sure if there were other things re. downtime, but we'll need some time for a couple cutovers related to switching out barium and also databases [16:38:01] dstrine: My thoughts at the moment are, * move all jobs off of Jenkins, on the old production server. No downtime expected. * Switch to the new server. Downtime. [16:38:15] yeah, what Jeff_Green said. [16:38:28] shouldn't we be able to switch the jobs to the new server without downtime? [16:38:30] We're unfrozen once we're migrated over to the new barium server. [16:38:33] awight: when we migrate jobs that do mail we need to watch carefully what happens re outbound mail [16:39:17] cwd: You might be right about that! [16:39:35] i need to double-check but I think barium has special firewall handling for outbound mail, i.e. NAT public IP [16:40:01] i'm sure there will be necessary downtime in the migration but i don't think for the jobs stuff [16:40:10] I don't want to rush any of you or micromanage but I would like to offer you some safetly/working space (which might look like planned downtime to stakeholders). [16:40:40] so if you would like that, we need to give notice to fundraising people [16:40:42] totally. And just asking the question is helpful, to catalyze discussion. [16:40:47] yar [16:41:35] dstrine: do you know if there are any existing gaps in the campaign schedule over the next week? [16:42:17] awight / cwd we could certainly have puppet populate a config variable in the global process-control config [16:42:26] cwd: So I'm getting greedy already. I realized that the steps I'll take as I test each job mostly overlap with the steps we would take to actually migrate. [16:43:22] * write p-c job config * disable Jenkins job * run p-c job and read diagnostics. [16:43:37] At that point, I either reenable Jenkins or let it stay on p-c [16:43:45] If cron-generate is working, then I can leave it... [16:43:48] yeah, i think we should just migrate one at a time [16:43:53] oh of course [16:44:17] my vote would be to get everything off jenkins on barium, and then cut over NAT to civi1001, and then enable jobs on civi1001 [16:44:34] at least, that ^^^ for any job that sends mail [16:44:35] Jeff_Green: +1, that's my plan. Minimum change at once [16:44:39] k [16:44:52] Jeff_Green: no real gaps in the campaign schedule for the next few weeks but I heard before that campaigns would not need to come down, only civi [16:45:06] What's on my mind is, while I'm disabling jobs and stuff, I could be *actually* migrating onto barium:process-control [16:45:22] awight: that makes sense [16:45:41] dstrine: ok [16:46:23] so if campaigns need to come down we need a much earlier heads up. Civi is easier to schedule [16:46:31] cwd: You might not like this, but what I'm suggesting is that we get a few more minor patches merged and roll another .deb, so I can migrate for real. [16:46:33] dstrine: that might still be the case. we do have a master database swap but I think we can do that with a short downtime, in the ballpark of an hour [16:47:06] meh, I'll just run through everything and get us ready to go without cron. [16:47:08] awight: should be fine, i'm getting a hang of it [16:47:15] why doesn't cron work? [16:47:32] a bug--I forgot the "--job" flag I was so attached to. [16:47:34] we need to solve the target_host thing somehow [16:47:39] ah gotcha [16:49:12] ideas: we could have some kind of group-name for jobs, like "job_group = civicrm" vs "job_group = dev", which lives in the job config files, and then a corresponding group name in the main config that is assigned by puppet [16:49:49] so if the group name from main conf matches a job file, it gets added to local cron [16:49:53] Jeff_Green: That's on our roadmap for sure, but we were thinking of having job groups for things like "touches the queue" so we could disable them en masse [16:50:33] Seems a little bit wrong to reuse the same concept for host selection... [16:51:06] well... [16:51:15] I think we should figure out the multi-host use cases before trying to solve the config file syntax. [16:51:21] agreed [16:51:40] Are we planning to use this for tool for all the non-CRM server jobs, eventually? e.g. listener host cron? [16:51:48] we have one host right now [16:51:56] immediate use case I can think of would be lutetium [16:52:04] so probably a one host solution is ok for first version? [16:52:05] cron=null, IMO [16:52:32] cwd: yes, one host is perfect IMO. AIUI, the question is how to tell civi2001 to not get too excited [16:53:05] also how to make jobs for lutetium that don't get installed on the civi servers? [16:53:29] Jeff_Green: It's probably sort of okay to install jobs, at long as we don't cron them? [16:53:32] can we just install on one server for now? [16:53:39] awight: but what if we want to cron them? [16:53:47] we don't, for now... [16:54:15] I agree that the day may come, but it seems well outside MVPland [16:55:11] yup, the question is whether we're boxing ourselves in to future pain by not thinking through how it will work when that day comes [16:56:22] sure--so we'll want to run cron jobs everywhere, I assume. They can either come from puppet if they're sysadminny, or from localsettings/process-control/ if devs should be able to change [16:57:03] right now any file in localsettings/process-control gets deployed everywhere [16:57:09] We could have a directory per server role, similar to how we've cloned the payments config [16:57:17] yup [16:57:30] I can't imagine any sane use case for server roles reusing jobs [16:57:41] even prod vs staging Civi [16:57:44] agreed [16:57:47] *especially* prod vs staging civi [16:58:00] cos we want elbow room to be complete experimental on staging [16:58:59] Sort of a cop out, but that setup would make the deployment easy: process-control-crm, process-control-crm-staging, process-control-listener, process-control-queue... [16:59:48] what about a jobs dir per host for now? [16:59:52] If we ever get to the point of having parallel job runners, I think it will be parallelized on a single machine, since nothing is CPU-bound [16:59:57] awight: maybe easier would be process-control/{arbitrary_text} and puppet sets /srv/process-control/{arbitrary-text} as the job conf dir [17:00:01] Jeff_Green: how ugly would that be in the blaster? [17:00:19] fr-tech: Working with Julie Andrews is like getting hit over the head with [17:00:19] a valentine. [17:00:19] -- Christopher Plummer [17:00:19] -- discuss. [17:00:32] lol thanks slander [17:01:01] Jeff_Green: that sounds good. And hosts only get files for their own role [17:01:26] we can do it by host, by role, by anything really--we just have to assign the right dir for the host in puppet [17:01:43] and if that's not populated, the whole thing should just laugh at you [17:02:04] for now hostname is good? [17:02:15] at this point i feel like hostname would be most useful [17:02:22] makes it easy to move around too [17:02:26] we are targeting one at a time for now [17:02:27] you just git rename the dir [17:02:30] yeah [17:02:51] then let's use short hostname [17:02:57] perfect [17:03:03] awight: sound good? [17:03:14] localsettings/process-contro/barium ? [17:03:18] yes [17:03:34] i'll tweak puppet [17:05:28] :D [17:06:31] awight: you wanna change localsettings? [17:06:37] i can make a new package whenever [17:06:47] does there need to be more stuff merged? [17:07:31] ok puppet change is deployed [17:08:22] cwd: sure thing, here goes [17:08:45] cwd: I do have a list of CR I'd like, if you're in the mood [17:09:16] https://gerrit.wikimedia.org/r/#/c/346317/ https://gerrit.wikimedia.org/r/#/c/346314/ https://gerrit.wikimedia.org/r/#/c/346230/ https://gerrit.wikimedia.org/r/#/c/345480/ [17:09:28] They're all pretty easy [17:09:57] runs good: IOError: [Errno 2] No such file or directory: '/srv/process-control/lutetium/smite.yaml' [17:10:25] * awight wonders if I broke it [17:10:37] whaddya mean? [17:11:07] IOError sounds "pythonic" [17:11:11] hah [17:11:33] it may not be the prettiest error message but the behavior seems right otherwise [17:11:37] self.flaggelate [17:12:05] I don't understand how you would get p-c to give that error. Unless perhaps you're using the 1.0.2 package? [17:12:19] That must be it. cos the jessie package isn't built yet. [17:12:24] ah that ~is~ what's here [17:12:29] lemme update lutetium [17:12:29] gotcha. [17:12:38] Cos the 1.0.3 package doesn't take arbitrary job paths [17:13:02] i didn't give it a path [17:13:13] i did --job smite [17:13:20] ookay [17:13:32] you can also --list-jobs [17:13:38] k [17:13:46] that will glob through the job_directory and try to parse all the files [17:14:20] (PS2) Cdentinger: Cast environment variables to string [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346317 (owner: Awight) [17:14:22] i'll try that once it's done updating [17:15:32] (CR) Cdentinger: [C: 2] Cast environment variables to string [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346317 (owner: Awight) [17:16:04] (Merged) jenkins-bot: Cast environment variables to string [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346317 (owner: Awight) [17:16:26] awight: fwiw it's the same error with 1.0.3 when you request a nonexistent job [17:17:10] (PS3) Cdentinger: Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 (owner: Awight) [17:17:13] certainly close enough, even if the error message is a bit busy it's clear what's happening [17:18:28] (CR) Cdentinger: [C: 2] Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 (owner: Awight) [17:18:40] Jeff_Green: yeah it makes sense to me now... and it does seem better than a hygenic "your job does not seem to be available [somewhere mysterious]" [17:18:53] (Merged) jenkins-bot: Fix CLI syntax for run-job in the generated crontab [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346314 (owner: Awight) [17:19:05] cwd: Thanks for the hustle. I'm going to start converting jobs [17:19:34] cool [17:19:45] i will build the new pkg as soon as these all get merged [17:19:46] There are an awful lot of them [17:19:47] http://farm4.staticflickr.com/3232/3049956413_abd1ffaa95_z.jpg?zz=1 [17:19:54] there are indeed a shitload [17:19:57] awight: is it ok if I deploy your changes to /srv/process-control? [17:20:04] i want to test this on lutetium [17:20:12] haha [17:20:15] that took me a second [17:20:37] Jeff_Green: sure, they're harmless [17:20:40] k [17:21:09] (PS2) Cdentinger: Copy logging to stdout when run interactively [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346230 (owner: Awight) [17:22:02] * awight chuckles at these 2-line changes--imagine what the equivalent Jenkins patch would take out of us [17:22:12] (CR) Cdentinger: [C: 2] Copy logging to stdout when run interactively [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346230 (owner: Awight) [17:22:14] weeks of data flow diagrams... [17:22:39] (Merged) jenkins-bot: Copy logging to stdout when run interactively [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346230 (owner: Awight) [17:23:16] (PS3) Cdentinger: Short -l flag for --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345480 (owner: Awight) [17:23:54] crud. [17:24:01] i have to fix mwdeploy too [17:28:54] (CR) Cdentinger: [C: 2] Short -l flag for --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345480 (owner: Awight) [17:29:30] (Merged) jenkins-bot: Short -l flag for --list-jobs [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345480 (owner: Awight) [17:29:47] ah i guess puppet needs to make /var/log/process-control [17:30:34] Jeff_Green: instead of the package? [17:30:42] the package doesn't seem to do it [17:31:06] i'm going to rebuild it momentarily [17:31:12] do we want to make the package do it? [17:31:30] it's ok for puppet to do it, that way we have better control of the ownership & access [17:31:47] cool [17:31:54] i'm going to build with the new code then [17:32:04] unless i should wait for anything? awight ? [17:32:34] cwd: oh rad, lemme take a quick look [17:32:40] should all me on master [17:33:04] cwd: that's perfect, you can put the rubber stamp away :p [17:33:21] Thanks btw for the initial jenkins->text conversion, that's making my life bearable today [17:33:41] great, np [17:33:51] * cwd zips up bunny suit [17:39:02] * awight cries wolf again [17:40:00] awight: need more stuff? [17:40:55] nope, just trying to find an appropriately carnivorous response [17:47:54] restorative justice by random process: [17:47:55] Enter the following 3-digit key to update. [ yh8 / n ]: yh8 [17:50:16] haha [18:12:36] I keep making the same typo... looks like run-job should accept [--job] as an optional flag. One positional parameter obviously means, run it. [18:16:07] i also forget -j a lot [18:19:54] Want to code that? [18:20:27] i can put it on the docket [18:20:31] building .4 right now [18:20:41] niice [18:20:46] sha-1? [18:20:52] err sha-* [18:20:59] * awight is pwned [18:22:37] latest master [18:22:47] after those 4 merges [18:22:54] eb57d7bacad3b12604fdb78b6e481c9f37b64d11 [18:25:42] git tag wmf-1.0.4 [18:25:48] git push gerrit wmf-1.0.4 wmf-1.0.4:refs/heads/wmf-1.0.4 [18:25:56] clearly :p [18:30:24] Fundraising-Backlog, FR-Astropay, FR-WMF-Audit: AstroPay audit downloader does a lot of downloading - https://phabricator.wikimedia.org/T162175#3154530 (awight) [18:31:31] Fundraising-Backlog, FR-Astropay, FR-WMF-Audit: AstroPay audit parser failing to chmod - https://phabricator.wikimedia.org/T162177#3154559 (awight) [18:52:27] cwd: So now that 1.0.4 is deployed, I can do actual migrations, incrementally. fr-tech: Any objections? [18:52:58] Specifically, to enable process-control versions of scheduled jobs, and disable jenkins ones [18:53:21] that's what i'm talking about [18:53:23] how can i help? [18:54:25] I think we're ready to do. I'll just drop the needle on the Banner History Queue Consumer in a few minutes. It's configured to run every 5 minutes [18:54:34] s/do/go|do so/ [18:55:24] sweet [18:55:25] !log disabled banner history queue consumer [18:55:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:43] * cwd crosses all the fingers [18:56:50] # Generated from /srv/process-control/barium/banner_history_queue_consume.yaml [18:56:53] 0-55/5 * * * * jenkins /usr/bin/run-job --job banner_history_queue_consume [18:57:04] Next run is 19:00 UTC [18:57:21] !log enabled pilot process-control job: banner history queue consumer [18:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:33] Oh, that's faily--I don't see process-control lines in syslog [18:58:52] We should see lines like "Running job CiviCRM cron run (civicrm_cron_run) [18:59:00] hmm [19:00:54] Pretty important, since that's the only place we log exactly what commandline was run [19:01:54] Can you check for jenkins mail? [19:02:19] one sec [19:03:06] Also, syslogs went out as facility "daemon" [19:03:10] Are those eaten? [19:03:47] i am not seeing anything in mail [19:06:25] The banner history job went off at 19:00 as planned, succeeded in doing things, and logs are in the right place :) [19:06:29] But I'm blocked by the lack of syslog [19:09:56] cwd: Is it easy to grab the timeout from jenkins*xml? [19:10:01] If not, I can mouse [19:10:50] probably [19:10:53] one se [19:10:55] c [19:13:08] Verified that the sync blaster removes deprecated jobs :D [19:16:38] fundraising-tech-ops, Operations, ops-eqiad, Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3154837 (Cmjohnson) Set the raid cfg to raid 10 [19:17:44] (PS18) AndyRussG: [WIP] Custom campaign mixin param handlers [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/343953 (https://phabricator.wikimedia.org/T144453) [19:24:55] (PS1) Awight: Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 [19:46:39] (CR) jerkins-bot: [V: -1] Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 (owner: Awight) [19:48:59] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Format syslogging so that process-control lines can be bucketed. - https://phabricator.wikimedia.org/T162189#3154932 (awight) [19:50:59] cwd: Jeff_Green: ^ there's the TODO fyi [19:51:29] ok [19:52:06] bucketing on the central log collector is not hard [19:52:39] i suggest we avoid getting too involved with bucketing on the syslog host [19:52:45] i mean on the civi host [20:25:27] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Fetch CiviMail Bounces job is broken - https://phabricator.wikimedia.org/T162192#3155031 (awight) [20:28:12] If anyone else wants to tackle the formatter, I can help suggest things. Currently paddling upstream against this wall of jobs to migrate [20:30:18] cwd: I'm getting permissions errors for some jobs, see the status matrix [20:31:07] cool [20:31:52] k, I'm back in the big city with 3 whole Gs of connection strength [20:32:16] owow [20:32:20] taking a look at the syslogging ticket [20:32:22] ejegg: see the migration matrix... [20:32:31] ejegg: Syslogging is just simmering, fyi [20:32:56] We got the lines going to syslog, now it's just missing some logging: formatter: fmt: work in the puppetized global config [20:32:59] aww hell yeah, look at all those ys in the 'tested ok' column [20:32:59] We got the lines going to syslog, now it's just missing some logging: formatter: fmt: work in the puppetized global config [20:33:03] oops [20:33:11] 8D [20:37:01] ejegg: We have a pilot job running under cron:run-job, too! [20:37:07] nick awight [20:37:14] sweet! [20:38:17] fundraising-tech-ops: replace db1025 with new hardware running jessie - https://phabricator.wikimedia.org/T145107#3155106 (Jgreen) [20:38:39] fundraising-tech-ops: replace lutetium with new hardware running debian/jessie - https://phabricator.wikimedia.org/T145110#3155109 (Jgreen) [20:38:42] fundraising-tech-ops, Operations, ops-eqiad, Patch-For-Review: rack and cable frdev1001 - https://phabricator.wikimedia.org/T159887#3155107 (Jgreen) Open>Resolved looks good, host is imaged and up! [20:42:26] ejegg: oh hey this is a quick fix: //gerrit.wikimedia.org/r/346344 [20:42:34] curses [20:42:40] https://gerrit.wikimedia.org/r/346344 [20:42:52] heh, thanks for making it clicky [20:43:20] Pretty annoying regression in deb stretch, that ":" is no longer a word character [20:43:25] in gnome-terminal [20:43:50] weird, who ordered that? [20:44:16] I'd like a supersize unusable burger [20:45:51] oh, test just needs .001 or somesuch [20:46:47] I think so [20:46:52] ty! [21:08:11] Fundraising-Backlog, MediaWiki-extensions-CentralNotice: CentralNotice: Remove unused code for banner preview in banner editor - https://phabricator.wikimedia.org/T161906#3155172 (AndyRussG) [21:08:14] Fundraising-Backlog, MediaWiki-extensions-CentralNotice, Patch-For-Review: CentralNotice: Remove unused code for banner preview in banner editor - https://phabricator.wikimedia.org/T161907#3155174 (AndyRussG) [21:08:15] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Fetch CiviMail Bounces job is broken - https://phabricator.wikimedia.org/T162192#3155175 (ggellerman) p:Triage>Normal [21:10:13] Fundraising-Backlog, FR-Astropay, FR-WMF-Audit: AstroPay audit parser failing to chmod - https://phabricator.wikimedia.org/T162177#3155183 (ggellerman) p:Triage>High [21:11:11] Fundraising-Backlog, FR-Astropay, FR-WMF-Audit: AstroPay audit parser failing to chmod - https://phabricator.wikimedia.org/T162177#3154559 (ggellerman) p:High>Normal [21:12:08] Fundraising-Backlog: Silverpop import processed twice - https://phabricator.wikimedia.org/T162197#3155188 (CCogdill_WMF) [21:12:43] Fundraising-Backlog, FR-Ingenico, FR-WMF-Audit: Ingenico WR1 parser needs its working files dirs cleaned - https://phabricator.wikimedia.org/T162195#3155200 (ggellerman) p:Triage>Normal [21:15:44] Fundraising-Backlog: Silverpop import processed twice - https://phabricator.wikimedia.org/T162197#3155215 (ggellerman) p:Low>Normal [21:17:04] Fundraising-Backlog, FR-Astropay, FR-WMF-Audit: AstroPay audit downloader does a lot of downloading - https://phabricator.wikimedia.org/T162175#3155218 (ggellerman) p:Triage>Low [21:18:52] Fundraising-Backlog: process-control package sets up service user and directory permissions - https://phabricator.wikimedia.org/T162093#3155225 (ggellerman) p:Triage>Low [21:20:46] Fundraising-Backlog: process-conotrol jobs should target a host - https://phabricator.wikimedia.org/T161931#3155230 (ggellerman) p:Triage>Normal [21:22:39] Fundraising-Backlog, FR-Ingenico: SmashPig failing with Ingenico "NO DIRECTORY FOUND" - https://phabricator.wikimedia.org/T162087#3155237 (Ejegg) [21:22:43] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, FR-Smashpig, MediaWiki-extensions-DonationInterface, and 2 others: iDEAL bank lookup should also cache failures - https://phabricator.wikimedia.org/T161072#3155239 (Ejegg) [21:23:12] Fundraising-Backlog, FR-Ingenico: SmashPig failing with Ingenico "NO DIRECTORY FOUND" - https://phabricator.wikimedia.org/T162087#3152043 (Ejegg) Fix here: https://gerrit.wikimedia.org/r/345790 [21:23:29] Fundraising-Backlog, MediaWiki-extensions-CentralNotice, Patch-For-Review: CentralNotice: Remove unused code for banner preview in banner editor - https://phabricator.wikimedia.org/T161907#3155241 (ggellerman) p:Triage>Normal [21:24:56] Fundraising-Backlog: process-control should handle bad YAML syntax - https://phabricator.wikimedia.org/T161858#3155244 (ggellerman) p:Triage>Low [21:25:17] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, FR-Ingenico: SmashPig failing with Ingenico "NO DIRECTORY FOUND" - https://phabricator.wikimedia.org/T162087#3155247 (Ejegg) a:Ejegg [21:25:28] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, FR-Ingenico: SmashPig failing with Ingenico "NO DIRECTORY FOUND" - https://phabricator.wikimedia.org/T162087#3152043 (Ejegg) p:Triage>Normal [21:27:29] Fundraising-Backlog: Silverpop import processed twice - https://phabricator.wikimedia.org/T162197#3155188 (Ejegg) @CCogdill_WMF , I thought this job was just scheduled for once a day. Or is it triggered by our upload? Has Trilogy made any changes to the import lately? [21:27:43] !log Finished migrating Fundraising jobs to process-controlb [21:27:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:27:55] Fundraising-Backlog: Develop shared understanding of definition of Tech Debt - https://phabricator.wikimedia.org/T161817#3155264 (ggellerman) p:Triage>Normal [21:28:51] Fundraising-Backlog: Send interview answers to LegoKTM - https://phabricator.wikimedia.org/T161770#3155268 (ggellerman) p:Triage>Normal [21:29:20] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Migrate all jobs to process-control - https://phabricator.wikimedia.org/T162163#3155273 (awight) [21:29:53] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Migrate all jobs to process-control - https://phabricator.wikimedia.org/T162163#3154229 (awight) Done, now we just watch for fallout. Please monitor all job logs for a few days. [21:31:14] Fundraising-Backlog: process-control should handle bad YAML syntax - https://phabricator.wikimedia.org/T161858#3155281 (awight) Open>Resolved p:Low>Normal [21:31:41] Fundraising-Backlog: process-control should handle bad YAML syntax - https://phabricator.wikimedia.org/T161858#3145252 (awight) eh reverting my change [21:32:29] Fundraising-Backlog: process-control should handle bad YAML syntax - https://phabricator.wikimedia.org/T161858#3155299 (awight) Resolved>Open p:Normal>Low One part of this is done: `run-job --list-jobs` will recover from some syntax issues and will help you find the bad file. We could still... [21:33:53] * awight stares at the slain dragon and shakes his head [21:34:04] If only you hadn't been made of pure Java. [21:34:56] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Fundraising is running a duplicate cron job - https://phabricator.wikimedia.org/T161668#3155329 (ggellerman) p:Triage>Normal [21:35:20] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Fundraising is running a duplicate cron job - https://phabricator.wikimedia.org/T161668#3155332 (awight) Open>Resolved a:awight Cleaned this up as part of the migration. [21:35:30] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3155335 (ggellerman) p:Triage>Normal [21:51:06] Fundraising-Backlog: Silverpop import processed twice - https://phabricator.wikimedia.org/T162197#3155372 (CCogdill_WMF) I believe Silverpop will process any new file on the ftp server in the 3am-4am window, so it ran both files. No recent changes have been made the import; the version we're running has been... [23:07:57] Fundraising-Backlog: error opening Annual Report in Safari Version 9.1.1 (10601.6.17) - https://phabricator.wikimedia.org/T159389#3155621 (MBeat33) Open>Resolved Not much traction on this, so presumably not a noteworthy thing - Resolving. [23:10:14] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Format syslogging so that process-control lines can be bucketed. - https://phabricator.wikimedia.org/T162189#3154932 (Ejegg) The Formatter objects needs to be instantiated from a config file to fit with e... [23:11:42] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Format syslogging so that process-control lines can be bucketed. - https://phabricator.wikimedia.org/T162189#3155632 (awight) @Ejegg IMO we should skip the job name for now. The process-control main log... [23:12:54] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops: Format syslogging so that process-control lines can be bucketed. - https://phabricator.wikimedia.org/T162189#3155634 (Ejegg) k, pretty sure that'll just be an update to process-control.yaml (.example) [23:13:09] ejegg: ^ I think that's exactly it [23:14:40] I'll try to do the timeout -> minutes business [23:17:25] rockin [23:18:03] syslog seems to have an out-of band way to indicate log level, right? [23:18:26] and no need to include date or machine name [23:18:51] I think the formatter has access to the levelname as a template parameter? [23:19:31] it does, but somehow Jeff_Green sorts 'em into .error, etc, without that being in the files [23:19:40] errr, lemme double check that [23:19:46] ejegg: Loglines currently look land in syslog as: Apr 4 23:17:01 HOST Running job Thank you mail send (thank_you_mail_send) [23:20:04] ^look^ [23:20:13] Yep, I think syslog does the date and hostname [23:20:21] ejegg: awight I'm trying not to put this on you at the moment but Leanne just confirmed she needs this fix by early next week https://gerrit.wikimedia.org/r/#/c/345268/ [23:20:53] eileen: oh, I'd looked at that once but got distracted [23:21:05] I'll get back to it after I do one little logging thing! [23:21:52] awight: somehow payments.error doesn't have any textual indication that the messages are at error level [23:22:40] ejegg: rsyslog.d conf can detect the loglevel without needing text to bucket [23:22:51] great, will omit that parameter [23:23:05] That'll be in the central logging config... somewhere [23:23:24] (PS2) Awight: Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 [23:24:02] (PS3) Awight: Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 [23:27:14] (PS1) Ejegg: Add run-job[pid] to example syslog format [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346479 (https://phabricator.wikimedia.org/T162189) [23:29:25] (CR) Ejegg: [C: 2] Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 (owner: Awight) [23:30:16] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, fundraising-tech-ops, Patch-For-Review: Format syslogging so that process-control lines can be bucketed. - https://phabricator.wikimedia.org/T162189#3154932 (Ejegg) a:Ejegg [23:30:20] (CR) Awight: [V: 1 C: 2] "That works!" [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346479 (https://phabricator.wikimedia.org/T162189) (owner: Ejegg) [23:32:09] (Merged) jenkins-bot: Timeout should be given in minutes [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346344 (owner: Awight) [23:32:13] (Merged) jenkins-bot: Add run-job[pid] to example syslog format [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346479 (https://phabricator.wikimedia.org/T162189) (owner: Ejegg) [23:33:12] ejegg: that gerrit is a this-week-thing not a today-thing [23:33:50] thanks eileen... looking anyway, so I don't forget again [23:36:44] (PS1) Awight: Syslogs point at syslogd [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346480 [23:37:10] ejegg: I'll do a puppet patch with that [23:38:13] awight: ah, remind me where that repo is? [23:38:40] (PS2) Ejegg: Syslogs point at syslogd [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346480 (owner: Awight) [23:39:48] (CR) Ejegg: [C: 2] Syslogs point at syslogd [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346480 (owner: Awight) [23:42:55] (Merged) jenkins-bot: Syslogs point at syslogd [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/346480 (owner: Awight) [23:44:32] (CR) Ejegg: [C: -1] "One concern about the wmf_civicrm change" (2 comments) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/345268 (https://phabricator.wikimedia.org/T161666) (owner: Eileen) [23:45:49] ejegg re. syslog sorting into errors logs, that's based on syslog severity [23:46:08] rockin [23:46:13] easy to set severity by message with the python logger module [23:46:57] yep, looks like we're doing that. I set the default formatter to just be run-job[$pid]: $message [23:46:57] Fundraising-Backlog: process-control failmail should include hostname - https://phabricator.wikimedia.org/T162214#3155815 (awight) [23:47:14] since it looks like syslog is also taking care of machine name and date [23:51:22] Fundraising-Backlog: process-control failmail should include hostname - https://phabricator.wikimedia.org/T162214#3155834 (awight)