[00:06:23] (PS2) Awight: Force a pythonic build, ignoring the Makefile [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345479 [00:06:34] cwd: ^ improved, now with more graphite powder [00:07:08] oh god is space different from equals there? [00:08:22] awight: srs question ^ [00:08:59] cwd: no, just a quixotic striving for consistency [00:09:13] * cwd on board [00:09:19] (PS3) Awight: Force a pythonic build, ignoring the Makefile [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345479 [00:09:21] (PS1) Awight: "make clean" target [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345482 [00:09:23] (PS1) Awight: Remove unnecessary build dependency [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345483 [00:10:23] cwd: seeing that line with both space and equals in all the official docs makes it clear that this whole duct tape boat is built by people eating copypasta with minimal understanding of what's actually happening [00:10:42] I wanted to pretend I know what was going on, by fixing a superficial typo. [00:10:52] i'm pretty sure that's the only way a build system of this complexity can possibly exist [00:10:56] cargo cult af [00:11:44] My people have always turned the beer can tab pi/2 radians clockwise before deploying [00:12:02] seems reasonable [00:12:10] I'm sure you heard the anecdote about the Pythagorean who proved that sqrt(2) is irrational? [00:12:32] tell me [00:12:43] Pythagorean [00:12:45] pretty good [00:13:58] (I'm checking for urban legend status) [00:15:26] i heard about that too, but i forget the proof [00:16:26] There's an easy proof by contradiction, let sqrt(2)=m/n [00:17:46] cwd: https://en.wikipedia.org/wiki/Hippasus [00:18:25] The Pythagoreans were obsessed with perfection and supposedly got heated enough about sqrt(2) to toss this guy into the ocean [00:18:56] * ejegg tosses self overboard [00:19:17] i thought pythagorean was a clever word for a python programmer [00:19:19] People were also claiming to discover pyramids within pyramids guarded by minotaurs at the time tho [00:19:24] i leave this in your capable hands, gentlemen. Don't stay up late! [00:19:29] pythonoborean [00:19:44] I am resigning too [00:20:00] See you tomorrow! [00:20:21] see ya! [00:26:21] sqrt(2) = R = m/n, in simplest form where m and n are integers with no common factors [00:26:24] R^2 = 2 = m^2/n^2 [00:26:27] m^2 = 2n^2, therefore m is even [00:26:29] if m is even then letting M=m/2, [00:26:31] 4*M^2=2n^2, [00:26:34] 2M^2=n^2, so n is also even [00:26:37] m and n were both divisible by 2 so that's a contradiction [00:38:45] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: retrieve the text/ html and statistics data for mail we have sent - https://phabricator.wikimedia.org/T161758#3142597 (Eileenmcnaughton) a:Eileenmcnaughton [01:58:11] Fundraising Sprint Deferential Equations, Fundraising Sprint English Cuisine, Fundraising Sprint Far Beer, Fundraising Sprint Gondwanaland Reunification Engine, and 6 others: Mediawiki namespace pages, including CentralNotice banners, are slow to sa... - https://phabricator.wikimedia.org/T158084#3142668 [03:15:37] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: retrieve the text/ html and statistics data for mail we have sent - https://phabricator.wikimedia.org/T161758#3142724 (Eileenmcnaughton) So to articulate my thinking here - these ar... [03:16:40] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Email: retrieve the text/ html and statistics data for mail we have sent - https://phabricator.wikimedia.org/T161758#3142725 (Eileenmcnaughton) @awight @ejegg big brain dump above - commen... [13:23:24] (Restored) Hashar: Jenkins job validation (DO NOT SUBMIT) [wikimedia/fundraising/dash] - https://gerrit.wikimedia.org/r/132341 (owner: Hashar) [13:23:36] (CR) Hashar: "check experimental" [wikimedia/fundraising/dash] - https://gerrit.wikimedia.org/r/132341 (owner: Hashar) [13:46:18] (PS1) Ejegg: Update smashpig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/345565 [13:46:32] (CR) Ejegg: [C: 2] Update smashpig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/345565 (owner: Ejegg) [13:48:29] (PS1) Ejegg: Update smashpig [extensions/DonationInterface/vendor] - https://gerrit.wikimedia.org/r/345566 [13:48:36] (CR) Ejegg: [C: 2] Update smashpig [extensions/DonationInterface/vendor] - https://gerrit.wikimedia.org/r/345566 (owner: Ejegg) [13:56:51] (Merged) jenkins-bot: Update smashpig [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/345565 (owner: Ejegg) [14:21:22] Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Upgrade our CiviCRM server php version to 5.5+ - https://phabricator.wikimedia.org/T158717#3143616 (cwdent) [14:21:25] Fundraising Sprint Far Beer, fundraising-tech-ops, Epic: EPIC: build fundraising civicrm (barium) replacement server on Debian Jessie, with HHVM or PHP5.5 - https://phabricator.wikimedia.org/T136959#3143615 (cwdent) [16:11:35] Fundraising-Backlog, Analytics: Storage for banner history data - https://phabricator.wikimedia.org/T161635#3143898 (DStrine) We are checking some points with legal here: T161656 [16:26:16] awight: sorry yeah didn't push to packages yet [16:26:18] one sec [16:30:52] awight: you should see it now [16:37:06] Looks good at a distance. [16:37:30] Once we're off of precise, I think the pybuild flag will work again [16:37:58] i.e., that workaround is all we need for now, I guess [16:38:12] this "it's so nice to be off precise" thread is making me depressed [16:38:58] awight: do you know what the convention for upgrading stuff in puppet is? [16:39:14] i feel like setting to absent then back again is a lot of git noise [16:40:08] cwd: IMO we should jump in there and confess to having a few Precises up our sleeves [16:40:20] ha HA [16:40:23] i was considering that [16:40:35] but i didn't want to bring everybody down [16:40:57] cwd: Have you tried ensure => latest? [16:41:07] that should probably work [16:41:12] if we are comfortable with that [16:41:22] The docs say it's a thing, at least. But it was failing me in a personal project. [16:41:53] interesting [16:41:57] i'll try in vb [16:42:02] We could also pin the version, if we get desperate [16:42:33] yeah [16:42:39] i couldn't find an example of that in puppet [16:42:57] Yeah it feels dirty [16:44:16] (Abandoned) Awight: [WIP] Change meaning of validation variables [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/280141 (https://phabricator.wikimedia.org/T98447) (owner: Awight) [16:45:30] (Abandoned) Awight: [WIP] Remove old staging code [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/277989 (owner: Awight) [16:45:54] (Abandoned) Awight: [WIP] Kafka backend [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/280158 (https://phabricator.wikimedia.org/T131269) (owner: Awight) [16:46:38] (Abandoned) Awight: Integrate with PHP-Queue [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/284597 (https://phabricator.wikimedia.org/T131271) (owner: Awight) [16:47:14] (PS3) Awight: Remove "too many banners" warning [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/247785 (https://phabricator.wikimedia.org/T109714) [16:48:20] (Abandoned) Awight: [WIP] Terrible lint:yaml glue [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/296766 (owner: Awight) [16:48:24] (Abandoned) Awight: [WIP] Remove RapidHTML [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/299263 (owner: Awight) [16:48:41] (Abandoned) Awight: [WIP] Move RapidHtml inline loader statements to the server side [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/297808 (owner: Awight) [16:49:14] (PS3) Awight: Relax private access control [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/300906 [16:49:24] (Abandoned) Awight: Dependencies += SmashPig [extensions/DonationInterface/vendor] - https://gerrit.wikimedia.org/r/301010 (owner: Awight) [16:50:50] (Abandoned) Awight: Update composer libs [wikimedia/fundraising/SmashPig] (deployment) - https://gerrit.wikimedia.org/r/310904 (owner: Awight) [16:50:52] (Abandoned) Awight: Revert "CiviCRM submodule update" [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/310143 (owner: Awight) [16:50:56] (Abandoned) Awight: Include a null default for $wgStompServer [extensions/FundraisingEmailUnsubscribe] - https://gerrit.wikimedia.org/r/310040 (owner: Awight) [16:51:00] (Abandoned) Awight: Update PHPMailer per T154209 [wikimedia/fundraising/SmashPig/vendor] - https://gerrit.wikimedia.org/r/329377 (owner: Awight) [16:51:02] (Abandoned) Awight: Update PHPMailer per T154209 [wikimedia/fundraising/SmashPig] (deployment) - https://gerrit.wikimedia.org/r/329378 (owner: Awight) [16:51:53] (CR) Awight: "fr-tech: bump, I don't want this repo to die." [wikimedia/fundraising/stats] - https://gerrit.wikimedia.org/r/324774 (owner: Awight) [16:52:53] Fundraising-Analysis, Fundraising-Backlog: Create new git repository for fundraising stats tools - https://phabricator.wikimedia.org/T151982#3144155 (awight) Open>Resolved [16:53:47] Fundraising-Backlog: Develop shared understanding of definition of Tech Debt - https://phabricator.wikimedia.org/T161817#3144156 (ggellerman) [16:55:03] Fundraising-Analysis, Fundraising-Backlog: Shepherd wikimedia/fundraising/stats repo a bit - https://phabricator.wikimedia.org/T161818#3144171 (awight) [16:55:14] Fundraising-Analysis, Fundraising-Backlog: Shepherd wikimedia/fundraising/stats repo towards usefulness - https://phabricator.wikimedia.org/T161818#3144184 (awight) [16:56:05] (PS6) Awight: Form should not validate if manual errors are present [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/341592 (https://phabricator.wikimedia.org/T98447) [16:56:19] (PS6) Awight: Remove deprecated function [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/341721 [17:00:24] Fundraising-Backlog: Develop shared understanding of definition of Tech Debt - https://phabricator.wikimedia.org/T161817#3144217 (ggellerman) a:ggellerman [17:00:44] fr-tech: The ripest fruit falls first. [17:00:44] -- William Shakespeare, "Richard II" [17:00:44] -- discuss. [17:02:05] i'll stop by in a few [17:10:49] agile development: https://i.redd.it/etxk36gkujoy.gif [17:12:08] I love the Nick Park stuff! Mari and I discovered Shaun the Sheep just last year... [17:13:15] (Abandoned) Awight: Bump version: 1.0.1-rc1 [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345273 (owner: Awight) [17:13:32] i used to watch that wallace and grommet a lot as a kid [17:13:44] So that's what happened... [17:13:58] :S [17:15:48] awight: so 'ensure latest' will upgrade the package, but doesn't implicitly run apt-get update [17:16:02] hehe [17:16:14] no autorequire, huh [17:16:35] i'm sure there's a way to make it update but i'd want to ask jeff first if he wants that [17:18:46] require => Exec['apt-update'] [17:19:20] although nothing else in the repo does that. [17:19:50] ah, "subscribe =>" is better [17:19:57] but still, we're out on a limb [17:20:08] i love how puppet encodes simple commands into slightly unsettling fake versions of such [17:20:18] apt-update, gtfo [17:20:42] it's the name of something in the apt module. but yes it's insanity [17:20:51] also how it helpfully omits '/files' from file paths because that is such a bother to type [17:21:10] i would much rather type uncanny non existent path names [17:22:09] anyway there are about 2 million ways i could update this package [17:22:23] agreed, the template() syntax is blood-curdling [17:22:35] no rush, I'm happy tinkering with the sequential execution patch [17:25:06] (PS5) Awight: [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) [17:25:30] (CR) jerkins-bot: [V: -1] [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) (owner: Awight) [17:31:33] (PS6) Awight: [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) [17:32:15] (CR) jerkins-bot: [V: -1] [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) (owner: Awight) [17:39:19] awight: the thing is, i don't see any "absent" activity in puppet git [17:39:30] makes me think that is not jeff's preferred approach [17:39:37] oh hell no it's a terrible thing to do [17:39:47] why are you looking for that? [17:40:05] i'm trying to deduce how he would want me to upgrade this package [17:40:19] i can easily just remove it on barium and let puppet replace it [17:40:32] apt update ; ensure latest is a no-go? [17:40:53] lessee what we do with the other custom packages... [17:41:06] i feel like apt-update is a little drastic? [17:41:12] maybe we can just update frack? [17:41:33] simple ensure=>present it seems [17:41:41] So I don't understand what would ever trigger an update. [17:42:05] yeah, well it looks to me like nothing does [17:42:15] which is why i'm hesitant to add that [17:42:26] makes me think there's a separate channel [17:47:00] awight: why a single timeout for the command group? [17:48:48] ejegg: cos I think that's right :p [17:49:10] could be a good idea, i'm just still thinking in the Jenkins style of a job specifying a next job [17:49:27] In that case, the commands are /usr/bin/run-job job2 and the subjob can have its own timeout [17:50:20] ok, that works [18:12:58] rsync error: unexplained error (code 255) [18:13:01] that's nice [18:13:59] wheee! [18:14:06] so is that 8-bit -1? :p [18:14:26] Fundraising Sprint Far Beer, Fundraising Sprint Gondwanaland Reunification Engine, Fundraising-Backlog, FR-Smashpig, and 2 others: Handle iDEAL push notifications - https://phabricator.wikimedia.org/T161153#3144569 (Ejegg) [18:14:27] fr-tech: ejegg: wanna spend a minute grooming the scope for this beast? [18:15:19] awight: sure, you mean the sequence in particular? [18:15:27] no the whole deal [18:15:40] I want to move the goalposts closer [18:16:35] i'm just going to manually update these packages [18:16:41] tyler said that's normal [18:16:51] and i'm like wow [18:16:59] :) ty that's amusing [18:17:00] such scaffold. many abstraction. [18:17:00] awight aren't we basically there? seems like the deb packaging is the tough part [18:17:07] and yet this simple case isn't covered [18:17:11] ejegg: naw deb is solved [18:17:22] awight: don't say that! [18:17:24] ejegg: look at unchecked boxes: https://phabricator.wikimedia.org/T161569 [18:17:27] it will break again [18:17:35] * awight knocks on wood [18:18:33] we can check off 'has some kind of failmail' [18:18:46] and the log boxes [18:18:50] I was hoping to check things once they're verified, though [18:18:53] also, the service user [18:18:56] ah, I see [18:19:15] I'll count even dev box verification [18:19:56] I don't think we solved service user actually. You can still run-job as your own user, with devastating consequences. [18:20:06] e.g. file perms shot [18:20:28] hmm? should only run as jenkins... [18:20:30] awight: oh, huh, even with setuid bit? [18:20:39] it doesn't have setuid [18:20:51] In fact, AFAIK we don't even have a sudo wrapper [18:20:57] ohwai [18:21:04] should we just make it -x? [18:21:09] oh, what was our plan for devs running jobs then? [18:21:20] currently, it's sudo -u jenkins /usr/bin/run-job... [18:21:21] 744? [18:21:28] that is whitelisted in sudoers [18:21:40] 744 is a nice quick hack [18:22:00] k, I'll do a quick patch to bail if not service user [18:22:01] I like it more than app code to lock out !jenkins [18:22:10] ejegg: cwd is saying we can just do it with -x [18:22:17] oho, true [18:22:23] cos patch will be hellof annoying for devs [18:22:24] yep, better [18:22:53] cwd: would that be a puppet or .deb responsibility? puppet, I think? [18:23:15] deb shouldn't assume a specific user, should it? [18:23:42] That's a common install pattern, not that it's good [18:23:51] awight: yeah i'd say puppet [18:23:54] kk [18:24:38] add-user jenkins # he's been with the family for decades [18:24:39] notated in the task. [18:24:46] harr [18:24:59] nobody waiting on the inheritance, I see [18:24:59] awight: yesterday you were saying there were process-control.yaml changes what needed done too? [18:25:02] ejegg: ha ha [18:25:14] cwd: yup, re-up from the .example [18:25:20] gotcha [18:25:25] and you might see some dirs that need puppetizing [18:25:36] I tried to list those in the epic, T161569 [18:25:36] T161569: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569 [18:26:27] so, 'devs can kill jobs' is the last uncoded thing I see on the non-ops list [18:26:44] 345062 master [WIP] --kill-job [18:27:04] we want to make changes, I guess? [18:27:12] cwd what version of node is in jessie? [18:27:14] Also: can we cut that from scope? IMO yes [18:27:21] awight sure [18:27:32] https://packages.debian.org/search?keywords=node [18:28:03] ty [18:28:05] What else can we take out? [18:28:25] oops, https://packages.debian.org/search?keywords=nodejs [18:28:39] ejegg: well, npm is available so i don't know if the nodejs package is super relevant? [18:29:00] apt-get install npm && npm install n && n 0.8 [18:29:20] ah, ok. got a patch from hashar upgrading the dash to node 6 :) [18:29:21] Cos this is for the dash, hence not quite production? [18:29:23] i have no idea if there's a convention wrt the apt version of nodejs [18:29:46] deb is still at 0.10 [18:29:51] ancient [18:30:01] yerp [18:30:16] i tend to think in this situation i'd trust the more recent code more than the ancient code [18:30:25] with something like node that is so new in general [18:30:34] freal [18:31:22] awight: should jenkins own /usr/bin/run-job ? [18:31:25] that seems a little nuts [18:31:51] how abotu we just promise to not run it as a user other than jenkins and see what jeff thinks [18:32:18] some kind of mollyguard would be nice [18:32:31] to avoid scrambling permissions [18:33:08] What did jeff do with drush? [18:33:39] we have a sudo wrapper script, and the binary is simply hidden. [18:33:46] So not really comparable [18:33:54] We could do something like alternatives [18:33:56] awight: looks like it runs as www-data [18:34:01] drush [18:34:15] move run-job to run-job.real, and run-job is a sudo script [18:34:16] right, it has to b/c crm template dir permissions [18:34:37] awight: yeah there is a ton of that in /u/l/bin [18:35:02] i'll dig up the puppet part [18:36:13] oh? I don't see it [18:38:10] ejegg: No more scope cuts that you can think of? [18:38:22] IMO, "code already written" doesn't matter [18:38:26] We can always circle back [18:38:45] kill job seems fine to put off [18:38:47] mebbe " Log actions and errors to syslog."? [18:38:57] awight: there are other scripts that sudo in there i mean [18:39:06] awight that's not covered by the python logging? [18:39:22] ejegg: maybe. but if it doesn't work, are we going to fix it or forge ahead. [18:39:28] so members of www-data can sudo to that user [18:39:34] not sure how we want to relate that to jenkins [18:39:49] awight forge ahead works for me, as long as we get the job output [18:39:51] we could add everyone to the jenkins group [18:39:54] kk [18:39:55] but that seems silly [18:40:16] ejegg: How about failmail? [18:40:23] cwd: Doesn't sudoers already have that magic? [18:40:51] awight: what magic? [18:41:06] oh the jenkins thing [18:41:07] sudoers already defines who can run run-job, so what are we determining right now? [18:41:10] yah [18:41:17] ah yes [18:41:21] fr-tech can sudo jenkins [18:41:35] yeah that works perfect [18:42:32] i'd be comfortable chgrp jenkins /usr/bin/run-job and chmod 754 [18:42:51] how's about that? [18:43:59] That looks like a good interim solution [18:44:07] We'll still have to manually sudo, but that's fine [18:44:21] we can make a wrapper script [18:44:27] see /usr/local/bin/* [18:44:33] corner-cutting R us [18:45:40] fr-tech: k I've updated T161569 with my understanding of our shared understanding. [18:45:40] T161569: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569 [18:46:04] ejegg: I'd still like to drop anything failmail for MVP [18:46:16] We already get spamballed like there's no tomorrow [18:48:09] hmm, i think we need that or an indication of jobs that failed last time htey ran in the status list [18:48:31] otherwise things like SP export or audit downloads will fail silently [18:48:46] and we might not know for weeks [18:48:59] Let's tighten the feature at least [18:49:07] so, "failmail when job exits with non-zero"? [18:49:18] sure, that's good enough [18:49:20] kk [18:50:09] ok, after lunch I'll try to finish the command sequence patch [18:50:31] do you feel like T161751 has been proven well enough to tick off? [18:50:31] T161751: WikibaseQualityExternalValidation test failures - https://phabricator.wikimedia.org/T161751 [18:50:40] uh. T171571 [18:50:47] stashbot: yeah you [18:50:47] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [18:50:54] T161571 [18:50:54] T161571: process-control streams to log - https://phabricator.wikimedia.org/T161571 [18:51:14] awight works on my machine! [18:51:35] [tick] [18:51:46] ok perms thing works on vb, pushing to irl [18:52:51] cwd: fyi also note that the /etc file is .. and not in /etc/fundraising now [18:53:04] (maybe a good use of ensure=>absent ;) [18:53:18] /etc/process-control.yaml? [18:53:44] yep [18:54:02] i dont' see that in the example conf? [18:54:20] hmm? [18:54:26] oh good point, that should be documented. [18:54:29] ty! [18:55:17] awight: puppet still has /etc/fundraising... [18:55:37] you could ensure=>absent that file, and copy the declaration [18:55:39] what looks for it where? [18:55:45] it needs to be in /etc [18:55:49] that's hardcoded [18:55:56] gotcha [18:56:11] I considered symlinking, but Jeff convinced me that we can abandon my /etc/fundraising BSD-hangover fetish. [18:56:33] cwd: fyi T161544 [18:56:33] T161544: Move all /etc/fundraising config into /etc, drop the subdirectory - https://phabricator.wikimedia.org/T161544 [18:56:39] eventually... [18:57:01] The strongest argument IMO is that we're building tools that will be used by non-fundraising groups [18:57:01] hehe [18:57:10] yeah [18:57:14] that would be the hope [19:01:25] wikibugs: where have you been all my life? [19:02:05] Our bots are not very social [19:02:37] back in a few [19:15:52] ejegg: i notice log level DEBUG is specified twice in the new file [19:15:56] are they both necessary? [19:16:46] in fact the second one messes with my auto indent [19:16:52] i wonder if that is a mistake? [19:21:33] cwd oops, might be! [19:22:30] hmm, let's see, different contexts [19:23:00] i'm trying it on vb as is [19:25:39] cwd looks kosher [19:25:49] both loggers and handlers take a level setting [19:26:08] so you could have one logger at debug level [19:26:38] with a syslog handler at warning level and a file handler at debug level [19:28:44] ah cool [19:28:46] got it [19:29:15] ok i'm just giving the new conf a once over [19:29:21] everything looks to have worked [19:29:51] i am finally getting to the point where i can make puppet behave in a quasi predictable manner [19:30:27] yay! [19:32:37] ejegg: did we ever figure out a way to deploy the jobs? i see /var/lib/p-c but i do not see that on barium [19:33:17] wondering if adam manually added the files [19:33:29] I though I saw Jeff_Green say he'd done something for that [19:33:31] lessee now [19:33:48] yeah, we were talking about making the deploy script push them somewhere [19:36:42] i see them at /srv [19:38:32] hmm, just deploying like a regular project then [19:39:59] k, fundraising_code_update now has a project called process-control [19:40:17] yeah [19:40:25] it must be deploying to /srv/process-control [19:40:58] ejegg: ah yeah there's a dir in localsettings [19:41:07] yep, just found it myself! [19:41:12] looks like it will go to /srv/ not /var/lib tho [19:42:04] k, so the /etc/ finle needs updating [19:43:23] cool, well i can make the change and we can add to .default whenever [19:43:54] default's probably OK as is - /srv is a local-ism, right? [19:44:40] heh i'm not sure of the scope of that convention [19:44:53] it seems like people put in /srv what they used to put in /var/www [19:45:07] and /var/www is painfully uncool now [19:45:28] oops, stdup [19:46:07] 1~k. still feels like a quirky place to put job descriptions if you're not using fundraising_code_update [19:47:20] cwd - I saw a note about php 5.6 vs hhvm - Civi doesn't officially support hhvm - but it does officially support php7. Having said that, Civi test run on the earliest version it supports, not all of them due to resources, & I think most of the hhvm intolerances also apply to php 7? [19:48:51] eileen: i think mostly but 7 has a few of its own...my feeling on this issue would be to stick with 5.6 but switch to fpm. what do you think? [19:50:12] afaik everything would "just work" and we'd get a small to medium performance boost [19:50:29] depending on the code being run [19:54:48] yeah - 5.6 is a bit safer - safety in numbers [19:55:11] php7 gives some of the benefits (speed) and has reasonable numbers [19:55:40] you can see the spread over versions here https://stats.civicrm.org/?tab=technology [19:56:04] for 4.7 sites: [19:56:19] 7.1: 20 [19:56:24] 7.0 367 [19:56:34] 5.6 3,068 [19:56:41] 5.3 129 [19:56:58] that's a pretty narrow bell [19:57:01] but of course 7.0 & 7.1 are growing & 5.3 is shrinking… [19:57:08] 5.5 1,322 [19:57:14] 5.4 373 [19:57:35] (not quite as narrow - but yeah - well over 50% of 4.7 users are on 5.6 [20:00:09] wow! [20:00:28] well yeah i definitely feel safest being just one of the big crowd [20:24:03] cwd: +1 to backscroll :) [20:24:15] kk rad, I'll try running stuff. [20:24:34] boom. [20:25:19] good boom or bad boom? [20:25:38] I cannot hear [20:26:12] :( [20:26:26] What commit did you build? [20:26:33] just so's I can tag it in git [20:27:27] b4ca767669343bef136799cf9086585de198ccf0 ? [20:29:00] cwd: ^ [20:29:35] cwd: nvm I got it: af8f191ef3cbaa9bd06789268849fddc593a4de1 [20:29:44] yep sorry [20:29:48] makin coffee [20:29:56] np! [20:46:45] cwd: okay, it's cos fake job is so fake. [20:46:52] deploying some real now [20:46:58] :) [20:47:05] we should make it fail softer too [20:48:01] Fundraising-Backlog: process-control should handle bad YAML syntax - https://phabricator.wikimedia.org/T161858#3145252 (awight) [20:53:55] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3145280 (awight) [20:55:33] I'll try the audit parser now [20:56:39] hahaha, immediate return with no output. [20:56:55] any syslog? [20:56:58] nope [20:57:03] :P [20:57:34] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3145281 (awight) [20:57:56] hmm. add a console logger to config? [20:58:02] oh wait, /var/log/p-c [20:58:23] ejegg: I'm thinking, add the console logger whenever os.isatty() [20:58:34] nice [20:58:34] That way you see logging about "no console logger configured" :) [20:59:12] ejegg: hey it's that bug you were anticipating. [20:59:24] ooh, which one? [20:59:28] overriding the environment clears out things we assumed would be there [20:59:37] ohhh yeah [20:59:46] child jobs don't like the barren wastes [21:00:12] Seems like an assumption we shouldn't make, regardless [21:00:23] makes me wonder what we've set jenkins env to [21:00:28] to make things work [21:01:54] awight env is full of nice things [21:02:05] like the machine telling us its locale [21:02:07] None of which we should trust ;-) [21:02:12] true [21:02:17] and printer paper size [21:02:22] yeah I agree with your initial suggestion that we should merge [21:02:23] hahaha [21:02:38] and thousands of arcane characters explaining the "termcap" [21:03:00] *.yuv=01;35 [21:03:10] heads-up, job running! [21:03:16] woohoo! [21:03:20] darn, I missed the chance to --list-jobs it [21:03:28] the paypal audit? [21:03:43] try the gc audit, it'll stick around for a while :P [21:03:59] pp audit downloader checks out. [21:04:05] nice ! [21:04:10] Should we build a spreadsheet for this migration? [21:04:11] cwd: ^ [21:04:29] awight: did you just set LOGNAME? [21:04:37] hehe yup [21:04:40] nice [21:04:45] not one to suffer from pride [21:05:00] oh, nice fail mail [21:05:05] ejegg: Holler if you're jumping on any of these and I'll do likewise [21:05:20] cwd: F* yeah! [21:05:21] awight: yeah a migration etherpad sounds nice [21:05:24] I'll do env merge [21:05:35] i can check on the cron-generate thing [21:05:55] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3145321 (awight) [21:06:03] tyty--I'll finish the command sequencing [21:06:19] (last codewise checkbox!) [21:07:16] heh maybe i'll just run the generator for today [21:07:34] There's time to do that one right... [21:07:36] since the obvious fix is whitelisting sudo :P [21:07:46] yeah i'd like to know jeff's opinion of that [21:07:50] ha! [21:08:11] I'll run the orphan rectifier for fun [21:08:35] !log disable ingenico orphan rectifier (jenkins) [21:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:46] awight: is having the schedule commented out effectively turning off the job? [21:09:14] yeah [21:09:21] although we need to talk about how that should really work [21:09:41] like a disable knob which *really* prevents a job from being run one-off or anything [21:09:48] really really [21:09:55] ingenico_orphan_rectifier - Ingenico Orphan Rectifier {pid: 3375, status: running} [21:10:07] * awight warms heart by garbage can fire [21:10:25] yikes--it's buffering all logs [21:10:28] This will be fun [21:11:11] we knew that would happen right? [21:11:33] yeah [21:11:42] the next commit on master is supposed to fix that [21:12:11] word [21:12:41] only 16G ram on this box [21:12:44] better watch out! [21:13:26] (not even joking) [21:15:37] heh so there is an example of a job printing to stderr that we were ignoring before [21:15:59] We... don't know yet [21:16:09] Maybe these drush buggers [21:16:10] cwd think it only stderrored b/c of missing env [21:16:12] I'm running one now [21:16:32] ejegg: the orphan rectifier? [21:16:41] there is a new fail mail [21:16:47] cwd oh I thought you meant the audit download [21:16:49] !log reenabled ingenico orphan rectifier (jenkins) [21:16:51] it is much less drenched in PII than before [21:16:55] which is nice [21:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:05] well it depends on the job [21:17:18] we (ejegg and I) decided yesterday to drop the stderr stuff [21:17:20] mail delivery is slow down here in the jungle [21:17:32] Instead, all stdout+stderr goes to the logfile [21:17:40] p-c only cares about non-zero exit code. [21:17:52] Any other failmail is sent by application logic and we don't want to know. [21:18:08] did the rectifier exit non-zero? looks like a successful run to me... [21:18:11] wat, is orphan rectifier putting EVERYTHING on stderr? [21:18:15] On a related topic, I tried to convince Jeff that we should look at integrating with icinga, but I think you were there for that [21:18:35] cwd we haven't changed the code to allow stderr without failmail yet [21:18:39] ejegg: hahaha I was thinking we might find some of these specimens. Yeah that must be drush/watchdog/syslog_watchdog [21:18:43] ah right on [21:19:17] stdout from the rectifier: [21:19:17] * No output * [21:19:41] awight: so you think drush just pipes everything to stderr? [21:20:08] yeah at least the way we use it [21:20:17] nice--I'd say the orphan rectifier checks out, too [21:20:30] any reason at this point, besides excessive spam, that we couldn't move this job over? [21:20:50] Only that we're gonna be iterating on the tool [21:21:01] yeah [21:21:05] (PS1) Ejegg: Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 [21:21:09] let's set a target then [21:21:12] awight: oh damn, we use watchdog for a lot of stuff [21:21:14] Maybe we can start migrating jobs over on the oldbox [21:21:15] cause we will be iterating for eternity [21:21:22] well, good thing we're streaming that to the log! [21:21:25] and then we blast experimental tool releases to newbox? [21:21:32] I don't know if it makes sense [21:21:49] ejegg: fwiw, this .deb didn't have the streaming code yet [21:21:53] yeah i could see that [21:22:08] ah, ok [21:22:24] I'll get rid of that buffering to Queues [21:22:28] oh ty [21:22:41] actually--if it works, leave it for now? [21:22:54] command sequence is all that same code [21:22:57] sort of [21:23:01] k [21:25:26] (CR) jerkins-bot: [V: -1] Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 (owner: Ejegg) [21:25:41] don't we not want more than one server running this program at once? [21:26:04] ugh, something in my output fetcher is fragile [21:26:08] is there rigorous resource locking that im not aware of? [21:26:48] cwd: It should be okay as long as it's not running the same jobs [21:26:56] Does seem like a screwy migration plan though [21:26:59] but wouldn't it be? [21:27:00] Any other suggestions? [21:27:15] like if cron-generate worked wouldn't all 3 servers be running all the same jobs? [21:28:06] yeah we would have to stop stuff [21:28:51] no worries about running civi in multiple places at once? [21:28:51] should we maybe just target one server for deploy of p-c? [21:29:09] ejegg: good question [21:29:36] Fundraising-Backlog, fundraising-tech-ops, Epic: [Epic] Basic process-control good enough to run all CRM jobs - https://phabricator.wikimedia.org/T161569#3145395 (awight) [21:29:38] i imagine most of the writes are fired off through the UI [21:29:45] besides this stuff [21:30:00] there is some weird filesystem caching [21:30:12] might just be templates [21:30:23] maybe we're just getting lucky cause nobody uses the alternate servers [21:30:34] (PS2) Ejegg: Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 [21:31:54] (CR) jerkins-bot: [V: -1] Merge environment, don't clobber [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345768 (owner: Ejegg) [21:40:06] ejegg|biab: oh. The getLogger(self.slug) is not playing nice with my sequence code [21:43:55] ejegg|biab: no, that's not it [21:44:50] ooh the streams are all in their own thread, and writing to the same file. [21:45:46] why is there more than one stream? [21:46:09] (stdout,stderr) x three subprocesses [21:46:34] sorry, why 3 subprocesses? [21:46:38] I'd rather do some file descriptor magic and connect the pipes directly, or with a thin filter [21:46:42] see the privchan [21:46:59] The feature I'm working on is T161035 [21:46:59] T161035: process-control will need to run some jobs sequentially - https://phabricator.wikimedia.org/T161035 [21:47:18] So if the first command fails, we don't run the next [21:47:31] oh, you had 3 jobs there? [21:47:39] for my test thing yeah [21:47:43] same job chained 3 times/ [21:47:44] ? [21:47:55] 14:42 < awight> - /bin/echo hello [21:47:55] 14:42 < awight> - /bin/ls [21:47:55] 14:42 < awight> - /bin/echo goodbye [21:48:01] sorry it's opaque cos not checked in [21:48:42] chained jobs almost seems worthy of a new thing [21:48:55] rather than making this thing multi threaded [21:49:03] it's not multithreaded for chained jobs [21:49:10] they are run in serial\ [21:49:34] The multithreading seems to be the only way to stream stdout* from child processes in Python :( [21:49:52] See https://gerrit.wikimedia.org/r/345424 [21:50:33] this is the only way to get partial output if we kill the parent process? [21:51:16] I think there are other ways to do that [21:51:32] but we also want to stream the output to the logfile so we can see what's happening in realtime [21:51:40] Feel free to push for !MVP [21:52:00] I'm on the fence, all I care for now is that we never lose loglines [21:52:17] So streaming seemed like a good fix which gets us other nice properties [21:52:21] so, if we kill the child process the python thread will still finish [21:52:25] right? [21:52:28] yes [21:52:30] and write the log files? [21:52:32] yep [21:53:19] cwd: and awight mind if I add that jenkins matrix to the fr-tech folder? [21:53:19] the inherent complexity of multithreading seems like a high cost for a relative edge case [21:53:26] dstrine: Thanks! [21:53:42] cwd: what about streaming the logs [21:53:46] not exactly edgy [21:54:03] well, does it matter for anything besides reading them in real time? [21:54:51] no, it's just those two things: seeing them in realtime, e.g. when you start a long-running job; and being pretty sure that we're not losing large amounts of irreplaceable data [21:55:26] you would, tailf the file? [21:55:40] uh yeah probably [21:57:16] I really will write a curses interface for this :) [21:57:38] i guess i think, it sounds like a useful feature for development that would probably be rarely used afterwards [21:57:54] we mostly rely on fail mail to see that something is wrong [21:57:58] I can't quite agree, I watch lots constantly [21:58:01] *logs [21:58:05] in jenkins? [21:58:07] yep [21:58:13] like many times per day [21:58:23] Especially when changing code [21:58:26] well i can't argue with that [21:58:53] The realtime thing is probably less important than I think, though [21:59:10] Cos I will rarely kill the job or do anything realtimey in response to what I'm seeing [21:59:20] Probably just a fetish [21:59:21] so waht this thing does is fork a new process that attaches to stdout? [21:59:25] :( [21:59:38] ejegg|biab and I were talking about simplifying a bit [21:59:43] but yeah, for now. [21:59:48] eh i'm not trying to talk you out of thinking it's useful [21:59:58] just wondering if there's a less drastic thing that's possible [22:00:05] I think we might be able to just plug something into Popen's stream args [22:00:22] multithreading seems like it ^2 the complexity of the whole thing [22:00:43] when what we want to do is run things strictly in serial [22:00:45] plus, the Queue stuff is actually sending the entire logs over IPC, which makes me wish I were blind [22:01:14] yeah i'm pretty sure you can do this with Popen [22:01:21] we'd lose whatever is cool about subprocess [22:01:26] We did comb the obvious SO threads... [22:01:33] subprocess.Popen so no worries [22:05:03] i'm sure this is old news http://stackoverflow.com/questions/2715847/python-read-streaming-input-from-subprocess-communicate/17698359#17698359 [22:05:14] it looks like they are just setting a tiny buffer? [22:05:18] and dumping it as it goes? [22:07:32] looks blocking, I'm not sure [22:07:49] We're def going to rewrite this, so it's great to collect snippets that might work [22:07:56] blocking of what? [22:08:25] that readline() is blocking IO, I'm not sure what will happen there [22:08:36] it'll certainly be impossible to get stdout and stderr in sync [22:10:11] maybe we should have written this in bash [22:10:46] this logging thing is a crappy problem to have [22:10:49] indeed [22:11:11] i'll have to ask tyler when he's around [22:11:13] Python has just a few warts but I feel like I'm up against them every time [22:11:26] i hear that [22:11:42] the language itself seems fine but the std libs are all wacky [22:13:32] awight: https://docs.python.org/2/library/subprocess.html#subprocess.Popen [22:13:39] this says default behavior is unbuffered? [22:14:05] That sounds familiar [22:15:35] have we tried naming the log file in there instead of PIPE? [22:15:50] cwd: That was something ejegg|biab and I were talking about, yeah [22:16:01] I think that's a big simplification [22:16:07] the docs make it sound like the behavior we want is the default [22:16:12] Also, we don't need to collect stdout and err so the Queue is not needed. [22:16:13] but it could be a big disappointment [22:16:35] I'm hoping for "good enough" in this round [22:16:47] i say we give that a shot [22:16:49] can't hurt [22:17:16] maybe PIPE is beholden to some other buffer [22:17:37] I just got it to work perfectly a second ago, it seems the problem is actually the logger layer [22:18:12] ah ha [22:27:24] man there are a lot of batshit solutions to problems that i can't tell if are this one or not [22:28:21] awight: we got timestamps on logging? [22:28:54] yeah, it's rad enough that I'm fighting to keep it [22:28:59] prognosis is good [22:29:09] (PS7) Awight: [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) [22:29:14] so in theory we don't really need to weave err/out together [22:29:20] thcipriani's idea, btw! [22:29:27] right, it's a bonus [22:29:29] he's full of em [22:29:40] but i think he's out today [22:30:09] but i have a feeling that when they deploy with scap they aren't waiting for it to finish to see the log ;) [22:33:12] although that's probably a bunch of smaller commands [22:33:15] not one huge one [22:36:22] (CR) jerkins-bot: [V: -1] [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) (owner: Awight) [22:36:52] awight: https://pypi.python.org/pypi/sarge/0.1.4 [22:38:25] (PS8) Awight: [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) [22:39:28] That looks like just the thing, maybe stale though? [22:39:55] (CR) jerkins-bot: [V: -1] [WIP] Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) (owner: Awight) [22:40:24] Very nice "block=" param, http://pythonhosted.org/sarge/tutorial.html#capturing-stdout-and-stderr-from-commands [22:40:48] awight: 2015, not that bad in python terms [22:41:30] i can't believe this is a problem that needs a library [22:41:34] but w/e [22:42:34] ejegg|biab: ^ [22:44:22] you know probably the whole problem with subprocess is windows compat [22:44:33] there should be a "i'm never going to run this on windows" flag where you get all the good libs [22:44:45] I bet that's the back story [22:45:07] I've written a few cross-platform libs [22:45:10] a million little compromises [22:45:13] and here we are [22:45:15] remember nmake? [22:45:15] can't stream to a file [22:45:37] heh yeah [22:45:47] apparently there's some sort of ubuntu vm in windows now [22:46:11] but i think it translates to system calls [22:46:23] so maybe it's more like an ubuntu lxc [22:47:03] (PS9) Awight: Run commands in sequence [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) [22:47:07] anyway an abstraction is only as good as the worst thing it abstracts [22:47:24] .pay_me_now() [22:47:40] avarice.setTrue() [23:01:38] (PS1) Awight: Relax a test until we can fix the potential double-failmail. [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345781 [23:01:42] I'm a bad person ^ [23:02:56] ejegg|biab: I had to mangle OutputStreamer, the biggest issue turned out to be that we were adding the filehandler multiple times, the way I had initially integrated my patch. [23:03:25] While I was floundering trying to figure that out, I trimmed some of the yak hairs we had gone over [23:09:04] ejegg|biab: btw, code is all yours. I'm out for the night shortly and not back until Monday [23:11:01] so long awight! [23:11:26] You mighit especially like https://gerrit.wikimedia.org/r/345781 [23:11:38] reading backscroll [23:11:40] in other news, https://docs.google.com/spreadsheets/d/1qfXaBmhW45qSbFRqgJs_zpeEt6ZIo2hrZcBtRhVXS9w/edit#gid=0 [23:11:54] and, command sequence is "good enough" IMO [23:12:14] oh, the run-job run-job was doubling up the file handles? [23:12:30] no, it was actually the getLogger [23:12:41] logging.getLogger creates the logger *if it doesn't already exist* [23:12:54] so we were getting the same object each time, and adding multiple file handlers [23:13:02] removeHandler was not my finest work, but expedient. [23:13:20] oh wow, a lot can happen in a couple hrs... [23:13:35] aha, gotcha [23:13:41] yeah, in a fresh code base [23:14:51] darn, pytest-3 is broken [23:15:31] oh hey, miler singleton might fix that, huh? [23:15:54] *mailer [23:15:59] seems like it? [23:16:05] I don't think so [23:16:25] It looks like we're mailing, then killing and a second mailing may or may not take place depending on timing [23:16:33] how would a singleton help? [23:16:42] a singleton for the file handle? [23:17:43] ejegg was suggesting, for the mailer, in reference to my nasty: https://gerrit.wikimedia.org/r/345781 [23:17:57] That edge case sometimes triggers two failmails. [23:18:00] We can live with it [23:18:26] oh sry [23:18:35] oh, because the first mail is in a subprocess? [23:18:54] or... race condition? [23:20:04] race I think [23:20:14] I don't think there's a subprocess involved [23:20:31] On second thought, I'm surprised that this race is possible cos, GIL [23:21:06] well that's not quite what I mean [23:21:27] I would have imaginied that sendmail is all single-threaded, and puts that message on a queue before returning. [23:33:14] gtg, good luck comrades [23:37:08] (CR) Ejegg: "Looking good so far, tinkering a bit locally. Would be nice to test nested execution, though subprocess should theoretically mean no leaka" (6 comments) [wikimedia/fundraising/process-control] - https://gerrit.wikimedia.org/r/345429 (https://phabricator.wikimedia.org/T161035) (owner: Awight)