[10:39:41] folks as FYI the /var/lib/git/operations/private dir on puppetmaster1001 is still showing staged changes for the requestctl cloud ipblocks [10:40:02] I reproed locally in https://phabricator.wikimedia.org/T368023#10041581 and it is definitely an issue of the external_clouds_vendors code that commits to /srv/private [10:40:08] but I haven't narrowed down the cause [10:40:21] at the moment it should block any pupper private commit propagation [10:40:49] and it shouldn't be an issue for prod since IIUC those ip blocks are only used in volatile and etcd (correct me if I am wrong) [10:40:58] if you have ideas lemme know :) [10:41:29] <_joe_> elukey: not a problem per-se, but why is that script running on puppetmaster1001? [10:41:37] <_joe_> if it's also running on another host, I mean [10:41:53] nono I rolled back, only on puppetmaster1001 [10:42:02] since days ago [10:42:13] <_joe_> ok [10:42:19] <_joe_> but it wasn't an issue before [10:42:22] <_joe_> was it? [10:42:26] <_joe_> so something has happened [10:42:31] I suspect that we didn't know about it [10:42:41] because the code clearly causes this [10:42:49] <_joe_> and code only got synced when people committed to /private [10:43:01] <_joe_> I suspect it's some env variables missing [10:43:17] <_joe_> what did you try exactly? [10:43:51] two repos locally, one cloned from the other, and a post-commit hook with "cd $path; git pull" [10:44:08] basically to mimic /srv/private and /var/lib/git/etc.. [10:44:26] then I just used the python code of external_cloud_vendors that commits to the repo [10:44:29] <_joe_> uhm and the git pull *never happens*? [10:44:40] <_joe_> or does it fail? [10:45:14] it does happen, but for some reason there is (what seems to me) a staged revert in the repo that git pull runs on [10:45:19] commits etc.. are all good [10:45:40] <_joe_> yes I am looking at the git history and it seems ok [10:46:40] nothing in the reflog too, and I tried to log as many things as possible in the python code and gitpython library locally on my repro, nothing clear that I found [10:47:02] <_joe_> did you try running the git pull with GIT_TRACE=1 ? [10:47:46] I tried GIT_TRACE=true, there are more info logged but then I also added a git status in the post-commit hook's script, and when it runs it doesn't show any staged content [10:47:50] <_joe_> the revert is for more than just one patch [10:47:51] that puzzles me even more [10:48:08] for puppemaster1001's repo yes [10:48:18] <_joe_> the staged revert, even [10:49:15] <_joe_> what's the status of the private repos on the other servers? [10:49:29] I found out about it since when I moved the timer to puppetserver1001, the first time it causes an unstaged revert, and Balthazar hit an error while post-commit was trying to propagate. Never happen afterwards, only staged [10:49:39] all the time that I checked it was good [10:49:49] only where the script commits you see the problem [10:49:57] *caused [10:50:13] but it seems consistent [10:50:37] <_joe_> so only where there is a pull instead of a push [10:51:00] <_joe_> have you tried to actually do as follows [10:51:04] <_joe_> skip the hook [10:51:09] <_joe_> then do it manually? [10:51:42] <_joe_> because my first instinct was thinking the problem might be with the su -c gitpuppet [10:52:00] yes I did, it works in that way [10:52:01] <_joe_> but clearly that's not the case per your experiments [10:52:18] <_joe_> ok, so the problem is most likely some env variable missing [10:52:29] <_joe_> but why only for those commits [10:52:36] <_joe_> uhm wait [10:52:58] they must be special, maybe the way we add them to index via gitpuppet creates some inconsistency [10:53:04] it is the only thing I can think of [10:53:59] <_joe_> so we run the script as root [10:54:15] <_joe_> I'm looking at the systemd stuff [10:54:19] <_joe_> uhm [10:54:30] <_joe_> can you paste me your git hook in your experiment? [10:54:37] <_joe_> the post-commit hook [10:57:55] sure one sec, I was cleaning it up from all the horrors that I added to log etc.. and I found something that I didn't see before (since at the time I haven't added a ton of print() to all gitpython's commit code yet :D) [10:58:19] <_joe_> I don't think the problem is gitpython per se [10:58:30] <_joe_> but I kinda feel there might be some stray env vars [10:58:50] <_joe_> anyways, going to lunch, ttyl :) [10:59:01] ack! I'll try to paste everything in the task [11:00:36] okok now I see that it happens when the post-commit runs [11:01:06] and I haven't noticed before this one: 'GIT_INDEX_FILE': '/srv/git/private/.git/index' [11:02:18] that seems to be the culprit, I have unset it in the post-commit and now I don't see the issue anymore [11:02:29] _joe_ --^ [11:06:31] all right going to lunch as well, great progress, thanks for the brainbounce! [14:15:36] btullis: if you are back working can you please comment on https://phabricator.wikimedia.org/T370465? gehel, if btullis is not back from work can you find someone else to work on that task? thx [14:16:21] andrewbogott: I'm back today, so I'll get on it. g.ehel is out for another week. [14:16:27] thx [15:19:21] _joe_ created https://gerrit.wikimedia.org/r/c/operations/puppet/+/1059899 as workaround for the GIT_INDEX_FILE issue, lemme know (when you have time, no rush) if it could work [15:19:30] also if anybody wants to chime in, feel free :) [15:52:56] <_joe_> elukey: reading about GIT_INDEX_FILE and I'm still unsure how the heck we got where we got tbh :D [15:53:11] <_joe_> but it makes sense it could cause unexpected results [15:53:48] thanks for the review!