[00:05:06] (PS1) Ejegg: Update libraries (upstream php-queue) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349136 [00:07:28] (PS1) Ejegg: Update libs [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/349138 [00:08:01] (CR) Ejegg: [V: 2 C: 2] Update libs [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/349138 (owner: Ejegg) [00:11:34] (PS1) Ejegg: Blank deprecated repo [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/349139 [00:11:57] (CR) jerkins-bot: [V: -1] Update libs [wikimedia/fundraising/crm/vendor] - https://gerrit.wikimedia.org/r/349138 (owner: Ejegg) [00:13:50] (CR) Ejegg: [C: 2] Update libraries (upstream php-queue) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349136 (owner: Ejegg) [00:24:58] (Merged) jenkins-bot: Update libraries (upstream php-queue) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349136 (owner: Ejegg) [00:25:40] (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/349141 [00:31:42] (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/349141 (owner: Ejegg) [00:31:49] (Merged) jenkins-bot: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/349141 (owner: Ejegg) [00:37:14] !log updated CiviCRM from 90d679b171ee2791fbc30417265c09cd7140bfc7 to 51dbbad9f7822a7b3df730a2bd92ee3fb176b3ec [00:37:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:41:42] so far so good [00:45:17] jobs are back on [00:45:49] (CR) Ejegg: [C: 2] Blank deprecated repo [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/349139 (owner: Ejegg) [00:46:01] (Merged) jenkins-bot: Blank deprecated repo [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/349139 (owner: Ejegg) [00:46:13] (Abandoned) Ejegg: PDO: create/delete table doesn't need arg [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/287942 (owner: Ejegg) [00:53:52] (Abandoned) Ejegg: Reconnect to Redis on idle timeout [wikimedia/fundraising/php-queue] - https://gerrit.wikimedia.org/r/333793 (https://phabricator.wikimedia.org/T155150) (owner: Ejegg) [00:58:36] (PS4) Ejegg: Comments and todos [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/342547 (owner: Awight) [01:15:54] (CR) Ejegg: [C: 2] Comments and todos [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/342547 (owner: Awight) [01:17:41] (Merged) jenkins-bot: Comments and todos [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/342547 (owner: Awight) [02:43:02] Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Thank you mail send should have a time limit - https://phabricator.wikimedia.org/T163412#3196461 (Ejegg) [03:44:08] Fundraising-Backlog, Analytics, MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#3196485 (LilyOfTheWest) >>! In T115042#3195557, @Nuria wrote: >>We are interested in getting some data to them to judge the effectiv... [03:48:27] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3196489 (Dzahn) relevant puppet code: `modules/monitoring/manifests/service.pp` ``` 39 # If a service is set to critical and 40 # paging is not disabled for this machine in... [04:24:00] Fundraising-Backlog, Analytics, MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#3196508 (Nuria) >Basically, because banners are heavily used by the community, it makes sense to empower the community to analyze th... [04:39:28] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3196511 (Dzahn) my suggestion would be: - (optional) rename group "sms" to "core-ops" (or maybe "core-ops-sms") since it specifies a list of people, not a notification method, or at th... [07:26:31] (PS3) Gergő Tisza: Switch TestingAccessWrapper to librarized version [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349092 [11:35:52] (PS4) Gergő Tisza: Switch TestingAccessWrapper to librarized version [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349092 (https://phabricator.wikimedia.org/T163434) [14:15:58] Fundraising-Backlog: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3197692 (Pcoombe) [14:18:17] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3197707 (Jgreen) @Dzahn thanks for the many clarifications! I think I understand. So as of today if "sms" does not show up in contact_groups for a host or service, individual Ops don't... [14:21:44] Fundraising-Backlog, FR-Paypal, Recurring-Donations: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3197715 (Pcoombe) p:Triage>High [14:31:38] Fundraising Sprint Homebrew Hadron Collider, Fundraising-Backlog, FR-PayPal-ExpressCheckout, FR-Paypal, Epic: Paypal Express checkout 1 hour test - https://phabricator.wikimedia.org/T131816#3197762 (Pcoombe) Actually the Spain traffic for mobile looks really low now. It's not much more work t... [14:55:39] fundraising-tech-ops: frack eqiad hardware refresh - https://phabricator.wikimedia.org/T133524#3197834 (Cmjohnson) [15:01:42] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3197848 (Jgreen) I removed 'sms' from notification for frack hosts, and changed myself to 24x7. [15:15:06] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3197901 (Jgreen) >>! In T163368#3197848, @Jgreen wrote: > I removed 'sms' from notification for frack hosts, and changed myself to 24x7. ...and removed myself from 'sms' [15:25:01] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3197946 (Jgreen) Another question...does it make sense to move IRC notifications out of #wikimedia-operations and into #wikimedia-fundraising? I'm not sure of the mechanics of doing tha... [15:36:37] fundraising-tech-ops, Operations, ops-eqiad: move frdb1002 from pfw1 to pfw2 - https://phabricator.wikimedia.org/T163268#3197979 (Jgreen) a:Jgreen>Cmjohnson @Cmjohnson assigning this to you. The destination port on pfw1 should be ready to go, so just give me a little warning before you do the... [16:15:48] Fundraising-Backlog, FR-PayPal-ExpressCheckout: Paypal: deal with 10486 (redirect donor back to pp) - https://phabricator.wikimedia.org/T163458#3198208 (Ejegg) [16:28:19] (PS1) Ejegg: Add paypal_ec to form settings [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349250 [16:28:26] XenoRyet: ^^^ [16:28:38] saw a few 'could not determine error form' messages [16:29:10] Yea, usually following that 10486 error [16:29:16] We'll have to deal with that. [16:29:58] That patch should at least give em a decent looking error message for now [16:32:53] fundraising-tech-ops, Operations: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3198304 (Dzahn) > So as of today if "sms" does not show up in contact_groups for a host or service, individual Ops don't get email or sms notification. If that's correct, we're much clo... [16:34:33] should be trivial to CR if you have a sec! [16:36:20] Heh, suppose I can tear my eyes away from the logs for a moment. [16:37:20] (CR) XenoRyet: [C: 2] Add paypal_ec to form settings [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349250 (owner: Ejegg) [16:39:50] (Merged) jenkins-bot: Add paypal_ec to form settings [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349250 (owner: Ejegg) [16:40:28] thanks! [16:55:01] Fundraising Sprint Homebrew Hadron Collider, Fundraising-Backlog, FR-Paypal, Recurring-Donations: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3198409 (Ejegg) Darn, sounds like a regression in our new recurring normaliz... [16:55:27] Fundraising Sprint Homebrew Hadron Collider, Fundraising-Backlog, FR-Paypal, Recurring-Donations: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3198413 (Pcoombe) Based on the Express Checkout pre-test it seems this isn't... [17:00:20] fr-tech: Professor Gorden Newell threw another shutout in last week's Chem Eng. 130 [17:00:20] midterm. Once again a student did not receive a single point on his exam. [17:00:20] Newell has now tossed 5 shutouts this quarter. Newell's earned exam average [17:00:20] has now dropped to a phenomenal 30%. [17:00:20] -- discuss. [17:01:22] Fundraising Sprint Homebrew Hadron Collider, Fundraising-Backlog, FR-Paypal, Recurring-Donations: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3198434 (Pcoombe) @jrobell @spatton @TSkaff FYI, we should hold off on doin... [17:03:08] (PS1) Ejegg: Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 [17:04:09] (PS1) Ejegg: Always update c_t table for recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) [17:06:56] XenoRyet: does that look like the way to go for the issue pcoombe was talking about ? ^^^^ [17:07:07] Looking right now. [17:07:13] (CR) jerkins-bot: [V: -1] Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 (owner: Ejegg) [17:09:02] (CR) jerkins-bot: [V: -1] Always update c_t table for recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [17:13:29] cleaning up tests [17:13:31] ... [17:13:57] Yea, other than that it seems reasonable to me though. [17:14:54] back in a few, gotta grab the lappy's plug [17:15:11] must have electrons! [17:18:06] fundraising-tech-ops, Operations, Patch-For-Review: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3198569 (Dzahn) The way the custom IRC notifications work: - add a special notification command which writes to a new logfile (https://gerrit.wikimedia.org/r/34... [18:06:31] aaaaarg refactor refactor refactor [18:07:18] dstrine: compatibility with large banner seen identifiers from the old large banner limit mixin is important, eh? [18:08:00] OK postponing the refactor and leaving a rough edge [18:08:10] * AndyRussG grimaces and moves on [18:08:36] noooooooooooo I can't doo it...... [18:10:16] Hrrgg K forget it, the refactor that was trying to control my brain isn't so important [18:10:33] * AndyRussG continues to battle self in public [18:17:50] Hmm I think I reached an agreement with myself [18:24:46] AndyRussG: I'm not sure what you were asking above. Shall we talk at standup? [18:25:05] I'm going to run to lunch [18:39:15] (PS2) Ejegg: Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 [18:42:19] (PS3) Ejegg: Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 [18:57:38] fr-tech can anyone look at that stuff ^^ ? [19:06:57] Jeff_Green: ^ [19:08:12] (PS4) Ejegg: Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 [19:08:14] test for T163368 [19:08:28] mutante: woo! [19:08:49] the only thing that confused me for a while [19:08:59] is that today the icinga server switched from einsteinium to tegmen [19:09:15] and i of course used einsteinium as always and wondered why it doesnt pick up my changes [19:09:24] mutante: ah right. fwiw ~everything~ about icinga confuses me for a long time everytime [19:09:47] i think I finally may have wrapped my head around how warning vs critical alerts end up in inboxes vs pagers [19:10:05] heh :) [19:10:12] yea, i added comments how the IRC thing works [19:10:34] i added a contact "irc-fundraising" to both of your groups [19:10:42] so stuff sent to either of them will also show here [19:10:57] cool [19:11:06] (CR) Thiemo Mättig (WMDE): "I think having a composer.lock submitted is a mistake, and it should be deleted. No extension I know does have one." (1 comment) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349092 (https://phabricator.wikimedia.org/T163434) (owner: Gergő Tisza) [19:11:08] (PS2) Ejegg: Always update c_t table for recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) [19:11:26] do you also want logmsgbot by the way? [19:11:34] i just happened to find out how to do that :p [19:11:48] that's the !log feature and writing to SAL [19:12:41] i'm not sure what the fr-tech folks have been doing so far but I've just used the one in #wikimedia-operations to the extent that I use it [19:12:43] mutante: think we've already got that guy hanging out here [19:12:44] thanks! [19:13:25] We've been using that to log all the payments-wiki and CiviCRM upgrades [19:13:28] super-handy! [19:13:54] oh, but i dont see logmsgbot in here [19:14:14] hmm, lessee who's been responding to !log [19:14:39] yet another thing is the reaction to T12345 [19:14:40] T12345: Create "annotation" namespace on Hebrew Wikisource - https://phabricator.wikimedia.org/T12345 [19:14:43] ah :) [19:15:04] !log test logging in fundraising channel [19:15:08] Oh hey, stashbot's been doing the !logging for us too! [19:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:16] versatile [19:15:33] aha! i get it now. yea, things are changing in the bot world , heh [19:15:41] operations still has the old one [19:15:45] alrighty [19:15:55] (CR) jerkins-bot: [V: -1] Always update c_t table for recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [19:16:54] mutante: the one(?) thing I'm still a little foggy on is how paging vs email for individual users works [19:17:43] CUSTOM - Host alnilam is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [19:17:53] ^ test via actual icinga web ui [19:18:04] most Operations stuff I noticed has 'admin' as well as 'sms' where the former goes to mailing list, and the latter goes to individuals who may have email or sms configured individually [19:18:37] where did you see the mailing list part? because i thought that's what it was but didnt see it anymore [19:18:42] for "admin" [19:18:46] finding.... [19:19:51] unrelated: why did icinga just page me for alnilam coming up without first paging that it was down? [19:20:17] fundraising-tech-ops, Operations, Patch-For-Review: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3199208 (Dzahn) ``` 12:09 -!- icinga-wm [~icinga-wm@tegmen.wikimedia.org] has joined #wikimedia-fundraising 12:11 < icinga-wm> test for T163368 ``` The second lin... [19:20:27] (CR) Gergő Tisza: Switch TestingAccessWrapper to librarized version (1 comment) [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/349092 (https://phabricator.wikimedia.org/T163434) (owner: Gergő Tisza) [19:20:49] Jeff_Green: because i went to the Icinga web ui, selected a random fundraising host, alnilam [19:20:54] and clicked "send custom notification" [19:20:56] oh ha ok! [19:21:01] to test that it really shows up here [19:21:02] and it did [19:21:16] ok. still looking for the admins thing [19:22:10] i felt like admins did create extra mail for everything, crit or not [19:22:22] but then it did not anymroe at some point [19:22:40] and i coudlnt find the actual mail in my inbox except for crit stuff [19:22:53] as if ops was removed from "admins" [19:23:15] to reduce (partially duplicate) mail [19:23:43] also tried "from:icinga to:ops@lists.wikimedia.org " in gmail [19:27:04] and i think when it still mailed all of us it was to root@ [19:27:22] admin@wm.org is an alias for root@ and root@ is = all ops [19:27:47] so it would make sense if the "admins" icinga group used to have email notifications to admin@wm.org [19:28:49] sorry, I got IRL interupted [19:29:03] np, i am on a train myself, sometimes my connection drops [19:30:25] ah ha [19:31:02] ok so in contacts.cfg there's 'team-operations' which is 24x7 destination admins@wikimedia.org [19:31:24] there we go, right [19:31:40] but: [19:31:52] really admins@ or admin@? [19:32:14] sorry alerts@wikimedia.org actually [19:32:33] ok, so that is a thing i setup [19:32:39] to get all critical alerts by mail [19:33:00] for the ops meeting section "pages for awareness" [19:33:02] ok, that explains why I'm still getting mail for the main platform [19:33:08] so that i can just search for mail to alerts@ [19:33:13] and see all things that paged [19:33:22] independent of timezone settings [19:33:27] for my personal account [19:33:47] but this is also just all services that are set "critical => true" [19:34:04] and email-only [19:34:31] about admins@ and admin@ i was about to point out this: [19:34:36] 8 admin: root [19:34:36] 9 admins: :fail: [19:34:39] from exim aliases [19:37:40] right [19:38:10] ok, so this is all making more sense. i saw that other teams have similar team contact lists that also mail for warnings [19:38:15] alerts = root [19:38:19] yeah [19:39:00] so we would have to remove you from roots@ but then you get nothing else either [19:39:31] right. I don't mind getting that email, I can filter it or whatever. I just didn't want 24x7 harassment to my phone for core infrastructure [19:39:48] ok, yea. sounds good then [19:40:23] question: is this ^^ bot set to get warnings+critical, or just critical? [19:40:39] mutante: ooh our very own icingabot? [19:42:08] cwd: it's technically all one bot, but we taught it about a new channel and made icinga write a new custom log for just fundraising things:) [19:42:25] so it will only talk about relevant stuff (that has the fr contactgroup) [19:42:25] neat! [19:42:58] alright, i jsut arrived in SF on caltrain. gotta move. cu cuys [19:43:08] thanks for the upgrade! [19:43:10] (laggy connection that's why :) [19:43:11] welcome [20:29:53] (CR) Cdentinger: [C: 2] "Nice line count!" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 (owner: Ejegg) [20:34:16] thanks cwd ! I'll try to get the following one passing tests shortly [20:34:35] (Merged) jenkins-bot: Delete old PP-specific normalization [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349253 (owner: Ejegg) [20:34:53] ejegg: sounds good! what are the other major arcs right now? [20:35:23] The next big thing is planning the Ingenico reintegration [20:36:26] I was thinking we should meet and talk about how to do that, maybe review Omnipay and list some things to learn from them [20:40:55] ok sounds good [20:40:59] someone's here, brb... [20:46:17] ejegg: did you have other top CR concerns? [20:46:41] cwd just the follow-on to that cleanup patch [20:47:04] I'm puzzling through the missing link between contribution_tracking and recurring donations [20:47:25] it was something we broke when we moved the recurring message normalization back into the listeners and audit parser [20:48:02] is there only one c_t row for each subscription? or is it each payment? [20:50:40] PROBLEM - Host rigel is DOWN: PING CRITICAL - Packet loss = 60%, RTA = 11595.65 ms [20:50:41] PROBLEM - Host saiph is DOWN: PING CRITICAL - Packet loss = 53%, RTA = 11097.88 ms [20:50:49] RECOVERY - Host saiph is UP: PING WARNING - Packet loss = 0%, RTA = 1897.40 ms [20:51:22] woot. paging works :-) [20:52:04] cwd yep, only one for each subscription [20:53:10] Jeff_Green: \o/ [20:53:18] works on my phone now too [20:53:57] looking at civi1001:/var/log/process-control . . . that's bloating fast, jenkins style [20:54:37] assuming we can't live without the detail in log collection, what are the chances of gzipping on the fly? [20:55:10] otherwise I guess I can adapt the jenkins log collector [20:55:19] RECOVERY - Host rigel is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [20:56:54] ergh yeah we might want to prioritize rotating those [20:57:43] i'm just not sure how to identify what's currently being written to, to exclude it from gzipping in place [20:59:47] what would happen in that collision case? [21:00:35] good question [21:00:51] i guess maybe gzip would complain that the file changed while it was being compressed? [21:02:24] (PS3) Ejegg: Always update c_t table for initial recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) [21:02:50] also getting those backed up off-host is fraught for the same reason [21:02:55] 4.3G /var/log/process-control/ [21:03:07] so that's for a few weeks worth? [21:03:19] yeah, we have some time to figure it out, there's still 33G available on that partition [21:03:33] yep but it should be a priority [21:03:42] we can also move it to /srv [21:03:43] or we will be hearing from icinga :) [21:03:59] icinga loves u [21:05:19] i'm guessing mtime is written when the file handle is closed? [21:06:11] seems logical yeah [21:06:51] if p-c is keeping the filehandle open, and has the file locked, we could check for that state [21:09:59] mutante: around? [21:25:35] fundraising-tech-ops, Operations, Patch-For-Review: Revisit paging strategy for frack servers - https://phabricator.wikimedia.org/T163368#3199856 (Jgreen) I removed 'admins' from contact_groups for frack hosts, so we should stop seeing alerts re. frack hosts in #wikimedia-operations, so frack host al... [21:35:27] (CR) Ejegg: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [21:36:06] (CR) jerkins-bot: [V: -1] Always update c_t table for initial recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [21:36:33] Jeff_Green: yeah if that behavior is as simple as it sounds that would make a great archiver [21:38:11] (CR) Ejegg: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [21:38:42] (CR) jerkins-bot: [V: -1] Always update c_t table for initial recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [21:43:03] Jeff_Green: what's up [21:45:38] mutante: i figured it out, i wanted to stop alerting in #wikimedia-operations re. frack hosts, and was having trouble finding the new irc stuff. I was just being a dunce, had forgotten to update puppet git locally [21:54:32] Fundraising-Backlog: Compile list of systems we depend on and should check all the time - https://phabricator.wikimedia.org/T163509#3199915 (ggellerman) [22:00:12] Jeff_Green: ah , yea i was thinking about that too. ok cool [22:00:57] mutante: should we close the ticket? [22:10:34] Jeff_Green: regular ops still get paged for fundraising things, no? [22:11:08] Jeff_Green: dont we still need some kind of change in puppet base module that treats stuff different based on "in frack or not" [22:13:03] mutante: all the frack hosts/services are passive, configured in nsca_frack.cfg in the private repo, so i just removed all instances of admins and sms from that file [22:15:13] Jeff_Green: oooh! well that i did not think about and made things easier , heh [22:15:49] Jeff_Green: well, i would say if you are happy with it for your team.. maybe point out what you just said on the ticket , if you haven't already, and then close it. re-opening is cheap [22:15:50] it took me a long time to get my head wrapped around it enough to see how simple the changes were [22:16:09] yup, already commented on the ticket [22:16:50] cool, yea, just call it resolved then, if others have comments i'm sure they will add them either way or reopen [22:16:59] k [22:17:09] thanks for all your help on it [22:17:50] you're welcome! [22:20:17] the only follow-up i would have had is renaming the existing groups because their names are kind of bad and will confuse the next person. but also touching the existing setup is a bit of a pita.. so .. meh [22:20:29] "sms" and "admin" i mean [22:20:45] yeah those names make no sense [22:20:50] (and it's not FR stuff so not this ticket) [22:20:59] maybe [22:21:07] yes, they make no sense [22:22:26] i'm a little puzzled why we'd want to email individuals only during their oncall time [22:23:07] seeing that's how it was set up, I finally understood why I would get no 'warning' stuff before some 'critical' thing half the time [22:23:07] yea, good point [22:23:30] if every user would have -email and -sms user [22:23:37] then it would actually be exactly like it was in watchmouse [22:23:41] yeah [22:23:57] true, actually.. i would take that part [22:23:58] or we could just never use individual emails, and just rely on the mailing lists [22:24:11] and change that for all ops .. unless people object [22:24:43] other teams seem to use the mailing team list for critical+warning whereas ops only uses it for critical [22:24:48] whenever i created icinga contacts for people i mentioned the timezone option [22:25:10] then most people would say to not even bother.. give me 24x7 and i handle it on my side [22:25:14] yeah [22:25:17] some people would care though [22:25:23] and then others would use android apps [22:25:29] to login on icinga web ui [22:25:35] and set their own alerts there [22:25:40] yeah [22:26:01] to me it's just confusing to get warning email only during oncall hours, but critical email 24x7 [22:26:23] which reminds me, I was going to remove my email address from my contact [22:26:38] yea, it's all about giving one contact more than one notification method [22:26:48] they need to all just have a single one [22:27:02] then it would be more copy/paste work but also clear [22:28:28] hmm.. or .. we need to make our own new notification method and move the timezone logic into that.. set it 24x7 in Icinga but tell the method itself to look at the time and send both or just one [22:29:01] that's an interesting idea [22:29:24] that almost seems more annoying to do than having 2 users though..because you have to handle the date/time stuff yourself.. dunno yet [22:30:12] most users get their contacts created for them by the same few people and dont even want to know the details [22:30:54] what if we just stop dealing in individual email addresses [22:31:20] that reminds me about another thing.. the idea to move it out of private repo [22:31:26] into public repo where the groups are [22:31:29] so you either get the 24x7 email via team@list.wm.o or you get paged during your preferred hours [22:31:37] except the phone numbers itself.. treat it like passwords [22:31:42] in templates [22:32:02] that would be a little less confusing [22:32:18] it's a little maddening now trying to find all the pieces in various puppet places [22:32:22] more people would upload their own changes to their contacts [22:32:28] like a timezone change when travelling [22:32:33] right [22:32:45] i don't understand why this is all done outside of the icinga UI [22:32:53] your idea with the team email is also very interesting [22:32:57] seems totally bizarre [22:34:09] we're already mostly doing the team email, at least all the critical stuff goes that way [22:34:22] in a job before wikimedia we had a nagios and the notification method was writing a file into config firectory of an Asterisk server.. that would call people, as in normal phone call but from the server [22:34:34] i'm guessing it's exim that's preventing double messages, not sure about that though [22:34:49] then there was some scripting for a phone menu in asteriks, so you could reply to when called and ACK it on the phone [22:34:57] which would feed back into icinga web ui [22:34:58] ha, that's terrifying but pretty neat [22:35:06] that sounds pretty awesome [22:35:18] sorry honey the robot is calling [22:38:57] and one more thing that would be nice is a shell script or web ui button that lets you send custom messages to contacts/groups. like the "CUSTOM" thing i did earlier as test but with real content. for the situation when a whole team gets paged and you went to tell them all "i got it" or "X is on his way" so the others can relax again [22:39:35] (instead of going to office wiki and a contact list all manually and typing numbers) [22:40:24] or everybody getting a laptop and booting to get to IRC just to check if somebody already responded [22:40:50] isn't that what 'ack' is for? [22:41:16] although that ties the message to a specific alert which maybe isn't desireable [22:41:45] yea, but i have never seen an ACK in an SMS.. hmm [22:41:57] hmm [22:42:15] I'm not sure whether I have, never thought about it [22:42:22] maybe it's just simple config about the notification types in the by-sms-method [22:45:20] or nobody ever clicks ACK on a "critical" service but eh.. gotta check that some time [22:46:01] to me the problem is the overhead to log in and find the alert, it would be spiffy if you could just reply to the SMS [22:46:17] that whole "critical => true" but we mean "paging => true" but it's actually "contactgroup => ops" is also such great stuff :P [22:46:46] oh yeah, the puppet vs actual config is mind boggling [22:48:09] alright, so yea, there is more to improve here, i'll get back to it, but for now i have to leave for a bit. maybe checking in later and then tomorrow day off [22:48:41] cool. well afaict the fundraising stuff is where we can happily leave it for the forseeable future, thanks again! [22:49:03] and more icinga users and contacts to create for performance... and their permissions to do things in web ui , heh [22:49:13] alright, no problem. [22:49:15] cu later then [22:58:38] Fundraising Sprint Homebrew Hadron Collider, Fundraising-Backlog, FR-Paypal, Recurring-Donations, Patch-For-Review: Paypal recurring donations missing contribution_tracking link since 2017-04-10 - https://phabricator.wikimedia.org/T163443#3197692 (Ejegg) a:Ejegg [23:08:57] (CR) Ejegg: "recheck" [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [23:10:57] (CR) jerkins-bot: [V: -1] Always update c_t table for initial recurring [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349254 (https://phabricator.wikimedia.org/T163443) (owner: Ejegg) [23:37:03] (PS16) AndyRussG: [WIP] Banner sequence campaign mixin [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/344988 (https://phabricator.wikimedia.org/T144453) [23:39:39] (CR) jerkins-bot: [V: -1] [WIP] Banner sequence campaign mixin [extensions/CentralNotice] - https://gerrit.wikimedia.org/r/344988 (https://phabricator.wikimedia.org/T144453) (owner: AndyRussG) [23:41:24] ejegg ^ works! [23:41:46] fr-tech ^ [23:41:48] or at least mostly, so far... a few tests to clean up still.. [23:41:57] exciting! [23:42:08] and still gotta re-check the code and fix up inline doc [23:42:11] yurrrrp [23:42:42] aaarg just a wee bit more complex than expected... [23:43:46] Aaarg actually doesn't work now..... [23:45:08] Ah no sorry that last "doesn't work" was just I forgot to click "submit"... [23:50:00] K yes still appears to work :) oooff there'll be soooo many code paths to test tho [23:52:55] (PS1) Ejegg: WIP: fix civibuild options for updated amp [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349357 [23:53:28] (CR) jerkins-bot: [V: -1] WIP: fix civibuild options for updated amp [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349357 (owner: Ejegg) [23:53:59] (PS2) Ejegg: WIP: fix civibuild options for updated amp [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349357 [23:55:55] (CR) jerkins-bot: [V: -1] WIP: fix civibuild options for updated amp [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/349357 (owner: Ejegg)