[00:18:04] 45k emails have gone now [00:27:05] 👏 [01:48:00] mail runs seem to have balanced out a bit between the colos. but still chugging along fine. [02:07:10] 61k gone out [02:41:02] cool! [06:10:11] 100k [12:37:36] dstrine: I love the slinky metaphor [12:38:06] 161k sent | 498k queued | 4 failures [12:38:20] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 7694 7500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 59 days 11 hours - memory use is 6.55M (peak 9.13M, 0.14% of max, fragmentation 1.70%), connected_slaves is 3, donations is 46, jobs is 0, jobs-adyen is 0, jobs-paypal is 99, payments-antifraud is 4, payments-init is 1, pending is 0, refund is 0, unsubscribe is 3 https://icinga.wikimedi [12:38:20] -bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:38:39] recurring tidal wave incoming [12:41:21] g'day fr-tech! [12:48:16] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 7761 7500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 59 days 11 hours - memory use is 6.57M (peak 9.13M, 0.13% of max, fragmentation 1.66%), connected_slaves is 3, donations is 4, jobs is 0, jobs-adyen is 0, jobs-paypal is 84, payments-antifraud is 7, payments-init is 4, pending is 1, refund is 0, unsubscribe is 3 https://icinga.wikimedia [12:48:16] bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:53:19] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 8016 7500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 59 days 11 hours - memory use is 6.56M (peak 9.13M, 0.14% of max, fragmentation 1.67%), connected_slaves is 3, donations is 0, jobs is 0, jobs-adyen is 0, jobs-paypal is 8, payments-antifraud is 15, payments-init is 8, pending is 1, refund is 0, unsubscribe is 4 https://icinga.wikimedia [12:53:19] bin/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [12:58:22] PROBLEM - check_redis on frqueue1003 is CRITICAL: CRITICAL: recurring is 8029 7500 - REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 59 days 11 hours - memory use is 6.42M (peak 9.13M, 0.13% of max, fragmentation 1.73%), connected_slaves is 3, donations is 1, jobs is 0, jobs-adyen is 0, jobs-paypal is 0, payments-antifraud is 4, payments-init is 1, pending is 1, refund is 0, unsubscribe is 4 https://icinga.wikimedia. [12:58:22] in/icinga/extinfo.cgi?type=2&host=frqueue1003&service=check_redis [13:23:19] RECOVERY - check_redis on frqueue1003 is OK: OK: REDIS 5.0.14 on 127.0.0.1:6379 has 1 databases (db0) with 9 keys, up 59 days 12 hours - memory use is 5.64M (peak 9.13M, 0.13% of max, fragmentation 1.84%), connected_slaves is 3, donations is 2, jobs is 0, jobs-adyen is 0, jobs-paypal is 0, payments-antifraud is 11, payments-init is 9, pending is 1, recurring is 6450, refund is 0, unsubscribe is 1 https://icinga.wikimedia.org/cgi-bin/icin [13:23:19] o.cgi?type=2&host=frqueue1003&service=check_redis [15:33:17] good point for the accidental yubikey paste [15:34:13] y'all came back! [15:34:34] (net split) [15:38:26] 16:33:18 good point for the accidental yubikey paste [15:38:28] 16:34:13 y'all came back! [15:38:30] 16:34:34 (net split) [15:41:48] ah maybe it was on my side [15:41:53] I'll show you what I saw [15:43:20] https://phabricator.wikimedia.org/F34908460 [15:45:43] Ohh I didn't realise when I quit too :) [15:49:56] (03Abandoned) 10Mepps: CampaignType: Hide forced campaign type selections in user prefs [extensions/CentralNotice] - 10https://gerrit.wikimedia.org/r/654682 (https://phabricator.wikimedia.org/T268646) (owner: 10Mepps) [15:50:04] (03Abandoned) 10Mepps: WIP: Pull adapter resources out of mustache, consolidate monthlyconvert logic [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/635891 (https://phabricator.wikimedia.org/T250918) (owner: 10Mepps) [15:50:13] (03Abandoned) 10Mepps: Merge branch 'master' into deployment [extensions/DonationInterface] (deployment) - 10https://gerrit.wikimedia.org/r/628904 (owner: 10Mepps) [15:50:21] (03Abandoned) 10Mepps: Make $tracking_data into instance variable [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/604489 (owner: 10Mepps) [15:50:34] (03Abandoned) 10Mepps: Make subscription id optional in paypal refund script [extensions/DonationInterface] - 10https://gerrit.wikimedia.org/r/574864 (owner: 10Mepps) [15:51:07] mepps_: :) [15:58:24] jgleeson and I are on the wrong side of the Pond it seems [16:00:30] indeed [17:02:21] is there an autopsy today? [17:02:21] Hi Grey is in the meeting ~ [17:12:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 34.04, 27.30, 15.79 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:17:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 33.89, 31.43, 20.77 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:22:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 33.95, 32.95, 24.39 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:22:31] hmm. load avg 34 is maximum awesomeness [17:27:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 31.86, 32.83, 26.74 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:32:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 26.91, 30.29, 27.45 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:33:10] Looks like we're getting some now Jeff_Green [17:33:18] want me to try ACKing one of those [17:35:49] yeah [17:36:49] so if I follow one of those links after having already logged into icinga, "Acknowledge this service problem" is a viable link for me, takes me to the form to ack the alert [17:36:56] is it different for you? [17:37:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 18.98, 24.22, 25.69 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:39:15] jgleeson: ^^ [17:40:01] yep I see the form but when submitting I get https://phabricator.wikimedia.org/F34908560 [17:40:04] Jeff_Green: [17:41:45] ok [17:42:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 2.64, 12.60, 20.45 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [17:43:32] I think the best thing to do is to create a task for SRE requesting access for FR-Tech users. [18:07:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 35.93, 28.51, 21.51 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:11:55] sounds good Jeff_Green [18:12:04] I'll do it shortly [18:12:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 33.62, 32.07, 25.04 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:17:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 28.83, 30.73, 26.49 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:22:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 27.92, 28.77, 26.80 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:27:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 20.83, 24.64, 25.64 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:27:41] fr-tech looks like we need to skip over one of the eoy emails for it to keep going [18:27:58] each run is failing in doContactsHaveActiveRecurring [18:32:10] PROBLEM - check_load on frdb1003 is CRITICAL: CRITICAL - load average: 16.28, 19.64, 23.24 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [18:33:43] wanna set that email to failed ejegg and then maybe circle back on it? [18:33:54] I saw we already had some marked as failed [18:42:38] yep, can you tell which it is? [18:43:53] nvm, I think I can do it [18:44:36] just did a select limit 1 [18:44:43] I think it'd be the same one [18:46:09] k, ran a slow-start and one worked [18:46:14] nice! [18:47:29] !log localsettings changed from 2d371ed1 to 3df415c1 [18:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:10] RECOVERY - check_load on frdb1003 is OK: OK - load average: 0.29, 0.38, 3.90 https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=frdb1003&service=check_load [19:06:38] 10Fundraising-Backlog, 10SRE: Fundraising-Tech engineers unable to ACK icinga alerts on fr-tech host groups - https://phabricator.wikimedia.org/T298649 (10jgleeson) [19:11:52] Wrapping up for today fr-tech. EOY email counts at pixel time are: Sent 209160 | failed 13 | queued 450496 [19:12:14] bye for now o/ [19:15:36] 10Fundraising-Backlog, 10SRE: Fundraising-Tech engineers unable to ACK icinga alerts on fr-tech host groups - https://phabricator.wikimedia.org/T298649 (10Dzahn) @jgleeson I see in the screenshot you are logged in as "Jgleeson". Try (in a new browser session since there is no logout button) to login instead as... [19:53:34] 10fundraising-tech-ops, 10observability: check_mysql / load on fr* is extremely spammy - https://phabricator.wikimedia.org/T296811 (10Dzahn) > This isn't useful at all and just sends spammy notifications and will normalise people to ignoring them. > Andy said they weren't important enough to wake anyone. For... [20:13:53] looks like we may have another email holding up the EOY sends. [20:14:18] the last 2 runs have failed. [20:26:14] just looking into those fails - [20:48:35] coolthx. [20:50:34] eileen: we just manually marked one failed the last time [20:50:52] ejegg: yeah - I did that - I think I can see a fix if you want to review it [20:51:12] definitely eileen [20:54:29] (03PS1) 10Eileen: Throw exception (rather than allow type fail) for no-longer-usable email [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751801 [20:54:35] ejegg: ^^ should do it [20:55:47] (03PS1) 10Eileen: Don't allow emails to go to deleted contacts [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751804 [20:57:33] eileen should there be a ! there? [20:57:42] if ($contactDetails['ids']) { [20:58:05] ejegg: opps [20:58:43] (03PS2) 10Eileen: Throw exception (rather than allow type fail) for no-longer-usable email [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751801 [20:58:44] fixed [20:59:59] (03CR) 10Ejegg: [C: 03+2] "Thanks!" [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751801 (owner: 10Eileen) [21:00:47] ejegg: there is another patch after it - it's just the first rabbit hole I went down but I think it makes sense to merge it too [21:00:55] yep, just looking now [21:01:12] (03PS2) 10Ejegg: Don't allow emails to go to deleted contacts [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751804 (owner: 10Eileen) [21:01:28] what I can't quite figure out is why the email I just manually failed was calculated in the first place - [21:01:38] (03CR) 10Ejegg: [C: 03+2] "Looks good!" [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751804 (owner: 10Eileen) [21:02:10] eileen: guessing some email was edited? [21:02:36] ejegg: yeah - that makes sense - just trying to confirm [21:06:47] ejegg: are you working today or just checking in? [21:14:05] ejegg: so this contact - https://civicrm.wikimedia.org/civicrm/contact/view?reset=1&cid=16715008 had an email change on the 5th/6th when the paypal $ came in I guess. That seems normalish I suppose [21:14:28] (03Merged) 10jenkins-bot: Throw exception (rather than allow type fail) for no-longer-usable email [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751801 (owner: 10Eileen) [21:14:30] (03Merged) 10jenkins-bot: Don't allow emails to go to deleted contacts [wikimedia/fundraising/crm] - 10https://gerrit.wikimedia.org/r/751804 (owner: 10Eileen) [21:14:55] if we ran the job again we would probably get him picked up on his new email & added in [21:16:03] (03PS1) 10Eileen: Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751808 [21:16:19] (03CR) 10Eileen: [C: 03+2] Merge branch 'master' of https://gerrit.wikimedia.org/r/wikimedia/fundraising/crm into deployment [wikimedia/fundraising/crm] (deployment) - 10https://gerrit.wikimedia.org/r/751808 (owner: 10Eileen) [21:25:47] !log civicrm revision 32d7370a -> 67264062 [21:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:30] eileen: sorry, half working, half on kid duty [21:36:40] ejegg: that's fine - just checking [23:38:54] 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Regular CiviCRM upgrade post code freeze - https://phabricator.wikimedia.org/T298664 (10Eileenmcnaughton)