[01:07:21] PROBLEM - check_gcsip on payments1002 is CRITICAL: CRITICAL - Socket timeout after 121 seconds [01:10:10] RECOVERY - check_gcsip on payments1002 is OK: HTTP OK: HTTP/1.1 200 OK - 343 bytes in 0.121 second response time [10:55:06] PROBLEM - check_disk on bismuth is CRITICAL: DISK CRITICAL - free space: / 5677 MB (10% inode=87%): /sys/fs/cgroup 0 MB (100% inode=99%): /dev 7988 MB (99% inode=99%): /run 1599 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 7999 MB (100% inode=99%): /run/user 100 MB (100% inode=99%): /a 384415 MB (99% inode=99%): /boot 181 MB (72% inode=99%) [11:00:06] PROBLEM - check_disk on bismuth is CRITICAL: DISK CRITICAL - free space: / 5561 MB (10% inode=87%): /sys/fs/cgroup 0 MB (100% inode=99%): /dev 7988 MB (99% inode=99%): /run 1599 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 7999 MB (100% inode=99%): /run/user 100 MB (100% inode=99%): /a 384415 MB (99% inode=99%): /boot 181 MB (72% inode=99%) [11:05:16] PROBLEM - check_disk on bismuth is CRITICAL: DISK CRITICAL - free space: / 5541 MB (10% inode=87%): /sys/fs/cgroup 0 MB (100% inode=99%): /dev 7988 MB (99% inode=99%): /run 1599 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 7999 MB (100% inode=99%): /run/user 100 MB (100% inode=99%): /a 384415 MB (99% inode=99%): /boot 181 MB (72% inode=99%) [11:08:58] ACKNOWLEDGEMENT - check_disk on bismuth is CRITICAL: DISK CRITICAL - free space: / 5541 MB (10% inode=87%): /sys/fs/cgroup 0 MB (100% inode=99%): /dev 7988 MB (99% inode=99%): /run 1599 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 7999 MB (100% inode=99%): /run/user 100 MB (100% inode=99%): /a 384415 MB (99% inode=99%): /boot 181 MB (72% inode=99%): Casey Dentinger we hear you buddy [12:15:06] RECOVERY - check_disk on bismuth is OK: DISK OK - free space: / 27464 MB (51% inode=87%): /dev 7988 MB (99% inode=99%): /run 1599 MB (99% inode=99%): /srv 361064 MB (93% inode=99%): /boot 181 MB (72% inode=99%) [14:12:49] (PS1) Ejegg: audit_echochar: handle more methods, don't crash [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370204 [14:43:46] PROBLEM - Host alnitak is DOWN: PING CRITICAL - Packet loss = 100% [14:50:17] (PS1) Ejegg: Ingenico WX audit: flag recurring transactions [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/370214 (https://phabricator.wikimedia.org/T86090) [15:10:10] (CR) Mepps: [C: 2] Ingenico WX audit: flag recurring transactions [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/370214 (https://phabricator.wikimedia.org/T86090) (owner: Ejegg) [15:11:09] (Merged) jenkins-bot: Ingenico WX audit: flag recurring transactions [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/370214 (https://phabricator.wikimedia.org/T86090) (owner: Ejegg) [15:15:35] hmm, looks like the globalcollect wr1 auditor never actually caught recurring donations [15:19:01] probably never needed to, since the globalcollect recurring charge mechanism directly inserts records as soon as it makes the charge [15:19:35] ugh, and we're not actually tracking the ingenico-side effort id [15:19:44] (PS7) Mepps: WIP rectify orphan function [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/369996 [15:19:51] just incrementing the 'processor_id' field in civicrm_contribution_recur for each charge [15:20:02] ejegg ahh [15:20:12] that's not even what that field is for... [15:20:39] OK, I need to change how we record those donations. [15:25:40] (CR) jerkins-bot: [V: -1] WIP rectify orphan function [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/369996 (owner: Mepps) [15:37:51] huh, so we're creating the EffortID on our side, from that column [15:38:05] and... everything will be different when we switch to Connect recurring [15:38:15] on second though, I'm not going to change how we record those [15:38:43] it looks like it's pretty solidly keeping the column in line with the actual contribution ID [15:39:07] mepps: I'm going to fix the 4 recurring records without the audit processor [15:39:39] but there's one tiny display bug making it crash: https://gerrit.wikimedia.org/r/370204 [15:39:52] That one's small enough I'd feel fine releasing it today [15:40:17] and we could start backfilling all the OBT contributions and things that were manually pushed through [15:47:12] oh hey, the rain let up. I'm going to bike up to the co-working space and pick things up from there [15:53:44] (CR) Mepps: [C: 2] audit_echochar: handle more methods, don't crash [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370204 (owner: Ejegg) [15:56:09] fundraising-tech-ops, Operations, netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3501481 (Jgreen) >>! In T171962#3492728, @mark wrote: > No objections from me. It does add complexity somewhat and will probably add some failure modes wher... [15:59:56] (Merged) jenkins-bot: audit_echochar: handle more methods, don't crash [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370204 (owner: Ejegg) [16:26:54] Fundraising-Backlog, fundraising-tech-ops: Can't access CiviCRM - https://phabricator.wikimedia.org/T172233#3501553 (cwdent) Open>Resolved a:cwdent The old clear cookies/restart browser did it [16:40:28] (PS1) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [16:41:52] (PS2) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [16:45:49] (CR) jerkins-bot: [V: -1] WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 (owner: Mepps) [16:52:18] (PS3) Mepps: WIP Orphan Slayer Module [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370225 [16:59:29] hey ejegg, want to dial me into the call starting shortly? [16:59:52] will do! [17:02:55] mepps / XenoRyet sorry, gonna mess with audio for a sec [17:23:36] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:37] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:37] RECOVERY - Host alnitak is UP: PING OK - Packet loss = 0%, RTA = 36.59 ms [17:23:38] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:39] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:39] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:40] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:40] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:41] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:41] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:42] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:42] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:43] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:43] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:44] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:44] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:45] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:45] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 34.97, 34.22, 32.15 [17:23:56] PROBLEM - check_zombie on mintaka is CRITICAL: PROCS CRITICAL: 201 processes with STATE = Z [17:25:06] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 8.42, 25.58, 29.28 [17:25:16] RECOVERY - check_zombie on mintaka is OK: PROCS OK: 0 processes with STATE = Z [17:25:33] RIP my cell phone [17:26:53] aw snap [17:27:13] Jeff_Green: i don't know what the heck that was but i suspect ossec [17:27:25] o.o [17:30:16] PROBLEM - check_load on mintaka is CRITICAL: CRITICAL - load average: 0.38, 9.60, 21.29 [17:42:28] hmm. [17:42:37] sorry was in a meeting, looking [17:44:55] why icinga decided to alert us 8B times I do not know [17:46:02] i don't see anything suspicious in the logs [17:46:17] and i have no real basis for blaming ossec [17:46:38] ha I was wondering [17:47:12] 17:23:27 [17:47:21] load spikes to 35 [17:47:36] sits there for awhile [17:50:10] maybe nfs when alnitak faceplanted [17:50:22] that could be [17:52:02] Fundraising Sprint Navel Warfare, Fundraising Sprint Outie Inverter, Fundraising-Backlog, FR-PayPal-ExpressCheckout, and 2 others: Are PayPal refunds for recurring donations incorrectly being tagged as EC or vice versa? - https://phabricator.wikimedia.org/T171351#3501848 (XenoRyet) a:XenoRyet [17:55:16] RECOVERY - check_load on mintaka is OK: OK - load average: 0.62, 0.64, 4.62 [18:03:41] (PS1) Ejegg: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/370238 [18:03:49] (CR) Ejegg: [C: 2] Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/370238 (owner: Ejegg) [18:05:27] (Merged) jenkins-bot: Merge branch 'master' into deployment [wikimedia/fundraising/crm] (deployment) - https://gerrit.wikimedia.org/r/370238 (owner: Ejegg) [18:34:17] Fundraising-Backlog, MediaWiki-extensions-CentralNotice, MediaWiki-extensions-Translate, Performance-Team, WMDE-Fundraising-CN: WMDE banners failing to save - Timing out on save - https://phabricator.wikimedia.org/T170591#3501942 (AndyRussG) Hi... I'll be digging into this right away. I'll le... [18:48:16] (PS4) Mepps: Approve and Cancel payment [wikimedia/fundraising/SmashPig] - https://gerrit.wikimedia.org/r/366565 (https://phabricator.wikimedia.org/T163952) [19:41:58] !log updated CiviCRM from f1fd7f0f9e89f59a8fc4daaa5e95803a2f60acbb to f24ba787f711ed38029594f3f3049bd79221ddd7 [19:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:56] fr-tech are we standupping? [19:47:09] Was just popping on myself. [19:48:54] i think so ejegg, i asked dstrine to call me in [21:32:03] Fundraising Sprint Outie Inverter, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM, FR-Ingenico: Deal with recurring donations stuck in 'In Progress' status - https://phabricator.wikimedia.org/T171868#3502399 (Ejegg) The drush recurring-globalcollect-patch-history command still works! I just h... [21:40:23] (PS1) Ejegg: FIXMEs regarding processor_id column [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370285 [21:44:24] !log stopped donations and refund queue consumers [21:44:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:17] ejegg: breakage? [21:47:09] cwd nope, just about to run the new audit processor and I want to look at the messages before they get consumed [21:48:26] righteous [21:50:06] running it... [21:55:15] ah, QueueWrapper is overwriting the source_ fields with inaccurate [21:56:05] but that's not /too/ bad [22:20:17] (PS1) Ejegg: Set source fields correctly [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370289 (https://phabricator.wikimedia.org/T95647) [22:20:53] ok, going to import those [22:24:15] cool! [22:24:38] besides the source fields thing, it looks like the WX parser is actually working [22:24:51] BPay transactions are actually coming in [22:25:16] fr-tech anyone want to look at https://gerrit.wikimedia.org/r/370289 ? [22:25:32] We're currently setting all the source_ fields wrong in the audit parsers :( [22:26:29] (CR) jerkins-bot: [V: -1] Set source fields correctly [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370289 (https://phabricator.wikimedia.org/T95647) (owner: Ejegg) [22:34:30] (PS2) Ejegg: Set source fields correctly [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/370289 (https://phabricator.wikimedia.org/T95647) [22:37:47] !log restarted donations and refund queue consumers [22:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:38:33] ok folks, I'm heading out. I'll be working Monday [22:38:39] have a great weekend!