[01:50:34] Fundraising Sprint Owls, Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (Eileenmcnaughton) a:Eileenmcnaughton [01:56:58] Fundraising Sprint Owls, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: slow anonymous save - https://phabricator.wikimedia.org/T199753 (Eileenmcnaughton) Query on view contact for count ``` SELECT COUNT(DISTINCT(tbl.activity_id)) as count FROM ( SELECT civicrm_activit... [14:05:46] (PS1) Jgleeson: Add opt_in/opted_in field to silverpop export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/447807 [14:06:26] (CR) jerkins-bot: [V: -1] Add opt_in/opted_in field to silverpop export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/447807 (owner: Jgleeson) [14:24:40] (PS2) Jgleeson: Add opt_in/opted_in field to silverpop export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/447807 [14:33:55] Fundraising-Backlog, Fr-CiviCRM-dedupe-FY2017/18: Civi Dedupe searches yielding minimal results - https://phabricator.wikimedia.org/T197481 (MBeat33) Open>Resolved a:MBeat33 The method of deduping by CID number ranges is yielding much higher rates of potential merges, so we can distribute the... [14:38:35] (CR) Jgleeson: [C: 2] "The text makes the form unusually bigger for me although I don't think it will be a problem with a fixed width container." [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/445327 (https://phabricator.wikimedia.org/T199278) (owner: Ejegg) [14:52:50] (Merged) jenkins-bot: Add opt_in field for selected countries [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/445327 (https://phabricator.wikimedia.org/T199278) (owner: Ejegg) [14:57:40] dstrine MBeat: reminder we have a test starting in a few minutes. Using the new Ingenico gateway. I'll be monitoring [14:57:55] thanks, pcoombe [15:02:08] Banners are up! [15:50:07] whoah failmails [15:52:45] fr-tech ^ [15:53:27] UNKNOWN_SERVER_ERROR [15:53:29] oh good [15:53:40] Request returned http status (500): [15:54:04] looking [15:54:34] thanks! [15:56:04] we're still getting mostly successful txns [15:57:50] ejegg: this is a queue consumer right? [15:58:13] can we turn it off and fix it w/o front end impact? [15:58:46] The test is ending in 2 minutes [15:59:15] i really want to get on top of the fail mail events because the last one was bad when it broke gmail [16:00:02] cwd no this is all from the front end [16:00:21] bummer [16:00:34] looks like a bunch of timeouts to API calls as well as the 500 errors [16:00:54] We had upped the timeout from default 7 seconds to 12 seconds for the old API [16:01:02] last time we spammed root@, which is all of SRE and i'm not sure who else, right in the middle of 2 other outages [16:01:05] I can adjust that for the new API too [16:01:35] cwd this will end in the next few minutes [16:01:50] ok [16:01:55] it doesn't look like enough to cause a problem [16:01:57] as the banners have already come down [16:02:12] but can we make actionables when these turn up? [16:02:42] either stop the cause of the mail, or the mail if we can't [16:03:25] cc dstrine [16:06:05] MBeat: these errors are all coming from the front end, so there may be around 50 donors who were unable to donate in the last 15 min of the test [16:06:17] thanks, ejegg [16:06:30] do you know what their experience would look like, ejegg ? [16:06:50] mostly looks like errors on our first request to set up the iframe [16:07:30] so they don’t even send their card #s, sounds like [16:07:55] I'm glad I had backscroll to read on this one. The actual mail message could use some additional context... [16:08:19] I'd have guessed orphan rectifier. [16:11:25] ...yeah, I keep looking at the emails, and nothing about this says "donor facing" to me. Let's Descriptive. [16:12:07] yeah, so they would get the red error message with a ref # instead of the iframe [16:13:17] ejegg: Oh hey. Did you notice the last few are substantially different? [16:13:25] Same subject line pattern, though. [16:14:20] status (405): Method Not Allowed [16:16:31] ...or maybe that's just the last *one*. [16:20:08] ty [16:22:04] K4-713: yeah [16:24:22] We probably need more readily available info in the subject line for these things. The codebase as an identifier used to be good enough, but Smashpig is becoming ubiquitous, so we should be more specific. Server name would be great. :p [16:24:46] K4-713: yeah, I made a task for that last week actually [16:24:51] hehe, awesome. [16:25:07] T200245 [16:25:07] T200245: Failmail should always indicate which machine it comes from - https://phabricator.wikimedia.org/T200245 [16:25:23] or... monday [16:26:03] If you haven't already, I'd have mepps|vacation and jgleeson|away weigh in on what would demystify these messages for them, too. [16:26:34] yep, good call [16:27:16] relocating [16:27:21] If we had all the time in the world, I'd totally want to route all these things to a central pain management system. [16:27:41] right, with shut-off switches [16:27:49] And long-term stats. [16:28:22] part of the problem from our/ops end is it's not clear what to do when the e-spam starts [16:28:37] Jeff_Green: I think it's not clear generally. [16:28:41] my inclination is "lots of mail, looks like error, something should be shut off" [16:28:44] yeah [16:29:39] Should we scope something out for a central pain system? Even if we can't work on it right now, it would be good to know where we'd like to go. [16:31:26] Thinking about it would be a good exercise even if we can't we arrive at a central system, so I'm for it. [16:31:52] Yeah, I figure there will be some useful iterative steps in there somewhere. [16:32:31] I'll find some time. An hour or something should get us started. [16:41:36] And, done. [17:07:28] !log Updated SmashPig settings: extended Ingenico Connect API timeout to 12 seconds [17:07:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:01] cwd: Jeff_Green K4-713 ejegg maybe this it a good topic for next monday's ops chat? [17:16:20] also sorry all I lost internet at home ~5 minutes into the test [17:16:29] dstrine: K4-713 sent out a separate invite [17:16:37] oh ok [17:17:00] also fr-tech: just added an entry for the lockfile species of failmail: https://www.mediawiki.org/wiki/Fundraising_tech/Failmail_zoo [17:19:30] XenoRyet: want to collect Ingenico errors in a google doc? I'd say etherpad but something private might be better till we can make sure we've scrubbed logs [17:21:10] Yea, google doc seems right. [17:21:18] I'll set one up [17:22:17] (CR) Ejegg: [C: 2] Add opt_in/opted_in field to silverpop export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/447807 (owner: Jgleeson) [17:22:49] (Merged) jenkins-bot: Add opt_in/opted_in field to silverpop export [wikimedia/fundraising/tools] - https://gerrit.wikimedia.org/r/447807 (owner: Jgleeson) [17:25:37] fr-tech any news for scrum of scrums? [17:25:46] None here [17:27:40] ejegg, sorry I'm back later than expected I hit rush hour traffic [17:27:45] jgleeson: sorry, I forgot about the Scrum of Scrums meeting! [17:28:03] that'll be 20 min or so [17:28:40] no problem, I'm gonna start dinner now but I'll be about for another hour [17:28:42] but the basic idea is that with 'variant' you can swap out any of DonationInterface's yaml files in the XXX_gateway/config directories [17:28:57] so if you finish and still have time, I should catch you [17:29:13] so we just need to figure out the best way to use that to add an element to the form [17:31:06] got it [17:32:58] ejegg, would something like a merge/extend with country_fields.yaml work which just adds the fields to all if loaded? [17:33:37] jgleeson that'll be part of it. [17:33:53] the file from the 'variant' dir will completely replace the file from the config dir though [17:34:06] so we don't have the issue we have with SmashPig overrides [17:34:21] i.e. it is possible for the variant file to have fewer required fields [17:55:26] jgleeson: OK, I'm free to chat whenever you are [18:04:27] So, people are thinking seriously about how to mark certain areas as 'dangerous' [18:04:35] for tens of thousands of years [18:04:46] like radioactive waste dumps [18:05:09] That is an interesting problem that it's sad we have to think about. [18:05:11] Skull & crossbones could lose its meaning [18:05:13] etc [18:05:32] here's one of the wackier ideas I've seen: http://www.theraycatsolution.com/ [18:05:57] ! [18:06:12] That is... [18:06:18] well, it's an idea. [18:06:24] OMG, the idea was first proposed in 1981 [18:06:39] So, where's my hypercolor cat? [18:06:53] I thought this was the future. [18:07:15] Brico.bio is working on it! [18:08:02] whoa, it's a makerspace with CRISPR facilities [18:08:05] yikes [18:08:17] I'd settle for a hedgehog. [18:08:35] just, like, a normal hedgehog? [18:08:40] they're pretty cute [18:08:54] I wasn't thinking a *normal* hedgehog, but... yes. [18:13:03] wow, did not realize warning signs are a pretty recent "invention" [18:13:16] https://en.wikipedia.org/wiki/Warning_sign#History [18:13:57] raycats... [18:14:05] ejegg, i'm free now if it works? [18:16:19] sure jgleeson ! [18:16:57] Fundraising Sprint Owls, Fundraising-Backlog: publish Oanda exchange rates to internal, private google doc - https://phabricator.wikimedia.org/T200227 (jgleeson) a:jgleeson [18:17:48] jgleeson: ok, I'm in the usual hangout [18:53:56] have a good evening fr-tech! [19:07:22] fundraising-tech-ops: encrypt fundraising mariadb replication - https://phabricator.wikimedia.org/T170320 (Jgreen) After much pain and suffering, I seem to have MariaDB's 10.0.35 packages working in virtualbox, with OpenSSL and TLSv1.2, and using puppet's CA and host certificates. I'm not sure I have the ri... [19:21:54] XenoRyet: want to use the tech-talk slot to go over this morning's errors? [19:25:10] Sure [19:28:40] I'm actually not sure we saw much that was new, which is a good thing. [19:30:30] Give me one more sec to get the doc a little more readable and I'll be in. [19:30:40] ok, I'm still in a DS meeting right now anyway [19:30:50] lmk when you're ready and I'll switch over [19:31:03] We could do it after you're done with that if it's useful for you to stay in there. [19:31:24] I think eileen_'s taking point from here on out [19:32:29] ok, jumping on the usual hangout [19:32:48] or I guess tech talk has it's own new style one. I'll use th at. [19:33:51] fr-tech: Anyone else coming for tech-talk? [19:53:04] XenoRyet: I never found that getNameScore function you mentioned yesterday. Maybe it’s not in DonationInterface, but somewhere else? [19:53:35] awight: give me a few minutes and I'll point you at it. [19:57:54] Is there currently any code collecting the donor’s local timezone? [19:59:11] awight: The name filter is in gateway.adapter.php right about line 2600. Apparently I forgot how I named it yesterday, it's getScoreName(); [20:00:27] XenoRyet: awesome, thanks for digging that up! [20:00:54] No worries [20:03:31] It looks like NameFilterRules is only configured in the private files, so maybe you can pass the values along to saurabhbatra at some point? [20:04:29] Yea, that wouldn't be a problem [20:12:33] saurabhbatra: I left a few notes on the features doc. The only thing I noticed is that we should be taking into account previous attempts from the user_ip and previous contributions from the contact. [20:13:02] Also, it might be worth having fr-tech and MBeat come up with some more features [20:14:39] We're scheduled to be having some discussions surrounding fraud this sprint anyway. We should definitely combine forces. [20:14:52] looking at the notes rn [20:14:56] ah nice [20:15:03] I'll let you know when that meeting actually gets on a calendar. [20:15:19] I’d be interested in attending, for sure. [20:15:39] Need to get my anti-deja-vu helmet refitted, I suppose [20:15:46] heh [20:17:09] re: previous contributions from ip/contact [20:17:48] it'll require us to run a query per record [20:18:06] which should be fast enough i think because ids will be indexed [20:18:41] but just wanted to make sure this was what you had in mind [20:18:48] That sounds right to me [20:20:06] we'll also have to run a query per record per filter to get the payment_fraud_breakdown filter values [20:20:40] because they are stored with a schema - payment_fraud_id, filter_name, filter_value [20:21:27] I’m not sure what form the prior contribution features should take, though. AFAIK there’s no way to use a variable-length vector, so maybe we have to distill like “difference in amount between second-to-last contribution and this contribution”. Also don’t know how to indicate in a feature like that there is no prior contribution. [20:21:49] Query per filter doesn’t sound right, can’t we get all the values in one query? [20:22:24] we can use group_contact if that makes it easier to process [20:23:48] we can get all filters in a single query, but it returns 5 rows corres. to a payment_fraud_id [20:24:11] we could have a sub-query run on that result though [20:24:28] yeah that part is annoying, but if you group by payment_fraud_id, then you can group_concat the payment_fraud_breakdown.filter_name and .filter_value [20:25:46] yup [20:25:48] got it [20:25:56] should not be that hard then [20:26:24] and one query should suffice [20:26:41] saurabhbatra: In unrelated good news, halfak (original author of ORES) was positive about my suggestion that we use this opportunity to extract some of the reusable bits into an independent library... [20:27:19] that sounds great! [20:27:25] It probably won’t be ready in time for GSoC, but would be helpful in the future. [20:27:37] yeah exactly what i was about to say [20:27:54] i'm planning on sticking around after and completing this [20:28:11] so maybe put that into a "some day in the future" bucket [20:28:17] It means there’s less pressure to engineer the feature extraction, dependency solving, and command-line stuff to be perfect, because in theory we can migrate to that framework when ready. [20:28:28] ^ wonderful, great to hear it! [20:28:59] yup [20:30:08] so i was thinking - how do we keep incorporating fraud trends into our model as we get new data [20:30:19] yeah... [20:30:43] this is the "ambitious" part though, not too worried about it rn [20:31:18] we probably want to have a structured pipeline which updates the model every, say 6 months or so [20:31:58] I guess the possible approaches are, * regular re-training using automated collection of new frauddy observations, or * active learning, where the model prompts us to manually review low-confidence cases [20:32:13] We probably need to re-train whenever new payment methods come online... [20:32:29] i think manual review is definitely going to still be a thing [20:32:56] it's just that if we do a better job than minFraud, we can reduce the number of frauds which get through [20:33:06] hence reducing man hours (hopefully dramatically) [20:33:33] Yeah, luckily there are strong, external motivations that cause our data to eventually become correct over time. [20:34:05] yup, so if we manually upgrade the model with data from the past 6 months, we can keep up with fraud trends easily [20:34:12] +1 that your new fraud model will be “human-assisted” or vice-versa, at least initially [20:34:31] re - active/online learning [20:34:38] that won't be possible with sklearn [20:34:57] because it doesn't support online training of random forests [20:35:15] yeah “live” is too scary for fundraising anyway [20:35:16] as in you call fit once and you're done [20:36:10] so when we make the pipeline down the line, we can work on incorporating ORES features as an independent library and then leverage it [20:37:32] btw, how do you think we should sample normal data? [20:37:54] as in do you think that dates matter or just take the top 16k entries? [20:38:38] Interesting. My instinct is to randomly choose fromt the last 2 years or something, then exclude known fraud [20:39:06] it’s tricky—most countries will only see one FR campaign per year [20:39:31] so country-specific payment methods, name spellings, etc. will only appear during those campaigns. [20:39:42] The time of year may shift between years [20:40:15] how about this [20:40:48] select one normal transaction from the same date as a fraudulent transaction [20:42:14] It seems contrived… [20:42:17] although this sounds too resource intensive w.r.t query time [20:42:19] yes [20:42:22] i.e., I’m not sure what we win [20:42:52] just that we sample equally cross-campaigns [20:43:15] there should be a simpler strategy though [20:43:22] yeah, good point. English wikipedia is massively overrepresented in donations [20:45:10] I guess that’s only a problem if it’s under-represented in the fraud sample, which is probably not the case. [20:46:13] Either way, I think we can test for what you’re describing by checking fit for subsets of the test set, split by payment_method and country. [20:46:36] BTW, have we discussed LIME yet? [20:46:47] a bit in a mail thread [20:47:21] cool. I found it makes surprising feature importance ’n’ stuff very visible: https://github.com/adamwight/ores-lime/blob/master/Explain%20edit%20quality.ipynb [20:49:27] yup, i did mess around with it using our dummy data [20:49:46] it was kinda inconclusive because the features were'nt labelled [20:50:53] I mention because I think it would help with the sampling dilemmas: If we’re overfitting on payment_method or country, it might be obvious using LIME… unless there are other important features which also vary with method :-/ [20:51:04] which I’m sure there are. [20:51:14] oh yeah, that makes sense! [20:52:14] so how about doing a count of transactions grouped by campaign? [20:52:58] and then including a similar number of data points from normal contribs [20:56:58] Won’t a random sample do that already? [21:00:03] if we assume that more contribs in a campaign = more fraud contribs [21:00:08] tehn yes [21:00:10] *then [21:01:16] although now that i think about it, utm_campaign seems like a useless field to me [21:01:35] not like future transactions are going to use this label ever [21:01:41] *past labels ever [21:03:11] hmm, yeah, though the difference between anything and nothing might be relevant [21:03:32] and there might be fraudsters using particular saved links [21:05:46] yup that makes sense too [21:06:35] so how about sampling normal transactions 1:1 to fraud by year? [21:06:58] we were planning to truncate campaign names "C15..." to C15 anyway [21:07:33] Reducing labels and making sure our sampling is fairly random as well [21:15:02] Fundraising Sprint Owls, Fundraising-Backlog: Add explainer text to CC payment form (for banner checkbox experience) - https://phabricator.wikimedia.org/T200218 (Ejegg) a:Ejegg [21:18:28] awight: signing off for now, cya tomorrow! [21:20:40] (PS1) Ejegg: JS module to add English email explainer [extensions/DonationInterface] - https://gerrit.wikimedia.org/r/447919 (https://phabricator.wikimedia.org/T200218) [21:56:23] hi XenoRyet ! [21:57:01] Fundraising Sprint Owls, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: slow anonymous save - https://phabricator.wikimedia.org/T199753 (Eileenmcnaughton) a:Eileenmcnaughton I have an upstream pr here https://github.com/civicrm/civicrm-core/pull/12557 [21:57:53] hi ejegg [22:44:49] Fundraising-Backlog: Ingenico error: Response body is not valid JSON: 'Method Not Allowed' - https://phabricator.wikimedia.org/T200377 (XenoRyet) [22:45:40] Fundraising-Backlog: Ingenico error: Can't communicate or internal error: Failed data validation - https://phabricator.wikimedia.org/T200378 (XenoRyet) [23:12:32] (CR) Ejegg: [C: 2] "Looks good! One suggestion inline" (1 comment) [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/444487 (https://phabricator.wikimedia.org/T196644) (owner: Eileen) [23:13:13] ejegg: [23:13:20] what say I add that caching as a follow up [23:13:53] ah - you did +2 it so I guess that was implicit [23:13:55] Yeah, sure! I C+2ed it as-is, just thought the caching would be nice for later! [23:19:57] (Merged) jenkins-bot: Add code to delete select contact fields & custom fields [wikimedia/fundraising/crm] - https://gerrit.wikimedia.org/r/444487 (https://phabricator.wikimedia.org/T196644) (owner: Eileen) [23:33:17] /nick AndyRussG