[00:10:40] Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (CCogdill_WMF) [00:10:44] Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (CCogdill_WMF) p:Triage>Normal [00:47:02] Fundraising-Backlog, FR-Smashpig: Failmail should always indicate which machine it comes from - https://phabricator.wikimedia.org/T200245 (Ejegg) [03:11:56] Fundraising Sprint Naming Sprints Is Not Important, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: contribs export failing for DS agent - https://phabricator.wikimedia.org/T196569 (Eileenmcnaughton) @krobinson I'll ping you about this tomorrow -I'm still stumped but perhaps if we look on s... [03:12:34] Fundraising Sprint Naming Sprints Is Not Important, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: contribs export failing for DS agent - https://phabricator.wikimedia.org/T196569 (Eileenmcnaughton) a:Eileenmcnaughton [05:27:02] (PS2) Eileen: [WIP] CiviCRM 5.4 [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/444769 [05:33:02] (CR) jerkins-bot: [V: -1] [WIP] CiviCRM 5.4 [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/444769 (owner: Eileen) [14:11:52] hi jgleeson! [14:12:01] hey ejegg ! [14:12:26] did that charset bit help with the db connection? [14:13:46] I haven't tried just yet, thanks for the tip tho. I started a little late and got distracted with setting up stuff locally. I'll give it a go now though [14:13:56] cool, cool [14:14:11] I've been ripping out my php5 setup locally, I broke a few things setting that up [14:14:12] I re-did the opt-in patch to make it a radio button [14:14:16] nice! [14:14:53] looks like there's a ton of eileen code to review - suppose I'll start in on that! [14:52:11] ejegg, it worked!!!! [14:52:37] jgleeson: nice! [14:53:04] there's some info here: https://www.mediawiki.org/wiki/Fundraising_tech/tools [14:53:18] but it may be all stuff you've figured out already [14:53:53] yeah the logging block took a while to figure out [14:53:58] the generic name threw me [14:54:06] it isn't included in the sample.yaml [14:54:14] I ended up pulling it down from the server [14:54:37] ah shoot [14:54:49] yeah, the logging is some of the newest code there [14:54:53] anyway, another learning experience [14:55:04] If you want to update the sample.yaml, please feel free! [14:55:23] It forced me to get remote debugging working with python/vagrant which will hopefully be useless going forward [14:55:27] sure! [14:57:04] so in terms of how we pass on the 'opt_in' information, can we do something like only export records which have the flag or do we need to send it as a new field? [14:57:10] ejegg ^ [14:57:24] jgleeson: we'll export it as a new field [14:57:44] ccogdill can use the new field in queries within the Silverpop system [14:58:40] ok cool [15:12:33] ejegg, do we just wanna add a simple 'opt_in' field to the generated DatabaseUpdate csv files? along side a new home for it in the intermediate sql tables [15:13:20] man it's tragic what's happening in greece [15:13:22] those poor people [15:13:42] 74 confirmed dead [15:13:45] horrifying [15:14:53] jgleeson: yeah, just a new field [15:15:01] * ejegg checks news [15:15:08] https://www.theguardian.com/world/2018/jul/23/greeks-urged-to-leave-homes-as-wildfires-spread-near-athens [15:15:21] jeez [15:16:11] I can't even imagine how hot a fire has to be to be lethal even while your standing in the water on a beach [15:16:26] cripes [15:18:16] Yeah horrific [15:29:15] back a little later [15:48:26] fr-tech for sample event data to put in the Python library git repo. Taking a sample of real data and scrubbing it... It's probably sufficient to scrub ip address, right? There's no full ua string, just the parsed info about it, and nothing about the article that was being viewed. Here's a sample (from a request from me): https://tools.wmflabs.org/paste/view/7bbc5ca6 [15:50:08] uuid is the event ID. You could only link that to real user data if you had access to FR analytics [15:50:44] hmm, looks ok [15:51:36] think 'region' is fuzzy enough to not be invasive? [15:52:45] ejegg: hmmm yeah good point, I'll scrub that too [15:52:53] hehe, how many logged-in linux/firefox users are there in ciudad méxico browsing enwiki? [15:53:24] Somos un chingo y seremos más [15:54:15] (Standard chant a demonstrations--"There are a shit-ton of us and there'll be more") [15:54:21] nice! [15:54:44] (apologies for foul language) [15:55:11] Maybe I'll go through it manually and change country values too [15:55:29] That's pretty easy, and better safe than sorry [15:56:56] Sure, if that's easy enough [15:57:52] Yeah why nots ;p [16:00:06] Actually most (or all?) of the real traffic is just bots hangin' out at aawikibooks [16:01:37] oh yeah, not much for real users over thataway [16:02:38] Seems good enuf to serve temporarily as test data anyway [16:06:49] Unfortunately we don't have log files from the time we temporarily turned up the CN EL rate [16:11:36] cwd: Jeff_Green: hi!! two minor things about the new centralnotice eventlogging log files... It looks like it's all going to /srv/banner_logs/2018, same place as the old stuff (at least on alnitak)... Also, for centralnotice log files, we'll need to include the sample rate in the filename, though for the new format, it should now go _after_ the timestamp... [16:11:47] At least tentatively [16:12:03] Mmmm maybe I should wait for others on fr-tech to review the new scripts to confirm that last point... [16:12:13] thx!!! [16:14:56] AndyRussG: from our perspective it's simpler to have all the proxy logs ending up in the same directory tree, does it really matter if they're separated from the other logs? [16:15:34] unless it matters my inclination is to just rename that to something like /srv/proxy_logs/* so it's less specific [16:18:05] Jeff_Green: yeah that's ok too... Was just wondering since I see a /srv/kafkatee directory there too... Out of curiosity, why do you call them "proxy logs"? [16:18:35] /srv/kafkatee is the 'work' directory used by the kafkatee process [16:18:55] aren't these logs all originating from the production proxies? [16:19:30] ah ok [16:19:57] Hmmm [16:19:59] just an idea anyway [16:20:14] Well, in all cases the varnish servers are in the pipeline [16:20:46] The old style were pretty directly from the Varnishes's web logs I think [16:20:49] yeah [16:20:53] The new ones come down the eventlogging pipeline [16:21:36] So those are client-side requests that the proxies somehow pass on to EventLogging, which validates the events, and pipes them along to the appropriate kafka topic [16:21:41] Something like that anyway :) [16:21:54] so is it actually hitting an application server of some kind? [16:22:22] Yeah something or other that validates the event [16:22:36] against the EventLogging schema [16:22:44] Invalid events get sent to a different Kafka topic [16:22:52] Heheheh here's hoping it'll scale!!!! [16:23:21] yeah no kidding, and that there's some protection against abuse! maybe /srv/traffic_logs/2018 ? [16:23:47] Jeff_Green: heheh not much protection, but we'll be adding a bit more [16:24:34] If you don't want to see multiple stored examples of SQL injection attempts, don't check out the country and project tables in the pgehres database [16:25:22] Jeff_Green: fwiw wrt names, I'm calling the new script/library 'fr_user_event_consumer' [16:25:24] ugh. going forward we should treat those as 'events' and ticket them and they should trigger a code review to figure out why they made it that far [16:25:33] * cwd shudders [16:27:07] AndyRussG: the more I think about it, the less i think banner_logs is inaccurate [16:28:08] Jeff_Green: well there's also landing page logs, which are not from banners [16:28:29] could be /srv/user_event_logs/ ? [16:28:35] sure [16:29:34] re. filename, the current convention is to add the sample rate if it's sampled, and not 1:1 [16:32:33] (CR) Ejegg: [C: 2] Add apis to retrieve information about the ultimate destination contact and source contacts for merged contacts. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/446519 (https://phabricator.wikimedia.org/T199748) (owner: Eileen) [16:33:24] I'm also unclear why we would want to move the timestamp around, is there some reason that matters? [16:35:05] Hmmm only the centralnotice (banner) logs, not the landingpage logs should get sampled [16:35:12] so filenames for landingpage logs don't need sample rate [16:35:46] for centralnotice logs, it' be nice to have sampled in there if somehow it ever gets up to 100% server-side, though I don't imagine that'll happen [16:36:43] The change in order is so that we can be sure that alphanumerical ordering of the files will put them in chronological order, which will make the new (more flexible) file selection thingy easier [16:37:29] So something like 'centralnotice-impressions-20180705-220001-sampled10.log.gz' would work nicely [16:37:58] Currently I see 'centralnotice-impressions-20180705-220001.log.gz' style [16:38:21] ah [16:38:30] If u like wait for me to get some feetback from others on fr-tech on how the new script is gonna work [16:39:14] Always nice to have one's feet returned [16:39:14] banner logs are sampled 1:10, landingpages and centralnotice-impressions are not sampled, so the filenames are sane [16:39:24] Hmmm [16:39:36] centrlanotice-impressions I think should get sampled 1:10 too [16:39:50] we can do that [16:40:01] That'll eventually ramp up to the same scale as banner logs (which centralnotice-impressions replaces) [16:40:12] cool thx!!!!! [16:40:24] maybe we should switch it to {timestamp}-{source}-{format}-{samplerate} or something [16:40:28] (Merged) jenkins-bot: Add apis to retrieve information about the ultimate destination contact and source contacts for merged contacts. [wikimedia/fundraising/crm/civicrm] - https://gerrit.wikimedia.org/r/446519 (https://phabricator.wikimedia.org/T199748) (owner: Eileen) [16:41:11] Yeah currently the client-side sampling is nearly turned off... the plan is to gradually ramp up the client-side sample rate to test scale [16:41:24] what happens now is {thismessisdefinedbykafkateeconfig}-{timestampisdefinedbylogrotationscript} [16:41:42] Ah I see hmmm [16:42:03] we can make a smarter script but we'd have to make it smart enough not to break the old-style consumer [16:42:31] wrt '{timestamp}-{source}-{format}-{samplerate}', I'm imagining '{source}-{timestamp}-{format}-{samplerate}' would be prettier [16:42:50] If it's complicated I think we can devise some workaround in the scripts [16:44:14] We can just make a custom sorting function in the Python scripts to ignore the sampled section of the filename, and order by the timestamp even if sampled comes before it in the filename [16:44:43] are you storing data on which files have been read somewhere? [16:44:45] Might be a wee bit less performant, but pro'lly not that much [16:44:59] yep [16:45:17] one side or the other has to be made smart, my thinking is that the log rotation stuff is by nature trivial and not smart [16:46:03] Hmmm [16:46:12] it's doing stuff like: rename $filename "/srv/banner_logs/${year}/${filename}.${timestamp}" [16:47:03] hmmm [16:47:43] I can probably work around it by having puppet control the filenames in both places [16:47:51] hmmm [16:48:17] I don't know much about what can or should be done on that end [16:48:46] it's probably a similar amount of work to fix is on that end as it is on the parser script end [16:49:10] If it's pretty trivial to make it stick the timestamp somewhere in the new filename before the sampled section, that might be nicer in the long run [16:49:16] but if the parser script is just looking for new files, does it even matter at all? [16:49:52] well, it won't read all the filenames in the directory and check if the database for the processed status of every one [16:50:13] Rather the option this is for is backfill [16:50:39] where you should be able to make it try to read in files within a specific timestamp range in the filename [16:50:49] AndyRussG: did you want to talk this out live at the tech talk today? [16:50:50] so it'll only try to consume those files [16:51:00] ejegg: yes fer sure :) [16:51:03] AndyRussG: i see [16:51:10] cool cool [16:51:45] Jeff_Green: cwd: more than welcome to join in too if you'd like! (coming up at 10 am Pacific Time) [16:52:15] Jeff_Green: I'd rather not rely on the actual write times of the files [16:52:19] seems more dicey [16:53:40] Also I'd change the functioning of the current "recent" feature. Currently it tries to load files with timestamps with digits that correspond to the last hour and preceding hour [16:54:00] to me a live mtg doesn't add anything, I think we're on the same page about the things we're trying to fix so it's going to boil down squinting at existing code/config and finding the sanest option [16:54:52] Jeff_Green: :) heheh yeah [16:55:33] For the recent feature I'd like it to instead look in the database for the timestamp of the most recently processed file, and then try to read all files with a timestamp after that [16:55:36] Seems more resilient [16:55:45] ya [16:56:12] we have a very standard timestamp format, so parsing that from whatever filename is easy at least [16:56:57] So the idea is basically to be able to find files in a specific time range easily based on the timestamp in the filename. The easiest way I've found so far in Python is to order them alphanumerically then go through them in reverse order checking the timestamp in the filename is within range [16:57:15] That way, when we get outside the range of the "since" parameter, we can just stop [16:57:35] Which seems nice, assuming there'll be a large number of files in the dir [16:57:54] i'm only quasi around, heading back from nyc this aft [16:58:22] i see, so that's why it's easier for you for these files not to mix with banner/landing logs [16:59:58] Mmmm potentially yeah, though that bit can be solved with a pretty easy file glob [17:00:08] But it would kinda feel cleaner [17:00:10] AndyRussG: what I might do is...first get the latest timestamp from the db, and then rather than bothering with ordering, cycle the filenames evaluating the timestamp in each against the db value [17:00:24] unless it matters what order you process them in [17:00:47] No, the order of processing doesn't matter. But I just didn't want to cycle through _all_ the files in the dir [17:01:02] fr-tech, are we tech talkin [17:01:03] ? [17:01:07] jgleeson: yeah one sec! [17:01:07] it might be faster than globbing [17:01:23] ah ok, one sec also! [17:02:13] cwd: have a good trip! [17:02:42] :) [17:06:50] AndyRussG: (after that mtg) is there an existing ticket for this project? I'd like to get the changes we've discussed tacked to it for posterity [17:16:23] Fundraising-Backlog: Generate new Civi certificate for Kristie - https://phabricator.wikimedia.org/T200281 (MBeat33) [17:46:36] Jeff_Green: yeah for sure... Hmmm checking [17:47:05] Jeff_Green: so this is the epic for the whole project: T183978 [17:47:06] T183978: [Epic] Kafkatee changes - https://phabricator.wikimedia.org/T183978 [17:47:45] This is the one for the CN files: T189820 [17:47:45] T189820: Create job to deliver the eventlogging_CentralNoticeImpression topic - https://phabricator.wikimedia.org/T189820 [17:48:03] So that's where I can summarize some of the above ^ [17:48:18] ok, I'll make some subtasks for the -ops stuff [17:48:41] Cool thx!!!! [17:49:44] I think we don't have task for creating files from the eventlogging_LandingPageImpression topic, mmmm maybe I was supposed to add that and forgot 8p [17:50:20] Comments on new ingress scripts themselves can go here: T195594 [17:50:20] T195594: New scripts to ingress data from Kafkatee into MySQL - https://phabricator.wikimedia.org/T195594 [17:50:26] ok [17:50:41] Also here's the code so far: https://github.com/AndrewGreen/fr_user_event_consumer [17:50:44] thx!!!!!!!!!!!!! [17:58:27] I should definitely not get involved, but I’m surprised that the consumer reads from log files, it seems like an extra level of indirection and loses the benefits of Kafka... [17:58:52] Kafkatee only exists to write those particular log files, and could be deprecated otherwise, for example [18:03:43] hi awight! [18:04:01] :-) hi saurabhbatra [18:04:28] up for a feature selection discussion? :-) [18:04:41] certainly! [18:05:23] so we have 4 tables to get data off of [18:05:41] contact, contribution, wmf_contribution_extra, payments_fraud [18:06:03] contribution_tracking might have a few more details, too [18:06:13] awight: true... It was considered as an option to de-file the system [18:06:20] hehe [18:06:20] However backfill [18:06:35] Kafka is actually brilliant at backfill [18:06:35] And just wishing to jiggle things as little as possible at the outset [18:07:52] I shouldn’t let my urge to kill kafkatee blind me to the other facts :-) [18:08:07] is contrib_tracking a part of Civi? [18:08:08] awight: not sure if the kafka messages would be ok to keep for that long... Basically we'd need to dig into data retention requirements [18:08:11] and allowances [18:08:44] whereas if we don't jiggle the (human/legal) system much for now, we can just keep files/retention as is [18:08:53] /o\ that’s real. [18:09:16] I can definitely see a time when we'd no longer want the files and could also consume realtime [18:09:28] also a time when instead of filling up a MariaDB database we use something better [18:09:50] apparently there is moar better analytics tools coming along soon, too, as far as I understand [18:10:02] saurabhbatra: It’s a weird one. contribution_tracking is in the “drupal” database IIRC, and I’m forgetting how to join it [18:10:15] I'll have a look see [18:10:35] so the idea was just to leave the re-scoping of retention requirements for later when things might get re-done in a more serious way [18:10:58] AndyRussG: sorry to butt in! I can nurture my vendetta in private, until the time is ripe. [18:11:11] * awight casts the evil eye towards kafkatee [18:11:14] awight: heheh no worries, au contraire, thanks much for the input!!! [18:12:01] for now it's just making the pipeline more stable, and more maintainable, and trying to make whatever new pieces useful for the brave new future [18:12:23] saurabhbatra: https://collab.wikimedia.org/wiki/Fundraising/Engineering/Fun_SQL_Queries#contribution_tracking_joins_with_civicrm_contribution [18:12:38] I think it joins on contribution_id [18:13:02] AndyRussG: That makes much sense. Killing django_banner_stats is its own level boss [18:13:11] awight: any comments/recounting of fond memories appreciated wrt this, too!! https://www.mediawiki.org/wiki/User:AGreen_(WMF)/Draft:Mapping_of_EventLogging_properties_to_FundraisingImpressions_database [18:14:36] LOL [18:14:46] I'm not whitelisted for collab [18:14:48] ;p [18:15:16] saurabhbatra: Ah, all that I was going to point to was the join: [18:15:17] It's been a fun ride so far...... [18:15:18] LEFT JOIN drupal.contribution_tracking ct [18:15:19] ON cc.id = ct.contribution_id [18:15:33] yup, that's what I did [18:16:01] there are some interesting fields here which might help us [18:16:52] utm information seems to be new [18:17:38] Totally. I imagine the banner and campaign info will be helpful. For now, just extracting the components (e.g. “C16”) might be enough [18:18:18] whitelisting reminds me, frdev is cut off from the internet, can't get python up and running on it :-( [18:18:37] it won't let me do pip installs [18:20:37] cwd: ^ Is there proxying allowed, or should saurabhbatra just scp over Python wheels? [18:21:57] I could scp, but pip is antiquated on the server [18:22:19] it's running pip version 1.5 vs pip 10.0 on my system [18:23:35] Jeff_Green might be able to help, not sure if casey's back yet [18:25:45] saurabhbatra: yes, that is by design [18:26:02] what workaround would you suggest? [18:26:05] I can help you with package installs of the stuff you need [18:26:15] We like having old packages cos it makes us feel young by comparison :p [18:26:21] ha [18:26:23] haha :-) [18:26:37] we like data not flowing off the machine :-) [18:26:41] i think just a pip upgrade would be great [18:27:04] then i can scp in my virtualenv folder [18:27:24] That probably won’t work [18:27:33] and hopefully everything will work out, the sun will shine and birds will chirp [18:27:38] virtualenv folders are really twitchy about host machine and path [18:27:51] i take that back then [18:27:51] bundling as wheels seems more likely to work… [18:27:59] * awight looks for commandline [18:28:58] pip wheel -r requirements.txt -w wheels [18:29:09] That will dump the required wheels into a directory "wheels" [18:30:29] oh great! [18:30:36] haven't used it before [18:31:21] although now that i look at my requirements, they're not that complicated [18:31:52] just need numpy, sklearn, matplotlib, pickle and flask [18:32:34] + dependencies, I believe [18:33:28] python-numpy 1:1.8.2-2 is already there [18:33:36] let me give wheels a try [18:33:49] python-sklearn 0.14.1-3 available as a package [18:34:08] saurabhbatra: heads-up that you should be using the same version of python for best results, of course [18:34:33] python-matplotlib 1.4.2-3.1 available as package [18:35:12] is pickle the same as python-jsonpickle ya think? [18:35:56] python-flask 0.12.1-1~bpo8+1 is available as a package [18:36:30] python-jsonpickle 0.8.0-1 is available as a package, assuming it's the right thing [18:37:39] i think it's different, looking at the documentation [18:38:01] json-pickle serializes to json while pickle serializes to a byte stream [18:39:45] weird, cos pickle is core Python functionality [18:39:51] oh yeah [18:40:03] darn, pickle should be there de facto [18:40:21] just the python function, yeah I think that's part of the core package [18:42:01] >>> numpy.version.version [18:42:01] '1.14.3' [18:42:02] >>> sklearn.__version__ [18:42:03] '0.19.1' [18:42:06] >>> matplotlib.__version__ [18:42:07] '2.2.2' [18:42:16] these are the versions i'm currently working with [18:42:35] btw, “pip freeze" [18:42:56] ok, hopefully the slightly older versions are viable [18:43:19] i'll see if i can work with the versions available [18:43:35] pip install should work right? [18:43:57] It sounds like that won’t be necessary cos Jeff_Green is installing the libs as packages [18:44:19] I'll have the debian versions of these installed momentarily [18:45:07] cool, thanks! [18:45:43] back to feature selection [18:46:26] saurabhbatra: Do you have a plan for how to store the extracted feature values? [18:46:54] currently i'm outputing the mysql queries results into a csv file [18:47:08] kk [18:47:30] selecting only the columns which have information that seems useful [18:47:45] currently i've shortlisted these columns [18:47:46] Then maybe the final extraction to vectors will happen during training? [18:47:48] id,financial_type_id,payment_instrument_id,receive_date,total_amount,currency,gateway,payment_method,country,user_ip,server,utm_source,utm_medium,utm_campaign [18:48:15] oh yeah, i'm not too worried about that part (maybe I should be) [18:48:20] I think you should leave out user_ip and server [18:49:01] user_ip for the reasons we were exploring yesterday, though I understand if you’re still not convinced! And server because it’s irrelevant IMO [18:49:14] ok install is done [18:49:32] because of the not to be biased against certain regions argument? [18:49:54] saurabhbatra: yes, but sorry now I see that it might be useful for other processing if not directly as a feature. [18:50:08] how about we keep the ip but no geo-location info? [18:50:12] i.e., we can pull more information about that user_ip’s activity and use as features [18:51:17] not sure how that'd be helpful though, so maybe we should drop it right now [18:51:53] thanks Jeff_Green, tested and it works! [18:51:58] excellent@! [18:53:20] saurabhbatra: I’m okay with one of the compromises we mentioned yesterday, that we should include IP or not, but attach a big disclaimer about our reservations regarding that feature, and ideally compare model health with and without it. [18:53:50] But it makes sense to have in the CSV in any case, since there are more extractions we can do using the IP. [18:54:20] Great news that the python libs are ready! [18:54:38] alright, leaving it in there for now [18:54:54] should i be using contact names as well? [19:06:19] We should extract them, but calculated features will me more like “number of characters in name”, “number of words in name”, “number of digits in name” [19:06:29] makes sense [19:06:56] a donation from "jhgfkjhgf hjgfd" could be a dead giveaway :-) [19:07:53] regarding "smart" features - we could also do some queries like contact.created_date-donation.received_date [19:08:15] Yeah—There must be good ways to look for gibberish. Vowel ratio... [19:09:14] Interesting date calculation. For sure, let’s play with lots of ideas! Want to make an etherpad, https://pypi.org/search/?q=pickle [19:10:36] I’m starting to see some places where the ORES framework might be nearly reusable already, maybe I’ll poke at that today. [19:11:23] I guess the techniques would be the same cross-concerns [19:11:36] damn near! [19:12:16] Really, it seems that a few ORES packages should be upstreamed to sklearn [19:12:44] so i guess the problem boils down to - how can we use these columns to make feature trends easily recognizable [19:13:06] awight: We've got a gibberish name detector that's working pretty well. Doesn't get everything, but we've seen fairly few false positives out of it. [19:14:19] XenoRyet: Perfect! I wonder if it would be better to output a scalar like “certainty”, or just take the threshold cutoff as-is… [19:14:49] I think taking the threshold as is would work well enough [19:15:12] I see a lot of "iononioin ioniooinino" and "hygtfr ujhygt"s in the data we pulled [19:15:19] Yeah, I guess the way this works is that we can try both eventually [19:15:49] XenoRyet: so how is the name detector made available? as a service? [19:16:12] * awight quickly plugs ears to avoid hearing "PHP" [19:16:48] It's just one of the fraud filters DonationInterface is running internally right now. [19:17:15] oh cool—so its results go into payments_fraud? [19:17:23] Yep [19:17:28] :100%: [19:17:34] the filter name is getNameScore I think [19:17:45] payments_fraud_breakdown I think [19:17:54] saurabhbatra: nice. [19:18:08] saurabhbatra: On that note, we should just import all that jazz as individual features, eh? [19:18:21] i have some reservations regarding that [19:18:42] so I'm not sure if we're looking to complement minFraud or replace it altogether in the future? [19:18:54] Definitely, the long-term is to replace it [19:19:05] Yea, definitely [19:19:05] But for now, yes only complement and no need to be redundant with it, [19:19:30] But the payments_fraud numbers are internal calculations which will certainly form a part of the minfraud replacement tech [19:19:38] oh is that so [19:19:49] i thought fredge was populated post minFraud [19:20:04] Yes but not all those numbers come from minfraud [19:20:20] (particularly filter scores) [19:20:30] even if not though, minfraud output seems perfectly suited as an input to out post-fraud filter it seems [19:21:13] yeah but if we put that in as a feature, we require that new transactions have that field populated [19:21:42] so any transaction that does not have minFraud scores is just new ground for the model [19:22:41] oho, good point. But since we’re only building the post-minfraud model in this phase, it doesn’t seem like a blocker. [19:22:58] cos everything does have a minfraud score by the time we process it [19:24:07] When we do the next phase, a pre-fraud model would definitely have different inputs. And to your point about similar features, that suggests that we’ll need to have several different models, which can be used only as their dependencies are fulfilled. [19:24:52] makes sense [19:24:59] I'll include that data in as well [19:25:17] That’s sort of how antifraud is currently designed in DonationInterface at least. There are layers like “IP was recently seen”, which are triggered before certain steps. [19:26:01] saurabhbatra: Unless you have a page already… https://etherpad.wikimedia.org/p/Antifraud_features [19:26:02] alright so the filters are minFraud + internal scores [19:26:56] switching to mobile, be back in a minute [19:28:36] Back [19:29:48] sorry that etherpad is terrible on mobile [19:30:15] I'll have a look when I'm back on my laptop, should take 15 minutes [19:30:34] I pasted what I think we have so far, please edit when you have time [19:31:29] I think I'll be able to follow up on my own [19:31:49] We can probably go over it together tomorrow to finalize it [19:32:28] Argh, it’s late there! o/ [19:32:33] eileen joining the channel reminds to sleep everyday :-) [19:32:48] hahaha [19:33:48] I'll fill out the etherpad in a bit, see you tomorrow! [20:02:10] Fundraising-Backlog: publish Oanda exchange rates to internal, private google doc - https://phabricator.wikimedia.org/T200227 (DStrine) p:Triage>Lowest [20:04:44] Fundraising-Backlog: Add privacy policy link to Bitpay donation form - https://phabricator.wikimedia.org/T200292 (MBeat33) [20:14:24] Fundraising Sprint Karma chameleons hide amongst us, Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Matt Damon to head up Space Force, Fundraising Sprint Naming Sprints Is Not Important, and 3 others: Help switch over foundation pages ... - https://phabricator.wikimedia.org/T193663 [20:14:28] Fundraising Sprint O 2018, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Import log files not including all skipped rows - https://phabricator.wikimedia.org/T200031 (DStrine) [20:14:30] Fundraising Sprint O 2018, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: slow anonymous save - https://phabricator.wikimedia.org/T199753 (DStrine) [20:14:32] Fundraising Sprint Karma chameleons hide amongst us, Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Matt Damon to head up Space Force, Fundraising Sprint Naming Sprints Is Not Important, and 2 others: New scripts to ingress data from K... - https://phabricator.wikimedia.org/T195594 [20:14:34] Fundraising Sprint Naming Sprints Is Not Important, Fundraising Sprint O 2018, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: contribs export failing for DS agent - https://phabricator.wikimedia.org/T196569 (DStrine) [20:14:36] Fundraising Sprint Fhabricator is spelled with an "F", Fundraising Sprint Gravity wasn't always this pushy, Fundraising Sprint HTTP originally stood for Happy Turtle Transfer Protocol, Fundraising Sprint Ivory and eggshell white are the same color, and 7 ot... - https://phabricator.wikimedia.org/T189613 [20:14:38] Fundraising Sprint Karma chameleons hide amongst us, Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Naming Sprints Is Not Important, Fundraising Sprint O 2018, and 3 others: Ingenico We don't have an order status after doing a GET_ORDE... - https://phabricator.wikimedia.org/T194517 [20:14:40] Fundraising Sprint Gravity wasn't always this pushy, Fundraising Sprint HTTP originally stood for Happy Turtle Transfer Protocol, Fundraising Sprint Ivory and eggshell white are the same color, Fundraising Sprint Junebugs prefer July, and 7 others: Ingenic... - https://phabricator.wikimedia.org/T190098 [20:14:42] Fundraising Sprint Asymmetrical Earth Theory, Fundraising Sprint Bermuda Rhombus (where things disappear then reappear), Fundraising Sprint Cottage Cheese isn't Made of Cottages, Fundraising Sprint Dinosaur Cookies co-existed with Gingerbread People, and 1... - https://phabricator.wikimedia.org/T178930 [20:14:44] Fundraising Sprint O 2018, Fundraising-Backlog: Add explainer text to CC payment form (for banner checkbox experience) - https://phabricator.wikimedia.org/T200218 (DStrine) [20:14:57] Fundraising Sprint O 2018, Fundraising-Backlog: Queries and maybe scripts to verify equivalence of data in new-Kafka-pipeline-testing and pgehres production databases - https://phabricator.wikimedia.org/T198752 (DStrine) [20:14:59] Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Matt Damon to head up Space Force, Fundraising Sprint Naming Sprints Is Not Important, Fundraising Sprint O 2018, Fundraising-Backlog: Write a specification for mapping banner/landing ... - https://phabricator.wikimedia.org/T196563 [20:15:01] Fundraising Sprint O 2018, Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (DStrine) [20:15:03] Fundraising Sprint Karma chameleons hide amongst us, Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Matt Damon to head up Space Force, Fundraising Sprint Naming Sprints Is Not Important, and 5 others: Update Ingenico WX audit parser to... - https://phabricator.wikimedia.org/T195337 [20:15:05] Fundraising Sprint O 2018, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: civi: report filter for total USD within a time period - https://phabricator.wikimedia.org/T151271 (DStrine) [20:15:07] Fundraising Sprint O 2018, Fundraising-Backlog: publish Oanda exchange rates to internal, private google doc - https://phabricator.wikimedia.org/T200227 (DStrine) [20:15:14] Fundraising Sprint Karma chameleons hide amongst us, Fundraising Sprint Lactose is unusually tolerant, Fundraising Sprint Matt Damon to head up Space Force, Fundraising Sprint Naming Sprints Is Not Important, and 7 others: Create Ingenico orphan rectifier m... - https://phabricator.wikimedia.org/T163949 [20:31:30] ejegg, do you have a few minutes to help me work out why my seemingly straightforward changes to the silverpop export app are breaking? [20:34:11] sure [20:34:25] want to put up a WIP ? [20:34:25] great, queenmary? [20:34:33] sure [20:34:36] ok, I'll be right there [21:43:42] ejegg, looks like something else is still a miss, the case statement isn't doing what it should be. Gonna try and work it out tomorrow. have a good evening fr-tech! [22:24:39] Fundraising-Backlog: Translate CountryNope message to other languages - https://phabricator.wikimedia.org/T199255 (Pcoombe) @stjn Thanks, I made both of these changes. [22:37:09] Fundraising Sprint Naming Sprints Is Not Important, Fundraising Sprint Owls, Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Civi: contribs export failing for DS agent - https://phabricator.wikimedia.org/T196569 (Eileenmcnaughton) I just tried to ping Kristie - I realised I don't know her tz [23:02:39] Fundraising Sprint Owls, Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (Eileenmcnaughton) So I don't think the spec is to only keep the most recent action - this was implemented to keep both received & opened actions in the DB (with them being f... [23:10:38] Fundraising Sprint Owls, Fundraising-Backlog: Mailing data double counting in CiviCRM? - https://phabricator.wikimedia.org/T200240 (CCogdill_WMF) Ah cool, good to know my understanding re: most recent action was wrong. That's good for us at the end of the day! Also good to know CPS was reading the data... [23:23:15] I’m running some queries on staging in case there is any slowness [23:51:40]