[06:42:03] <awight_mob>	 eileen: I can type here, if there's anything I can add to the meeting?
[06:42:47] <eileen>	 awight_mob: ah just your mobile?
[06:42:55] <awight_mob>	 Exactly
[06:42:59] <eileen>	 We were wondering if same time tomorrow would work?
[06:43:12] <awight_mob>	 Yes, sounds good!
[06:43:36] <eileen>	 great - the one thing we covered was sample data
[06:43:47] <awight_mob>	 I should have the guest wifi password by then ;-)
[06:43:53] <eileen>	 :-)
[06:44:06] <eileen>	 so what is the deal - you are living where at the moment?
[06:44:28] <saurabh>	 hi @awight_mob @eileen
[06:44:39] <eileen>	 hey saurabh
[06:44:47] <awight_mob>	 Okay perfect, my only input about sample data is that things are easier if we have a "balanced" set, meaning 50% fraud and 50% not.
[06:44:59] <awight_mob>	 Hi saurabh o/
[06:45:26] <saurabh>	 Yup, but that's probably not going to be the case in a real world scenario
[06:45:44] <awight_mob>	 I might be getting ahead of things though, if you're just discussing a few snippets to analyze the format
[06:46:12] <saurabh>	 What we can do is only pick non-fraudulent data points equivalent to the number of fraudulent data-points in the training dataset
[06:46:37] <eileen>	 saurabh:  can you post your dataset link?
[06:46:44] <saurabh>	 So I found this dataset about credit card fraud - https://www.kaggle.com/mlg-ulb/creditcardfraud
[06:47:14] <saurabh>	 The problem being "we have 492 frauds out of 284,807 transactions"
[06:48:03] <awight_mob>	 Right, that's the ticket, to synthesize the balanced set...  I'm not certain whether it makes a difference for all ML algorithms, but believe it does for some, so for exploring algos this will make it simpler. Also, the model health statistics will be easier to interpret.
[06:48:25] <awight_mob>	 500/500 might be fine to start with!
[06:49:35] <eileen>	 awight_mob: so you are suggesting it's not real world but it's easier to work with during coding in your experience?
[06:50:16] <awight_mob>	 Yah, exactly.
[06:50:24] <saurabh>	 Yup, makes sense to train on that
[06:51:06] <saurabh>	 So which model do you think I should start experimenting with?
[06:51:28] <awight_mob>	 Basically, the additional 284k nonfraud samples will just make our ai better at recognizing nonfraud, which isn't what we need it to specialize in.
[06:51:51] <saurabh>	 Yup
[06:52:00] <saurabh>	 But another issue being
[06:52:15] <saurabh>	 If we use all 500 of the fraudulent ones, how can we test our model?
[06:52:30] <saurabh>	 A 450/50 training/testing split?
[06:52:36] <awight_mob>	 saurabh: I should be clear that I'm a beginner in ML myself :-) but from what I've seen, we should try a handful of models, it's really hard to guess what will work.
[06:53:19] <saurabh>	 So how about I start with the usual classifying algorithms like Logistic Regression and SVMs etc?
[06:54:43] <awight_mob>	 Scikit-learn includes some cross validation stuff, tldr, it makes several folds E.g. all ten 90-10% splits, trains and tests each of those models, and if they all behave roughly the same, we can feel justified in training using the full set.
[06:55:06] <awight_mob>	 Sure, SVM, gradient boosting, random forest...
[06:55:40] <awight_mob>	 I found a nice one page guide to shooting from the hip when choosing algorithms, if you ping me later I can forward
[06:55:53] <saurabh>	 Yeah I'll drop you a mail
[06:56:03] <saurabh>	 And get started with some light experiementation
[06:56:11] <awight_mob>	 :100%:
[06:56:22] <awight_mob>	 Exciting stuff!
[06:56:43] <saurabh>	 The kaggle link also has posts by people who've experimented with the data with various models
[06:57:21] <saurabh>	 I'll go through them once too, start off with something that looks promising
[06:58:52] <awight_mob>	 This was the page I was thinking of, https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/
[07:00:19] <awight_mob>	 Cool, with a 1k set it shouldn't take much CPU time. Feature engineering and dimensionality reduction will probably be the fun part.
[07:00:45] <saurabh>	 Yup
[07:01:06] <saurabh>	 Although the dataset columns aren't labeled I think which might cause some problem
[07:01:08] <saurabh>	 *problems
[07:01:20] <awight_mob>	 Thanks for putting up with my ELIZA appearance today, hopefully I'm full on Max Headroom by tomorrow
[07:03:31] <eileen>	  saurabh - you'll send through a new invite?
[07:04:32] <saurabh>	 Yup sure
[07:06:47] <saurabh>	 Same time tomorrow works for everybody?
[07:12:58] <eileen>	 for me yes
[15:18:20] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220909 (mepps) @XenoRyet can you update your query for globalcollect? And can you see how many donors in paypal would be affe...
[15:19:24] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220912 (mepps) @MBeat33 When you look at those donors still affected by this banner, were any of their charges refunded?
[15:24:11] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220920 (MBeat33) @mepps I'm not sure how to query all donations from that banner to spot-check. All I know is it was in the l...
[15:27:24] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220931 (mepps) @MBeat33 yeah @XenoRyet can do the query, I was more curious what you were seeing with the donors reporting.
[15:30:03] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220942 (MBeat33) Ah, yes, the donors who are reaching out are ones where the donations need to be refunded as well as cancele...
[15:36:47] <wikibugs_>	 Fundraising Sprint Junebugs prefer July, Fundraising-Backlog: Cancel & refund the remaining unintended recurring donations from Big EN - https://phabricator.wikimedia.org/T192958#4220957 (mepps) @MBeat33 Got it, I just wanted to make sure one or the other process didn't fail. It sounds like they just wer...
[19:32:08] <dstrine>	 AndyRussG:  meeting?
[19:32:30] <AndyRussG>	 :)
[19:36:49] <wikibugs_>	 Fundraising-Backlog: Prospect Tab- Reviewed Field - https://phabricator.wikimedia.org/T194784#4221230 (DStrine)
[19:37:42] <wikibugs_>	 Fundraising-Backlog: Contact Report Filters not displaying tag set tags - https://phabricator.wikimedia.org/T194783#4208403 (DStrine)
[19:39:45] <wikibugs_>	 Fundraising-Backlog: Testing infrastructure for EventLogging ingress of banner impression and landing page data - https://phabricator.wikimedia.org/T195259#4221243 (AndyRussG)
[20:02:33] <wikibugs_>	 Fundraising-Backlog, MediaWiki-extensions-CentralNotice: Improve remind me later JS for advancement banners - https://phabricator.wikimedia.org/T195260#4221269 (DStrine)
[20:05:16] <wikibugs_>	 Fundraising-Backlog, Wikimedia-Fundraising-CiviCRM: Assess and implement GDPR - https://phabricator.wikimedia.org/T195261#4221282 (DStrine)
[21:45:14] <wikibugs_>	 Fundraising-Backlog: Creating a new record without email - https://phabricator.wikimedia.org/T195266#4221475 (NNichols)
[21:56:44] <AndyRussG>	 cwd: hey... I'm here anytime you want to look at this...
[22:05:35] <cwd>	 AndyRussG: i'm around
[22:06:04] <cwd>	 was just working w/ otto some more, still can't quite get this to work
[22:06:33] <cwd>	 but entirely separate issue
[22:11:02] <AndyRussG>	 cwd: okok yeah no rush :) mebbe can u post here the exact kafkacat command you'd use currently to monitor the topic? I can start re-checking the JS
[22:14:01] <cwd>	 AndyRussG: kafkacat -C -b kafka-jumbo1002.eqiad.wmnet:9092 -t eventlogging_CentralNoticeImpression
[22:17:11] <AndyRussG>	 cwd: okok.... From which box? Also, why kafka-jumbo? This example uses kafka1012.eqiad.wmnet from stat1002: https://wikitech.wikimedia.org/wiki/Kafka#Consume
[22:17:16] <AndyRussG>	 Maybe there's some doc I'm missing
[22:17:18] <AndyRussG>	 thx!!!
[22:17:36] <cwd>	 AndyRussG: i believe they replaced all the kafka servers
[22:17:41] <cwd>	 now it is jumbo1-6
[22:17:53] <cwd>	 do it from alnitak
[22:17:57] <cwd>	 it is a codfw server
[22:18:20] <AndyRussG>	 alnitak.codfw.wmnet?
[22:18:30] <cwd>	 alnitak.frack.codfw.wmnet
[22:19:30] <AndyRussG>	 Ah okok
[22:26:51] <AndyRussG>	 cwd: to get there, I should be going through frbast.wikimedia.org, and using my frack credentials, no?
[22:27:51] <AndyRussG>	 Hmm looks like I still have some updating to do in my ssh config as per https://wikitech.wikimedia.org/wiki/Fundraising/tech/ssh_config
[22:30:29] <cwd>	 AndyRussG: that works
[22:30:36] <cwd>	 you can also use rigel which is the codfw bastion
[22:30:38] <cwd>	 doesn't really matter
[22:33:37] <AndyRussG>	 hmmm
[22:34:18] <cwd>	 AndyRussG: having trouble?
[22:34:28] <AndyRussG>	 yeah for some reason it was trying to go through bast2001 and trying to use the ssh key for normal prod
[22:34:35] <AndyRussG>	 no just gonna try that config, one sec
[22:34:49] <AndyRussG>	 hopefully it'll work :)
[22:35:05] <cwd>	 fail2ban is pretty sensitive but i can unblock you
[22:35:10] <cwd>	 if it hangs
[22:43:03] <AndyRussG>	 cwd: all good, got in with that ssh config :) thx!!!
[22:45:34] <AndyRussG>	 Can I check the fingerprints newhere?
[22:47:50] <AndyRussG>	 https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints
[22:58:31] <cwd>	 AndyRussG: ah, good call, i will add them to a wiki
[22:58:37] <cwd>	 i don't think there is one atm
[22:58:47] <AndyRussG>	 ah ok thx much :)
[23:01:14] <AndyRussG>	 btw again I didn't get the event from the campaign on ruwiki, even though I did see it sent from the browser
[23:02:18] <AndyRussG>	 however I do see it when sent from the campaign on wikibooks
[23:02:29] <AndyRussG>	 At least that, I doubt, is a JS issue
[23:06:42] <AndyRussG>	 hrrmmm also not getting the event on enwiki
[23:06:48] <cwd>	 AndyRussG: that was my experience also
[23:08:02] <cwd>	 but i also can't get the wikibooks one to work in ff
[23:08:06] <cwd>	 and i do not see the request
[23:10:32] <cwd>	 i have a few privacy extensions but they are all green lights
[23:19:38] <AndyRussG>	 cwd:when you don't see the wikibooks one in ff, do you see it in kafkacat?
[23:20:05] <AndyRussG>	 can you maybe run a different firefox instance without the plugins (like maybe as a different user on your machine)?
[23:21:18] <AndyRussG>	 (I often run X programs using more than one local user, just give them permission first: sudo xhost +SI:localuser:some_other_user
[23:21:20] <AndyRussG>	 )
[23:21:51] <AndyRussG>	 (after that you can log in as some_other_user and run ff or anything else under a fresh profile)
[23:22:34] <AndyRussG>	 I wonder what happens to events that are somehow not validated by the EL schema... Do they not come down the Kafka pipe, I guess?
[23:23:08] <cwd>	 good question
[23:23:42] <AndyRussG>	 Hmmm they go to their own topic: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Verify_received_events
[23:24:52] <AndyRussG>	 cwd: yep that's it
[23:25:30] <AndyRussG>	 ok gotta relocate in a sec, but that must indeed be a JS issue
[23:25:39] <AndyRussG>	 so I'll dig at that...
[23:26:00] <AndyRussG>	 or well, I should say, likely is a JS issue
[23:28:10] <AndyRussG>	 K back in a bit!
[23:30:35] <cwd>	 ok, i'll be around