[00:00:25] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric) review from @Ottomata is appreciated. And btw, why is the mediawiki_history fetch disabled? Is there some (problem/d... [00:03:27] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Upgrade python-kafka to 1.4.7 - https://phabricator.wikimedia.org/T234808 (10Krinkle) Ack. [00:03:34] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Upgrade python-kafka to 1.4.7 - https://phabricator.wikimedia.org/T234808 (10Krinkle) a:03Krinkle [00:59:28] nuria: Just checking in. I've been looking over turnilo quite a bit to try to better understand the data. Also began rereading the phab tickets to get a better understanding of what's wanted for the new resulting dataset. Will continue to read the phab tickets and then look over the past public datasets to acquaint myself with the format of those datasets [09:38:21] 10Analytics, 10Desktop Improvements, 10Event-Platform, 10Readers-Web-Backlog (Kanbanana-2019-20-Q2): [SPIKE 8hrs] How will the changes to eventlogging affect desktop improvements - https://phabricator.wikimedia.org/T233824 (10ovasileva) p:05Normal→03High [10:52:16] 10Analytics, 10Analytics-EventLogging: Update pingback reports to use heartbeat pings to filter data - https://phabricator.wikimedia.org/T236178 (10Aklapper) @fdans: #good_first_bug tasks are self-contained, non-controversial issues with a clear approach and should be well-described with pointers to help a com... [13:11:24] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Ottomata) > why is the mediawiki_history fetch disabled {T234229} [14:09:46] ah sorry i thought i was online! [14:35:47] 10Analytics, 10Community-Tech, 10Product-Analytics (Kanban): Hash all pageTokens or temporary identifiers from the EL Sanitization white-list for Community Tech - https://phabricator.wikimedia.org/T226861 (10aezell) @mforns Thanks for that info. It was very helpful. @nettrom_WMF As best I can tell from the... [15:03:18] 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) @Milimetric the nediawiki history is quite a large fileset that requires a hadoop client on the dump servers to rsync in a... [15:03:30] lexnasser: sounds good, let's chat in person today [15:17:03] ottomata: what was the url in labs to search mediawiki codebase ? [15:17:50] hmmm [15:17:52] nuria: i don't know [15:29:07] (03PS1) 10Mforns: Add report generation for data quality oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/546212 (https://phabricator.wikimedia.org/T235486) [15:31:57] (03CR) 10Mforns: [C: 04-2] "Still needs testing." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/546212 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:41:38] hey ottomata :], would it be possible to execute an rsyn from an oozie shell-action from HDFS to a stats machine? [15:41:45] *rsync [15:42:00] given that the user executing the oozie coordinator is analytics [16:14:39] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) I want to start a discussion about MW Extensions and our intentions with where code should live. Part of this discussion has alre... [16:16:11] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) My preferences: - Strong preference for Config-1. - In favor of Producer-1. Producer-2 is ok too. Don't want Producer-3. - Pref... [16:18:53] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) Also @mpopov [16:22:28] AGHH STANDUP [16:22:29] WHAT [16:27:40] 10Analytics, 10Data-release, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339 (10leila) @Nuria can you provide an update about the status of this task? (I'm getting more questions about it in Wikidata Con 2019;). [16:37:55] 10Analytics, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Neil_P._Quinn_WMF) [16:38:08] 10Analytics, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Neil_P._Quinn_WMF) [16:43:54] 10Analytics, 10Product-Analytics: Start refining InputDeviceDynamics events - https://phabricator.wikimedia.org/T212368 (10Neil_P._Quinn_WMF) 05Open→03Declined This data stream is now inactive. [16:43:56] 10Analytics, 10Product-Analytics: Start refining all blacklisted EventLogging streams - https://phabricator.wikimedia.org/T212355 (10Neil_P._Quinn_WMF) [16:59:55] 10Analytics-Kanban: Per referer mediarequests returns requests count as string - https://phabricator.wikimedia.org/T233622 (10Nuria) 05Open→03Resolved [17:00:03] 10Analytics, 10Community-Tech, 10Product-Analytics (Kanban): Hash all pageTokens or temporary identifiers from the EL Sanitization white-list for Community Tech - https://phabricator.wikimedia.org/T226861 (10nettrom_WMF) 05Open→03Resolved @aezell : I agree with you. I went through [[ https://meta.wikimed... [17:00:06] 10Analytics, 10Product-Analytics, 10VisualEditor: Hash all pageTokens or temporary identifiers from the EL Sanitization white-list - https://phabricator.wikimedia.org/T220410 (10nettrom_WMF) [17:02:29] 10Analytics, 10Data-release, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339 (10Nuria) @leila sorry, but we reprioritized this task to be able to work in the three upcoming public datasets. 1) geoeditors, editors stats p... [17:39:17] 10Analytics, 10Better Use Of Data, 10Epic, 10Performance-Team (Radar), 10Product-Infrastructure-Team-Backlog (Kanban): Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5604098, @Nuria wrote: >In that case it seems that the next step that ne... [17:40:56] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Nuria) >@Nuria, I know you are wary of creating a new extension to replace EventLogging. From what I understand, your worry is more about lo... [17:42:46] 10Analytics, 10Analytics-EventLogging, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Isaac) As I go to do this analysis, what UTC day/hour should be my cut-off for when QuickSur... [18:11:03] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) > the transport, enqueing and sampling remains so the majority of the code https://github.com/wikimedia/mediawiki-extensions-Even... [18:26:53] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Nuria) >The only code in https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/modules/ext.eventLogging/core.js that wi... [18:45:06] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, and 2 others: Event Platform Client Libraries - https://phabricator.wikimedia.org/T228175 (10Ottomata) > I think my thoughts align with what you describe here: Config-1, Producer-3 but Ia m not sure I understand what you are proposin... [18:53:33] nuria: ottomata: have a couple mins to hash out this maxLength business on the error logging patchset? [19:12:23] hello yes [19:12:25] was afk for a few mins [19:12:29] nuria: yt too? [19:22:32] ottomata, hip: i have 5 mins before meeting [19:23:24] ottomata, hip : batcave? [19:23:25] oh hi [19:23:29] ya sure [19:23:39] hip: https://meet.google.com/rxb-bjxn-nip [19:23:58] nuria: am there [19:47:27] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Patch-For-Review, 10Product-Infrastructure-Team-Backlog (Kanban): Create client side error schema - https://phabricator.wikimedia.org/T229442 (10jlinehan) New patch submitted and reviewed (https://gerrit.wikimedia.org/r/536792). Plan is to merge as p... [20:12:57] nuria: Succesfully sshed into the server/hive [20:13:04] lexnasser: ohhh [20:13:17] lexnasser: nice, did you try jupyter notebooks? [20:13:27] lexnasser: if you do and it works let me know [20:13:50] lexnasser: ssh -N notebook1003.eqiad.wmnet -L 8000:127.0.0.1:8000 [20:14:01] lexnasser: and after try to access localhost:8000 [20:14:02] nuria: have not tried Jupiter, will try now. for future reference, will I be creating the public dataset from `wmf ` or `wmf_raw` [20:14:11] lexnasser: wmf [20:14:21] lexnasser: they very large webrequest table [20:18:03] nuria: was able to ssh into Jupyter, but got this message at localhost. Should I configure https with Jupyter or is it fine as is? https://usercontent.irccloud-cdn.com/file/e0SXbqe9/jupyter_https [20:18:29] lexnasser: that is fine cause you are bypassing https via ssh tunnel [20:18:39] you can use your wikitech credentials to log in [20:18:52] lexnasser: user/pw [20:20:49] nuria: successfully logged in, file structure has just a single `venv` folder [20:21:02] lexnasser: ok, now [20:21:14] lexnasser: ssh into the box with another terminal [20:21:30] lexnasser: and copy cp /home/nuria/Detailed_Pageview_Report.ipynb to your homedir [20:21:41] lexnasser: "/home/lexnasser" [20:22:03] the box meaning the stat1007 server? [20:22:12] lexnasser: teh notebook one [20:22:26] lexnasser: notebook1003.eqiad.wmnet [20:22:32] ssh youruser@notebook1003.eqiad.wmnet [20:24:22] nuria: successfully copied to my files [20:24:55] lexnasser: ok, now if you go to http://localhost:8000 again you should see file, if you click on it [20:26:22] nuria: yes, I see it and clicked on it. I'm assuming that running the queries currently in the notebook won't mess anything up [20:26:36] lexnasser: no, it is read only [20:26:43] lexnasser: the notebook that is [20:27:00] nuria: got it [20:27:04] lexnasser: so you can execute queries and see plots and use that as a basis to explore the data we will be using for caching [20:27:12] lexnasser: have in mind this data is PII [20:27:34] lexnasser: meaning that we are extremely careful with it and never leaves our machines [20:28:06] nuria: understood, is there anything else I should do to check my ssh is setup properly [20:28:26] lexnasser: no, that's it [20:28:44] lexnasser: let's talk on our next call about data sensitivity and next steps for dataset [20:29:17] nuria: 👍 [21:40:30] (03PS2) 10Ottomata: Add HDFSCleaner to aid in cleaning HDFS tmp directories [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) [21:42:06] (03CR) 10Ottomata: "Haven't had time to test this code today but I implemented your ideas!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/543897 (https://phabricator.wikimedia.org/T235200) (owner: 10Ottomata)