[03:06:47] 10Analytics, 10Product-Analytics: Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10Nuria) The cookie you would need is WMF-Last-Acess-Global and it is been there since "incepti... [06:34:03] morning! [06:34:15] so the +r flag has finally made some changes in spam [07:01:36] \o/ :) Morning elukey [07:01:47] bonjour! :) [07:02:04] so it is not the best since people not registered in Freenode will not be able to reach us [07:02:22] we can try to remove it in a few days [07:02:26] yup - Maybe we can regularly check for spam presencE? [07:02:30] hopefully the spam will be away [07:02:34] yup [07:29:41] an-coord1003 still not ready from the hw side, probably we'll need to schedule another maintenance window for it [07:29:52] k elukey [07:31:58] (afk for a bit) [10:35:30] ok I am kinda functioning now [10:35:52] I had to take a couple of naps, for some reason this time the jet lag is making me sleep more than usual :D [10:35:57] (and I have slept all night!) [11:23:02] elukey: I'm completely unproductive so far - Did an insomnia last night, woke up at 3am [11:23:26] elukey: Still have to be up to manage kids and all, but completely unproductive :( [11:26:36] :) [12:11:56] * elukey lunhc! [12:20:56] 10Analytics: Add CitationUsage fields to EL purging white-list - https://phabricator.wikimedia.org/T205272 (10Miriam) p:05Triage>03High [13:49:45] 10Analytics, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, 10MobileFrontend (MobileFrontend.js), 10Patch-For-Review: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Ottomata) > The point I was trying to make above is that neither Druid nor Su... [14:08:14] 10Analytics: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Ottomata) So, I just tried to reproduce this, and can't. (notebook1004 was missing a package that caused a different error, but I'm fixing that now. notebook1003 works). I ran this comm... [14:16:38] helloooo a-team I just arrived to Spain. I’m going to rest now and miss the meetings but I’ll be connected later in the evening [14:16:52] yoohooo [14:17:01] heyall [14:25:11] 10Analytics, 10Analytics-Cluster: Upgrade Hive to ≥ 2.0 - https://phabricator.wikimedia.org/T203498 (10mpopov) Oooh, exciting!!! :D [14:33:32] * elukey hugs fdans [14:33:43] also, nobody said this yet [14:33:48] THERE IS A MASSIVE SALE! [14:34:11] * elukey auto-kicks himself from the chan for spamming [14:53:32] 10Analytics: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Ottomata) Ah, sorry, I responded to hastily. The problem is from inside a Jupyter notebook. I'm a bit stumped at the moment, as I can't see why this would work on the shell but not in a... [14:53:47] haha [15:04:28] 10Analytics-Kanban, 10Patch-For-Review: Refactor Refine job scalaopt to use property files and CLI overrides - https://phabricator.wikimedia.org/T203804 (10Ottomata) p:05Triage>03Normal [15:05:24] (03PS10) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) [15:07:01] (03CR) 10Milimetric: "@mforns: this is ready for testing, with URL http://localhost:5000/dist-dev/#/et.wikipedia.org/contributing/editors/normal|line|2-Year|~to" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) (owner: 10Milimetric) [15:07:27] milimetric, ok! [15:20:12] 10Analytics: stats.wikimedia.org home page should link to wikistats 2 - https://phabricator.wikimedia.org/T191555 (10ezachte) I wonder: is there a link in Wikistats 2.0 to Wikistats 1.0 ? If not, why not? After all most of the reports that contain the above-mentioned description have no counterpart in Wikistats... [15:29:51] (03CR) 10Nuria: Update wikistats2 top endpoints date (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/461666 (https://phabricator.wikimedia.org/T204707) (owner: 10Joal) [15:32:37] (03PS11) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) [15:38:29] 10Analytics: Add CitationUsage fields to EL purging white-list - https://phabricator.wikimedia.org/T205272 (10Nuria) Yes, it is possible, you just need to submit arequest (inform of a CR) to whitelist the country configuration : https://github.com/wikimedia/analytics-refinery/blob/master/static_data/eventlo... [15:40:17] 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10Nuria) [15:41:46] 10Analytics, 10Analytics-Kanban: Try multi group by in druid 0.11 with current data - https://phabricator.wikimedia.org/T204765 (10Milimetric) p:05Triage>03Low [15:43:22] 10Analytics, 10Analytics-Kanban: Add index to mediawiki_page_create_3 - https://phabricator.wikimedia.org/T204572 (10Milimetric) a:03Ottomata [15:43:34] 10Analytics, 10Analytics-Kanban: Add index to mediawiki_page_create_3 - https://phabricator.wikimedia.org/T204572 (10Milimetric) p:05Triage>03High [15:43:36] 10Analytics, 10Analytics-Kanban: Add index to mediawiki_page_create_3 - https://phabricator.wikimedia.org/T204572 (10Nuria) p:05High>03Triage [15:44:04] 10Analytics: MIgrate all reportupdater queries to hive - https://phabricator.wikimedia.org/T205296 (10Nuria) [15:44:07] 10Analytics, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10Milimetric) p:05Triage>03High [15:45:23] 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10Milimetric) a:03mforns [15:46:07] 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog: Identify source of discrepancy between HUE query in Count of event.impression and druid queries via turnilo/superset - https://phabricator.wikimedia.org/T204396 (10mforns) Will look into this in the next couple days. [15:46:51] 10Analytics: Improve AQS request-parameter validation/normalization - https://phabricator.wikimedia.org/T204958 (10Milimetric) p:05Triage>03Normal [15:47:36] 10Analytics, 10Analytics-Kanban: Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment - https://phabricator.wikimedia.org/T204953 (10Milimetric) p:05Triage>03High [15:47:39] 10Analytics, 10Analytics-Kanban: Raise Edit Data Quality to the point where we can offer snapshots on Cloud (labs) environment - https://phabricator.wikimedia.org/T204953 (10Milimetric) [15:48:44] 10Analytics, 10Analytics-Kanban: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Milimetric) p:05Triage>03High [15:49:31] 10Analytics: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) [15:49:38] 10Analytics: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users - https://phabricator.wikimedia.org/T204950 (10Milimetric) p:05Triage>03Normal [15:55:18] 10Analytics, 10Performance-Team: Rename column on old hive data for a few tables - https://phabricator.wikimedia.org/T204922 (10Milimetric) We talked this over and are hesitant to do this kind of work, because we want to encourage backwards-compatible changes going forward. Is it too much of a pain for you to... [15:56:00] 10Analytics, 10Analytics-Wikistats: 500 error on wikimedia stats.wikimedia.org - https://phabricator.wikimedia.org/T205163 (10Milimetric) a:03ezachte [15:56:33] 10Analytics: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Milimetric) p:05Triage>03High [15:56:49] 10Analytics, 10Analytics-Kanban: heirloom-mailx fails trying to send out email from SWAP notebook - https://phabricator.wikimedia.org/T168103 (10Milimetric) a:03Ottomata [15:59:01] 10Analytics, 10Analytics-Kanban, 10Readers-Web-Backlog, 10Wikimedia-Site-requests, and 2 others: Turn on MinervaErrorLogSamplingRate (Schema:WebClientError) - https://phabricator.wikimedia.org/T203814 (10Milimetric) [16:04:55] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Milimetric) a:03Nuria [16:41:41] 10Analytics: stats.wikimedia.org home page should link to wikistats 2 - https://phabricator.wikimedia.org/T191555 (10Nuria) >If people want detailed reports on content, editors and edits ( like those reports listed on https://www.mediawiki.org/wiki/Analytics/Wikistats/DumpReports/Future_per_report ), Wikistats 1... [17:04:06] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Nuria) Misc is no longer in service, all requests have been migrated to 'text' [17:06:22] (03PS1) 10Bmansurov: Add CitationUsage and CitationUsagePageLoad to EL whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/462521 (https://phabricator.wikimedia.org/T205272) [17:07:56] (03PS2) 10Bmansurov: Add CitationUsage and CitationUsagePageLoad to EL whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/462521 (https://phabricator.wikimedia.org/T205272) [17:16:19] (03CR) 10Joal: [V: 032 C: 032] "Merging after review. Next patch to follow tomorrow" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/461666 (https://phabricator.wikimedia.org/T204707) (owner: 10Joal) [17:28:42] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10elukey) Yep exactly, cache misc (where query.wikidata.org was hosted) has been migrated to cache text, therefore all the Hive que... [17:31:15] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Ottomata) See also {T200822} and {T164609}. Sorry yall didn't know about this. I wonder if there is a better way we can configu... [17:31:45] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Ottomata) > all the Hive queries (and related) should be using 'webrequest_text' from now on. e.g. `WHERE webrequest_source = 'te... [17:33:13] (03PS1) 10Framawiki: app.py: make it possible to block a user from the site [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T104322) [17:33:43] (03PS2) 10Framawiki: app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T104322) [17:34:48] (03CR) 10Nuria: Add CitationUsage and CitationUsagePageLoad to EL whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/462521 (https://phabricator.wikimedia.org/T205272) (owner: 10Bmansurov) [17:35:20] 10Analytics, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog: Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10chelsyx) [17:35:44] (03PS3) 10Framawiki: app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T104322) [17:36:11] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Nuria) a:05Nuria>03mpopov [17:36:40] (03PS4) 10Framawiki: app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T205286) [17:37:08] (03CR) 10Zhuyifei1999: [C: 031] app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T205286) (owner: 10Framawiki) [17:38:04] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Nuria) Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abe... [17:43:14] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10mpopov) Thanks for looking into it, @Nuria! And for confirming, @elukey @Ottomata! :) A note for #operations: this is not the fi... [17:47:00] (03CR) 10Framawiki: [C: 032] app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T205286) (owner: 10Framawiki) [17:47:53] (03Merged) 10jenkins-bot: app.py: make it possible to block a user from running queries [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/462534 (https://phabricator.wikimedia.org/T205286) (owner: 10Framawiki) [17:52:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) At the Analytics Engineering offsite last week, we were talking about how the current naming of the various Modern Event Platform compon... [17:58:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [18:05:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [18:06:42] * elukey off! [18:10:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201068 (10Ottomata) [18:11:28] Hey ottomata - Nice prose on https://phabricator.wikimedia.org/T185233#4611779 :) [18:12:21] ottomata: something that could be added as a matter of example of decorellation between streams and schemas is the idea of having 2 streams sharing the same schema (one for Android, the other iOS for instance) - Just a detail though :) [18:17:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Scalable Event Intake Service - https://phabricator.wikimedia.org/T201963 (10Ottomata) [18:17:32] mforns: Please take a look at https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/462521/ there might be other alternatives I am not thinking of [18:17:36] joal we actually do this currently with the resource-change event, that is used (i believe) to expire restbase caches [18:17:45] nuria, ok [18:18:14] or uh, something: https://github.com/wikimedia/mediawiki-event-schemas/blob/master/config/eventbus-topics.yaml#L65 [18:18:20] ottomata: multiple streams - same schema ? Great :) [18:22:58] hola mforns :) quick question about data purging. we want to whitelist fields in our citation usage schema, retaining session tokens. https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/462521/2/static_data/eventlogging/whitelist.yaml@71 [18:23:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10Ottomata) p:05Triage>03Normal [18:23:19] hey miriam_ I was just looking into this [18:23:22] would it be ok if we hash page id - title, but retain revision id? [18:23:23] :] [18:23:27] ahh mforns many thanks!! [18:24:59] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10Ottomata) [18:26:35] miriam_, I see many fields that potentially hold user interests, like revision_id, page_id, page_title, link_text, link_url, footnote_number and citation_identifier_label [18:26:46] I assume you want to keep all those [18:27:11] mforns: yes, ideally yes [18:27:58] miriam_, so because of that, I'd say that we should avoid potential identifiers as much as we can [18:28:08] page_token is fine [18:28:24] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10Ottomata) [18:28:27] but session_token is dangerous... [18:29:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Event Schema Repository - https://phabricator.wikimedia.org/T201063 (10Ottomata) [18:29:20] especially, because mw.user.sessionId() is shared by all instrumentations AFAIK [18:29:29] mforns: sorry would you mind explaining the difference? [18:29:56] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201063 (10Ottomata) [18:30:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Event Schema Registry - https://phabricator.wikimedia.org/T201063 (10Ottomata) [18:30:16] so, mw.user.sessionId() returns the same id, regardless of schema you're trying to send [18:30:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [18:30:57] meaning, i.e. MobileWebBlahBlah will have the same session_id as CitationUsage for the same user at the same period [18:31:10] mforns: yes [18:32:10] miriam_, so, imagine MobileWikiBlahBlah has some partial identifiers that are OK in the context of MobileWiki, because that schema does not contain any user-preference fields [18:32:38] but if you join MobileWikiBlahBlah with CitationUsage, then you have identifiers on one side, and user-preference on the other [18:32:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Configuration Service - https://phabricator.wikimedia.org/T205319 (10Pchelolo) As an **engineer**, I want to specify concrete settings for different topics like the number of partitions or the... [18:33:00] the session_token lets you do that join [18:33:18] mforns: I see, do you think we can then hash session_token [18:33:37] mforns: and make it unique to our schema? [18:33:46] miriam_, if we salt it and rotate every 90 days yes [18:34:17] but then you won't be able to link events across quarters, only within the same quarter [18:34:42] mforns: so if we hash session_token, salt it, rotate every 90 days, do you think we could keep revision_id, etc? [18:34:49] yes [18:35:38] mforns: ok! last question: is the hash schema-specific? [18:35:57] hmmmmmm.... [18:37:18] miriam_, you made me think that... salting and hashing won't make much of a difference... [18:37:45] because, no, the salt and hash of a cross-schema field, will continue to be cross-schema... [18:37:53] I'm getting back a 500 Internal Server Error trying to access superset. Is that known/expected? [18:38:25] Specifically, getting the message "AttributeError: 'bool' object has no attribute 'login_count'" when I access https://superset.wikimedia.org/login/ [18:38:37] mforns: oh so, it's not schema-specific? [18:38:54] miriam_, theoretically, you would be the first ones doing that, so you could do it and be ok, but then the next schema using the same solution, would be joinable with yours... [18:39:04] no, it's cross-schema [18:39:12] because the session_token is cross-schema [18:39:12] right, cross schema [18:39:51] miriam_, is it possible for you to generate a original uuid? not mw.user.sessionId()? [18:40:17] mforns: sorry, what do you mean? [18:40:33] mforns: it is possible to generate it per page [18:40:43] cc miriam_ [18:40:53] but not per session [18:40:58] instead of calling mw.user.sessionId() from the instrumentation, generate a uuid there and store it somewhere? [18:41:33] mforns: yes, we can generate a uuid different from session_token [18:41:35] though using cookies or localstorage might be controversial, even if we salt and hash them... [18:41:52] mforns, miriam_ : teh problem with taht is that is no longer a session token [18:42:20] mforns: it is just a random identifier that they would need to roll out their own code to persist per session (not trivial) [18:42:30] yea [18:42:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10Ottomata) [18:43:07] mforns: having a per-pageview token is supported but that is not carried across pageviews [18:43:17] yea [18:43:22] mforns: so it will only match events sent within the same pageview [18:43:50] marlier: just clear cookies [18:43:54] nuria, having a different salt for each schema would work [18:44:18] marlier: or start an incognito session [18:44:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Scalable Event Intake Service - https://phabricator.wikimedia.org/T201963 (10Ottomata) BTW, in case you missed it, I just updated {T185233}, the main parent task that describes the difference co... [18:44:25] marlier: and let us know [18:45:10] nuria: no go. Permissions issue, maybe? Is access to superset restricted in a different way than access to wikistats and etc? [18:46:01] mforns,nuria: could we have a different salt for each schema? or, could we generate a random UUID for each session_token, and retain the UUID only? [18:46:05] miriam_: so we understand, what is the research trying to stablish? user patterns of clicking on citations? user patterns on clicking on citations per category? [18:46:19] marlier: can you access http://turnilo.wikimedia.org? [18:46:32] yep [18:46:51] marlier: looking [18:46:51] marlier: mmmm....bad turnilo give me asec, what is your ldap name? [18:47:25] nuria: user needs to be created manually (email unique bug) [18:47:38] nuria: this is the bug with adding a new user [18:48:00] 10Analytics, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog: Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10chelsyx) [18:48:05] ottomata: ya, i though that was fixed on this version, will add ian with marlier ldap right? [18:48:18] marlier: is your ldap username marlier? [18:48:22] imarlier [18:48:22] nuria: doing it [18:48:25] k [18:48:35] joal: i am doing it [18:48:39] ok [18:48:45] nuria: basically both. [18:48:51] nuria: it is fixed only for the very first person that tries to get an automatic account created [18:48:53] daisy won that race [18:49:04] 10Analytics, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog: Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10JMinor) p:05Triage>03Normal [18:49:24] marlier: is "marlier" your LDAP? [18:49:33] nuria: imarlier [18:49:39] Including first initial [18:49:54] marlier: now it should work [18:50:19] nuria: we would need to retain a form of uuid for users, to make sure that we draw our statistics correctly (if we aggregate data at, say pageview level, statistics might be biased by super-users) [18:50:49] miriam_: right [18:51:08] dropping for tonight team - see you tomorrow [18:51:09] miriam_: but stats per user do not necessarily would need pageview ids [18:51:12] joal: ciao [18:51:45] nuria: you mean page ids? [18:52:22] miriam_: seems that easiest is to do away with pageview_ids and hash sessionIds so you can infer quite a bit from users ' behaviour around citations w/o page specifics [18:52:42] nuria: joal: ottomata: working now, thanks all! [18:52:49] miriam_: a subsequent experiment would need to address behaviour arround categories [18:52:51] nuria, miriam_, maybe there's another way to tell super-users apart from regular users that does not involve tokens? [18:52:52] *around [18:54:59] nuria: sorry just to clarify. We should hash sesison ids and remove page_ids/title? [18:56:26] miriam_: hash them too [18:56:44] so you would have (sessioId, pageId) pairs [18:58:31] cc mforns [18:58:33] 10Analytics: Migrate all reportupdater queries to hive - https://phabricator.wikimedia.org/T205296 (10Framawiki) [18:58:51] nuria, I'm not completely understanding [18:59:29] nuria, you mean hash page_ids? [18:59:43] which will give you stuff like for session 001 : (001, 003, 5), (001, 004,2) which means that user with session 001 visited pages 003 and 004 and clicked on citations 5 times and 2 times respectively [18:59:56] mforns: right, hash pageids [19:00:17] nuria: ok! So we should hash page_id; page_title; session_id [19:00:28] mforns: as they are not needed (i think) to asses user behaviour around citations, amkes sense? [19:00:33] *makes sense? [19:00:54] miriam_, how do you use all user-preference data, like page_id, page_title, urls and stuff? [19:01:14] would it be ok for you to hash all those? [19:02:15] mforns: ideally we could live without page_id / title. But the urls etc are the data on citations we are looking at [19:03:05] mforns: on meeting will check in a bit [19:03:11] mforns: for most things we might need page_id as well [19:03:12] k [19:03:34] mforns: at least revision_id [19:03:57] from which we can get topic/quality and other properties of the page from ORES [19:04:35] I see [19:05:21] miriam_, and are you using session_token for something else than telling users from super-users apart? [19:06:21] maybe we can find another way to do that [19:07:16] mforns: well we are using it to identify users [19:07:26] hm [19:07:35] mforns: to see for example, how many times a user converts a pageview into a citation [19:07:46] mforns: not identify, tell users apart [19:07:51] aha [19:08:04] miriam_, I just had an idea that might help here [19:08:24] essenctially we use session_id = user_id [19:08:24] but I have to think more about it, and tell the team, when nu-ria is back from meeting [19:10:02] mforns: ok, thanks [19:10:51] mforns: in short, we need to aggregate data at user-level, not at pageview or click level: and that is why we need session_token :) [19:11:16] understand [19:12:20] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Neil_P._Quinn_WMF) >>! In T185233#4611779, @Ottomata wrote: > One of the descriptive and confusing concepts I mention is a 'schema topic usage'.... [19:20:22] miriam_, what if the session_token 90-day salted hash was not cross-schema, would that bother you? [19:23:00] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) Yes the right track for sure! I'll add that we will be using event 'stream' to very technically refer to any semantically grouped set o... [19:23:26] mforns: it would be ok to have not cross-schema salted hash for session token, however we might need to cross CitationUsage with PageLoadCitationUsage cc pirroh who just joined, he is in the research team [19:23:36] working on citation usage. [19:23:53] miriam_, but only with PageLoadCitationUsage? [19:24:06] mforns: yes [19:24:21] that schema is not ready no? [19:24:32] yet [19:24:47] mforns: ah yes, it's running now [19:25:15] mforns: for this round of data which is about to be purged, we are looking at citationusage only [19:25:19] miriam_, shouldn't it be here? https://meta.wikimedia.org/wiki/Schema:PageLoadCitationUsage [19:25:56] mforns: sorry -- late here: I inverted the word https://meta.wikimedia.org/wiki/Schema:CitationUsagePageLoad [19:26:03] oh ok! [19:27:31] mforns: so, for this round, we need citationusage only, a schema-specific hash would work [19:27:42] miriam_, I think there's a solution that would work (and it's not super complex), but need to check [19:27:51] will create a task and discuss it tomorrow with the team [19:28:02] 10Analytics, 10Product-Analytics, 10Wikipedia-iOS-App-Backlog: Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data - https://phabricator.wikimedia.org/T202664 (10chelsyx) [19:28:03] mforns: for the next, we would *ideally* need to have same ash for pageload and citationusage [19:28:10] it might be that there are cryptographic issues though... [19:28:13] mforns: thanks a loot! [19:28:19] yea, that would be ok with the solution [19:28:35] np! [19:29:39] mforns: amazing! Please keep us posted :) I think a little dinner break now :) [19:32:40] mforns: thanks a lot! the solution that miriam_ proposed (matching hash of session_token in both the pageload and the citationusage schema) is perfect. [20:00:08] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Operations, 10Puppet, 10Readers-Web-Backlog (Readers-Web-Kanbanana-Board-2018-19-Q1): exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottoma... [20:01:21] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, 10Wikidata-Query-Service: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10mpopov) >>! In T204415#4611729, @Nuria wrote: > Assigned to @mpopov Again, our apologies that the data sources are hardcoded like... [20:01:59] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) I think ^ is what is needed (sorry wa... [20:15:52] 10Analytics, 10Performance-Team (Radar): Rename column on old hive data for a few tables - https://phabricator.wikimedia.org/T204922 (10Imarlier) [20:16:31] 10Analytics: Allow "scoped" EventLogging sanitization hashes - https://phabricator.wikimedia.org/T205331 (10mforns) [20:18:20] (03PS12) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) [20:19:14] 10Analytics: Allow "scoped" EventLogging sanitization hashes - https://phabricator.wikimedia.org/T205331 (10mforns) [20:20:41] (03CR) 10Milimetric: "ok, latest patch also addresses:" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) (owner: 10Milimetric) [20:24:25] 10Analytics: Allow "scoped" EventLogging sanitization hashes - https://phabricator.wikimedia.org/T205331 (10mforns) Another option that sounds more safe than concatenating the scope with the field string (or with the salt) is: -> HMAC the field value with key=scope first, and then HMAC the result of that with ke... [21:31:37] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, and 2 others: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10mpopov) @Ottomata @Gehel: I tried editing `stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv` but couldn't because the fi... [21:33:39] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, and 2 others: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Ottomata) @mpopov, since that file is managed by Puppet, you'll have to make a puppet patch to change it! [21:34:29] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, and 2 others: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Ottomata) Oh sorry, misunderstood. Yes we should be able to make the output of the file writable by you somehow. [21:43:56] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Wikidata, and 2 others: Query stats dashboard not updating - https://phabricator.wikimedia.org/T204415 (10Ottomata) Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script `chgrp analytics-sea...