[03:58:49] 10Analytics, 10Product-Analytics: Wikistats API for legacy pagecounts does not have mobile data before October 2014 - https://phabricator.wikimedia.org/T235143 (10kzimmerman) [05:56:54] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10elukey) >>! In T234684#5560354, @Nuria wrote: > @elukey I think increasing timeout would do little here the 4 year query for browsers is not executable (cold cache) in druid so superse... [06:24:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10elukey) Added also a hive2druid job to the testing cluster, and fixed it with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541554... [06:30:15] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) Given the fact that the GPU on stat1005 seems to work and we h... [06:30:30] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) @Nuria thoughts? [06:53:10] 10Analytics, 10Analytics-Kanban, 10Operations, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Interesting: ` Oct 10 03:00:01 krb2001 kpropd[26599]: Connection from krb1001.eqiad.wmnet Oct 10 03:26:24 krb2001 systemd[1]: Stopping Kerb... [07:01:13] created a more official page: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos [07:05:36] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) ` elukey@krb2001:~$ sudo systemctl cat krb5-kpropd.service # /lib/systemd/system/krb5-kpropd.service [Unit] Descriptio... [07:06:04] 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse) [07:06:35] Hi team [07:12:15] o/ [07:15:32] joal: interesting discovery yesterday about druid and kerberos [07:16:07] I'm interested by the interesting! [07:16:20] s/by/in [07:16:22] sorry [07:16:23] hive2druid was failing so I tried to check all the logs, and eventually found that the map reduce job launched by the middle manager failed due to the user 'druid' not present on the worker [07:17:05] this was kinda known since yarn will run containers on the host as the user who launched the job [07:17:16] BUT it is easy to forget! [07:17:22] it is indeed! [07:17:42] IIRC before kerb, JVMs were launched as yarn user, right? [07:17:50] exactly [07:18:16] for the same reason we'll have to deploy privatedata users to all the workers [07:18:24] (withou ssh access) [07:19:29] right - This makes a lot more user-management on the workers :( [07:19:48] nah it is transparent for us, only a hiera flag [07:19:56] ok [07:19:59] it required a ton of work before in puppet [07:20:01] :P [07:20:13] This I can guess [07:21:06] joal: when you have time https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos [07:21:17] let me know that it is not clear, incomplete,etc.. [07:21:23] still WIP but ready for review [07:21:25] Will read :) [07:22:30] it is mostly ops related but I'd love if it was as friendlier as possible [07:22:33] :D [07:58:14] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Nope, it seems that puppet is causing the stop/start of kpropd and rsync: ` Notice: /Stage[main]/Profile::Kerberos::K... [08:02:20] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Two daemons make sense: ` elukey@krb2001:~$ sudo systemctl status rsync ● rsync.service - fast remote file copy progr... [08:09:12] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) >>! In T226089#5562297, @elukey wrote: > > The kadmin server seems stopping by itself, but kadmin.local works on 2001... [09:04:33] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Seems fixed now. The culprit I believe it was: ` elukey@krb2001:~$ sudo systemctl cat krb5-kpropd.service # /lib/syst... [09:11:09] Looks like something weird happened yesterday at deploy time for pageviews - Job is stuck - Will kill it and rerun it [09:13:13] !log Kill stuck oozie launcher in yarn (application_1569878150519_43184) [09:13:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:13:55] !log rerun failed pageview hour after manual job killing (pageview-hourly-wf-2019-10-9-19) [09:13:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:28:02] elukey: can you confirm me that superset uses druid1003 as broker? [10:31:28] yep [10:31:35] hm [10:31:49] it is set in the datasource config [10:32:10] joal: any issue? [10:32:52] elukey: I'm trying to get a proper understanding of what happens with the pageview-per-browser-family chart [10:33:20] joal: in my view, the 500 reported mentions a timeout from a historical port, that is 10s [10:33:38] and the chart fails exactly after 10s of loading time [10:33:49] so druid seems to do what it is told to [10:33:50] elukey: I confirm that - Now I'm willing to make sure I have a correct understanding around caching etc [10:34:11] ah ok that is another thing :) [10:34:31] :) [10:34:50] Confirmed \o/ [10:40:37] going to lunch!! [11:43:53] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10JAllemandou) Here is my understanding of the different things at play here: - Caching is made by the broker **by query** - **per-segment**. This means for a given set query-parameters... [11:44:20] elukey, nuria -- ^ Please let me know if my writing makes sense [11:46:04] yep looks good! [11:48:52] my impression was way more superfial but I thought that superset, as it is now, wouldn't have been too smart when issuing queries [11:49:10] elukey: it's actually not that bad [11:49:22] cache warmup is also tricky to make, difficult to generalize [11:49:58] elukey: the by-default way to issue a query with a group-by is well, with a groupBy - The fact that superset uses 2 topN is a very good optimisation [11:50:06] agreed about cache warmup [11:50:07] too smart in the sense of applying heuristics etc.. very complicated to do [11:51:07] here the thing would actually be: query-success through iterative querying cache warmup to mitigate timeouts :) [11:51:08] the dashboard was used to work before due to the unlimited timeouts.. [11:51:16] of course [11:51:56] elukey: maybe 1min timeout for queries is not that long if we can make sure they don't pileup ? [11:52:20] the second part is the issue I am afraid :) [11:53:23] I can't disagree :) [11:53:31] this is the first case that I can see showing up the problem.. maybe the dashboard could be refined to make less heavy queries? I mean, is it necessary to support such big queries? [11:53:40] if yes we can try with more, say 30s [11:56:52] hello joal, would you have two minutes in the batcave? I know I'm doing something wrong, but I'm not sure why [11:58:30] joining fdans [12:00:45] elukey: I don't know if we need to support so long time-periods - we should ask analysts about that [12:01:01] completely agree [12:21:00] elukey: I have read the kerb page - I mostly makes sense even for a non-ops [12:21:42] elukey: one thing to update: machines names are inconstistent over 2 lines (krb2001 or 1002)? [12:22:20] And last, an introduction line about what kerb is at very level could be nice - but the system-content and ops-doc is very good :) [12:22:26] Thanks a lot for that elukey [12:22:43] ah typo! Fixing :) [12:22:52] also will try to add more info! [12:23:33] specifically, what about kerberos you'd like to know? [12:24:58] elukey: a one-liner explaining that kerberos is an authentication system for distributed systems would be perfect, maybe with a link to https://en.wikipedia.org/wiki/Kerberos_(protocol) [12:25:11] ahhh okok ack [12:25:32] I am wondering if an intro to the team about the set up would be good [12:25:45] or if only boring since too ops related [12:26:09] elukey: about the setup I can't say, about global archi and how it works, I'd say surely [12:42:40] just tested the failover, all working! [12:49:53] \o/ [12:50:37] once bacula is configured, then we are done done done [12:50:42] really happy about it :) [12:51:26] I think that we are now ready to ask some people to test it [13:03:43] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10MoritzMuehlenhoff) >>! In T226089#5559672, @MoritzMuehlenhoff wrote: >>>! In T226089#5559492, @elukey wrote: >>>>! In T226089#... [13:19:02] 10Analytics: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10Ottomata) Yes, and I re-ran it and it didn't pick up the file; everything was rsynced according to rsync. Let's look again today @mforns [13:23:20] 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10Ottomata) > do we need to do something on the refinery/hadoop side to create the hive table Depending on the name of the s... [13:35:20] ottomata: o/ - shall we remove the el mysql consumer?? [13:38:53] yaaaa [13:38:59] gimme few mins [13:41:36] yes yes even hours :) [13:51:30] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) >>! In T226089#5562842, @MoritzMuehlenhoff wrote: >>>! In T226089#5559672, @MoritzMuehlenhoff wrote: >>>>! In T226089#... [13:53:37] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10MoritzMuehlenhoff) >>! In T226089#5562958, @elukey wrote: >>>! In T226089#5562842, @MoritzMuehlenhoff wrote: >>>>! In T226089#... [14:01:29] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Ottomata) @JAllemandou can you take a look at Martin's code and see if there is anything he can optimize? [14:07:46] haaa joal, the udf returns null when the file name is way too long [14:07:47] that makes [14:07:49] seense [14:11:35] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) [14:14:48] elukey: ok am going to merge and disable mysql consumer [14:15:44] \o/ [14:15:51] not in labs too right? [14:16:01] not in labs right [14:16:07] https://gerrit.wikimedia.org/r/c/operations/puppet/+/541359 [14:16:19] oh didnt' add you as reviewer before [14:16:38] super! [14:16:38] in labs the profile is included directly [14:16:39] not the role [14:16:45] so thee role hiera isn't applied [14:17:37] ottomata: regarding the webrequest logs, how to check whether the request was served by varnish (as recommended by nuria) ? what role does it play with respect to checking the loggedin-status? [14:23:07] mgerlach: I'm not sure I know what she means there. all webrequests are 'served' by varnish, but maybe she's saying to check if the request is cached or not? [14:23:13] that is available i believe in the cache_status field [14:23:32] as for logged-in, i don't know how you can tell that from webrequest [14:23:55] unless perhaps it is set in the x_analytics_map field [14:23:55] what is the difference between a cached and not-cached request? [14:24:15] yes, the logged-in comes from the x_analytics_map [14:24:17] ah [14:24:17] https://wikitech.wikimedia.org/wiki/X-Analytics [14:24:17] yes [14:24:19] ok cool [14:24:31] so varnish is the frontend webcache [14:24:54] most page views, especially for logged out users [14:25:12] for most of those page views, the requests do not make it to the mediawiki application servers [14:25:34] if the page has been viewed and has not been edited since the last time it was viewed [14:25:38] varnish is likely to have it in cache somewhere [14:26:06] so the http repsonse content will be served directly from varnish's cache [14:35:30] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Ottomata) > We can identify new users from visits to "Create Account" pages BTW, in the future if this is something we wanted to regularly measure and analyze, we could create a mediawiki.user-create s... [14:36:47] ottomata: so this means it will not appear in the webrequest-log? or simply the number of views might be too low? [14:37:06] no, all http requests are in webrequest table [14:37:08] including cached ones [14:37:24] cache_status will just indicate if the response was served from cache or not [14:37:29] via 'hit' [14:37:45] if it goes to mediawiki app, it will probably say 'miss' [14:37:55] but maybe 'pass' if the request is not supposed to be cached [14:38:36] mgerlach: i'm not really sure what nuria's point is :p [14:39:20] ottomata: ok, in any case, thanks for the explanation [14:39:26] i think that pages are generally not cached for logged in users (not 100% on that) [14:39:38] ottomata: sorry, i wanted to make sure all requests (even the ones not cached by varnish , like they are for logged users) were on webrequest [14:39:43] ottomata: and yes, they are [14:39:50] so, maybe she's saying that this would affect pageview behavior...? sicne page would take (slightly) longer to load? [14:39:58] ah ok [14:39:59] yes [14:40:04] then yes they are all there :) [14:40:12] hello nuria :) [14:40:56] ottomata: hola [14:41:33] ebernhardson: mind if I move some of your spark yarn docs to main spark page? [14:41:33] https://wikitech.wikimedia.org/wiki/User:EBernhardson/pyspark_on_SWAP [14:42:57] mgerlach: you can cross reference your data with new account creations per wiki per month, there will be a bunch of bot behaviour but I have to say 53 accounts overall seems quite small [14:43:06] mgerlach: for enwiki [14:43:39] mgerlach: see https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/new-registered-users/normal|bar|2-year|~total|daily [14:43:54] mgerlach: this data comes from mw itself [14:44:26] nuria: thanks for the suggestion and the pointer [14:44:35] excellent ida [14:44:38] *idea [14:44:44] ottomata: go for it [14:44:58] mgerlach: note the stagerring difference in numbers [14:45:24] hm, aren't user account creations available in mw history? [14:45:38] mgerlach: i think it will be worth assessing methodology before optimizing queries [14:45:55] ah but you want pageviews [14:45:56] right. [14:46:03] ottomata: yes, but without any info with which you can cross reference [14:46:30] ottomata, mgerlach : the reality is that what you want to measure requires custom instrumentation, i think it will be hard to do with the request stream [14:46:31] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_user_history [14:46:36] *webrequest [14:47:02] you could cross ref it in the same way, if you are just trying to verify if the approach to detect new accounts is correct [14:47:40] date, count(*) group by day_date(user_registration_timestamp) as date [14:47:41] ? [14:48:04] i guess the wikistats stuff is generated from mw history anyway :p [14:48:36] but ya mgerlach i think nuria is right, if you wanted to be precise we could actually log user-create events into its own hive table [14:48:49] what we talkin' about? [14:49:04] milimetric: https://phabricator.wikimedia.org/T234188 [14:49:05] I saw a ping on Tuesday... haven't had a chance to look at those queries [14:49:25] but, hm, nuria even if we did that [14:49:40] ok, I'll take a close look at this, see if there's anything I can help with [14:49:42] we'd need to tie the user-create event then to the page view session [14:49:54] i guess you could fingerprint the user-create event with the same fields that are in webrequest [14:50:13] ottomata: that is why it requires targeted instrumentation as in an EL funnel [14:50:22] ottomata: similar to teh work that morten and roan have done [14:50:29] oh meaning actually log the page view events too? [14:50:32] for new users? [14:50:36] ottomata:yes [14:50:38] aye ya [14:50:55] ottomata: it is already mostly done by roan on the new homepage project [14:50:58] cc mgerlach [14:51:02] nuria: i think this is the exact use case jason was talkinga bout with the wildcarded stream config stuff [14:51:10] say there is a 'pageview' schema [14:51:24] and the instrumentation knows how to construct it and send it [14:51:33] nuria ottomata: I got ~20,000 new registration events for a full week for enwiki. order of magnitude seems to be ok. worth to check in more detail [14:51:46] if we wanted to start logging all page views the first hour of new users accounts [14:52:04] then jason's fancy stream config condition stuff could be used to just do hat [14:52:05] that [14:52:08] ottomata: you need custom instrumentation to that [14:52:15] as long as the instrumentation knows how to interpret the condition config [14:52:32] yes, but i tihnk the instrumentation could be made fancy (not in eventlogging client) to use stream config to do that [14:52:44] especially for things like cohort selection and time ranges [14:52:56] ottomata: you need custom instrumentation unless you want to bloat one pageview instrumentation with all the possible use cases for which pageview needs to be sent [14:53:05] not all, but many! [14:53:21] ottomata: we want to shy away from having tokens+ pageviews as that is likely to cause many privacy issues [14:53:22] even if you just matched on things like user account age [14:53:29] ottomata: in this case you just want to load afunnel [14:53:54] ottomata: and a funnel can be logged trvially with custom instrumentation that persists state of funnel across pages [14:54:41] ottomata: i woudl not try to do that generically until we have suceeded on doing that in some cases, thus far the only funnel instrumentation we have that has worked well that i know of is the work for newcomers homepage [14:55:29] ottomata: i think that use case does not really check out with the little sophistication of our current login. [14:55:54] (03CR) 10Mforns: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [14:58:17] mgerlach: ok, nice, just like you pointed out a bunch of those are going to be fake bots jumping the not-so-sophisticated captchas [14:58:45] mgerlach: but the (citation needed) *majority* of accounts I think should be users [14:58:50] nuria: per haps! i'm not totally sure what you mean by funnel state, i think the main point is that for schemas that might be reusable like pageview, it might be possible to configure collection of pageview events to specific streams without having to write new schemas or instrumentation code every time [14:59:10] ottomata: that sounds like a good goal, agree [14:59:25] if there is some way on each page to detect A. the stream name an pageview event should be sent to and B. if that page view event should be sent (via funnel state?), then the code could just send it! [14:59:38] ottomata: but it cannot be at the cost of having a major piece of code that is aware of every single condition it can affect the login of pageview [14:59:38] so, maybe the config is just some funnel state id [14:59:44] and instrumentation keeps some information about the funnel state [15:00:03] nuria: every single condition sounds unreasonable [15:00:06] but limited conditions sounds ok [15:00:07] ottomata: like , user-edit-bucket, just-created-account, vietnamese-wikipedia-editor, ..etc [15:00:28] nuria those are funnel states ^ ? [15:01:15] ottomata: rather than having that code centralized i would give control like [15:02:38] ottomata: pageview. emitEventToAskOtherListernerWhetherItNeedsTobeLOgged() [15:02:56] ottomata: and let other "listeners" in the page log that according to their schema [15:03:09] milimetric joal if one of yall could give this a quick look whenever you have time, the backfilling is dependent on it [15:03:36] ottomata: we want to shy away from having a datastore that records a pageview and every single piece of information about that user with that pageview as it blows out our pricacy practices [15:04:31] ottomata: rather have instrumentation for schemas receive that pageview event (by some event emitter internal to the page)_ and log the info if needed, anyways , we can talk about this in bc if you want [15:11:08] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10Nuria) @JAllemandou what a great piece of insight. I think i am going to close ticket cause: - bumping timeout seemed to resolve less intense queries - pageviews per browser family f... [15:11:43] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10Nuria) [15:15:24] 10Analytics, 10Analytics-Kanban, 10Operations, 10User-Elukey: Make the Kerberos infrastructure production ready - https://phabricator.wikimedia.org/T226089 (10elukey) Tested the failover and improved the puppet code to do proper clean ups when failing back to the original state. Tested a change in password... [15:15:54] * elukey moves --^ to done [15:16:02] \o/ \o/ \o/ [15:19:22] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) [15:21:41] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) We are finally able to allow somebody external to the Analytics team to test the Hadoop test cluster. [15:22:14] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) [15:24:12] (03CR) 10Nuria: [C: 04-1] Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [15:25:04] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Nuria) 05Open→03Resolved [15:25:07] 10Analytics, 10Discovery-Search, 10Multimedia, 10Reading-Admin, and 3 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Nuria) [15:25:12] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Nuria) ta-tachannnn!!!! [15:25:46] 10Analytics, 10Analytics-Kanban: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) @EBernhardson Hi! - I am wondering if you have some spare cycles to dedicate to test kerberos in the Hadoop test cluster (see https://w... [15:27:47] (03CR) 10Nuria: "Updating per our conversation yesterday, we will be documenting the vetting of user agent data in the light of UA parser upgrade and proce" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/541557 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns) [15:32:35] nuria: sorry was in meeting, sure sounds good :) [15:32:40] i mean your idea sounds fine to me [15:33:15] i'm actually not opinionated about how client achieves stuff, i just like the idea of being able to turn on new streams of predetermined schema type via config [15:41:06] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Milimetric) [15:41:45] fdans: what's this? [15:43:44] milimetric: sorry [15:43:49] https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/540433/ [15:43:52] this :( [15:49:29] PROBLEM - Kerberos KAdmin daemon on krb1001 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/sbin/kadmind https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos%23Daemons_and_their_roles [15:53:01] (03CR) 10Milimetric: [C: 03+2] Add mediarequests tops metric endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 (owner: 10Fdans) [15:53:17] gerrit is slooooooooooooooow [15:53:40] milimetric: I would use many more adjectives than that but yes [15:53:55] (03Merged) 10jenkins-bot: Add mediarequests tops metric endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/540433 (owner: 10Fdans) [15:54:02] thank you milimetric [15:54:18] np, good change [16:00:13] Thanks a lot milimetric for the review - I'd have done it later (was gone for kids at the time - forgot to ping here) [16:00:48] milimetric: let's please talk about this in standup: https://phabricator.wikimedia.org/T235189 [16:01:14] milimetric: I think we can help on that work but i do not think we should own it [16:01:50] nuria: sure, happy to talk about it. I'm making a KR for myself just to drive it, not own it [16:14:33] ty ebernhardson https://wikitech.wikimedia.org/wiki/SWAP#Launching_as_SparkSession_in_a_Python_Notebook [16:34:43] RECOVERY - Kerberos KAdmin daemon on krb1001 is OK: PROCS OK: 1 process with args /usr/sbin/kadmind https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos%23Daemons_and_their_roles [16:35:08] :) [16:40:14] ottomata: excellent. hopefully it helps someone and doesn't just confuse them [16:43:04] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Milimetric) [16:52:58] hey, are there some specs on the event-gate HTTP API (to push events)? Or perhaps a ready to use Java client? [16:55:21] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10fdans) p:05Triage→03Normal [16:57:35] ok found https://github.com/wikimedia/eventgate#usage [17:06:19] 10Analytics, 10Analytics-Kanban: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10fdans) [17:06:37] 10Analytics, 10Analytics-Kanban: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10fdans) p:05Triage→03High [17:08:11] 10Analytics, 10Product-Analytics: Wikistats API for legacy pagecounts does not have mobile data before October 2014 - https://phabricator.wikimedia.org/T235143 (10Milimetric) Yes, unfortunately we don't have mobile data going back before that. Before 2014, we had: * per-article pagecounts counted from the de... [17:08:21] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Krinkle) [17:08:25] 10Analytics, 10Product-Analytics: Wikistats API for legacy pagecounts does not have mobile data before October 2014 - https://phabricator.wikimedia.org/T235143 (10Milimetric) 05Open→03Resolved a:03Milimetric [17:08:39] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Krinkle) What is the timeline for this development, in which quarter is the perf review needed? [17:09:34] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5563826, @Krinkle wrote: > What is the timeline for this development, in which quarter is the perf r... [17:09:54] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10fdans) p:05High→03Unbreak! [17:10:57] 10Analytics, 10Analytics-Cluster: 500k files in hdfs /tmp - https://phabricator.wikimedia.org/T234954 (10fdans) p:05Triage→03High [17:10:59] 10Analytics, 10Analytics-Cluster: Create HDFS /tmp/ cleaner - https://phabricator.wikimedia.org/T235200 (10Ottomata) [17:11:05] 10Analytics, 10Analytics-Cluster: Create HDFS /tmp/ cleaner - https://phabricator.wikimedia.org/T235200 (10Ottomata) p:05Triage→03High [17:11:11] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10fdans) p:05Unbreak!→03High [17:17:27] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Milimetric) I looked at this and it's a clever way to get some rough information to answer the main question. But I just wanted to point out: it's not by accident that this kind of correlation is hard... [17:20:53] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10fdans) @nettrom_WMF can we confirm that the range to be deleted is beginning of time up to Oct 1? Would this be deleting all fields? [17:28:27] 10Analytics, 10Product-Analytics: Wikistats API for legacy pagecounts does not have mobile data before October 2014 - https://phabricator.wikimedia.org/T235143 (10kzimmerman) I want to be sure I understand: - Erik compiled data from sampled logs which are no longer available - We may have some aggregate d... [17:28:40] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Nuria) To be perfectly honest i doubt we can do this work by EOQ, we can probably start some of this work this quarter but no... [17:29:32] * elukey off! [17:45:12] 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Epic: Prototype client to log errors - https://phabricator.wikimedia.org/T235189 (10Nuria) I think an attainable goal is to have an client side error library ready in vagrant for @Krinkle to CR by end of quart... [17:58:43] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10nettrom_WMF) @fdans : Can confirm that the range to be deleted is the beginning of time (which is like April 2019) up to Oct 1. And yes, all fie... [18:12:05] mforns: let's look at the history rsync [18:13:36] dcausse: also [18:13:36] https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate#WMF_EventGate_implementation [18:13:37] and [18:13:40] https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate#Development_in_Mediawiki_Vagrant [18:14:05] using the eventgate-wikimedia repo (from gerrit) is probably better, as it has more specific wmf stuff, including configs to use schema repos [18:14:12] the way we do [18:14:50] https://github.com/wikimedia/eventgate#eventgate-wikimedia-implementation-and-use-as-a-dependency [18:15:55] ottomata, yes! [18:17:43] mforns: what is the file that is mising? [18:17:48] rsync says everything is copied [18:18:42] ottomata, everything after wikidatawiki alphabetically [18:18:44] for example [18:18:48] zhwiki [18:19:09] see: https://dumps.wikimedia.org/other/mediawiki_history/2019-08/zhwiki/ [18:19:31] interesting! [18:19:37] those files exist but are not world readable.. [18:19:40] but it's present in: stat1007:/home/mforns/mediawiki_history_dumps//2019-08/zhwiki [18:19:46] hm... [18:19:57] dunnow why that would be... [18:20:11] I can see they are 644 [18:20:28] like the others no? [18:20:37] ah! you mean in dumps.wikimedia.org? [18:20:52] yeah [18:20:53] huh [18:20:59] so -a is supposed to preserve perms [18:21:01] but it didint'? [18:21:06] addling -p to the command [18:21:08] fixed it! [18:21:10] I can see them now! [18:21:19] cooooool :] [18:21:38] thank you! [18:21:53] huh the puppetized one does [18:21:54] --chmod=go-w [18:21:55] I wonder why some wikis have data ion 2025 [18:21:55] which should work [18:22:04] 2025? [18:22:07] yea [18:22:10] oh [18:22:10] weird [18:22:12] https://dumps.wikimedia.org/other/mediawiki_history/2019-08/zhwiki/ [18:22:29] yeah [18:22:35] should filter that out in the dump job [18:23:15] anyway, thanks! I'll move the task to done :] [18:23:31] ellery! [18:24:08] there's a single record [18:24:13] in the zhwiki one anyway [18:24:39] 10Analytics, 10Analytics-Kanban: dumps.wikimedia.org/other/mediawiki_history is missing some files - https://phabricator.wikimedia.org/T235112 (10Ottomata) Hm, rsync -a did not properly preserve permissions. Adding -p fixed. [18:28:30] (03CR) 10Mforns: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [18:31:21] (03CR) 10Nuria: [C: 04-1] Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [18:38:42] nuria: in betterworks [18:38:58] i made an objective with a due date in Q4, but it has KRs with other quarter due dates in it [18:39:12] now it only shows up in the Q42020 view [18:39:42] AH NM [18:39:45] i just answered my own q [18:39:49] i had to make the start date Q2 [18:41:36] ottomata: on meeting , can talk in a bit [18:42:13] nuria: no worries i figured it out [18:42:20] for a second i thought i had deleted my okr i just made [18:42:33] but it just disappeared from the default view since the time for it was in the future [19:04:00] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) @JAllemandou let's make this happen! The .deb is ready to go! :) [19:54:33] milimetric, joal: i think we need to add to this page the descriptions of tables for geoeditors_daily /monthly.. or .. are those in another wikitech doc that I am not finding [20:05:29] nuria: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors [20:05:50] (btw, my secret is I google "wikitech <>" and it's always the first result, it's amazing how good Google is) [20:05:50] milimetric: ok, will link [20:06:21] like check out https://www.google.com/search?q=wikitech+druid+maintenance [20:07:30] milimetric: indeed [20:57:08] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) Confirmed Spark 2.4.4 works with Refine in local mode with existing refinery-job jar (compiled with 2.3.1) and a new refinery-job compiled with 2.4.4. Running in YARN seems t... [20:58:33] (03PS1) 10Ottomata: Bump spark.version to Spark 2.4.4 in pom.xml [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/542226 (https://phabricator.wikimedia.org/T222253) [21:00:07] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10Ottomata) @JAllemandou what do we need to do to test other Refinery jobs, mostly just test mw history somehow? [22:21:01] 10Analytics, 10Product-Analytics, 10Growth-Team (Current Sprint): Help panel: delete sanitized data from before Oct 1 - https://phabricator.wikimedia.org/T234870 (10MMiller_WMF) @fdans -- thank you for working on this. I just want to mention that we consider this a high priority task, and hoping for it to b...