[05:44:47] (03PS5) 10Lex Nasser: Modify external webrequest search engine classification and add tests. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) [08:04:08] Good morning tem [08:04:11] +a [08:05:11] o/ [09:14:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) Today I started with `spark2-submit --conf spark.io.encryption.enabled=false --conf=spark.network.crypto.enabled=false --conf spark.dynamicAllocation.enabled=f... [09:22:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) commons-crypto 1.0.0 is contained in the spark-assembly jar on HDFS, but possibly this is only a Python issue with crypto libraries? [09:22:14] joal: I added some thoughts to --^ [09:22:30] the AES RCP crypto option seems to be the culprit for python [09:22:40] iirc we have never had issues with scala-based code right? [09:23:11] elukey: nope - oozie only, but scala worked straight [09:24:41] interesting elukey !! [09:27:10] I can't think about something else [09:27:30] I tried to check DEBUG logs etc.. but there nothing really clear that points to a precise direction [09:30:19] :S [09:30:41] elukey: need to go AFK, will be back in some [09:31:23] o/ [10:43:36] elukey: helloooo, I reviewed your patch for the v2 move, nothing looks weird to me [10:43:41] thank you for doing this luca :) [10:44:41] fdans: hellooo!! [10:45:02] elukey: didn't +1 because you said it was still wip [10:47:07] fdans: I checked for more options but this one looks the easiest [10:47:09] so I think it is final [10:47:42] I'd need to test the config a bit more in my test env, and possibly I'd love to have a better test in a vps in labs [10:47:53] there is the risk of making a mess in my opinion [10:49:33] fdans: do you think that we could test this in labs? Maybe applying thorium's puppet role in there and adding "Fake" html content to mimic wikistats1 [10:49:41] then we add the change and test it [10:49:55] more annoying I know [10:52:17] elukey: that sounds like a lot, but I'm with whatever you think [10:52:40] remember that we agreed it was ok to downtime wikistats 1 for a little while [11:02:51] fdans: sure but it doesn't mean that it needs to happen. Now that we have more visibility, is seems not great to serve weird content to users.. there is also the problem of caching at varnish/ats level, that might be messed up [11:03:08] it shouldn't be much, just a test in labs [11:03:12] will try to do it later on [11:37:15] * elukey lunch! [12:09:56] (03PS6) 10Fdans: Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) [12:38:05] 10Analytics, 10Pywikibot, 10Wikimedia-Site-requests, 10User-Urbanecm: Provide some Pywikibot usage statistics for Python2.7 and Python3.x - https://phabricator.wikimedia.org/T242157 (10Urbanecm) Is there anything else I can help you with, or can we close this task? [12:54:52] !log restart hue to re-apply user hive limits (again) [12:54:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:14:33] fdans: are you there? [13:14:52] * joal watches elukey summoning an fdans [13:15:37] elukey: sorryyy I’m in the lunch sphere [13:15:56] ah np! [13:28:24] 10Analytics, 10Pywikibot, 10Wikimedia-Site-requests, 10User-Urbanecm: Provide some Pywikibot usage statistics for Python2.7 and Python3.x - https://phabricator.wikimedia.org/T242157 (10Xqt) Probably it would be interesting to have a new statistic in few months but this is enough for me now. Thanks a lot. [13:29:46] 10Analytics, 10Pywikibot, 10Wikimedia-Site-requests, 10User-Urbanecm: Provide some Pywikibot usage statistics for Python2.7 and Python3.x - https://phabricator.wikimedia.org/T242157 (10Urbanecm) 05Open→03Resolved Okay, closing to hide it from my dashboard. Feel free to re-open once you need a followup... [13:42:28] fdans: for when you are ready (leaving some notes) [13:42:31] ssh -L 8088:wikistats.eqiad.wmflabs:80 wikistats.eqiad.wmflabs [13:42:41] and then localhost:8088 in the browser [13:42:54] I have created the v1/index.html and v1/test1.html pages [13:43:01] plus there is the usual v2/ stuff [13:43:29] in theory just doing localhost:8088/test1.html should work, and of course localhost:8088 should lead to wikistats v2 [13:43:34] let me know if you see anything weird [13:43:41] I found some issues in my config, just updated my patch [13:43:47] should be good now [13:45:52] elukey: everything looks good to me! [13:47:04] elukey: one thing [13:47:41] so navigating to localhost:8088/test1.html shouldn't rewrite to localhost:8088/v1/test1.html right? [13:48:19] no it should give you the content with /test1.html [13:48:34] elukey: sounds good [13:48:57] basically httpd goes first into v2/ and then into v1/ to check if anything is there [13:50:52] links with /v2/etc.. should keep working [13:51:03] also old v1 links should work as well [13:51:07] anything missing? [13:52:11] I am currently backupping all htdocs into /srv/backup_wikistats_1 on thorium [13:52:17] after that we can merge and do the procedure [13:56:49] elukey: awyissss [14:00:50] elukey: I think links with /v2/ should be redirected to / [14:01:13] otherwise we're keeping up two sets of urls serving the same purpose [14:07:50] should be doable with a RedirectMatch ^v2/ / in theory [14:07:58] does it need to be done now or as second step? [14:08:14] fdans: --^ [14:09:27] elukey: I'd rather have it now unless it's a pain, sorry luca [14:14:09] fdans: with the current settings it is a bit of a pain, since behind the scenes the /v2/ is used so if redirected it causes a loop [14:14:20] but I'll try to find a solution [14:15:00] fdans: one question though - why are you saying that we maintain two sets of urls? [14:15:09] elukey: I mean I could do it on the client [14:15:41] fdans: let's bc for a sec if you have time [14:15:48] elukey: yess [14:24:18] !log Releasing hdfs-tools 0.0.3 to archiva [14:24:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:30:58] elukey: o/ i should have time to brainstorm on newpyter this week if you do, i started collecting thoughts on friday in etherpad [14:31:05] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Milimetric) The output is here: https://analytics.wikimedia.org/published/datasets/periodic/reports/metrics/structured-data/ [14:31:20] ottomata: hello! Sure let me know a day so I can prep [14:34:16] elukey: what is the bastion name for labs so i can help test? [14:35:40] nuria: any labs bastion is ok [14:35:56] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Nuria) 05Open→03Resolved [14:36:02] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Nuria) [14:36:10] elukey: this one no longer exists though : ebastion-eqiad.wmflabs.org [14:36:11] but I think that there is a limit in my config, it is not easy to add redirects for something like /v2/ to / [14:36:16] 10Analytics: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) p:05Normal→03Unbreak! [14:36:19] 10Analytics: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) p:05Unbreak!→03High [14:36:25] elukey: https://etherpad.wikimedia.org/p/newpyter [14:36:35] ottomata: yes yes I have seen it :) [14:36:37] oh ok [14:36:56] i dunno, tomorrow? i still don't know how to do some of the main use cases, even though JEG makes things easier [14:37:07] i might try and set some stuff up, maybe in labs...or maybe test cluster? [14:37:16] sure [14:37:24] elukey: i have not looked at config but letting v2 remain is also an option [14:37:45] so I have restricted.bastion.wmflabs.org as bastion, but not sure if you can use it [14:37:51] lemme check the sre docs [14:39:15] fdans: what bastion do you have in the config to ssh to labs? [14:39:55] (03PS1) 10Joal: Deploy hdfs-tools 0.0.3 [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/564022 [14:39:55] nuria: https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances#Accessing_Cloud_VPS_instances [14:40:04] ottomata: --^ :) [14:40:06] found it, took a bit :) [14:40:13] ooof, this service node template update is not at all smooth [14:40:23] nuria: bast1002.wikimedia.org [14:40:31] oh -shaded, nice [14:40:32] fdans: for labs? [14:40:38] oh wait [14:40:43] primary.bastion.wmflabs.org ? [14:40:45] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Deploy hdfs-tools 0.0.3 [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/564022 (owner: 10Joal) [14:40:53] yes exactly, matches the docs [14:40:55] primary.bastion.wmflabs.org [14:41:07] I think that Nuria is already testing [14:41:55] nuria: I was telling luca that we can also remove the v2 client side [14:42:22] fdans: we do not need to do that now [14:43:58] fdans: I think we can live with the v2 for now if it makes things easier [14:44:31] nuria: yep, the task is fulfilled either way [14:45:48] nuria: my main doubt though is that if this change adds so many constaints to our settings it might bite us in the future [14:46:29] elukey: ok, maybe we should reevaluate how are we doing this [14:47:02] I am still looking the redirect, it might be that I am currenlty not caffeinated enough [14:47:32] speaking if which, I am going to get a coffee [14:47:36] maybe it will help :) [14:50:08] fdans: are you talking about SEO considerations though? [14:50:49] milimetric: yeah that was part of my thinking [14:51:26] hm, I wonder if that's the lowest hanging fruit, I never seriously thought about it for ws 2 [14:51:30] fdans: seo for older urls correct? [14:51:35] fdans: as it only applies there [14:51:36] no, for the new site [14:51:38] ottomata: can I deploy hdfs-tools-deploy from deploiyment? [14:51:57] nuria: elukey: milimetric another consideration is that right now all wikipedias point to /v2 [14:51:59] nuria: you get marked down if you have two URLs pointing to the same content, crawlers don't like that [14:52:17] which is our main source of traffic, by far [14:52:30] fdans: it's ok to have a redirect from /v2 to / from the SEO point of view [14:52:37] just not ok to serve the same content at two addresses [14:52:46] milimetric: yeah that's what I'm saying [14:52:47] milimetric: that is not a problem if you use canonical to resolve multiple urls [14:52:47] so the wikipedia links wouldn't break, but we should update to save the redirect [14:53:29] Maybe we can take some time today in groskin to talk about this? [14:53:31] oh there's a meta tag to say / is the "canonical" site? [14:53:33] sure [14:53:37] milimetric: ya [14:53:55] cool, yeah, ok, mini-review for SEO on ws2 at grosking today [14:55:21] seo isn't a big deal to me, it's more the fact that allowing both / and /v2 to be navigated seems wrong to me [14:58:35] yes [14:58:38] joal proceed! [14:58:41] fdans: i see, that might be the easiest option for now [14:58:47] ack ottomata :) [15:00:30] !log Deploy hdfs-tools 0.0.3 using scap [15:00:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:00:37] fdans: also i think we need to talk about how are users going to access the main screen of wikistats1 , right? Ideally stats.wikimedia.org will take you to the new wikistats but you would have there an link to the older site, correct? [15:01:58] joal: do you want to re-do last week's tests? [15:02:10] yes please elukey :) [15:03:09] elukey: batcave? [15:04:18] joal: sure! [15:05:23] nuria: we didn’t task this but yea we could add a link to the footer [15:25:17] 10Analytics: [Wikistats2] Normalize pageviews per country by population - https://phabricator.wikimedia.org/T242621 (10mforns) [15:31:48] 10Analytics, 10Operations: Grant access to archiva-deployers for zpapierski - https://phabricator.wikimedia.org/T242622 (10dcausse) [15:35:13] 10Analytics, 10Operations: Grant access to archiva-deployers for mstyles - https://phabricator.wikimedia.org/T242624 (10dcausse) [15:37:07] elukey: any chance you have a minute to approve T242622 ? [15:37:08] T242622: Grant access to archiva-deployers for zpapierski - https://phabricator.wikimedia.org/T242622 [15:37:29] We're going through the deployment process for WDQS with Zbyszko and we're blocked on it [15:38:25] hey gehel, please go ahead, for these things let's just keep us informed but no need for our approval [15:38:43] ok, then I'll go ahead! Thanks! [15:38:48] and to keep you informed: T242624 [15:38:49] thanks for the heads up! [15:38:49] T242624: Grant access to archiva-deployers for mstyles - https://phabricator.wikimedia.org/T242624 [15:38:58] ack :) [15:39:11] you also know 10 times more than me how archiva works gehel [15:39:12] :D [15:39:20] so it is better handled in your hands! [15:39:34] you're still the one who will need to deal with the crap if they break things :) [15:39:55] gehel: hahahah yes correct but I can blame you as well in the process [15:40:10] damn, there's no escape for me :) [15:40:27] that's why you prefer not to formally approve those request: plausible deniability := [15:44:30] you got it [15:44:33] :D [15:44:46] 10Analytics, 10Operations: Grant access to archiva-deployers for zpapierski - https://phabricator.wikimedia.org/T242622 (10Gehel) 05Open→03Resolved a:03Gehel access granted [15:45:56] 10Analytics, 10Operations: Grant access to archiva-deployers for mstyles - https://phabricator.wikimedia.org/T242624 (10Gehel) 05Open→03Declined @Mstyles is already a member of that group [15:46:34] milimetric: nuria FYI Jason just told me he just looked at the error logging patch and realized there is more too it than he rememberfed [15:46:42] and it would def help if we could take it over [15:47:32] mmmmm, I'm game but let's talk at standup [15:51:00] (03PS1) 10Fdans: Add language selection functionality to Wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/564047 (https://phabricator.wikimedia.org/T238752) [15:51:46] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - Can't find stats for number of "Very Active" editors - https://phabricator.wikimedia.org/T242451 (10mforns) 05Open→03Invalid Hi @Clayoquot! The metric is there, but maybe not directly visible: You have to select the "editors" metric, and then enable the... [15:56:51] (03PS7) 10Fdans: Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) [15:58:07] (03CR) 10Fdans: Add vue-i18n integration, English strings (032 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) (owner: 10Fdans) [16:01:32] ottomata: let me talk to jason and joaquin [16:01:39] k [16:10:56] 10Analytics, 10Analytics-Kanban: Hourly labeling of "automated" traffic before loading of pageviews into pageview_hourly - https://phabricator.wikimedia.org/T238361 (10Nuria) a:03Nuria [16:33:16] nuria: wait following mforns 's standup order approach, who starts? [16:33:33] fdans: THE BOSS [16:33:38] fdans, nuria says who starts [16:33:48] ahhh ok [16:35:13] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research, 10Patch-For-Review: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10lexnasser) @Isaac One last thing to resolve: There are a few Google Translate referers with the parameter `prev=/search...` (ex. `... [16:55:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) An alternative approach: - Let's have wikistats1 and wikistats2 in the same directory - Let's create a wikistats deployment git repo that only con... [17:04:15] a-team: are you coming to our hangtime? I see on your calendars that you have a "tasking" meeting going on right now. [17:28:45] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research, 10Patch-For-Review: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10Nuria) >(ex. prev=/search%3Fq%3DBARON%2BDE%2BHIRSCH%26hl%3Del%26rlz%3D1T4GGLL_elGR398GR398%26prmd%3Divns). Should these also be cla... [17:35:27] joal: writing in here so it is quicker :) [17:35:46] I'll run the puppet compiler so we'll know exactly what puppet will try to render [17:35:48] sure elukey :) [17:35:49] and possibly check it [17:35:53] \o/ [17:36:39] hello Analytics! Could I nudge you about https://phabricator.wikimedia.org/T242525 ? not urgent, except that people are waiting for the most-viewed pages of 2019 [17:36:42] but, as you know, systemd ExecStart is not like bash, so we'd need to verify that it works as expected. worst case scenario we'll need to create a separate file (can be an option/flag, annoying I know but way safer) [17:36:49] also, did you see my comment for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/564066/3/modules/dumps/manifests/web/fetches/analytics/job.pp ? [17:37:11] no I had not seen it :) [17:37:14] looking! [17:38:34] elukey: musikanimal is requesting kerberos credentials in https://phabricator.wikimedia.org/T242525 [17:39:16] bearloga: o/ didn't see it yet but will work on it today I promise :D [17:39:33] :) [17:43:13] 10Analytics: Kerberos credentials for musikanimal - https://phabricator.wikimedia.org/T242525 (10elukey) ` elukey@krb1001:~$ sudo manage_principals.py create musikanimal --email_address=lziemba@wikimedia.org Principal successfully created. Make sure to update data.yaml in Puppet. Successfully sent email to lziem... [17:43:30] ping fdans coming to bc? [17:43:40] elukey: thank you :) [17:43:45] fdans: let's finish our discussion on localization [17:43:51] ok [17:44:32] elukey: pushed a version with comment [17:44:48] only because I like musikanimal, he is great, but don't tell it to him [17:45:24] hehe :) thanks [17:46:17] musikanimal: o/ you should be all set, let me know if something doesn't work etc.. [17:46:22] should all be written in the guide [17:46:31] yup! appears to be working :) many thanks [17:46:50] 10Analytics, 10Patch-For-Review: Kerberos credentials for musikanimal - https://phabricator.wikimedia.org/T242525 (10MusikAnimal) 05Open→03Resolved a:03MusikAnimal Looks like I'm in. Thank you! [17:49:58] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research, 10Patch-For-Review: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10Isaac) > Let's keep things simple and let's document that this format is not covered. Works for me -- thanks! [17:58:04] joal: so I tried to run the command but there were some parsing issues [17:58:08] /bin/date: invalid option -- '1' [17:58:09] etc.. [17:58:16] hm [17:58:30] I guess that it works on your testing settings? [17:58:36] elukey: date -1 month normally works! [17:58:59] yes I think it is the amount of special chars etc.. [17:59:28] elukey: can you paste the command? [18:00:00] it is in the code review [18:00:08] also I just noticed that it uses /mnt/hdfs [18:00:17] (brb) [18:01:43] oh yes!!!! of course !!!! oh man elukey - I'm bad at that thing :) [18:08:48] joal: ok if we deploy tomorrow morning? [18:09:16] elukey: forgot to mention in stadup - I'll be ogg tomorrow - ok on wednesday? [18:09:22] off (not ogg) [18:12:47] joal: sure! [18:13:01] Awesome thanks :) [18:13:11] all right, logging off o/ [18:52:42] ottomata: talked to jason, since we are busy this week (and both milimetric and mforns have work to finish) we will touch base again by friday if he hasn't had time to put into the errorlogging, let me know if this sounds good [18:56:35] ok [18:56:49] nuria it will take at least a week to get it deploy after it is merged [18:57:02] so it'd have to be merged this week if we wanted it deployed before all hands [18:57:11] let alone used/configured etc. [18:58:34] (03PS6) 10Milimetric: Modify external webrequest search engine classification and add tests. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [18:58:39] ottomata: I see, to avoid context switching i rather we finish ongoing staff we have this week (alarms, el patch and localization/wikistats) so it might be we cannot have this working before all hands [18:58:51] k [18:59:01] (03CR) 10Milimetric: "Why is CI not running on this change? Am I missing something?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [19:00:21] milimetric: if you want come to cpt sync ? [19:17:33] (03CR) 10Jforrester: "> Patch Set 6:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [19:29:03] (03CR) 10Joal: "Some comments on naming, organisation, parameters. Nothing major :)" (0313 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [19:29:16] mforns: just reviewed the scala bit - Let me know if my comments make sense :) [19:32:13] 10Analytics, 10Core Platform Team Workboards (Green): Flink Spike - https://phabricator.wikimedia.org/T241185 (10Ottomata) > Returns: a count of all edits for a given page This use case (from T240387) is relatively simple and wouldn't require emitting any new events. The [[ https://schema.wikimedia.org/repos... [19:49:36] (03CR) 10Joal: "Some comments (naming, things I'd do differently, version)" (0313 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [19:49:46] mforns: another bunch of comments in there [19:50:16] mforns: I'm gonna get diner, let's recombine on wednesday :) [19:50:26] gone for now - see you team [20:06:40] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10Ottomata) [20:54:34] (03CR) 10Milimetric: [C: 03+2] "thanks @Jforrester, today I learned about the CI whitelist. Nice that it ran after my rebase, I'll consider adding Lex later. For now th" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [20:56:27] fyi: I'm going to deploy Lex's referer classification change, cc: nuria [21:00:12] (03PS1) 10Milimetric: Bump changelog.md to 0.0.111 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/564125 [21:00:24] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Bump changelog.md to 0.0.111 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/564125 (owner: 10Milimetric) [21:06:26] (03PS1) 10Milimetric: Update jar to use new referer classification [analytics/refinery] - 10https://gerrit.wikimedia.org/r/564128 [21:11:35] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "merging in anticipation of the release" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/564128 (owner: 10Milimetric) [21:32:04] !log killing webrequest bundle for restart [21:32:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:32:37] Please, anyone: is there any place in the *local* filesystem (not hdfs) where analytics-privatedata (the one we use for kerberos) can write? [21:39:19] GoranSM: you mean temporarily? /tmp folder [21:44:12] weird... I did sudo -u analytics kerberos-run-command analytics oozie job and it asked me to kinit my own user... does that make sense in some way I'm missing? [21:45:25] !log webrequest restarted [21:45:26] GoranSM: you want to write something into your homedir? [21:45:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:45:32] mkdir ./dir [21:45:38] chmod 777 dir [21:45:49] then analytics-privatedata will be able to write there [21:45:59] we should probably put that user into the wikdev group [21:46:04] so that it will be able to automatically [21:46:07] actually [21:46:07] hm [21:46:09] better than that [21:46:13] don't chmod777 [21:46:35] chgrp analytics-privatedata-users dir [21:46:35] then [21:46:36] milimetric: yes, it can write to /tmp but then no one else can acess the files, neither analytics-privatedata can copy the files from /tmp to /srv, which is what I need... [21:46:41] chmod 775 dir [21:46:57] ottomata: probably the reason why you are offering this approach [21:49:26] ottomata: will analytics-privatedata be able to copy from ./dir to /srv/published-datasets in that case? [21:50:29] ottomata: because if I use /tmp for analytics-privatedata to write in the local filesystem, what enters there, stays there: it cannot be copied to /srv, neither to my homedir, neither can I access it as a user. [21:50:31] oh hm no. but you should be able to do the same for a dir in /srv/published-datasets [21:51:02] ottomata: so, the whole procedure as you have described, but for a new dir in /srv/published-datasets? [21:51:06] yeah try that [21:51:28] FYI /srv/published-datasets is now a sorta deprecated symlink to /srv/published/datasets, it is more correct to use that [21:51:34] we maintain that symlink just for backwards compat [21:52:07] ottomata: thanks a million times. I am running something in PySpark that is really difficult to collect, while - paradoxically, or at least for me - works when converted toPandas() and saved locally . [21:52:12] GoranSM: you should be able to chgrp and chmod and existing there in there too [21:52:23] dir in there* [21:52:31] ottomata: and thanks for the info on /srv/published/datasets [21:55:31] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10lexnasser) 05Open→03Resolved Deployed with the help of @Milimetric ! Hope you find these changes helpful! [22:05:51] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10Nuria) The last thing to do here is to update docs for dataset: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest#Changes_and_kn... [22:25:01] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Research: Improve quality of external referer data - https://phabricator.wikimedia.org/T239625 (10Isaac) This is great -- thanks @lexnasser and others who supported! I'll rerun some of the queries that inspired this work in a few days and let you know... [23:45:49] (03CR) 10Nuria: "Consolidating how locale is read looks good. Let's circle back once you have had a bit of time to dedicate to investigate if there is a l" (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) (owner: 10Fdans)