[01:31:29] 10Analytics, 10User-Elukey: Redesign architecture of irc-recentchanges on top of Kafka - https://phabricator.wikimedia.org/T234234 (10faidon) Thanks @Krinkle, very much appreciate all this! I have code from a couple of weeks ago that basically implements all this: consuming from SSE and formatting into IRC log... [02:10:52] 10Analytics, 10Performance-Team, 10Security-Team, 10WMF-Legal, 10Privacy: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10APalmer_WMF) Legal signs off. Thanks, all! [04:21:23] PROBLEM - Check the last execution of monitor_refine_mediawiki_job_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:54:02] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (101a1a11a) Hi @Nuria, @lexnasser and everyone else, thank you for the dataset, they are great assets to the research community! Same as Daniel's question, grafana... [07:01:41] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10MoritzMuehlenhoff) JFTR, I also created myself a "jmm" principal in the prod setup. [07:21:26] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: an-coord1001 hive metastore not listening on ipv6 - https://phabricator.wikimedia.org/T240255 (10elukey) [07:42:10] !log execute reset-failed for monitor_refine_mediawiki_job_events on an-coord1001 [07:42:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:45:21] RECOVERY - Check the last execution of monitor_refine_mediawiki_job_events on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:49:12] Good morning team [07:51:35] 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10lexnasser) @1a1a11a Thanks for the question! Nuria is more aware of the intricacies of the source of the data than I, but I believe the main other factor that l... [07:56:49] joal: bonjour :) [07:57:04] lexnasser: good evening :D [07:57:56] elukey: haha college life, good morning [07:59:00] :D [07:59:32] lexnasser: since you are here, please check the docs about kerberos if you haven't done so, on Monday we'll do the switch.. [08:00:05] elukey: is that just the kinit command setup? [08:01:03] lexnasser: basically yes, you'll need to do it to be able to use the cluster [08:01:17] great, already completed :) [08:01:32] lexnasser: ack then, remember to knint every time though :) [08:14:21] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10elukey) >>! In T237605#5724934, @Halfak wrote: > @elukey, @ACraze not being in `analytics-privatedata-users` seems like an oversight. I'll check on that and come back to re-request when we've g... [08:14:46] joal: there is a suspicious user asking for kerberos credentials in https://phabricator.wikimedia.org/T237605#5738518 [08:15:10] :D [08:15:48] moritzm: I am officially reporting suspicious activity --^ [08:15:50] Meh? [08:49:01] joal: what is the status of hdfs-rsync? I am reading the backlog but it is not super clear :( [08:49:07] do we have a repo etc.. [08:49:52] elukey: code is on github, Andrew reviewed most of it, no gerrit repo as of now AFAIK, and Andrew started debianization [08:50:19] elukey: I'd like to have you playing a bit with it if you don't mind, to be sure I'm not the only one having tested :) [08:50:28] yes sure [08:50:39] the other thing that we'd need to do is review the procedure for Monday [08:50:41] elukey: tested on prod cluster and test cluster, kinit success :) [08:50:46] yessir [08:51:30] the timers are working fine and already in place, when you guys are ready it will be only a matter of changing the command parameter in puppet [08:51:44] awesome [08:51:50] but we'll do it on monday after the whole mess :) [08:52:32] elukey: actually I'd like to see if pseudo real rsync (dry-run) give correct results from labstore [08:52:40] elukey: could we sneak in and test? [08:53:03] joal: not really, the ferm rules are not in place yet to contact hadoop [08:53:18] but if it works from say stat1004 there is no difference [08:53:30] It has [08:54:16] elukey: he sounds dangerous! [08:54:29] when does the fun start on Monday? have a rought ETA timewise? [08:57:51] moritzm: I think around 9:30/10 CET more or less.. is it ok for you? [08:58:04] joal: mind to review https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/556976/ when you have a moment? [09:00:57] elukey: yeah, that's perfect [09:01:03] super thanks :) [09:01:07] joal: <3 [09:05:12] going to fill the gaps in https://etherpad.wikimedia.org/p/analytics-kerberos-deployment [09:08:49] (brb) [09:13:46] (03PS3) 10Lex Nasser: Modify external webrequest search engine classification and add tests. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) [09:14:58] (03Abandoned) 10Lex Nasser: Fix style and correct incorrect test case. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556730 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [09:51:10] 10Analytics, 10Performance-Team, 10Security-Team, 10WMF-Legal, 10Privacy: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles) 05Open→03Resolved Fantastic, thank you very much! [09:53:12] dcausse: I have had a quick look at wdqs events if you're interested :) [09:53:51] joal: thanks! want to chat about it? :) [09:54:59] dcausse: from what I see, roughtly the same traffic from internal and external, with more variability for internal [09:55:48] interesting when you say same, is it in volume or the queries look similar? [09:56:01] dcausse: About query-time, internal has a P99 of ~100+ while external has ~1000+ [09:56:10] dcausse: volume only for now :) [09:56:12] ok [09:56:27] I think internal is getting super simple queries IIRC [09:56:54] dcausse: probably, P95 is actually very low for internal [09:57:21] for external, P95 is ~100+ and P90 ~100- [09:57:37] so there really is a hanful of queries costing a lot [09:57:47] handful sorry [09:57:52] right [09:58:07] were you able to group by status_code? [09:58:27] dcausse: didn't do it, but will do it as we speak ) [09:58:37] cool :) [09:58:46] * joal has nerd sniped dcausse - moar ideas :) [09:58:55] indeed! [10:05:48] dcausse: for internal, almost all traffic is 200, a handful of 429 [10:06:17] ok [10:06:50] for external however there is more variability - still mostly 200, but also so 403, 429, 400 and 500 [10:07:09] dcausse: --^ [10:07:37] joal: yes this makes sense [10:08:02] dcausse: can we assume 500 are timeouts? [10:08:26] I need to double check, but I think it can be a good proxy [10:08:30] k [10:08:54] basically I hope that query syntax errors are returned as 400 [10:09:04] dcausse: I assume next step is query analysis, right? [10:09:31] yes basically there are few questions to answsers [10:09:46] are there features that almost always timeout [10:10:19] are reified statements heavily used (truthy vs the full reified graph) [10:10:35] Makes sense [10:11:32] and how values and references are used (we can talk about this one later if you need more context) [10:13:57] perhaps a simple aggregation of the URI in the sparql that refers to our predicates might be intersting (I can help with that) [10:15:05] that sentence is not clear to me dcausse :) [10:16:38] joal: extract whenever a URI used in the sparql refers to a known predicate in our graph [10:16:45] in ?item wdt:P31*/wdt:P279* wd:Q16917; [10:16:47] wdt:P625 ?geo . [10:17:05] that would be wdt:P31 wdt:P279 and wdt:P625 excluding wd:Q16917 [10:17:36] counting predicates [10:18:15] yes to know which ones are important and which ones are never used [10:18:31] ack :) [10:19:11] that could help to drive some decisions if we are forced to prune the graph at some point [10:19:41] which I hope we won't [10:24:04] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10Wikidata-Campsite, 10wikidata-tech-focus: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Ladsgroup) Can I just use `normalized_host` field instead? [10:46:10] (03PS1) 10Ladsgroup: Fix WikidataArticlePlaceholderMetrics query [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556988 (https://phabricator.wikimedia.org/T236895) [10:46:31] 10Analytics, 10ArticlePlaceholder, 10Wikidata, 10wikidata-tech-focus, and 3 others: ArticlePlaceholder dashboard stopped tracking page views - https://phabricator.wikimedia.org/T236895 (10Ladsgroup) a:03Ladsgroup [11:11:50] elukey: hola - do we take a minute to talk about plans? [11:12:25] joal: sure, I'd need to step out in ~15mins, we can do a bit now and later or directly later [11:12:38] elukey: let's start :) [11:13:00] in da cave! [11:27:59] * elukey lunch! [12:59:09] (03CR) 10Addshore: [V: 03+1 C: 03+1] "Looks good to me, and i verified the query in HUE." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556988 (https://phabricator.wikimedia.org/T236895) (owner: 10Ladsgroup) [13:36:56] elukey: there's an alert for analytics1057: CRITICAL: 13 LD(s) must have write cache policy WriteThrough, currently using: WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack, WriteBack [13:36:58] 10Analytics, 10Product-Analytics: Develop a consistent rule for which special pages count as pageviews - https://phabricator.wikimedia.org/T240676 (10Neil_P._Quinn_WMF) [13:41:05] 10Analytics, 10Product-Analytics: Develop a consistent rule for which special pages count as pageviews - https://phabricator.wikimedia.org/T240676 (10Neil_P._Quinn_WMF) I think @Nuria is happy for Product Analytics to propose a rule here (T239672#5733146). It sounds like her preferred implementation would a wh... [13:51:21] o/ joal :) [13:51:27] Hi ottomata :) [13:51:59] 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10Neil_P._Quinn_WMF) 05Open→03Resolved >>! In T239672#5733146, @Nuria wrote: > Erik's initial definition counted all special pages and did not do... [13:51:59] was thinking last night about deployment, the debian thing is feeling really hacky since i'm not builiding from a release tarball and i'm not building from source [13:52:02] it just installs a binary file [13:52:06] we might as well scap + git fat it. [13:52:13] maybe even using the same repo...not sure will try [13:52:24] same repo? [13:52:26] refinery? [13:52:38] Ah no, the main one, I get it [13:53:59] ok - Also, I have changed the main algorithm quite some since yersterday, after some interesting discoveries [13:54:12] oh ya? [13:54:13] ottomata: --^ [13:54:20] ii see the multisource one [13:55:04] yes - needed to correctly handle globs as shell explode them [13:55:06] ty for the basePath change too, much more readable to me! [13:55:11] :) [13:55:26] I tried to make an effort for readability [13:57:20] cool does the dir creation one actually create multiple dest levels if it can? [13:57:22] like mkdir -p? [13:57:30] yes [13:57:32] cool [13:57:45] why not eh!? fs.mkdirs is plural! :) [13:57:56] not too complicated ;) [13:58:14] wow multi src [13:58:17] does rsync support that? [13:58:22] how does that work with --delete? [13:58:22] very much! [13:58:38] i guess files in any source won't be deleted [13:58:46] sounds complicated tho! [13:58:57] ottomata: --delete is for dest :) [13:59:11] yes, but delete deletes files not in src [13:59:20] ? [13:59:29] I don't get it :( [13:59:36] what if file A is in src2 but not in src1 [13:59:43] or actually other way around [13:59:47] file A is in src1 but not in src1 [13:59:51] src1 copies to dst fine [14:00:08] src2 copies, but no file A is present, will file A that was copied to dst from src1 be deleted? [14:00:46] hehe :) I get your point - Actually no it won't - Multisource is different from 2 consecutive rsnc [14:01:13] it merges recursive sources before copying? [14:01:24] In multisource, we build data to be copied at the same time in all sources (if they share folder path to copy) [14:01:27] yessir [14:01:38] does rsync do that???? [14:01:48] if it does that, we don't need this complicated hardsync stuff I do for published datasets [14:01:49] ! [14:01:52] it's not a merge before, it;s a merge while doing it :) [14:02:03] when is delete applied tho? [14:02:16] ah sorry i get it [14:02:18] after merge [14:02:18] ok [14:02:50] ottomata: example: rsync -r -t --delete --exclude readme.html hdfs:///wmf/data/archive/{pageview,projectview}/legacy/hourly file:/blah [14:03:32] I discovered yesterday that the glob for {pageview,projectview} is not treated in rsync, but in shell: rsync receives 2 sources! [14:04:29] And, the thing works thanks to merge, since the folder hierarchy is the same in pageview and projectview folders [14:04:31] huh wow! [14:06:11] hm, joal previously the recursion into dirs was done at copy time, right? now you are building the full source hierachy ahead of time? [14:06:27] Nope, while copying [14:07:22] I build one level of recursion at a time, and process it at random (meaning I can actually do a depth-first pass) [14:07:34] OH sorry thought getSrcFilesAndFolders was recursing [14:07:34] got it [14:08:06] the trick is the groupby: keep sources with same name at a given level together to be merged [14:08:35] aye cool [14:08:47] reminds me of something... (hive struct refine cough cough) [14:10:49] Mwahaha :) [14:18:23] I am back, sorry for the long absence but there is a snow storm now in here and of course I was in the car when the worst happened (had to leave it not far from home but I couldn't drive it up to my home) [14:18:35] Wow [14:20:03] moritzm: thanks for the ping, that is known, the BBU is faulty and sometimes it come back to life and alert.. need to find a way to set writethrough permanently via megacli or just disable the alarm [14:23:16] ack [14:31:07] SNOW storm LUCKY DUCK [14:36:42] (03PS1) 10Joal: Fix python oozie lib before kerberos [analytics/refinery] - 10https://gerrit.wikimedia.org/r/557029 [14:36:53] elukey: --^ [14:38:54] joal: can you add a comment about the rationale behind this change? I blindly trust you but I am not getting completely what the change is for :D [14:39:26] Yes ! [14:41:14] (03PS2) 10Joal: Fix python oozie lib before kerberos [analytics/refinery] - 10https://gerrit.wikimedia.org/r/557029 [14:42:31] joal just submitted a bunch of comments [14:42:59] reading ottomata :) [14:46:04] joal, also maybe we can do the right thing with this shaded jar as described in https://phabricator.wikimedia.org/T217967 [14:46:22] maybe we can make maven build both versions? totally unshaded (no scala, etc.), and our shaded with scala? [14:47:16] ottomata: maven does that by default I think, we just keep the shaded one only :) [14:47:28] hm [14:47:43] It's a naming thing more than anything else I think [14:47:46] oh original-hdfs-tools-0.0.1-SNAPSHOT.jar [14:47:59] ok yeah, let's name the shaded one with -schaded [14:48:02] and the unshaded one as normal [14:48:13] i think we should eventually switch do doing that in refinery too [14:48:57] works for me ottomata - would we name them -jumbo- ? [14:49:04] haha [14:49:08] no just -shaded, no? [14:49:10] That's another classic way of talking about fat-jars :) [14:49:15] oh really? [14:49:17] -shaded is fine :) [14:49:20] jumbo jars [14:49:26] i've also seen 'uber' [14:49:27] is that the same? [14:49:30] uber jar? [14:49:37] I think it is yes [14:50:11] But the proper name being shaded ... [14:50:44] https://maven.apache.org/plugins/maven-shade-plugin/examples/attached-artifact.html [14:50:44] aye [15:03:10] ottomata: leaving for kids now - will be back at standup and we'll finalize code and deploy after? [15:03:40] k sounds good, i'll work on scap stuff now [15:03:47] Thanks :) [15:22:49] ottomata: o/ - if you have time, I'd love to go through https://etherpad.wikimedia.org/p/analytics-kerberos-deployment with you to get your suggestions/ideas/etc.. [15:52:16] (03CR) 10Elukey: [C: 03+1] "I don't have a lot of context about this change but it looks simple and reasonable, Joal please merge :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/557029 (owner: 10Joal) [15:54:07] joal: if you want we can finish the procedure check [15:55:01] elukey: o/ sorry missed ping! [15:55:21] gimme 5 mins then batcave? [15:56:51] sure! [16:02:40] am in bc and am reading [16:05:29] elukey: ^ [16:09:36] ottomata: coming sorry, was doing a tea [16:09:49] np! [16:22:11] joal: I have a proposal for monday, related to hdfs-rsync. I know that everything is basically ready, but to avoid the stress of live testing stuff and debug if needed, we could do the following [16:22:25] 1) add /mnt/hdfs to labstore100[6,7] [16:23:26] 2) as part of the kerberos migration, we modify the rsyncs to fetch from the local /mnt/hdfs (rather than stat1007). It should work since the rsync will be kerberized, in theory (need to check) [16:24:00] 3) then, when everything else is working and not exploding, we move one of the rsyncs on labstore to hdfs-rsync and see how it goes [16:24:04] debug if needed, fix etc.. [16:24:11] and finally we extend it to all [16:26:37] yes I confirm that rsync works if I run kinit before [16:33:31] afk for a bit, back for standup! [17:01:25] ping fdans [17:01:32] hOLAAAAA fdans [17:22:42] elukey, ottomata - Do we go to batcave 2 for rsync? [17:22:57] ya [17:23:03] gimme link [17:29:17] brb for a bit! [18:04:38] Gone for diner, back in a bit [18:54:19] * elukey off! [19:33:18] joal: trying to run HdfsRsyncCLI on stat1004 [19:33:20] how do you do it? [19:33:26] java -classpath ./target/hdfs-tools-0.0.1-SNAPSHOT.jar org.org.wikimedia.analytics.hdfstools.HdfsRsyncCLI [19:33:27] ? [19:33:31] ottomata: give me a minute ) [19:33:46] ottomata: java -cp /home/joal/hdfs-tools/target/hdfs-tools-0.0.1-SNAPSHOT.jar:/usr/lib/spark2/jars/*:$(/usr/bin/hadoop classpath) org.wikimedia.analytics.hdfstools.HdfsRsyncCLI [19:33:53] OH i had a bad classname [19:33:59] ah danke [19:34:21] ottomata: you should remove the spark bit of it, not needed since we have scala [19:34:22] go tit [19:34:26] ya [19:34:30] java -classpath $(hadoop classpath):./target/hdfs-tools-0.0.1-SNAPSHOT.jar org.wikimedia.analytics.hdfstools.HdfsRsyncCLI [19:34:39] \o/ [19:35:04] ottomata: I'm reprocessing all the comments and code, our talk helped a lot in clarifying :) [19:35:12] great [19:51:35] (03PS1) 10Ottomata: Add hdfs-rsync wrapper script [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557099 (https://phabricator.wikimedia.org/T238326) [19:54:27] joal: this is such a great tool [19:57:53] \o/ [19:58:43] joal: i'm ready to go whenever you are :) [20:06:47] ottomata: I think I'm good :) [20:06:53] ottomata: wanna proof read once? [20:22:17] ah missed ping joal sorry! [20:22:47] joal push your recent changes to refacto_multisource branch? [20:22:52] or you want to BC? [20:23:15] ottomata: since no answer, I continued to look :) [20:23:19] pushing in a minute [20:23:21] k [20:26:09] joal: also in case you missed this comment i just added: https://github.com/jobar/hdfs-tools/pull/2/files#r357806568 [20:26:48] I had missed it indeed :) [20:31:02] ottomata: PR updated ! [20:31:32] looking! [20:31:57] joal: can you make it just 'hdfs-rsync' not 'hdfs-rsync tool' [20:31:57] ? [20:32:03] Ah ! [20:32:08] that value get used as usage header [20:32:12] Did it above but not there - doing [20:33:00] ahh heheh, joal this is totally fine but i see i wasn't clear about what I meant with plurals! [20:33:03] i mean either: [20:33:12] srcPaths or srcPathList [20:33:28] Meeh :( [20:33:30] :) [20:34:28] ottomata: convention now is: srcsList = Map[filename, Seq(..)], srcs = Seq(...) [20:34:29] It is a list, and a list is a singualr thing wth multiple elements, so srcPathList is fine, but also srcPaths implies that the type has multiple entries...some kind of iteratble [20:34:34] oh [20:34:36] uhh [20:34:37] but [20:34:39] Map is not a List? [20:34:48] it is an iterable though [20:34:53] hm [20:34:55] yesssss [20:35:04] srcsMap? [20:35:07] that's fine [20:35:21] I like list because it goes with listing files [20:35:52] hm [20:36:15] haha, whtver i'm not that picky on this one, as long as it is consistent! :) [20:39:54] ottomata: what a nice wrapper :) [20:40:21] (03CR) 10Joal: [C: 03+2] "Nice !" [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557099 (https://phabricator.wikimedia.org/T238326) (owner: 10Ottomata) [20:40:23] i think you mised one BasePath :) line 392 in Exec [20:40:43] looking [20:41:28] I did indeed! you good-eye hoot! [20:41:42] Waiting for more changes before pushing [20:41:46] ottomata: --^ [20:42:51] ok joal one more naming thing ok and then we can go! [20:43:01] sure [20:43:02] i'm not sure about the name mergeOrProcessCoherentSrcsList [20:43:20] remind me again why it is merge OR process? [20:43:21] I was not sure either :) [20:43:45] this function splits the ways foir dirs and files [20:44:07] create+merge for dirs, process (copy?) for files [20:44:11] hm, but you either call processDir or processFile [20:44:16] so it isn't merge OR process, right? [20:44:37] merge is actually the resultant of processing dirs... [20:44:48] right, but you always process [20:45:02] so maybe this is processSrcsList [20:45:02] ? [20:45:09] (merging is implied when processing dirs ?) [20:45:17] \o/ [20:45:41] sounds great - I wonder about the apply now, which processes as well, but eh [20:45:51] well that's ok [20:45:53] apply is entrypoint i thikn [20:46:03] it just preps some stuff and then processes? [20:46:09] yeah, process isn't the best name but its ok here [20:46:10] ok - processSrcsList it'll be [20:46:11] right? [20:46:29] hmmmm [20:46:43] I was thinking in the area of: routeSrcsFilesAndDirs, but maybe too specific [20:47:11] route! [20:47:13] but you actually do copies! [20:47:14] heheh [20:47:16] hm, joal [20:47:17] q [20:47:30] ottomata: shall I update the pom to generate a named shaded and non-named original? [20:47:36] deleteExtraneousDsts(srcsList, dstList) [20:47:36] happpens in applyRecursive, before filtering [20:47:39] is that right? [20:47:43] joal: yes, i think that is good [20:47:50] ok ottomata (pom) [20:48:05] yes, delete happens before filtering and processing [20:48:24] delete (clean) dst is the first thing done [20:48:48] oh, but deleteExtraneousDsts uses the filters? [20:48:57] i see ok from config [20:49:25] same filters for src and dst [20:49:45] k [20:49:56] ok, i think i have no more comments other than that function name [20:50:00] processSrcslist [20:50:13] actually: processSrcs [20:50:18] it's not a map! [20:50:21] :) [20:50:53] good ottomata ? --^ [20:51:31] processTargetCoherentSrcs ? [20:51:44] ok! [20:51:44] processSameTargetSrcs [20:51:45] sure! [20:51:51] processSrcs i like it [20:51:53] processSrcs [20:51:56] got it ! [20:51:56] because Srcs might be files or dirs [20:52:00] yes [20:52:00] and it decides which to do [20:53:18] ok here we are [20:53:25] I guess we should merge that PR :) [20:57:40] ya! [20:57:47] done [20:57:49] joal we should also move to wikimedia githbu [20:57:52] i wonder if you can do that [20:57:57] github allows you to rename repos. [20:58:00] and chagne orgs [20:58:01] I don't think so [20:58:04] i can do it as admin if it is my repo [20:58:19] we could also just push it to new wmf one [20:58:21] or fork it [20:58:29] but it is nice to start at wikmedia and fork the other way [20:58:34] yup [20:58:43] joal maybe you can give me owner perms on this repo? [20:58:46] not sure if that is a thing [20:59:10] go to settings tab [20:59:12] for repo [20:59:32] there might be a Transfer ownership option at the bottom [20:59:36] in the 'Danger zone' [20:59:37] :) [20:59:38] I;m trying that [21:00:51] ottomata: teams access: analytics + operations [21:00:57] hm [21:01:00] ottomata: wikidata? [21:01:01] all? [21:01:07] oh [21:01:13] i unno [21:01:20] plenty teams! [21:01:20] analytics + operations sounds fine [21:01:25] ok [21:01:41] you aren't in analytics team [21:01:42] adding you! [21:01:52] Thanmks :) [21:01:52] oh [21:01:53] you are [21:02:13] ah you are yes [21:02:17] Adding reasearch and scoring platform [21:02:30] Done ! [21:02:56] Nice! [21:03:04] greaaat stuff [21:03:28] ok, so, shall we make a 0.0.1 tag, and change version there, bump master to 0.0.2-SNAPSHOT [21:03:28] ? [21:03:33] then we can mvn release from 0.0.2 tag? [21:03:39] sorry [21:03:41] 0.0.1 tag? [21:03:42] works for me :) [21:03:46] ok i will do [21:06:26] ottomata: navigating github I realise analytics-refinery is not in the team's repo :) [21:06:32] source is but not refinery [21:06:40] oh hm! [21:07:46] fixed! [21:07:54] Thanks ! [21:08:55] joal weirdly that test still fails for me, but only on macOS, it succeeds on stat1004 [21:08:57] - should correctly parse URIs *** FAILED *** [21:08:57] org.scalatest.exceptions.TestFailedException was thrown. (TestHdfsRsyncCLI.scala:19) [21:09:04] ¯\_(ツ)_/¯ [21:09:09] MEH ! [21:09:16] Error: Argument parsing error: [21:09:17] Error validating src list: [21:09:17] file:/root does not exist [21:09:17] Try --help for more information. [21:09:26] hehehe [21:09:31] ok makes sense [21:09:50] ahh [21:09:52] I used linux generic folder names for tests - I should double checked [21:09:54] you are testing for /root dir i see [21:10:02] ok cool np, you can fix but not a blocker :) [21:10:06] maybe /tmp is better? [21:10:11] very much [21:10:40] ottomata: I use /home, /tmp and / [21:10:46] safe? [21:11:00] /home nope! [21:11:09] hm [21:11:12] i mean best best thing to do woudl be to create tmp dirs as some fixture [21:11:15] buuut that's annoying [21:11:24] wow mvn deplooy just worked! [21:11:24] ok will use that [21:11:30] \o/ ! [21:11:35] https://archiva.wikimedia.org/#quicksearch~hdfs-tools [21:11:40] Same settings as refienry :) [21:11:42] ok time for deploy repo [21:12:06] (03CR) 10Ottomata: [V: 03+2] Add hdfs-rsync wrapper script [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557099 (https://phabricator.wikimedia.org/T238326) (owner: 10Ottomata) [21:13:49] ottomata: Shall I resolve the original PR for comments? [21:15:26] yes [21:15:29] (03PS1) 10Ottomata: Add hdfs-tools 0.0.1 .jar with git-fat [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557117 (https://phabricator.wikimedia.org/T238326) [21:15:55] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add hdfs-tools 0.0.1 .jar with git-fat [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557117 (https://phabricator.wikimedia.org/T238326) (owner: 10Ottomata) [21:23:03] ottomata: I'm not usefull anymore - I'm gonna go to sleep ) [21:23:10] ottomata: except if you need me :) [21:23:16] OK! [21:23:19] joal: you da best [21:23:20] this is awesome [21:23:21] thank you! [21:23:28] i'm going to hopefully get the scap stuff to work now [21:23:34] ottomata: Thank YOU for the set up :) [21:23:38] and then we can do stuff next week to puppetize hjobs [21:23:42] \o/ [21:23:57] ottomata: I have ideas to make this tool even funnier :) [21:24:32] oh boy [21:24:42] and i think we should move the cleaner stuff into this repo too [21:24:45] that is a useful thing [21:24:49] very much yes [21:26:05] Ok gone for tonight - Thanks again mate :) [21:29:22] (03PS1) 10Ottomata: Fix hostnames for labstore scap targets [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557120 (https://phabricator.wikimedia.org/T234229) [21:29:43] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix hostnames for labstore scap targets [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557120 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [21:48:39] (03PS1) 10Ottomata: Fix git_repo in scap.cfg [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557128 (https://phabricator.wikimedia.org/T234229) [21:48:52] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix git_repo in scap.cfg [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557128 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [21:57:16] (03PS1) 10Ottomata: Update scap targets to hosts with profile::analytics::cluster::client [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557132 (https://phabricator.wikimedia.org/T234229) [21:59:07] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Update scap targets to hosts with profile::analytics::cluster::client [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/557132 (https://phabricator.wikimedia.org/T234229) (owner: 10Ottomata) [22:30:27] (03CR) 10Nuria: [C: 04-1] Fix WikidataArticlePlaceholderMetrics query (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556988 (https://phabricator.wikimedia.org/T236895) (owner: 10Ladsgroup)