[00:58:56] 10Analytics, 10Discovery: Ingest wdqs metrics into druid - https://phabricator.wikimedia.org/T240498 (10Nuria) [00:59:34] 10Analytics, 10Discovery: Ingest wdqs metrics into druid - https://phabricator.wikimedia.org/T240498 (10Nuria) [02:57:44] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10Milimetric) Ok, seems like some of this confusion is getting cleared up. For my part, here's what I'm planning to do next... [03:12:05] (03CR) 10Milimetric: "This looks really good. I haven't read the requirements, so I'd need to go over those more closely, but I couldn't find anything wrong wi" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [04:20:27] PROBLEM - Check the last execution of monitor_refine_mediawiki_job_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:57:07] Hni team [07:58:46] mber2019 [08:12:11] o/ [08:12:48] elukey: if you wanna have a look (not yet finished, but looking reasonably ok IMO: https://github.com/jobar/hdfs-tools [08:20:10] joal: my experience with Scala is zero so I cannot really make any judgement, but I am sure it is super good :) [08:20:46] elukey: I thank you for the blind trust, but probably would not do it myself for myself :D [08:21:08] joal: we can surely test it today/tomorrow :) [08:21:44] elukey: when you wish :) main missing feature is exclude, the rest should work as expected [08:22:15] elukey: The shaded jar is 5M - I guess this is ok ;) [08:22:59] joal: looks super good [08:23:05] how will it be deployed? [08:23:12] via refinery or another repo? [09:25:39] excuse me elukey I missed your last ping [09:25:48] I think it;ll [09:25:59] be deployed via a new repo [09:33:12] ok I didn't get this part [09:33:29] I am asking since we'd need to come up with the puppet code before monday (possibly) [09:33:41] completely agreed elukey [09:37:51] joal: so let's try to schedule a plan, since I'll need to follow up with Brooke and Ariel [09:38:34] yes elukey - I hope to have a full version (with exclude) either tonight or tomorrow morning [09:38:56] elukey: Andrew reviews my code (hopefully there are not too big of changes :S) [09:39:21] elukey: Andrew told me yesterday he'd set up a github for the project once reviewed etc [09:40:22] We then need to have a new repo to store the jar, the scripts (to facilitate running the jar) and the deploy config [09:40:33] elukey: -- Does that sound correct about actions to be taken? [09:41:06] joal: do you have a min for batcave? [09:41:11] sure [09:48:17] 10Analytics, 10Performance-Team, 10Security-Team, 10WMF-Legal, 10Privacy: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10Gilles) a:05Gilles→03Slaporte [09:59:50] joal: other thing that I forgot to tell you is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/556633/ [10:00:10] just to give you the idea of what I am doing [10:00:26] the hadoop-config.sh script is installed by our dear packages [10:00:27] elukey: I have no idea what augeas is :S [10:00:46] nono leave it aside, it was an experiment, look only to the file in puppet [10:00:50] hadoop-config.sh [10:00:58] that thing contains, by cloudera, this [10:01:13] # Disable ipv6 as it can cause issues [10:01:14] HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" [10:01:34] for some horrible reasons it ends up everywhere, in hive-server2/metastore, hdfs-datanode,etc.. [10:01:41] (in their run time parameters) [10:01:51] \o/ ! Welcome to a the new old-internet :) [10:02:17] so in the past to fix the problem in hadoop I appended -Djava.net.preferIPv4Stack=false via hadoop-env.sh [10:02:22] err hdfs-env.sh, etc.. [10:02:25] I think I recall that [10:02:35] now the problem is that with hive this is not possible [10:02:54] so I decided to remove the problem for source :D [10:03:10] in hadoop test hive is now correctly binding to ipv4 and ipv6 [10:03:24] wow - nice! [10:06:06] elukey: while being awesome, this frightens me a bit :) [10:06:33] joal: all the other daemons are running with the ipv6 settings, so I am reasonably sure it is ok [10:06:53] * joal trust elukey blindly :) [10:08:00] I will apply the change to all the workers and roll restart gently, to be sure [10:08:12] so we decouple this from monday's maintenance [10:16:28] ack elukey :) [10:19:00] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10daniel) >>! In T238878#5730630, @Milimetric wrote: >>>! In T238878#5708257, @daniel wrote: >> By the way, if you... [11:14:29] !log stop timers on an-coord1001 as prep step for hive/oozie restart [11:14:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:21:26] interesting: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Djava.net.preferIPv4Stack=true [11:21:30] this is a yarn container [11:21:54] not really a big deal but worth to follow up [11:22:27] tested the restart of a datanode btw, all good [11:29:29] ok so going to have lunch and then I'll come back to restart hive/oozie and the hadoop workers [11:39:44] heya teammmm :] [11:49:53] (03CR) 10Mforns: [V: 03+2] "@Milimetric, @Nuria: If you both suggested data_quality_stats, I think it's good! I also like data_quality_stats :], I was more against us" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [12:14:59] is "CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events" a known issue for an-coord1001= [12:15:02] is "CRITICAL: Status of the systemd unit monitor_refine_mediawiki_job_events" a known issue for an-coord1001? [12:20:11] moritzm: yes it is, we are working on it, will be solved hopefully by EOD [12:21:36] ack, thx [12:28:15] fdans: you got the mediawiki_job refine thing? [12:28:23] oh! he's in Austin, duh, sorry, nvm [12:39:39] !log restart hive and oozie on an-coord1001 to pick up ipv6 settings [12:39:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:40:18] !log enable timers on an-coord1001 after maintenance [12:40:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:41:39] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: an-coord1001 hive metastore not listening on ipv6 - https://phabricator.wikimedia.org/T240255 (10elukey) Looks better now! ` elukey@stat1004:~$ telnet an-coord1001.eqiad.wmnet 9083 Trying 2620:0:861:105:10:64:21:104... Connected to an-coord1001.eqiad.wmn... [12:41:41] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: an-coord1001 hive metastore not listening on ipv6 - https://phabricator.wikimedia.org/T240255 (10elukey) Looks better now! ` elukey@stat1004:~$ telnet an-coord1001.eqiad.wmnet 9083 Trying 2620:0:861:105:10:64:21:104... Connected to an-coord1001.eqiad.wmn... [12:54:25] joal: on an-coord1001 all is binding on ipv4/6, nothing exploding so far [12:54:37] I am going to roll restart the hadoop workers team [12:54:56] (since I have removed the prefer ipv4 false option, not needed anymore) [12:55:10] k elukey [12:59:25] !log roll restart hadoop workers to pick up the new settings (removed prefer ipv4 false after T240255) [12:59:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:59:28] T240255: an-coord1001 hive metastore not listening on ipv6 - https://phabricator.wikimedia.org/T240255 [14:46:21] joal: decided to propose this first step https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/556681/ [14:46:41] basically replacing crons with timers, and make sure all works fine [14:47:05] then I'll prep a patch to merge on monday, so we'll change only what it runs in the define [14:48:38] (03PS14) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [14:52:48] wow - I have -1 a patch on puppet ... first tine [14:54:22] joal: I copied the current version of the cron, not added something new.. I'd prefer not to change it a lot from the cron if possible [14:54:45] works for me in that case )p [14:56:22] joal: I didn't get "Having a trailing slash in source means only the source content will be copied, not the last-dir of the source path" [14:58:17] elukey: this is one of rsync trick: id you do rsync /my/src/folder /my/dst, you'll en up with 'folder' inside /my/dst - If you do rsync /my/src/folder/ /my/dst you'll have only folder's content in /my/dst, not the parent folder [14:59:02] joal: sure but this seems how the rsyncs are set up now [14:59:53] elukey: my complete bad then - I have not looked at the rsync-cron, assuming it would take src as is - please disregard :S [15:00:11] like [15:00:13] 31 * * * * bash -c '/usr/bin/rsync -rt --delete --exclude readme.html --chmod=go-w stat1007.eqiad.wmnet::hdfs-archive/unique_devices/ /srv/dumps/xmldatadumps/public/other/unique_devices/' [15:00:25] I see [15:00:26] joal: no no I am double checking with you, that's it [15:00:29] nothing more :D [15:00:33] sounds great :) [15:01:11] let's keep trailing slashes then - But I'd rather put them in $src variable than harcoded in command :) [15:01:14] Maybe for later [15:01:18] elukey: --^ [15:01:51] joal: yes yes we'll refactor it as second step, now I'd prefer to avoid diverging from the cron if you agre [15:02:05] ack elukey :) [15:02:34] Gone for kids :) [15:03:20] 10Analytics, 10Operations, 10SRE-Access-Requests: Add accraze to analytics-privatedata-users - https://phabricator.wikimedia.org/T240243 (10jcrespo) 05Open→03Resolved @ACraze seems to be unavailable. Resolving, but please reopen if you found issues later. [15:03:23] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to stats machines/ores hosts hosts for Andy Craze - https://phabricator.wikimedia.org/T226204 (10jcrespo) [15:03:32] 10Analytics, 10Operations, 10SRE-Access-Requests: Add accraze to analytics-privatedata-users - https://phabricator.wikimedia.org/T240243 (10jcrespo) a:05ACraze→03jcrespo [15:21:03] RECOVERY - Check the last execution of monitor_refine_mediawiki_job_events on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_job_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [15:23:11] !log execute systemctl reset-failed monitor_refine_mediawiki_job_events after Andrew's comment on alerts@ [15:23:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:40:24] (03PS15) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [15:44:12] hi team, looking at the data loss alarms if no one has [15:44:24] fdans: how dare you [15:45:15] elukey: luca this all hands we'll finally have a duel on the corridors of the hotel [15:45:42] these constant attacks shall not remain unpunished [15:46:05] fdans: I accept with pleasure [15:46:25] oh! I'll be the ref :D [15:47:09] mforns: did you see the update for superset presto and kerberos from github? [15:47:12] :( [15:47:22] elukey, I saw it yesterday, is there sth new? [15:47:46] mforns: nono, but it is kinda troublesome, I didn't find a package to replace pyhive.. [15:47:59] maybe we could try to contact dropbox somehow [15:48:25] hm [15:49:50] the immediate alternative for us is Druid, but it's not as appealing from the dashboarding point of view... [15:50:32] oh, actually, if we ingest data quality stats into druid, they could be shown in superset anyway... [15:52:10] mforns, elukey : Hola! i think for the presto case is worth contacting the superset owners to see if they have contacts/plan/ideas [15:52:20] elukey, already did [15:52:21] nuria: Hola! [15:52:28] hehe [15:52:31] how long should headers be truncted to? 400? [15:52:35] HOLA EVERYBODY [15:52:37] that is what we have as MAX_UA_LENGTH [15:52:48] also, if we want to consider XFF parsing in eventgate to set client_ip [15:52:55] should I just grab the left most one? [15:52:56] always? [15:52:58] ottomata: sounds fine, anything beyond 200 chars is likely automated traffic [15:53:02] or sory, rightmost one? [15:53:36] ottomata: let me remember what some proxies like googleweblight do [15:54:08] (03PS8) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [15:54:27] ah left most [15:55:20] and, if that is not set nuria, should I use ths connected socket's remote ip addr? [15:56:12] joal: want to see something cool? ssh -L 8080:an-airflow1001.eqiad.wmnet:8778 an-airflow1001.eqiad.wmnet [15:56:15] :D [15:56:46] all team probably would like it as well --^ [15:56:49] COOOL [15:57:11] very cool [16:00:27] I am trying to figure out now what user is running those tasks now [16:01:22] ottomata: so no x-forwarded-for to be found on googleweblight, but looks like list starts from left [16:02:03] ya [16:02:04] ottomata: header starts from left that is [16:02:20] i think varnish uses the rightmost non WMF IP address [16:02:33] but, we don't have a good way of getting WMF IPs in eventgate code [16:02:54] that makes sense, as we really want the IP that sent the request to us [16:06:34] elukey: hmm, the script that checks if the data loss is a false positive is getting stuck in [16:06:36] 19/12/12 15:59:50 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. [16:07:02] it's been 7 min like that, not sure if that's normal [16:07:21] never seen this before [16:10:00] elukey: nvm, it's gone through, but I don't remember it taking so long last time, maybe I'm wrong [16:10:17] confirmed false positive [16:11:13] (03PS9) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [16:13:27] ottomata: ya, the only list for IPs that is any trustworthy is the one mainatained by Bryan with adresses on labs [16:13:49] (03CR) 10Milimetric: [C: 04-1] "oh wait, no I tested the wrong branch by accident, you do have one test failure:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [16:16:58] 10Analytics, 10Analytics-Kanban: Estimate percentage wise the number of requests on mediarequest dataset that are previews - https://phabricator.wikimedia.org/T240362 (10fdans) Added note about percentage in the "Limitations" section of the API docs and added date range to study of signal and noise docs. [16:20:45] (03CR) 10Nuria: [C: 03+1] "I see, this is just a correction of typo of filename, correct? If so +1 on my end" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [16:24:07] ebernhardson: o/ - sorry to keep asking the same question, but I am wondering what user is launching jobs to hadoop from airflow (spark jobs I assume) [16:27:27] I am checking now the airflow dashboard of an-airflow1001 and I can't find anything.. maybe the current DAGs are not doing any hadoop related thing? [16:32:30] elukey: the service is run by the 'airflow' user, the jobs are submitted to the cluster as 'analytics-search' [16:33:45] elukey: currently the dag is in the "off" state, so it won't run automagically. I do test runs with `airflow test ` [16:35:12] (03PS5) 10Mforns: Add Spark job to update data quality table with incoming data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549115 (https://phabricator.wikimedia.org/T235486) [16:35:46] and we can make the daemon run as analytics-search if that makes everything significantly easier, it is simply for conceptual reasons that i wanted the service to run as an unrelated user (and in theory that would make it easier for separate airflow instances to be spun up for other teams, maybe) [16:36:21] ebernhardson: ah okok [16:36:48] ebernhardson: where is 'analytics-search' specified? [16:37:58] elukey: arguments to individual tasks (generally inherited from the DAG defaults) [16:38:20] elukey: basically the spark task takes two arguments, principal and keytab, and goes from there [16:38:56] ebernhardson: ok so the airflow user runs the spark task, that then has to read the keytab. [16:39:06] (03CR) 10Mforns: "@Nuria, yes. The last patch set (9) is just replacing data_quality_metrics by data_quality_stats, and I took the opportunity to change the" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [16:39:08] elukey: right, the spark task just runs the spark-submit CLI command [16:39:16] (basically the user running the scheduler is the one executing tasks) [16:39:17] (03PS6) 10Mforns: Add Spark job to update data quality table with incoming data [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549115 (https://phabricator.wikimedia.org/T235486) [16:39:21] elukey: yes [16:40:22] ebernhardson: all right thanks :) will file a change after meetings, I have clearer ideas [16:40:33] ok, thanks! Because my ideas are much less clear atm :) [16:44:58] (03PS2) 10Lex Nasser: Modified external webrequest search engine classification and added tests. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556449 (https://phabricator.wikimedia.org/T239625) [16:45:02] (03PS1) 10Lex Nasser: Fix style and correct incorrect test case. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556730 (https://phabricator.wikimedia.org/T239625) [17:00:44] ping milimetric [17:00:54] ping fdans [17:01:06] omw [17:04:37] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: an-coord1001 hive metastore not listening on ipv6 - https://phabricator.wikimedia.org/T240255 (10elukey) p:05Triage→03Normal a:03elukey [17:04:58] haha elukey you beat me to that by a millisecond [17:09:56] (03PS1) 10Milimetric: Report structured data use for commons [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/556741 (https://phabricator.wikimedia.org/T239565) [18:25:45] ebernhardson: an-airflow1001 kerberized [18:31:11] elukey: awsome! thanks [18:32:03] ebernhardson: so the idea is that user 'airflow' is now able to read the airflow.keytab file, that contains credentials for the 'analytics-search' principal [18:33:12] we'll maybe tune it in the future, but let's see if this works first [18:33:21] what do you think? [18:33:32] elukey: ahh, ok that makes sense. I'll put up a patch and try it (can it auth yet? or do i need to wait for monday) [18:34:20] ebernhardson: it can auth now but probably best to test it on monday after kerberos is up, since spark may be confused if you ask it to authenticate on a non-secured cluster [18:34:46] alright, i'll just prep and deploy them, but wont expect it to work yet [18:35:11] ack [18:38:44] ottomata, milimetric: https://github.com/jobar/hdfs-tools [18:38:49] gone for diner, back after [18:43:01] (03CR) 10Nuria: Report structured data use for commons (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/556741 (https://phabricator.wikimedia.org/T239565) (owner: 10Milimetric) [18:45:32] (03CR) 10Nuria: "code looks good, before merging it we should throughly tested it on the cluster with 1 hour of data using a jar build with this code on th" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556730 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [18:46:42] !log rsync timers deployed on labstore100[6,7] [18:46:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:46:47] \o/ [18:49:44] (03CR) 10Nuria: [C: 03+1] "Looks good, +2 when we have merged" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [18:50:10] (03CR) 10Nuria: [C: 03+1] "Sorry , +2 when we have tested it" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [18:59:18] 10Analytics: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10elukey) @Erik_Zachte Hi! Gentle ping to see if you have time to review the files during the next days :) [19:00:19] all right timers on labstore are working [19:00:21] gooooood [19:00:26] going to dinner :) [19:36:47] hurray for timers on labstore :) [19:45:11] joal: o/ [19:45:14] hi [19:45:30] if you create a branch in github at your initial commit [19:45:35] then you can create a PR against it [19:45:40] and we can use that for review [19:45:45] (otherwise I don't know how to leave comments! :) ) [19:45:50] Ah! [19:45:57] makes sense [19:48:58] ottomata: https://github.com/jobar/hdfs-tools/pull/1 [19:49:27] joal: i thikn you want to make the branch at https://github.com/jobar/hdfs-tools/commit/eed7a7cff21ffd1ad1649fdf3eb8ea24d614b602 [19:49:48] hm [19:49:57] if you make it there, you can make a PR from master to it [19:50:08] and all the commits in master will be part of the PR, and will be able to review them [19:50:23] lke this [19:50:23] https://github.com/wikimedia/eventgate/pull/1 [19:51:45] ok - will merge the current PR then push a branch from 1st commit [19:51:58] k [19:52:06] joal: i think i don't get "We need to differentiate those cases as copying a folder also copies its content, [19:52:06] * and we don't want that when doing recursion into folders." [19:52:11] in createOrCopy new [19:52:28] copying a folder also copies its content? [19:52:33] ottomata: yessir [19:52:38] not sure i understand what that means [19:52:51] ottomata: batcave? [19:52:53] k [20:12:37] ottomata: https://github.com/jobar/hdfs-tools/pull/2 [20:12:51] ottomata: please note the comments at the top of the HdfsRsyncExec file :) [20:13:23] Nice [20:20:44] joal: i did not know rsync had --chmod=CHMOD flag! [20:21:11] ottomata: it is one of the required features to mimic the current rsync :) [20:23:54] joal: do we use --filter? or just --exclude [20:23:55] ? [20:24:17] ottomata: any :) exclude is an alias for filter - X [20:24:23] huh. [20:24:55] The way it works is it stacks the rules, and applies them in order for every file, the first match is the good one: inclide or exclude - No match means include [20:29:24] joal, does dst have to be an existing folder? or does its parent just have to exist? [20:29:27] i often do [20:29:49] cd mystuff; rsync -av ./ remote.host.wmnet:~/mystuff/ [20:29:52] ottomata: dst needs to exist I think [20:30:02] remote.host.wment:~/mystuff/ might not exist [20:30:03] but will be created [20:30:11] ah? [20:30:25] yeah, i think it creates the dst dir if it doesn't exist [20:30:27] I fail in that case... [20:30:30] i think it will fail of the full path doesn't exist [20:30:42] if dst is A/B/C [20:30:47] and A or B don't exist, it will fail [20:30:51] but it will create C [20:31:14] that is nice for first rsyncs; kinda sucks to have to create a directory before your first copy [20:32:28] 10Analytics: Add mediarequests dataset to druid (just some dimensions) - https://phabricator.wikimedia.org/T240613 (10Nuria) [20:34:12] ottomata: later feature? [20:34:24] k [20:34:44] hm joal that means we'll have to puppetize dir creation, (or create dir in script), right? [20:34:57] hm [20:35:36] joal i'm not sure i understand the mlutiple chmod command parser thing [20:35:44] huhu [20:35:45] i get that you might have 2, one for F one for D [20:35:49] or you might have just one for both [20:35:55] but why do you need fold them all together? [20:35:59] ottomata: correct - No prefix means both [20:36:17] ottomata: I loop over the list once [20:36:40] I wondered about looping multiple times - more readable I guess? [20:37:31] i thikn you loop over the list twice? once for files and once for dirs? [20:38:20] ottomata: 3 passes - 1 for validation, 1 for files ChmodParser creation, 1 for dirs ChmodParser creation [20:38:47] would it be better to just validate the number of commands and do it one way or th eother [20:39:01] you should either be given: 1 chmodCommand with NO prefix [20:39:15] or, 1 or 2 commands with prefixes [20:39:16] right? [20:39:27] seems weird to provide one command with prefix and one without? [20:40:03] ottomata: you can do: go-wx D+x [20:40:33] for instance - can be written differently though [20:40:43] ya but why would you want to allow that? [20:40:58] (03PS2) 10Milimetric: Report structured data use for commons [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/556741 (https://phabricator.wikimedia.org/T239565) [20:41:00] (03CR) 10Milimetric: Report structured data use for commons (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/556741 (https://phabricator.wikimedia.org/T239565) (owner: 10Milimetric) [20:41:11] hmmm [20:41:13] ottomata: multi-chmod command for symbolic is by default [20:41:14] oh i think i see... [20:41:27] it actually works in chmod :) [20:41:27] that is crazy stuff [20:41:36] D F don't do they? [20:41:38] that's an rsync thing, no? [20:41:41] yessir [20:41:52] so multi-command, over 2 slots ... [20:41:58] ⊙ω⊙ [20:42:06] mwahahaha :) [20:42:23] wait [20:42:25] no not in chmod [20:42:29] you can't prefix D in chmod, can you? [20:42:32] just in rsync. [20:42:33] right? [20:42:37] nope - But in rsync you can [20:42:39] ah ok [20:42:40] yes [20:42:48] phew thought i was a bonkers man [20:42:53] ok [20:43:32] geez ok [20:44:20] ottomata: I alwas wanted to learn about rsync - Well now I'm kinda ok :) [20:53:05] joal: added some comments :) [20:53:29] 10Analytics, 10Analytics-Wikistats: Create English strings json for vue-i18n to use - https://phabricator.wikimedia.org/T240617 (10fdans) [20:55:38] 10Analytics, 10Analytics-Wikistats: Include locale string jsons as webpack chunks so that only the required language is bundled - https://phabricator.wikimedia.org/T240618 (10fdans) [20:58:20] 10Analytics, 10Analytics-Wikistats: Add stats.wikimedia.org/v2 as a TranslateWiki project - https://phabricator.wikimedia.org/T240621 (10fdans) [20:58:48] nuria: just tasked i18n as requested: https://phabricator.wikimedia.org/T238752 [21:19:33] joal: FYI one of the tests is failing for me [21:19:41] meeh :( [21:20:00] - should copy src to dst recursively with size-only and not copy existing *** FAILED *** [21:20:00] 2 did not equal 3 (TestHdfsRsyncExec.scala:160) [21:20:48] :( [21:21:01] all works for me (intellij +- maven [21:28:11] i'm just doing mvn package on CLI hm [21:28:24] (03PS1) 10Milimetric: Migrate to new dashiki instances [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/556818 (https://phabricator.wikimedia.org/T236586) [21:28:32] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Migrate to new dashiki instances [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/556818 (https://phabricator.wikimedia.org/T236586) (owner: 10Milimetric) [21:29:43] hm joal i guess we should install hdfs-tools.jar into its on directory? not in default hadoop classpath usr/lib/hadoop, because of the shaded jar? [21:30:22] It so so ottomata - No haddop def present, and only scala related stuff, but who knows [21:30:37] yeah, but if we run spark [21:30:40] i think it will load the hadoop CP [21:30:53] yup [21:31:03] might get conflicting scala version if we upgrade spark or something [21:35:41] 10Analytics, 10Analytics-Kanban: Dashiki: Read multiple wikis from single file - https://phabricator.wikimedia.org/T236941 (10Milimetric) @srishakatux: just a ping that this is done. I still want to update the Dashiki docs which are in a very sad state, but before I get to that. To use the feature, you have... [21:37:37] 10Analytics, 10Analytics-Kanban, 10Cloud-VPS (Debian Jessie Deprecation), 10Patch-For-Review: "dashiki" Cloud VPS project jessie deprecation - https://phabricator.wikimedia.org/T236586 (10Milimetric) This is done, updated docs and deployment code, deleting the instances now. [21:40:19] ottomata: I answered to the comments and pushed a new patch with new stuff for FilterRule [21:40:59] looking! [21:41:45] joal: why dropWhile instead of filter? [21:41:54] where? [21:42:01] iiuc dropWhile stops dropping at first non-match? [21:42:08] Yes [21:42:56] We drop first char if it is F or D, or We keep the rest [21:42:57] OH you are just removing it for parsing [21:42:58] GOT It [21:43:00] :) [21:43:04] not for filtering for the type [21:43:25] still, a little weird to use drop while, no? [21:43:28] ottomata: Yes, we already know the type since it is filtered [21:43:28] that will allow someone to do [21:43:35] DDDDDDDDDDo+w [21:43:35] ? [21:43:54] Nope, fails regexp - But would be processed correctly indeed [21:43:57] oh [21:44:21] ottomata: this function is about preparing config internals, not validation [21:44:28] why not just mod.tial [21:44:30] mod.tail [21:44:30] ? [21:44:35] oh [21:44:36] or [21:44:39] huhu [21:44:42] if startWith D or F [21:44:44] mod.tail [21:44:44] ? [21:44:52] Would work the same [21:45:07] i think would be more readable, is strange to loop over the string with a while to drop the first character [21:45:07] if ou prefer :) [21:46:14] hm also, don't you always need to drop the first char if F or D, even if not acc.isEmpty? [21:46:19] ottomata: I like the dropWhile syntax, concise - but can change :) [21:46:50] good catch !!! [21:46:56] milimetric: do you have a couple minutes on the bc? I want to run something by you as an i18n sanity check :) [21:47:09] ofc fdans, omw cave [21:48:16] (03CR) 10Nuria: "Thanks for taking care of this." [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/556818 (https://phabricator.wikimedia.org/T236586) (owner: 10Milimetric) [21:48:37] ottomata: https://gist.github.com/jobar/9a2fb8b0a3ba622f40bfbc06dcd7f2c9 [21:50:42] 10Analytics, 10Analytics-Kanban: Dashiki: Read multiple wikis from single file - https://phabricator.wikimedia.org/T236941 (10Nuria) @srishakatux might not need this cause she is not using the vital-signs layout [21:51:07] lgtm joal, much more readable, maybe just add a comment about removing the rsync speciifc F or D qualifier to make it compatible with normal chmod arg [21:51:36] yup [22:02:00] (03CR) 10Nuria: Report structured data use for commons (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/556741 (https://phabricator.wikimedia.org/T239565) (owner: 10Milimetric) [22:08:31] ottomata: can i get +2 at eeventgate depot? [22:10:00] nuria: i dont [22:10:04] i don't seem to have powers to change [22:10:09] ottomata: k [22:10:26] ottomata: also, are tests run with npm test? or is there anything else? [22:10:31] just npm test [22:12:46] (03CR) 10Nuria: "This change should probably be a new patch on https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/556449/" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/556730 (https://phabricator.wikimedia.org/T239625) (owner: 10Lex Nasser) [22:14:52] nuria: dan showed me how to make it a new patch, will fix later today [22:15:04] lexnasser: sounds good, no rush [22:29:40] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Further improvements to the WMCS edits dashboard - https://phabricator.wikimedia.org/T240040 (10srishakatux) Here is an update on suggested improvements: * In the [[ https://wmcs-edits.wmflabs.org/#wmcs-edits/wmcs-edits-tabular-view | Tabula... [22:30:28] 10Analytics, 10Cloud-Services, 10Developer-Advocacy (Oct-Dec 2019): Further improvements to the WMCS edits dashboard - https://phabricator.wikimedia.org/T240040 (10srishakatux) And, changes can be seen here for testing https://wmcs-edits.wmflabs.org. [22:32:20] ottomata: I have a final version I think :) [22:35:18] ya? [22:36:59] joal: transferTree? [22:37:12] transferRootTree ) [22:38:09] that's just to use glob or to use listSTatuds? [22:38:17] if source had /* in it? [22:39:08] used for filter-rules [22:40:20] ah [22:42:52] hmm, joal i have a hard time understanding how that is used... [22:44:38] transferTreeRoot is supposed to be the root of the src independent of any globs in src path? [22:45:15] ottomata: it needs to be computed per file at tree-root only - BUG ! [22:45:22] i think the word 'transfer' is confusing me [22:45:29] treeRoot? [22:45:47] ottomata: treeRoot can be thought of / [22:45:57] is it always from source path? [22:45:59] here we talk about the / in the context of the transfer [22:46:03] it is used in filter match [22:46:06] but not computed from it, right? [22:46:15] maybe a better name: [22:46:18] srcBasePath [22:46:19] ? [22:46:24] ottomata: used in src and dst (we don't delete extraneous exlcuded) [22:46:37] aye ok, but it means the same? [22:46:46] dst just doesn't have a glob [22:46:46] so [22:46:48] But no need to pass it or dst as it doesn't change [22:46:50] yup [22:46:51] it will be just dest? [22:46:54] co9rrect [22:47:14] hm [22:47:31] i guess treeRoot is ok, if it is explained what it menas a little bit. [22:47:34] srcTreeRoot [22:47:34] ? [22:47:52] you need to pass that around just for the filter matching? [22:47:55] is taht right? [22:47:56] works for me (basePath doesn't sound bad either) [22:48:02] yup [22:48:04] i like base path better [22:48:06] srcBasePath [22:48:09] srcBasePath [22:48:19] yeah, esp. since you have srcPath [22:48:24] makes sense to get the 'base path' out of it [22:49:30] joal i gotta go pretty soon! i have a rudimentary deb ready to just deploy a jar file [22:49:38] dunno if we need a little wrapper or not [22:49:47] Great - doing some more testing [22:49:52] k cool [22:50:02] wrapper would be good ottomata I think [22:50:14] java -cp /home/joal/code/hdfs-tools/target/hdfs-tools-0.0.1-SNAPSHOT.jar:/usr/lib/spark2/jars/*:$(/usr/bin/hadoop classpath) org.wikimedia.analytics.hdfstools.HdfsRsyncCLI [22:50:23] Not so nice ottomata --^ [22:52:46] aye k [22:53:34] joal cool, will do tomorrow! hopefully we can even deploy it tomorrow yesss and puppetize [22:53:36] let'sg OoOOoO [22:53:42] \o/ [22:53:46] Thanks ottomata [22:54:12] laterrrss! [23:01:59] (03CR) 10Nuria: "Virtual +2 if we have tested the job. Seems quite straight forward, thanks for changing the name of the class" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549115 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [23:03:42] (03CR) 10Nuria: "Given that the last patch should only be a name change, virtual +2 if we have tested the job" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [23:06:13] 10Analytics, 10Analytics-Kanban: Dashiki: Read multiple wikis from single file - https://phabricator.wikimedia.org/T236941 (10srishakatux) Thanks @Milimetric for sharing the updates! Currently, we are using the `tabs` layout and if/when we plan to use the `metrics-by-layout` these changes will be helpful. Ma... [23:17:12] 10Analytics, 10Operations, 10decommission, 10ops-eqiad: Decommission analytics100[1,2] - https://phabricator.wikimedia.org/T205507 (10Jclark-ctr) a:05Cmjohnson→03Jclark-ctr