[06:40:29] (PS8) Fhocutt: Add user names to json report, corresponding tests [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/181770 (https://phabricator.wikimedia.org/T74747) [08:04:17] Analytics, MediaWiki-Core-Team, Wikimedia-Site-requests: Ran out of captcha images - https://phabricator.wikimedia.org/T91760#1099579 (matthiasmullie) Open>Resolved a:matthiasmullie Seems like someone has already fixed this. Special:UserLogin&type=signup works again and the exception is no lo... [14:39:11] Heya [14:39:30] mobile_apps uniques jobs started [14:39:53] I am monitoring 'qchris' way (or at least trying) [14:39:59] Let me know if anything wrong [14:40:01] :-) [14:40:18] ;) [14:42:11] Thx qchris for balancing datanodes this weekend :) [14:42:21] It's better not to have unhealthy nodes ! [14:42:24] yw. [14:42:28] Yes. Definitely :-D [14:42:51] It will take some more 10 days to have them fully balanced. [14:43:16] But I guess we should discuss with ottomata if we even want a fully balanced hdfs. [14:44:03] I guess 'fully balanced' might not be needed, but at least 'reasonnably balanced' :) [14:44:39] The most missbalanced not has "DFS used" 84% right now. [14:44:39] joal: btw, did you try anything that would require the wmf ldap group membership? [14:44:50] s/not/node/ [14:45:23] (lowest "DFS used" is at 48%) [14:45:29] qchris: well, that looks like a lot misbalanced :) [14:45:40] YuviPanda: not yet [14:46:00] joal: can you try? simplest would be to try logging in to graphite.wmflabs.org [14:46:03] qchris had me sorted out by a trick on Friday, so I didn't test with it [14:46:11] Meh. Not so badly misbalanced. It's already the same digits. Only at different places. [14:46:39] huhu :D [14:47:04] :-D [14:47:44] YuviPanda: seems not to work :( [14:48:10] boo [14:48:23] joal: do you need it immediately or can I poke aorund later? [14:48:49] I'm ok now, you can get back when you'll have time [14:48:53] Thx YuviPanda [14:49:00] alright [14:51:06] (CR) Nuria: Add user names to json report, corresponding tests (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/181770 (https://phabricator.wikimedia.org/T74747) (owner: Fhocutt) [14:51:30] Daily jobs moving forward correctly [14:52:16] (CR) Nuria: [C: 2] Add user names to json report, corresponding tests [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/181770 (https://phabricator.wikimedia.org/T74747) (owner: Fhocutt) [14:53:28] (PS1) QChris: Add quiet mode to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195258 [14:53:30] (PS1) QChris: Add verbose mode to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195259 [14:53:32] (PS1) QChris: Do not attempt to run skeleton script for guards when running all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195260 [14:53:34] (PS1) QChris: Stop shifting twice when passing --rebuild-jar to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195261 [15:02:56] joal: what username did you use to try logging into graphite? [15:03:43] YuviPanda: Joal [15:03:49] and joal (no capà [15:03:52] hmm [15:03:53] ok [15:11:39] AH! i overslept! [15:11:40] a lot! [15:12:03] ottomata: :) [15:12:27] wow, i never set an alarm, but today I guess I should hav e [15:12:42] sorry i missed standup yall! [15:12:43] Sleep is good as well [15:14:06] no worries, DST is evil [15:14:20] still though, i over slept the non DST time too [15:22:53] ottomata: dailight savings BOOO [15:23:10] I'im not sure I can blame DST for this [15:23:35] i slept in more than I usually would have even if no DST [15:25:56] qchris, thanks for dealing with the balancer stuff this weekend [15:26:03] that was something that I had put off thikning about until this week :) [15:26:10] i just wanted to hackathon last week! [15:26:14] ottomata: yw. [15:26:43] The balancer is still running. [15:26:49] But all nodes are healty again. [15:27:22] awesooome [15:27:23] Just let me know when you think it's balanced "enough" and I'll kill it ... or you can just kill it yourself. [15:27:38] so, just running with a higher throughput spead it up and actually balacned? [15:27:50] Yup. [15:27:50] i'm considering running that more periodically [15:27:55] maybe once a day? once a week? not sure [15:28:11] i think the reason we've really had to balance is because I haven't deleted refined data yet, so we are storing much more [15:28:14] Do we want a fully balanced cluster at all? [15:28:24] I mean ... [15:28:26] i mean, we don't need 100% [15:28:31] Some machines have more disks ... [15:28:36] yes [15:28:39] well, more space [15:28:39] yes [15:28:42] Balancing means getting more data onto those machines. [15:28:53] Which means ... overall ... [15:29:01] not that much data is local to CPUs. [15:29:03] oh, i think that's fine. that means that load won't be balacned [15:29:08] hm [15:29:26] But I haven't fully thought this through. [15:29:35] or, those machines will just have more jobs on them than the others. although i suppose if our cluster gets close to 100% utilized all the time [15:29:38] then yes, things won't be local [15:30:12] i know that I don't want the smaller nodes to be full while there is plenty of space on the others [15:30:22] Full ACK. [15:30:24] and, jobs that write data want to do so locally [15:30:26] right? [15:30:36] Not sure about that. [15:30:40] But it would make sense. [15:30:42] me neither [15:30:43] aye [15:31:05] Aaaaanyways. Running some balancing periodically sounds like a great idea. [15:31:13] yeah [15:31:24] And running with 40x the default bandwidth helped getting the balancer to do work. [15:31:31] And the cluster was still very responsive. [15:33:02] cool, ok, good to know [15:45:04] (CR) Ottomata: [C: 2 V: 2] Add quiet mode to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195258 (owner: QChris) [15:45:29] (CR) Ottomata: [C: 2 V: 2] Add verbose mode to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195259 (owner: QChris) [15:46:16] (CR) Ottomata: [C: 2 V: 2] Do not attempt to run skeleton script for guards when running all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195260 (owner: QChris) [15:46:32] (CR) Ottomata: [C: 2 V: 2] Stop shifting twice when passing --rebuild-jar to script that runs all guards [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195261 (owner: QChris) [15:48:54] mforns: I had a quick look at the original example, I know what the issue is [15:49:07] Hey Guys [15:49:13] The "absolute" bars are blocking the hover event [15:49:23] is the splitwise finished ? [15:49:25] (on the "relative" bars) [15:49:26] oh, ok, I am fixing the missing semicolons and such stuff and I'll push it to gerrit [15:49:37] milimetric, I see [15:49:53] ping me if you wanna debug it together, I'm not 100% sure on the fix but I've had similar problems [15:50:08] joal: yea, splitwise is done, but I couldn't find a way to tell it what my paypal is [15:50:24] Mouarf ... [15:50:26] in any case, I have venmo and paypal, both should be under dan.andreescu@gmail.com, so whichever one you prefer [15:51:45] Done through paypal [15:52:05] Let me know if you get credited ! [15:52:30] milimetric, yes, we can debug it on the batcave [15:53:17] milimetric, yes you're right [15:53:26] brt [15:53:51] the absolute bar height is blocking the hover event, ninja [16:00:12] hmm, joal, i see a mobile_apps_uuids query running that is in the default queue [16:00:16] is that january? [16:00:39] It is [16:00:50] hm, the daily ones are in default too? [16:00:53] ottomata: We decided to run uniques in the default [16:00:57] yup [16:01:02] hm, ok [16:01:08] We ran into a problem friday evening :S [16:01:16] oh> [16:01:17] ? [16:01:26] uniques are big jobs, so they starve the cluster if in essential queue [16:01:37] aye, hm [16:01:41] Thx to qchris it has been noticed before causing troubvle [16:01:45] ha, maybe we need a 3rd queue! [16:01:55] perhaps adhoc needs to come back :) [16:01:57] or [16:01:57] But has been killed and restarted with monitoring today [16:02:05] something for prod jobs that are not essential [16:02:11] Well, I think it would be a good idea [16:02:59] I would even create one more : one for prod jobs that can be slow (uniques monthly for instance) [16:04:34] milimetric: Got the confirmation from paypal [16:08:19] ottomata: killed the daily ones ... Need to change concurrentcy to 1 [16:08:54] joal, I mean, there is an easier way to speed up the uniques job [16:09:00] well, not easier. But, more sustainable. [16:09:23] implement it as an actual mapreduce job using a streaming framework. We only actually care about one row at a time and an ever-growing list, so.. [16:10:24] ottomata: Will do the same for monthly actually [16:11:39] Ironholds: Not talking about efficiency yet [16:11:42] But why not [16:13:03] ? [16:13:07] using a streaming job? [16:13:14] *framework? [16:13:29] (PS1) Joal: Correct concurrency bug on mobile_apps uniques computation [analytics/refinery] - https://gerrit.wikimedia.org/r/195285 [16:14:11] (PS2) Joal: Correct concurrency bug on mobile_apps uniques computation [analytics/refinery] - https://gerrit.wikimedia.org/r/195285 [16:14:19] does that really help a lot joal? makes sense maybe for monthly since thee jobs are so far apart [16:14:27] and use a lot of data if you were backfilling [16:14:28] but [16:14:47] but, 2 on dailies just means that a couple of jobs would be running at once, right? [16:14:48] mforns: it works with visibility [16:14:56] milimetric, oh! [16:15:03] it's just this line: style('pointer-events', 'all') [16:15:07] how did you do it? [16:15:12] that needs to change to 'visible' instead of 'all' [16:15:22] ottomata: problem is that since we overwrite data, having concurrency 2 fails [16:15:27] because otherwise it's capturing the mouse regardless of visibility [16:15:34] oh, you mean pointer-events = visible [16:15:36] ottomata: makes sense ? [16:15:53] OH [16:15:57] because it shares the same file now [16:15:58] hm [16:16:02] (CR) Nuria: [C: 1] Correct concurrency bug on mobile_apps uniques computation [analytics/refinery] - https://gerrit.wikimedia.org/r/195285 (owner: Joal) [16:16:02] Correct ! [16:16:28] well, you have to make pointer-events visible, yes, but there's a style call on the absolute bars that sets it to 'all' [16:16:38] ok [16:16:39] ottomata: So we'll get slower to catch up (or can even do a big job manually if you prefer [16:16:42] hm, ok. can you add a comment there explaining that, so someone doesn't accidentally increase it in the future? [16:16:44] and it looks like .style('pointer-events', 'all') [16:16:54] thx nuria [16:17:02] (I can't tell you the line 'cause I hacked it up too much :)) [16:17:11] milimetric, I found it [16:19:07] milimetric, it works :], thanks [16:19:13] cool [16:20:01] milimetric, I'll try to avoid reordering of the columns and push it for review [16:20:45] k, cool. kevin did it by appending a number in front of the user class and sorting by that [16:21:00] i'm looking into the anonymous numbers and if they make any dang sense :) [16:21:51] milimetric, the data from the csv is already sorted correctly, the chart is reordering it by freq [16:22:15] ok [16:24:24] ottomata: What do you think ? [16:25:17] ? [16:25:31] ottomata: About concurrency [16:25:43] oh: "hm, ok.  can you add a comment there explaining that, so someone doesn't accidentally increase it in the future" [16:26:00] Sure :) [16:28:07] (PS3) Joal: Correct concurrency bug on mobile_apps uniques computation [analytics/refinery] - https://gerrit.wikimedia.org/r/195285 [16:28:18] Need to run, will check the thing later tonight [16:28:59] k latesr [16:29:17] (CR) Ottomata: [C: 2 V: 2] Correct concurrency bug on mobile_apps uniques computation [analytics/refinery] - https://gerrit.wikimedia.org/r/195285 (owner: Joal) [16:32:36] nuria, are you coming to the retrospective? [16:44:31] mornin' springle. Do you have 15 min today that I ask you for some advice for my queries? [16:45:03] I was able to improve some of them to run faster, but some are still taking a long time, springle. [16:54:53] hi milimetric. I'm gonna look at the query for VE you shared in the afternoon and will let you know if I have any clue. [17:22:49] heyYYy, check this out: https://yarn.wikimedia.org/cluster/scheduler :) [17:28:44] milimetric: shouldn't we be keeping the IP address for anons as their usernames? [17:31:43] leila: yeah, the IP address is kept, but the data we're using here is the EL data [17:32:07] oh, I see, milimetric. [17:32:19] so, I [17:32:19] but yeah, the next step is for me to join to the different wikis and try to figure out if the data matches what we have in the db [17:32:40] so, I'm looking at dewiki.user, and I can't figure out how we keep track of anons there. [17:32:50] it's not in .user it's in .revision [17:33:01] and it's basically, "most of the time if rev_user = 0" [17:33:07] oww, got it. [17:33:19] https://www.mediawiki.org/wiki/Manual:Revision_table#rev_user [17:33:37] and rev_user_text is the IP you were mentioning [17:33:59] although the IP can be mangled in early history, as you can see :) [17:34:35] yeah, now I see it. [17:48:07] kevinator, I did not understand if I should ping you or you should ping me, anyway I'm ready whenever you are [17:48:30] mforns: I was about to ping you... [17:48:44] I’m joining the batcave now [17:48:47] kevinator, ok [19:02:08] ottomata: awsome acces to scheduler :) [19:03:09] ja, the annoying thing is that i can't figure out a way to tell it to always use that url [19:03:14] e.g. if you click on an ApplicationMaster [19:03:24] the proxyied url you get always uses analytics1001.eqiad.wmnet:8088 [19:03:26] Yeah, I know what you mean [19:03:28] yeah [19:03:35] but, this is better than it was before [19:03:39] :) [19:03:41] Definitely [19:03:46] Thx for that [19:03:55] I am gonna deploy now, then lauch the jobs [19:11:34] Deployed [19:12:32] Launched daily jobs [19:12:42] Will wait for monthly until the daily ones have caught up [19:14:08] cool [19:14:24] you are rerunning for january, right? [19:14:37] correct [19:14:54] The definition changed, so decided to re-run everyhting [19:16:04] ok cool. [19:16:12] i want to delete january data soon, so let me know when that looks good [19:16:30] Yup [19:37:51] halfak, soo, if the p-value for the fisher exact test is <0.00000000000000022... ;p [19:39:56] milimetric, yt? [19:40:02] hi mforns [19:40:04] hi1 [19:40:18] question on the chart [19:40:44] milimetric, how we are going to query this chart? is it supposed to have a date selector? [19:41:04] because until now, I've treated it as 'not timeboxed' [19:41:25] right, that's fine - if it had a date selector it would just operate on the data passed to the chart [19:41:29] so the chart wouldn't have to know [19:41:45] but mforns, I don't know if you saw the recent thread on -internal [19:41:49] so we can pass to it like montly reports [19:41:55] yes... [19:42:09] but I suppose the instrumentation will be corrected, or? [19:42:35] I don't know the problem yet. I was just talking to memeht [19:43:01] I would say, for now, we should either pause the user type analysis work or look for the instrumentation problem [19:43:18] ok, milimetric, anyway I finished fixing te bugs [19:43:32] cool, yeah, so don't worry about timeboxing for now [19:43:47] and if we do not need any date selector, I think I'll push what I have [19:44:06] all right [19:44:26] thx [21:15:23] (PS1) Mforns: Query and visualization for failure vs user analysis [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/195436 (https://phabricator.wikimedia.org/T91123) [21:16:07] ottomata: Looks like analytics1021 is having issues again since about an hour [21:16:27] Oh ... it's just coming back? [21:16:41] i just did election [21:16:41] * qchris looks puzzled [21:16:46] Ah. Yes :-D [21:16:50] That explains things. [21:16:52] (PS2) Mforns: Query and visualization for failure vs user analysis [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/195436 (https://phabricator.wikimedia.org/T91123) [21:25:37] ottomata: About the syncing of the mediacounts files ... it's currently set up to rsync the plain mediacounts files from stat1002's hdfs mount onto dataset1001. [21:25:58] aye [21:26:08] Now from those plain mediacounts file, ezachte computes top1000 files which hold the top1000 entries for some columns. [21:26:22] He computes them outside of the cluster, so the files end up in plain fs. [21:26:46] They should get rsynced to the same directory as the plain mediacounts files. [21:27:02] Would you prefer ezachte to "hdfs dfs -put" them onto hdfs [21:27:07] (so everything is in one place) [21:27:21] or should I setup a separate rsync cron job to get the files in place? [21:29:33] hm [21:29:40] i think putting them in hdfs is fine [21:30:23] cool. [21:30:26] Analytics-Engineering: Missing dialect-specific directories in the pageviews definition - https://phabricator.wikimedia.org/T92020#1101669 (Ironholds) NEW [21:32:44] ottomata: privilege-wise ... /wmf/data/archive/mediacounts/daily/2015 is currently owned by hdfs:hadoop [21:32:58] So ezachte does not have access to write there. [21:33:16] Is it ok, if I just adjust the privileges in hdfs, or [21:33:25] should that go through some puppet code, or [21:33:33] should ezachte get added to the needed groups? [21:34:04] hm, [21:34:38] Or should I add a cron that copies the files as hdfs user? [21:34:42] qchris: i think you shoudl just adjust manually, make it group owned and writable by analytics-privatedata-users [21:34:55] k. Will do. [21:35:00] do -R from mediacounts/ level i think [21:35:06] k [21:35:25] Thanks. [21:35:51] yup :) [22:07:40] Analytics-General-or-Unknown, CA-team, Community-Liaison, Wikimedia-Extension-setup: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1101954 (Legoktm) The Piwik extension is not hosted in WMF's gerrit installation and the last time I asked the maintainer was not interest... [22:16:32] (PS1) QChris: In quiet mode of script that runs all guards, report failed jar rebuild [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195457 [22:22:52] (CR) Ottomata: [C: 2 V: 2] In quiet mode of script that runs all guards, report failed jar rebuild [analytics/refinery/source] - https://gerrit.wikimedia.org/r/195457 (owner: QChris) [22:23:06] Thanks! [22:23:44] ottomata: Since you're merging all the patches to the script ... did you see the change to get the script into a cron? [22:23:58] https://gerrit.wikimedia.org/r/#/c/195262/ [22:25:41] i ddi not! [22:30:02] Analytics-Kanban: Missing dialect-specific directories in the pageviews definition - https://phabricator.wikimedia.org/T92020#1102031 (kevinator) [22:34:07] qchris: lemme think about that more tomorrow [22:34:10] i gotta run [22:34:11] laaters! [22:34:12] sure.