[07:11:01] Analytics-General-or-Unknown, Language-Engineering, Mobile-Apps, Wikipedia-Android-App-Backlog, and 2 others: there should be a comparison of clicks count on interlanguage links on different platforms - https://phabricator.wikimedia.org/T78351#2340348 (Arrbee) [08:31:37] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2340456 (elukey) I discussed with @ema the inconsistency that we are seeing and we came to the conclusion that this change could be... [08:38:44] * elukey going afk for a couple of hours [10:51:29] * elukey back [10:51:42] I checked Kafka's metrics and we are kinda good [10:52:10] kafka1013 is getting below 20% for free space available on some partitions [10:52:18] (disk partitions) [10:52:46] buuut theoretically today/tomorrow we should see some log purging [10:53:30] I'll keep watching and in case lower down upload's retention to two days to trigger a manual purge [11:14:02] Analytics-Kanban, Operations: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340779 (elukey) @mforns: Sure! I think that we should follow up on the items in the task's description, namely: 1) follow up with upstream and package/test/deploy th... [11:15:17] Analytics-Kanban: Ease restarting and backfilling of jobs in cluster {hawk} - https://phabricator.wikimedia.org/T115985#2340780 (JAllemandou) @nuria: I wrote a script allowing to restart jobs either based individually on an ID, or globally based on folder-to-name convention. See refinery/bin/refienry-oozie... [11:34:01] * elukey commuting to the office [11:55:12] joal: aloha! [11:55:23] whenever you have a minute we can discuss your results [11:55:26] elukey: \o [11:56:25] elukey: batcave? [11:57:15] sure, give me a minute [11:57:20] sure :) [12:46:00] elukey: actually, higher throuput when loading !!!!! [12:46:14] joal: what?? :/ [12:46:25] elukey: sorry, when compacting load [12:46:38] elukey: must be related to cached files [12:47:06] yeah probably [13:03:58] o/ joal & milimetric [13:04:00] will join shortly [13:04:12] bienthi halfak_ ! [13:04:23] Hi halfak_ sorry, I'm also late-ish [13:06:08] Analytics-Kanban, Operations: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2340918 (mforns) Thanks @elukey! [14:33:38] elukey: I have a conflict for analyticops meeting later on today [14:34:03] I trust you'll make good discussion with ottomata on kafka and take good decision (you know my view on it :) [14:34:58] joal: sure! [14:35:07] WIPE ALL THE THINGS! [14:35:08] :D [14:35:14] :D [15:06:38] ottomata: o/ [15:06:45] HIIII [15:18:14] Hey ottomata :) [15:18:38] hiyaaa [15:18:46] How are you todayyyy? [15:19:50] sleeeeppyyy! long bike ride yesterday and my body is tirrreeed. got up early for an acupuncture appointment, which is basically just a nap time, so it made me sleeeeppyyy [15:19:52] how are you!? [15:19:59] :) [15:20:02] Goooood :) [15:20:22] Happy from a nice weekend, some good wether, and good progress on casandra ;) [15:20:32] ottomata: i added tons of things to https://etherpad.wikimedia.org/p/analytics-ops so you'll be super sleepy during standup :P [15:21:14] ottomata: Had a question about caravel: Can I try to change bits? [15:21:49] ottomata: Actually, looks like I can't save ;) [15:22:02] :) [15:22:10] bits? [15:22:13] ottomata: I'd like to add view_count as a SUM, MIN and MAX metric [15:22:31] Oh, my modif were actually saved ! [15:22:34] in caravel? or [15:22:34] haha [15:22:39] joal you know as much about caravel as me! :) [15:22:41] i just turned it on [15:23:13] :) [15:23:16] elukey: nice :) [15:23:26] ottomata: I'd like to have view_count instead of COUNT(*) [15:23:54] joal i think it gets its metrics (dimensions?) from the dataset definition in druid [15:26:07] it does seem like the druid support is pretty alpha [15:26:33] i guess, except that was the original use case [15:26:35] Analytics-EventLogging, Need-volunteer, Story: User clicks on link to event capsule schema while viewing a schema - https://phabricator.wikimedia.org/T74745#2341229 (Danny_B) [15:27:08] Analytics-Dashiki, Story: EEVSUser selects ALL wikis - https://phabricator.wikimedia.org/T70478#2341236 (Danny_B) [15:31:56] joal: https://plus.google.com/hangouts/_/wikimedia.org/wikiscan [15:32:15] though the others aren't there either [15:32:40] Analytics-Kanban: Ease restarting and backfilling of jobs in cluster {hawk} - https://phabricator.wikimedia.org/T115985#2341277 (Nuria) Open>Resolved [15:39:45] Quarry, Labs, Labs-Infrastructure: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#2341321 (chasemp) p:Triage>Normal @yuvipanda any ideas? [16:27:55] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341533 (Ottomata) > This is a problem in the way we check data integrity rather than in vk itself, so we should fix our calculation... [16:27:59] milimetric: looks like hangout is having trouble [16:28:05] Can't connect bakc [16:29:13] milimetric: please excuse me to both Marti and Akeron :( [16:29:30] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2341538 (elukey) >>! In T136314#2340456, @elukey wrote: > 1) vk is correctly adding the start timestamp to our logs but this trigger... [16:29:38] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: More solid Eventlogging alarms for raw/validated {oryx} [8 pts] - https://phabricator.wikimedia.org/T116035#2341539 (Ottomata) Resolved>Open Nuria, this was legitimately re-opened. I apparently have some loose... [16:30:46] you're right joal no worries they understood [16:30:54] there wasn't much more to talk about [16:31:11] Akeron was just saying that he thinks these kinds of statistics are very useful for volunteers, motivating and fun [16:31:25] milimetric: he's a nice guy from his voice :) [16:31:42] ok, I'll send notes about the call to -internal, joal, and it looks like staff got pushed 30 minutes so let me know when you want to sync up on the schema work [16:31:54] I'm just gonna grab some lunch until then [16:31:59] milimetric: k [16:34:01] milimetric, mforns : batcave on schema when you have time? [16:34:37] Analytics, Pageviews-API, RESTBase-API: AQS: query multiple articles at the same time - https://phabricator.wikimedia.org/T118508#2341556 (Milimetric) I agree, and there's support for this in the python client I maintain, along with other stuff that we don't want to do on the server like month-level... [16:34:48] ok, joal, going [16:35:13] milimetric: Please take your time for lunch first ! [16:36:15] ottomata: kafka delete policy merged, I also executed it on kafka1013 and all looks good [16:36:28] ok cool, executed it? [16:36:30] oh puppet [16:36:30] aye [16:48:05] milimetric: how was meeting with grant requestor? [16:48:26] meeting with joseph now about schemas, will send notes after but we can chat in cave if you like [16:55:13] elukey: , do you mind if i tweak your vk dashboard a little bit? mostly just some aesthetic things [16:56:26] ottomata: please do :) [17:00:13] cool, elukey interesting! just noticed deployment-cache-text04 is in these graphs! :) [17:00:17] dunno how that got in there, its from labs! [17:12:30] (CR) Ottomata: [C: 1] "One nit, aside from that +1 can merge." (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/290639 (https://phabricator.wikimedia.org/T130123) (owner: Madhuvishy) [17:13:46] ottomata: Hi - i didn't know if you just put a string in its own line it will eval on its own [17:14:07] sorta makes sense though [17:14:13] madhuvishy: i think it will, but you should verify [17:14:17] yeah [17:14:20] you worried about shell wildcard? [17:14:21] okay will do [17:14:24] yes [17:14:47] wikimedia/mediawiki-extensions-EventLogging#560 (wmf/1.28.0-wmf.4 - ae29826 : thcipriani): The build has errored. [17:14:47] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/ae298268f9af [17:14:47] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/134227514 [17:17:59] ottomata: [17:18:03] without the eval [17:18:06] https://www.irccloud.com/pastebin/hsRrmt6f/ [17:19:29] m, weird [17:19:30] hm [17:20:07] testing a bit more [17:20:58] ottomata: oh [17:21:01] the add works [17:21:04] the commit doesn't [17:21:22] because it has a string inside that something goes wrong with [17:21:52] hm [17:22:03] it is somehow [17:22:08] trying to do git add [17:22:28] because the first word in the commit message is add [17:22:34] and is treating it as a command [17:22:39] and trying to add the rest [17:22:42] hmm, that doesn't sound right [17:22:42] which are not files [17:22:43] really? [17:22:53] it is 'Add' [17:23:00] if you read the log i pasted [17:23:00] not 'add' [17:23:13] the commit message is git commit -m 'Add refinery-source jars for v0.0.30 to artifacts' [17:23:15] that i werid [17:23:18] right but, [17:23:20] Add != add [17:23:20] if you did [17:23:22] git Add [17:23:23] it would error [17:23:23] i know [17:23:27] oh, but maybe not on your mac. [17:23:30] not sure [17:23:41] yeah, it does [17:23:42] hm [17:23:50] madhuvishy: echo each of the commands out [17:23:56] i did [17:23:59] oh you are in dry run [17:24:33] no ottomata I'm trying with review, but I commented out the push part and added some echos [17:24:37] aye ok [17:24:51] huh i don't understand the add thing you are seing [17:24:54] can't see how it would do that [17:25:02] I have no idea [17:25:13] if you do [17:25:16] echo $COMMIT_COMMAND [17:25:17] but somehow its not treating the commit message as a string [17:25:17] what does it have? [17:25:19] what you'd expect? [17:26:26] https://www.irccloud.com/pastebin/MhBMPAR2/ [17:26:41] this is what is happening [17:26:50] ottomata: the echo looks right [17:27:14] if i do eval $COMMIT_COMMAND though, it does the right thing [17:27:30] oh [17:27:40] madhuvishy: ha, it isn't interpreting the 'Add' [17:27:41] sorry [17:27:41] yeah [17:27:49] git commit with args will attempt to add them first before committing [17:27:57] oh like that [17:28:00] hm, ok, welp, i guess leave the add then! [17:28:09] leave the eval? [17:28:13] sorry [17:28:13] yeah [17:28:31] okay then i'll leave it in in both lines for consistency? [17:30:06] a-team: staff on batcave? [17:30:13] nuria_: joining! [17:30:22] OO [17:30:36] (CR) Madhuvishy: Add script for jenkins to commit latest source jars to artifacts (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/290639 (https://phabricator.wikimedia.org/T130123) (owner: Madhuvishy) [17:36:24] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2341753 (Tbayer) It does not look like the daily active users (`wmf.mobile_apps_uniques_daily`) have been fixed yet: {F4094958, width=80%} Data source: ```lang=sql... [17:45:54] (CR) Ottomata: [C: 2 V: 2] Add script for jenkins to commit latest source jars to artifacts [analytics/refinery] - https://gerrit.wikimedia.org/r/290639 (https://phabricator.wikimedia.org/T130123) (owner: Madhuvishy) [17:52:23] madhuvishy: merged yo thang [17:52:33] ottomata: yup just aw thank youuuu [17:52:36] saw [17:52:39] joal: forgot to tell you, kafka1013 auto-purged itself [17:52:42] all good [17:52:47] elukey: goooood :) [17:53:36] urandom: hi! time for a cassandra question? [17:56:47] elukey: sure [17:58:34] thanks! [17:59:19] so we are wondering what kind of failure resistance we will be able to offer with the new AQS cluster [18:00:00] since we have raid0 for each /srv/cassandra-{a,b} mountpoint, a disk failure will mean a cassandra instance failure [18:00:29] so one disk failure shouldn't be a big problem since the cluster will be able to move forward (read and writes) [18:00:53] but in the unlikely event of a second disk failure, there might be some stall IIUC [18:00:58] both reads and writes [18:01:47] well, there is always some level of failure that will reduce availability [18:02:03] so I was wondering if it would be ok to remove a cassandra instance from the ring straight after a disk failure [18:02:20] to force the cluster to rebalance and restore the replication factor [18:02:56] well [18:02:57] this would allow to tolerate another disk failure easily (at least on paper), but I am not sure about the overhead [18:03:34] you could, yeah, but assuming the plan to fix the array and array and re-add the instance, you'll be double handling the data [18:04:20] meaning, you'll have to restream data around to get your replication factor back up after removing the failed node, and then, when you add the node to replace it, the bootstrap will require moving all of that data around again [18:04:48] yep yep, I would use this option only as "red button" [18:05:15] but I wanted to get your opinion about this idea [18:05:55] also, dependening on the amount of data, that red button might not bring you back to "normal" right away [18:06:08] right, this is also true [18:10:58] urandom: will double check also what restbase does tomorrow, and probably I'll ask more questions :) [18:11:07] thanks! [18:15:09] a-team: logging off! Have a good evening/afternoon :) [18:15:24] laters! [18:16:44] Analytics-Kanban: Create separate archiva credentials to be loaded to the Jenkins cred store {hawk} - https://phabricator.wikimedia.org/T132177#2341946 (Ottomata) Hm, I'm having a strange problem with my ops pwstore. I want to save this new Archiva user's creds to the archiva file, but I keep getting: ```... [19:00:46] milimetric: joal, also to play: [19:00:53] /home/otto/plyql/bin/plyql -h druid1001.eqiad.wmnet -q 'describe pageviews-hourly;' [19:01:23] cool, did you have to setup anything extra for plyql to work? [19:01:41] is pivot or druid responding to that? [19:01:44] its a node module, so i had to do just like i did for pivot [19:01:46] druid [19:01:46] is [19:01:51] i just npm installed in labs [19:01:56] and then copied everything manually to stat1002 [19:02:15] oh, ok, but pivot has the interface and talks to druid after it translates the plyql to druid query json [19:02:16] cool [19:02:17] thx [19:02:20] yes [19:02:44] ottomata: what should I do to help productionize pivot? [19:04:00] milimetric: not sure. i'm actually kinda looking at that now, seeing if it is not too hard to make a deb package from it...but never done that with npm [19:04:04] /node [19:04:27] for node don't we usually just freeze the deps like with aqs? [19:04:48] aqs-deploy has the results of npm install [19:05:11] yes that might be possible too [19:05:29] wanted to see if i could make it into a deb, since it is more of a service like other things, rather than a self developed app [19:07:36] hm, aqs is a service, but ok, a deb is cool too. So if you think that's too hard I can setup pivot and pivot-deploy or something (I know we don't like tech names in repo names) [19:12:53] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2342225 (Nuria) mmm.. given that daily users relies on app install id and that at least 1 request must have been sent to the regular php endpoint (correct?) as app can... [19:13:04] milimetric: tech names in repo names are fine, no? [19:13:14] anyway, ja i think if i can do deb, it would be preferable in this case [19:13:41] mainly because we aren't in charge of the release process. the node deploy repo thing is (afaik) usually used for things we develop in house [19:13:48] that's more along the lines of what we deploy [19:13:57] often [19:14:17] this is more like hue or something [19:14:28] where it would be nice to install the thing, puppetize the confs, and have puppet run it [19:14:36] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: More solid Eventlogging alarms for raw/validated {oryx} [8 pts] - https://phabricator.wikimedia.org/T116035#2342236 (Nuria) Ah, sorry, I was moving things around and must have closed it by mistake. Very sorry. [19:14:45] ottomata: I played quite a bit with caravel too, it's really nice [19:14:51] ja? [19:15:01] ottomata: a bit more complex than pivot, but cool [19:15:01] it seems a little annoying that it errors for first load on many things [19:15:12] not true anymore [19:15:12] its running in debug mode, so that's why you can see those stack traces [19:15:17] that can be turned off [19:15:17] k [19:15:20] no? [19:15:26] it helped me actually :) [19:15:35] joal: you should make a dashboard for pageview-hourly! :) [19:15:36] ottomata: quick batcqave for demo? [19:15:48] ok! [19:15:53] you're completely right, thought about it as well :) [19:30:56] the in-house vs. third-party distinction makes sense, but when we npm install for aqs it still brings in tons of third party stuff, so it's not completely black and white [19:31:05] I support trying to make a deb though [19:36:51] np milimetric, was a short little one :) [19:36:54] milimetric: aaaaaaa, i dunno [19:36:58] about deb for pivot [19:37:00] it is possible [19:37:04] but would be also very hacky [19:37:14] since we already do the deploy repo thing, maybe we should just do that [19:37:32] was just exploring with npm2debian [19:37:35] to see what it would do [19:37:38] it does nothing good :) [19:37:40] :p [19:37:42] ha [19:38:56] yeah, I can help make the case for the deploy repo thing if others oppose it [19:39:54] Analytics-Kanban: Create separate archiva credentials to be loaded to the Jenkins cred store {hawk} - https://phabricator.wikimedia.org/T132177#2342374 (MoritzMuehlenhoff) Giuseppe's key expired, so it's treated as invalid for now. @Joe can you please create a new key and sign it with your old one (or create... [19:42:36] milimetric: i think it won't be controversial [19:43:00] if you want and can make a pivot thing deployed with scap, i can do puppetization for it [19:43:21] kindof a random question ... looking to process the full commoncrawl dumps, looks to be ~22k 1GB files. I'm not sure how to get them off the web and into hdfs though, as web access is blocked from stat1002 (i think there is some proxy but i can't remember what?) [19:43:36] also ... is there any issue with me dumping 20TB of files into hdfs as long as i deleted them a few days later after processing? [19:44:35] ebernhardson: https://wikitech.wikimedia.org/wiki/Http_proxy [19:44:41] and, that should be fine :) [19:44:52] ottomata: that was too obvious, thanks! [19:48:48] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2342415 (JAllemandou) @Tbayer: I'm sorry I have been too fast in assuming just the pageview check was incorrect. I have found the culprit and I'm currently sending a p... [19:49:11] Analytics: Set up a deployment repository for Pivot - https://phabricator.wikimedia.org/T136640#2342416 (Milimetric) [19:51:55] (PS1) Joal: Correct bug in mobile_apps uniques jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/291977 [19:52:28] bearloga: squinting at http://discovery.wmflabs.org/external/#traffic_summary , that peak in "none (direct)" at the beginning of april looks interesting - any idea what happened there? just one spider getting busy? [19:52:51] (there's no corresponding increase in https://vital-signs.wmflabs.org/#projects=all/metrics=Pageviews , so i assume it wasn't human traffic) [19:56:51] nuria_: just saw your comment on the mobile_apps uniques thing [19:56:58] joal: aham [19:57:01] nuria_: I submitted a patch (see above) [19:57:12] joal: have not looked at code yet [19:58:14] milimetric: can you CR my two small patches in anayltics.wikimedia.org? [19:58:20] sure [19:58:34] joal: i see, sorry [19:59:04] joal: but did you run the query to make sure it returns different resukts? [19:59:07] *results [19:59:10] nuria_: I did [19:59:14] (CR) Milimetric: [C: 2 V: 2] Adding missing icon file [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/291833 (https://phabricator.wikimedia.org/T136217) (owner: Nuria) [19:59:19] joal: ah! [19:59:52] joal: ya, i though the app always needed requests to php api but i might be wrong [20:00:06] (CR) Milimetric: [C: 2 V: 2] Rename 'readers' dashboard to vital-signs [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/291284 (https://phabricator.wikimedia.org/T136426) (owner: Nuria) [20:00:24] For yesterday: Android ---- 1169020 with new patch, 605002 before [20:00:57] nuria_: That's the awhole point of the pageview def change for android: using restbase urls instead of apip.php [20:01:13] nuria_: And now with restbase, no more query params [20:01:25] joal: but i thought it was only for some content though [20:01:30] joal: i guess i was wrong [20:01:52] joal: i thought app needed both apis [20:02:00] joal: but sounds like it doesn't [20:02:01] nuria_: makes sense [20:02:11] nuria_: I actually don't know [20:02:19] but given the diff in numbers, looks like not :) [20:02:23] joal: i cc-ed dbrant [20:02:28] joal: he can clarify [20:02:32] cool [20:03:12] HaeB: I'm sorry I didn't double check the mobile_apps uniques backfill before running it [20:03:32] HaeB: Thanks for being so thorough in checking correctness :) [20:06:56] joal: thanks for fixing it! yeah if we could have something like automated plausibility checks, or otherwise integrate that into the process, that would be great [20:07:11] HaeB: Completely agreed ! [20:08:02] HaeB: I usually try to have that integrated, but once in a while one fix goes without (and it always the one that is not correct: ) [20:10:26] (PS1) Milimetric: Update for May meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/291982 [20:10:36] (CR) Milimetric: [C: 2 V: 2] Update for May meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/291982 (owner: Milimetric) [20:17:13] I'll be afk for a bit, back later [20:18:14] likewise! biking home for a bit [20:19:43] HaeB: no idea; probably a bot as you said [20:22:44] Analytics, Wikipedia-Android-App-Backlog: Investigate recent decline in views and daily users - https://phabricator.wikimedia.org/T132965#2342531 (Dbrant) @Nuria It's entirely possible for the app to use RestBase entirely (at least for viewing pages). The only instances where the app still uses the php... [20:28:04] (CR) Nuria: [C: 2 V: 2] Correct bug in mobile_apps uniques jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/291977 (owner: Joal) [20:34:42] (PS1) Joal: Correct bug in mobile_apps uniques jobs (monthly) [analytics/refinery] - https://gerrit.wikimedia.org/r/291989 [20:34:49] nuria_: forgot about monthly one, obviously --^ [21:30:48] Analytics, Analytics-Kanban, Datasets-Webstatscollector, Pageviews-API, and 2 others: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2342797 (kaldari) [22:23:16] milimetric: what do we do to pagetitles that have / in them when they are loaded into cassandra? [22:59:43] Analytics-Kanban: Get pageview dataset ready to be loaded to druid to test querying capabilities - https://phabricator.wikimedia.org/T136330#2343120 (JAllemandou) a:JAllemandou [23:01:15] madhuvishy: I'm pretty sure we don't do anything, the client is responsible for encoding it, restbase automatically decodes it and when we do the lookup it's an exact match