[00:11:19] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1090782 (Nuria) Any updates on this? [00:21:11] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1090818 (mforns) @Nuria, I reproduced the MultimediaViewerNetworkPerformance:image logs from Beta Labs. However, I did not find any piece of code breaking... [00:36:53] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1090854 (Nuria) Ok, let's talk tomorrow. [00:51:35] Analytics-EventLogging, Popups: Large number of popup events not validating - https://phabricator.wikimedia.org/T91272#1090878 (Nuria) Any updates or ETAs on this issue? [08:33:03] Analytics-Wikistats, I18n, LE-Sprint-82, Patch-For-Review: stats.wikimedia.org - the "User" namespace for Nynorsk is wrong - https://phabricator.wikimedia.org/T89387#1091512 (Arrbee) Open>Resolved a:Arrbee [09:24:09] Analytics-EventLogging, Popups: Large number of popup events not validating - https://phabricator.wikimedia.org/T91272#1091614 (Prtksxna) I have [[ https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#QA | Ori's script ]] in my common.js but I can't seem to reproduce these errors. {T86378} tracked... [09:31:02] Analytics-EventLogging, Popups: Large number of popup events not validating - https://phabricator.wikimedia.org/T91272#1091636 (Prtksxna) >>! In T91272#1082224, @Nuria wrote: > Also, wiki is coming as null in many wikis, which again produces a huge number of events that do not validate > Failed validatin... [09:33:05] Analytics-Wikistats, Easy, JavaScript: broken javascript on SquidReportPageViewsPerCountryBreakdownHuge - https://phabricator.wikimedia.org/T43663#1091643 (exploreshaifali) I could not find "squids/SquidReportPageViewsPerCountryBreakdownHuge.html" file inside git repo https://github.com/wikimedia/analy... [14:49:46] (PS1) KartikMistry: Add ky and pa to Beta feature dashboard [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/194521 [14:52:02] morning kevinator, ggellerman :) [14:52:23] good morning! [14:52:47] Analytics-Kanban: Failure Types by User Type - https://phabricator.wikimedia.org/T91123#1092130 (mforns) I wrote a query for this and got to the following: ``` +-------------+------------------------+------------+ | usertype | failuretype | percentage | +-------------+------------------------+-... [14:54:54] (PS2) KartikMistry: Add ky and pa to Beta feature dashboard [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/194521 [14:58:52] +Ironholds good morning! [15:09:15] kevinator, where is everyone in philly that they're not online? ;p [15:15:26] Analytics: Adhoc Analysis: Guided Tour activations - https://phabricator.wikimedia.org/T90942#1092175 (ggellerman) [15:16:49] they were just getting coffee before our standup :-) [15:17:13] it’s snowing hard outside! [15:37:16] Hey guys ! [15:37:40] Managed to get home before the baby ;) [15:38:46] yay! [15:38:50] boy or girl! [15:38:54] Congratulations?" [15:38:58] Congratulations! [15:39:03] Analytics-EventLogging, Popups: Large number of popup events not validating - https://phabricator.wikimedia.org/T91272#1092244 (Nuria) @Prtksxna You can test event validation in vagrant with the test server: http://www.mediawiki.org/wiki/Extension:EventLogging/Guide#Installing_the_EventLogging_devserver... [15:40:18] Don't know yet, still not here ;) [15:41:02] How is it in Philly's Hackathon ? [15:48:58] joal, like in the movies!!!! [15:50:09] nuria ? [15:50:17] joal: yessisr [15:51:19] What is like in the movies :-/ [15:51:34] My brain is kinda very slow ... [15:52:21] :) you flying back for your baby [15:52:40] https://www.youtube.com/watch?v=Y3O4IOaKZqU [15:53:23] joal: you going across the world to see your baby being born [15:53:36] Okkkkkk ! [15:53:41] joal: only taht if it was a movie he will be showing up as soon as you step foot in teh hospital [15:53:44] Pfff really slow today ! [15:54:18] True, Melissa might even get back home if it doesn't go faster [15:54:33] I am happy to be part of it :) [15:54:42] Thanks again Dan for the fast reaction ! [15:58:51] my pleasure, I'm happy everything worked out :) [15:59:21] Yeah, I have been almost 24h in translation, and now feel like tired :) [15:59:24] nuria, I'm back [15:59:33] hey joal! [15:59:44] Hi Marcel ! [15:59:53] how is everything? [16:00:11] Good ! I'm back home, the baby's not yet here [16:00:22] ok, that's good [16:00:39] So Yeah, my wife feels really better [16:00:50] and me too actually ! [16:01:02] nice, good you could get back quickly [16:01:24] Yes, really nice [16:02:15] nuria: you wanna look at the monthly apps stuff, ja? [16:02:22] you want to do that before I deploy? [16:08:04] nuria, yt? [16:08:22] ottomata: yes, was just doing that [16:08:31] ottomata: give me 2 mins [16:08:47] let me know if you wanna talk nuria [16:09:07] joal: sorry i didi not get to review this yesterday [16:09:16] np :) [16:10:28] joal, ottomata : so job has starting date of = 2015-01-01T00:00Z [16:10:31] Analytics, Fundraising Tech Backlog, Wikimedia-Fundraising, Fundraising Sprint Enya: Strategy banner impressions - https://phabricator.wikimedia.org/T90635#1092319 (awight) Open>Resolved [16:10:52] As usual, you can tweak with -Dstart8time [16:10:52] joal, ottomata but refined tables do not have data for that long do they? [16:11:16] I think there refined data for mobile and text since 2015-01-01 [16:11:19] joal: right but oozie will try to run for jan for which ( i might be wrong) [16:11:27] there is no data [16:11:46] nuria: refined tables do still have all that data [16:11:54] i haven't fixed my deletion scripts to work with that data format yet [16:11:58] ottomata: from jan 1st? [16:12:08] actually, iwant to do a review of the hive partitions layouts of both tables [16:12:10] and maybe redo them [16:12:12] so i am waiting on that [16:12:13] yes [16:12:19] i will delete manually when we get close to 90 days [16:12:25] but ja, i think i will not run since jan 1 anyway [16:12:26] just feb [16:12:28] maybe run feb [16:12:33] and if we want to run jan's manually we can [16:12:46] but, ja, the commiitted start_date doesn't matter [16:12:49] i will override that anyway [16:13:07] nuria, i'd appreciate if you most looked at the query, as it is fancy schmancy and i want to make sure you are cool with it [16:13:11] it seems cool to me [16:13:13] At least from a partition perspective, there seem to be data since january in refined webrequest [16:13:31] yes, the data is there in the refined table [16:13:40] it is being auto dropped from the raw, but not refiend yet [16:13:46] right [16:14:20] Analytics, Fundraising Tech Backlog, Wikimedia-Fundraising, Fundraising Sprint Enya: Strategy banner impressions - https://phabricator.wikimedia.org/T90635#1092328 (awight) Results have been delivered. I balked at making them public, because some auth tokens are slipping through the url parsing a... [16:14:23] joal: ok, sounds good, ottomata : looking at query, let me look at my notes [16:17:34] joal, ottomata : looks good, did we do a test run over real data to see how long it takes? [16:20:34] Nope, didn't do that [16:20:38] anly fake [16:21:26] where is the dataviz???? [16:21:52] what dataviz? [16:22:56] andrew's stuff from [16:22:59] yday [16:22:59] ottomata: ok, let's deploy it and see how long it takes, i am guessing hours [16:23:06] haha [16:23:20] ok will turn it on, it doesn't look that coooooool but the backend stuff is awesooome [16:23:23] hang on... [16:23:46] It might event take one or two days .... [16:24:34] Still happy of SparkStreaming ottomata ? [16:25:19] yes [16:25:22] ok [16:25:26] tnegrin: (or whoever) [16:25:34] ssh -N bast1001.wikimedia.org -L 3000:stat1002.eqiad.wmnet:3000 [16:25:34] then [16:25:38] http://localhost:3000 [16:26:12] milimetric: feature requests! [16:26:15] ok -- it's up now -- what are the terms in the cloud? [16:26:22] currently showing [16:26:31] yah [16:26:32] top 10 pageviews from all wikis in the last 60 seconds [16:26:37] updated every 5 seconds [16:26:52] momofuku_ando is the google doodle! [16:27:34] hella cool [16:27:45] guess the spark stuff is working? [16:27:56] ja! i mean, it is fragile because i dunno what the heck i'm doing [16:28:02] pretty sure it will die afte ra while [16:28:12] but ja! it is awesome [16:28:39] milimetric: ja feature request: [16:28:39] - strip underscores from page titles in display [16:28:39] - hover over titles to see pageview count numbers [16:28:48] oh, i'm going to normalize those numbers into a rate...i'll send you thos [16:28:48] those [16:28:55] are they all mobile on purpose? [16:29:03] hmm, i thought i was doing both [16:29:10] they're not all mobile [16:29:12] are they? [16:29:21] momofuku isn't mobile [16:29:43] uh oh, something is unhappy [16:29:44] heheh [16:29:50] ok turning it off, back to working on it! [16:30:16] actually, first will deploy [16:30:17] ok [16:30:51] yeah, i've a version that does all that but i'm working on the scaling and layout [16:31:00] it'll look better soonish [16:31:02] cool [16:31:16] milimetric: then we should see if we can combine that with the timeseries graph [16:31:19] of all pagecounts [16:32:51] (CR) Nuria: "Looks good, let's make sure to run it on cluster for 1 day to see how long is the query taking. Are we going to re-run jobs to populate th" [analytics/refinery] - https://gerrit.wikimedia.org/r/194400 (owner: Joal) [16:33:37] oh joal, are you working? [16:33:40] do you want to deploy? [16:33:45] nuria, but zsub looks at the output of the forwarder, no? [16:34:05] zsub just connects to a zeromq , whatever you tell it [16:34:13] ottomata: I am currently [16:34:29] do you want to? you can say no :) [16:34:30] mforns: you can look with zsub at any of the streams [16:34:42] i'm happy to do it [16:34:48] nuria, sure, so in zeromq is where the forwarder output is [16:34:48] I'll go for it, but still need to take notes on how you do it [16:34:51] sure [16:34:54] ya will guide ya [16:35:01] :) [16:35:01] so, step 1. deploy refinery [16:35:02] so [16:35:05] ssh tin.eqiad.wmnet [16:35:13] cd /srv/deployment/analytics/refinery [16:35:21] git pull [16:35:23] git deploy start [16:35:25] git deploy sync [16:35:36] go ahead and do that, lemme know how it goes [16:36:37] Analytics-Cluster, Analytics-Kanban, Easy: Mobile Apps PM has monthly report from oozie about apps uniques [8 pts] - https://phabricator.wikimedia.org/T88308#1092432 (Ottomata) PSSH this aint resolved! Almost though! :) [16:36:40] mforns: depends on the port, 8421 i think is the input port [16:36:51] nuria, here: http://pastebin.com/9d6a9yDe are two examples I took from zsub localhost:8422, the output port of the forwarder [16:37:06] Analytics, MediaWiki-Vagrant: role::hadoop will not provision on Ubuntu 14.04 (MediaWiki-Vagrant default) - https://phabricator.wikimedia.org/T70302#1092433 (Ottomata) Open>Resolved [16:37:18] nuria, 8421 is the port for server-side events [16:37:43] mforns: do input & output look the same for the forwarder? [16:37:47] qchris's chart says so: https://wikitech.wikimedia.org/w/images/d/d6/Eventlogging-backend.svg [16:38:09] ottomata: Will do, got a crash [16:38:18] I'm not sure the input of the forwarder is in zeromq? [16:38:51] mforns: I think is udp [16:38:58] ottomata: What does tin stands for ? [16:39:18] mforns: but client side events are coming to 8422 [16:39:38] nuria, yes, but I think they are the forwarder output [16:39:48] and they are already truncated [16:40:12] mforns: batcave? [16:40:15] ok [16:41:06] So, 'git deploy sync' tells me 'Deployment Finish' [16:41:12] I guess, I have deployed :) [16:43:52] tin is just a name [16:43:58] k [16:43:59] misc servers in eqiad are named after elements [16:44:00] hehe [16:44:06] cool, now log into stat1002.eqiad.wmnet [16:44:10] done [16:44:11] cd /srv/deployment/analytics/refinery [16:44:16] hdfs not updated yet ! [16:44:17] git log, make sure it is at the comit you want [16:44:22] yup, then [16:44:35] if it looks good [16:44:35] then [16:44:36] sudo -u hdfs bin/refinery-deploy-to-hdfs --no-dry-run --verbose [16:45:02] Ok [16:45:17] Since there is no change in others jobs, will we need to restart them ? [16:45:35] nope [16:46:35] tnegrin: ! I was not supposed to show you yet [16:46:41] apparently it is a secret [16:46:49] ottomata: could restart it ? [16:46:50] the stuff dan has already looks better [16:46:55] didn't manage to see it [16:46:56] joal: ? [16:47:03] restart what? [16:47:04] what dataviz? [16:47:08] yup [16:47:08] yes [16:47:14] oh [16:47:20] joal: maybe when dan tells me it is ok! [16:47:21] haha [16:47:37] np :) [16:47:52] let me know when running [16:49:20] ottomata: Starting the job now for february ? [16:49:43] sure, do it! [16:49:43] :) [16:50:02] oh, joal [16:50:08] you know to start it in essential queue? [16:50:19] -Dqueue=essential ? [16:50:45] yup [16:50:47] k [16:50:50] also do sudo -u hdfs when you submit [16:50:53] (CR) KartikMistry: "recheck" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/194521 (owner: KartikMistry) [16:50:54] cause only hdfs is allowed to submit to that queue [16:51:05] I go for it then, with start_date and queue special params [16:51:08] doo it! [16:51:13] :D [16:51:18] oh [16:51:20] i also do this [16:51:26] -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2015* | tail -n 1 | awk '{print $NF}') \ [16:51:28] if you want to as well [16:51:54] oh your properties doesn't have that [16:51:56] then use oozie dir [16:51:57] um [16:52:05] -Doozie_directory=$(hdfs dfs -ls -d /wmf/refinery/2015* | tail -n 1 | awk '{print $NF}')/oozie \ [16:54:12] (CR) KartikMistry: [C: 2] "This need to be sync on stat1003 (or whatever name is :)). Dan, please sync this :)" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/194521 (owner: KartikMistry) [16:54:31] (Merged) jenkins-bot: Add ky and pa to Beta feature dashboard [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/194521 (owner: KartikMistry) [16:56:11] ottomata: to lauch the job, I use the properties file in /srv/deployment/analytics/refinery/oozie/mobile_apps/uniques/monthly ? [16:56:57] yes [16:56:59] exactly [16:57:04] cool [16:58:15] Ready ? [16:58:27] Go ! [16:58:40] nuria, I can't get netcat to output anything for 8422... :\ [16:59:04] I'm using: netcat -u localhost 8422 [16:59:58] mforns: mmm.. there are couple versions of nc i think, you might need to try to stop the forwarder in labs and listen on that port yourself [17:00:10] nuria, ok [17:00:23] mforns: is it localhost though? [17:00:25] coool! [17:00:26] go! [17:00:48] nuria, not sure [17:01:28] mforns: [17:01:34] nuria, could it be vanadium? [17:01:49] mforns: do not try this on vanadium please [17:01:54] mforns: beta labs [17:02:06] mforns: that is what is there for [17:02:44] sure [17:04:12] https://www.irccloud.com/pastebin/gxRjPuap [17:04:32] nuria, the varnish instance is in http://bits.beta.wmflabs.org/? [17:05:03] mforns: yes, https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs#How_to_log_an_event_to_beta_labs_directly [17:07:17] ok, I got it [17:08:15] Well job launched ... [17:08:22] the log comes already truncated: http://pastebin.com/zYkrr9gn [17:08:26] nuria, ^ [17:08:26] mforns: you can 1) add another listener on udp socket 2) modify forwarder locally and have it output what is listening to 3) turn of forwarder and make your own python listener using python "socket" module [17:08:37] But I can't really track it using tunneling [17:08:47] So we'll see how long it takes [17:09:01] I'd have liked to check how many mappers [17:09:10] nuria, netcat works now, I was just pointing to the wrong ip [17:09:19] the logs come truncated: http://pastebin.com/zYkrr9gn [17:09:21] mforns: but wait where is the other info varnishncsa is adding (like ua, host) [17:09:40] joal: ok [17:10:10] mforns: that should come on the udp stream too, it was there on your prior events right? [17:10:16] yes [17:10:39] and some of the logs from netcat do have it, but others don't [17:10:59] For info nuria: with our setting of 2 parallel coordinators, the one for march is already there, waiting for last parition :) [17:11:21] nuria, I have to leave now... [17:11:24] joal: nice [17:11:36] mforns: k talk to you later [17:11:39] if it's ok for you, I'll continue in 2 hours [17:11:46] mforns: of course! [17:11:52] ok, see you! [17:12:28] joal: where was the change to have 2 parallel coordinators? [17:14:46] nuria: in coordinator.xml : 2 [17:15:22] joal: ahhhh [17:15:38] :) [17:16:01] joal, feb is running? [17:16:27] So, I expressed myself bad : not 2 coordinators, but two coordinator actions in parallel (in our case, workflow) [17:16:41] yup ottomata [17:16:55] But in the wrong queue :( [17:17:22] doh, hm, you might want to kill it? it might be preempted! [17:17:49] hmmmm true [17:17:53] I'll do that [17:20:41] ottomata: Me like application_1424966181866_15378 :D [17:21:22] Job relaunched, I had messed up the queue param name [17:21:26] Now good [17:23:36] hha, cool [17:23:47] ja am currently trying to better parallelize the input dstreams [17:23:57] 1 per kafka topic partition [17:24:04] Makes sense [17:24:32] Maybe even a dedicated multiplexing, depending on flow density [17:25:27] not working yet though, it look slike it is, but no data really coming in. [17:25:31] i see more consumers though [17:25:31] hm [17:26:05] nuria: I checked, for feb mobile monthly run, 147 mappers, 1 reducer [17:26:51] for the first mapred job [17:30:11] Nononon [17:30:41] nuria: Made a mistake --> 23291 mappers :) [17:30:46] Analytics-EventLogging, Popups: Large number of popup events not validating - https://phabricator.wikimedia.org/T91272#1092675 (Nuria) And ... I do not see the wiki error anymore, so if we could fix the duration it will be great. In case it helps, failures and user agents for about a day: 35 Mozilla/5.... [17:31:16] joal: is that a normal number? [17:31:52] Sounds too much [17:33:39] Arch, maybe not [17:34:07] I need to double how many files per partition for mobile and text usually [17:34:39] cause for feb, we read 24*28*2 = 1344 partitions [17:35:09] Makes like 17 files in average for mobile/text [17:36:42] And since it's parquet format, it's difficult to judge as well how many files need to be read per partition [17:36:53] We don't use that many columns really [17:43:00] nuria: what do you think for mappers [17:43:32] joal: i have no idea, in my imaginary hadoop there was 1 mapper per partition ... but I totally made that up [17:43:48] joal: "sounded" sensical [17:44:01] Well, there is usually one mapper per block [17:44:08] not even file [17:44:26] So let's say we are good with that :) [17:47:14] i think that is a normal sounding amount of mappers for a month [17:56:05] wikimedia/mediawiki-extensions-EventLogging#362 (wmf/1.25wmf20 - a2bf69e : jenkins-bot): The build passed. [17:56:05] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.25wmf20 [17:56:05] Build details : http://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/53224753 [18:19:08] ori: yt? [18:48:22] ok, joal, still WIP [18:48:25] check it out now [18:48:32] ssh -N bast1001.wikimedia.org -L 3000:stat1002.eqiad.wmnet:3000 [18:48:39] its a 60 second window [18:48:45] not 10 minutes like the title says [18:48:45] atm [18:48:51] i'm going to leave this running through lunch [18:59:26] ottomata: nothing for me ... just title ... [18:59:33] In chrome, weird [19:10:48] Wow, not even that bad to compute mobile_apps_uniques_monthly over a month : 1h :) [19:43:25] nice! [19:43:26] its done? [19:43:33] Sure :) [19:43:47] You've seen my note about dataviz ? [19:43:55] Not wxorking for me ;-5 [19:44:12] hmmm [19:44:27] yeah must have gotten laggy, i haven't worked much on figuring out how to make sure it can keep running [19:45:02] np :) [19:45:35] You ok to resolve the task now ;)? [19:45:42] oh yeah! [19:45:43] :) [19:45:44] do it. [19:45:45] hehe [19:45:47] uhuhu [19:45:54] go ahead and run jan if you want [19:46:01] you could just do it via query, or via oozie by setting stop_time too [19:46:39] Yeah, I'll go oozie style, but now is dining time, so either later or tomorrow :) [19:48:00] Ironholds: can we turn off any crons that might be publishing data to: http://datasets.wikimedia.org/aggregate-datasets/apps/ [19:48:09] nuria, sure [19:48:26] already did, I thought? [19:48:48] Ironholds: i wasn't sure ... [19:49:05] cool [19:49:10] well, the last file update is January, sooo.. ;p [19:49:15] I've checked; they were commented out [19:49:31] joel, ottomata, Ironholds : just checked numbers for mobile uniques for february [19:49:49] 2015 2 Android 5924045 [19:49:49] 2015 2 iOS 74465 [19:49:56] joal, nice work! [19:50:40] cool [19:51:25] hmm, yeah, something is not working with the way I just parallelized it joal, will try to figure that out... [19:51:39] joal: will send an e-mail to dan with cc to analytics, numbers for Android are 20% within our rough estimate (expected) number for IOS are a lot lower [19:52:20] ottomata: not working as in "less performant" ? [19:52:31] joal: The time (1h) is awesome [19:55:32] nuria, naw, not working as in, sometiems some of the parallel receivers don't consume from kafka [19:55:38] not always though, sometimes it works fine [19:55:40] dunno. [19:56:23] kevinator: https://edit-analysis.wmflabs.org/adhoc.html#ve-success-rate-by-user-type.tsv [20:00:25] joal: its still weird, but try it now [20:01:49] also not done, goign to add a new level of fancy i thikn soon [20:16:37] ottomata: just looked at unique "daily" data and there is no data since 02/24 [20:17:28] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1093345 (mforns) I've tested again in Beta Labs, this time listening to the udp logs coming from varnish, the logs I sent from the browser came in the foll... [20:18:35] ottomata: did you turned those jobs off on purpose? [20:18:38] nuria, https://phabricator.wikimedia.org/T91347 I added some comments [20:19:27] mforns: and what do you see when you look at your browser console when sensing those events? [20:19:44] nuria, the url is well formed [20:19:45] mforns: in the network panel [20:20:03] nuria, and the response is 204 - no content [20:20:22] example: http://bits.beta.wmflabs.org/event.gif?%7B%22event%22%3A%7B%22type%22%3A%22image%22%2C%22contentHost%22%3A%22en.wikipedia.org%22%2C%22isHttps%22%3Atrue%2C%22total%22%3A583%2C%22urlHost%22%3A%22upload.wikimedia.org%22%2C%22status%22%3A200%2C%22XCache%22%3A%22cp1064%20hit%20(25)%2C%20cp3016%20hit%20(3)%2C%20cp3010%20frontend%20miss%20(0)%22%2C%22varnish1%22%3A%22cp1064%22%2C%22varnish1hits%22%3A25%2C%22varnish2% [20:20:22] 22%3A%22cp3016%22%2C%22varnish2hits%22%3A3%2C%22varnish3%22%3A%22cp3010%22%2C%22varnish3hits%22%3A0%2C%22XVarnish%22%3A%22598562358%20567337670%2C%201059408435%201051057839%2C%203889109054%22%2C%22contentLength%22%3A32166%2C%22age%22%3A26094%2C%22timestamp%22%3A1425572831%2C%22lastModified%22%3A1383188919%2C%22redirect%22%3A0%2C%22dns%22%3A67%2C%22tcp%22%3A255%2C%22request%22%3A92%2C%22response%22%3A168%2C%22cache%22%3 [20:20:23] A1%2C%22uploadTimestamp%22%3A%2220050406000000%22%2C%22imageWidth%22%3A320%2C%22country%22%3A%22BR%22%7D%2C%22revision%22%3A11030254%2C%22schema%22%3A%22MultimediaViewerNetworkPerformance%22%2C%22webHost%22%3A%22en.wikipedia.org%22%2C%22wiki%22%3A%22enwiki%22%7D; [20:20:43] oops, too long [20:20:51] mforns: well, response as long as you hit event.gif will always be 204 [20:21:02] yes [20:21:44] nuria, from the browser side, it seems fine [20:22:04] mforns: you are sure the url (not in js console) but network tab [20:22:09] is being sent complete? [20:22:51] nuria, i did not [20:23:36] ottomata: then .. there is something going on as last daily job file is 2015-02-24 [20:24:46] aye [20:24:54] k will look into it nuria, is it urgent? [20:25:22] nuria, http://ibin.co/1toOtwwaGL5s [20:25:41] ottomata: well, if we are going to give the mobile team the results for monthly unqiues on february we need to make sure the 24 to 28 is included [20:26:02] k [20:26:11] deadling? meaning: can I fix it next week :D [20:26:15] deadline* [20:26:44] ottomata: I don't know what you just did, but it didn't work, and then worked ! [20:26:54] ottomata: their deadline for jan and feb data is march 10th (to present to execs) [20:27:04] that is next Tuesday [20:27:16] so .. ahem ... kind of tight [20:27:28] nuria, ottomata : Maybe there is an error with iOS regexp for extraction? [20:27:52] joal: i do not think so, just checked uas for 1 hour [20:28:12] joal: and all IOS urls have IPhone [20:28:48] joal: it could be that uniques for IOS are now coming in other ways (via x-analytics) [20:28:58] might [20:29:34] About your problem for daily being late, if it's calm babywise tomorrow I'll be able to test and deploy, [20:29:47] joal! i stopped it! [20:29:49] mforns: did you looked at varnishncsa logging length limits? [20:29:52] i restart it when i make changes [20:30:14] nuria, looking [20:30:17] Riiight ottomata [20:30:20] mforns: ok [20:30:52] nuria: the perf is I think good thanks to parquet :) [20:31:30] joal: it is VERY Good, it used to take over a day [20:32:13] The power of columnar format for filtering/aggregation :) [20:33:05] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1093404 (Nuria) Things to check would be: - varnishncsa url logging limits: https://www.varnish-cache.org/docs/3.0/reference/varnishncsa.html - udp packe... [20:33:49] Guys, time-to-sleep for me :) [20:34:06] See you tomorrow with some new uniques ;) [20:34:35] BTW nuria, if you find the reason for low number of iOS, I'm interested ! [20:35:02] joal: i think that would be Ironholds more than me [20:35:16] nuh-uh [20:35:16] Ok understood [20:35:24] I work for The Great Oz now ;p [20:35:38] geeks prefer wizards :) [20:36:14] ottomata: If you wish, maybe we'll find somme time to talk about parallelism in streams tomorrow [20:41:05] Ironholds: just cc you on e-mail. We might need to update queries (easyly done) cc joal|night [20:41:42] nuria, cool [20:41:48] note that there's a UDF for extracting XFF fields [20:42:07] ("there's a UDF for that" is the new "there's an App for that") [20:44:07] Ironholds: I know, but i am 90% sure those changes are not deployed to IOS [20:44:20] Ironholds: just sent e-mail and dan can get back to us [20:44:50] sure! [20:44:55] I mean, you can always do a CASE WHEN check [20:45:10] the UDF consistently returns NULL if there's no [param] entry [20:46:07] Ironholds: coding wise yes, but what i want to know is if we need to change those queries now [20:46:44] *nods* [20:46:44] Ironholds: also if possible when those changes are deployed to users apps team should let us know via phab ticket or something [20:46:48] indeed [21:04:56] milimetric: its going! [21:05:01] nuria, couldn't find anything on varnishncsa log size limits [21:05:26] ah the page counts are not normalized to60 seconds [21:05:33] but the total is [21:05:44] I found something on linux udp receive buffer size, but I checked and the size in that instance should be enough [21:06:47] nuria, maybe it is a problem of netcat... I'll try to modify the forwarder and output its input [21:11:20] mforns: I see that you asked for the (for sure by now outdated) EventLogging diagram yesterday. [21:11:23] It is at: [21:11:30] https://wikitech.wikimedia.org/w/images/d/d6/Eventlogging-backend.svg [21:11:38] hi qchris! [21:11:44] Hi :-) [21:12:00] yes, nuria passed it to me in the stand-up meeting today [21:12:10] Ah. ok. [21:12:16] I had lost it inside my bookmarks somehow [21:12:26] but thanks anyway! [21:12:48] Sorry for being too late :-/ [21:12:56] np [21:13:13] my fault [21:13:48] Btw ... I have the source file (dia) for the svg but I am not sure where to put it (in case someone wants to update/fix it) [21:14:10] So if you want to mess with it ... either let me know and I'll email it to you, or tell me where I should upload it. [21:17:04] qchris, if you want you can send it to me and next stand-up we'll discuss where to put it [21:17:50] and we can upload it there [21:21:57] mforns: ok, I would ask ops on varnishncsa log limits [21:22:09] ok [21:22:42] mforns: for url size [21:23:28] although nuria, I found one log that was considerably shorter and had the same problem... [21:23:29] hola qchris [21:23:48] mforns: so it might be getting chopped off due to some character [21:23:56] I see [21:33:27] mforns: You've got mail. [21:33:43] qchris, seen it [21:33:48] thanks! [21:33:51] yw [21:36:16] mforns: the urls you pasted like this one [21:36:41] ?%7B%22event%22%3A%7B%22xhrSupported%22%3Atrue%2C%22xdomainSupported%22%3Afalse%2C%22imgAttributeSupported%22%3Atrue%2C%22scriptAttributeSupported%22%3Atrue%2C%22scriptLoaded%22%3Atrue%2C%22samplingFactor%22%3A1%2C%22isHttps%22%3Afalse%2C%22isAnon%22%3Atrue%2C%22country%22%3A%22US%22%7D%2C%22revision%22%3A10884476%2C%22schema%22%3A%22ImageMetricsCorsSupport% [21:36:41] 22%2C%22webHost%22%3A%22en.wiki?%7B%22event%22%3A%7B%22type%22%3A%22image%22%2C%22isHttps%22%3Atrue%2C%22total%22%3A583%2C%22urlHost%22%3A%22upload.wikimedia.org%22%2C%22status%22%3A200%2C%22XCache%22%3A%22cp1064%20hit%20(25)%2C%20cp3016%20hit%20(3)%2C%20cp3010%20frontend%20miss%20(0)%22%2C%22varnish1%22%3A%22cp1064%22%2C%22varnish1hits%22%3A25%2C%22varnish2 [21:36:41] %22%3A%22cp3016%22%2C%22varnish2hits%22%3A3%2C%22varnish3%22%3A%22cp3010%22%2C%22varnish3hits%22%3A0%2C%22XVarnish%22%3A%22598562358%20567337670%2C%201059408435%201051057839%2C%203889109054%22%2C%22contentLength%22%3A32166%2C%22age%22%3A26094%2C%22timestamp%22%3A1425572831%2C%22lastModified%22%3A1383188919%2C%22redirect%22%3A0%2C%22dns%22%3A67%2C%22tcp%22%3A [21:36:41] 255%2C%22request%22%3A92%2C%22response%22%3A168%2C%22cache%22%3A1%2C%22uploadTimestamp%22%3A%2220050406000000%22%2C%22imageWidth%22%3A320%2C%22country%22%3A%22BR%22%7D%2C%22revision%22%3A11030254%2C%22schema%22%3A%22MultimediaViewerNetworkPerformance%22%2C%22webHost%22%3A%22en.wikipedia.org%22%2C%22wiki%22%3A%22enwiki%22%7D [21:36:41] deploymen?%7B%22event%22%3A%7B%22xhrSupported%22%3Atrue%2C%22xdomainSupported%22%3Afalse%2C%22imgAttributeSupported%22%3Atrue%2C%22scriptAttributeSupported%22%3Atrue%2C%22scriptLoaded%22%3Atrue%2C%22samplingFactor%22%3A1%2C%22isHttps%22%3Afalse%2C%22isAnon%22%3Atrue%2C%22country%22%3A%22US%22%7D%2C%22revision%22%3A10884476%2C%22schema%22%3A%22ImageMetricsCor [21:36:42] sSupport%22%2C%22webHost%22%3A%22en.wikipedia.beta.wmflabs.org%22%2C%22wiki%22%3A%22enwiki%22%7D; deployment-cache-bits01.eqiad.wmflabs 251794 2015-03-05T19:54:20 162.222.73.11 Mozilla/5.0 (X11; Linux i686; rv:35.0) Gecko/20100101 Firefox/35.0 [21:36:45] arghhhh [21:36:48] hehehe [21:36:49] SORRY everyone [21:38:05] mforns: trying again: http://pastebin.com/knf8kQE7 [21:38:09] ok [21:38:13] this is not 1 event but two [21:38:31] yes, but they are on the same line [21:38:36] not separated by end of line [21:39:15] qchris: hello! [21:39:15] they are 2 events in the same log [21:39:45] heya ottomata [21:40:09] wanna see somehting COOOOL? [21:40:12] (man i am constantly wanting to show off here! ) [21:40:19] Sure. [21:40:21] Show off :-D [21:40:25] ssh -N bast1001.wikimedia.org -L 3000:stat1002.eqiad.wmnet:3000 [21:40:30] http://localhost:3000 [21:40:33] mforns: is that what you are getting when you try to post one event to varnish? [21:40:42] yes, exactly [21:41:01] well, there are other random events coming from somewhere [21:41:09] and obviously they get mixed up [21:41:13] ottomata: Hahahaha :-) [21:41:18] Neat. [21:41:20] can I gutcheck with someone on our geolocation? [21:41:46] hha [21:41:59] mforns: two identical events? [21:42:01] we're grabbing postcode, latitude and longitude [21:42:04] tgr: yt? [21:42:19] nuria, well, yes, I sent the same event a couple of times [21:42:26] (1) we have never, to my knowledge, used any of these datapoints in analytics, (2) I cannot see any use case for them and (3) having data that granular sitting around we don't use makes me really uncomfortable. [21:42:27] ottomata: what is this based on? [21:42:45] qchris: spark streaming on kafka stream [21:42:56] so, realtime pageview counts! [21:42:56] Finally spark :-D [21:42:59] Yay! [21:43:41] mforns: ok, so if you send the event 1 time you get 1 on the logs [21:43:42] I like seeing Angelsberg there :-D [21:43:50] mforns: correct? [21:44:22] nuria, yes [21:44:29] that is working as expected [21:44:38] sorry for not being clear with that [21:45:50] :) [21:46:04] mforns: ok, understood [21:46:21] mforns: so can you repro reliably one event being cut off? [21:46:25] nuria, I would interpret it as either: 1) a problem with my netcat command 2) some problem with varnishncsa, race condition when writing? flush problem? [21:46:30] yes [21:46:31] uggh, Angelsberg [21:46:37] the hill on which my battle with Ops died [21:46:45] ? [21:46:53] ottomata, Angelsberg's traffic is entirely artificial [21:47:01] oh?? [21:47:04] howso? [21:47:08] Ops decided to have Python ping it millions of times a week to check our site availability and response time [21:47:11] and not to tell us about it [21:47:13] its showing up as one of the top 10 viewed pages :) [21:47:16] haha [21:47:40] and when I patched puppet to have those events ping "Undefined" since we already exclude those, Faidon told me I was an idiot and it was our responsibility to deal with their (non-transparent) poor decisions and -1/2d [21:47:48] * tgr reads scrollback [21:47:52] and this is why that page makes me grumble [21:48:23] mforns: ok, we should be able to rule out 1) [21:48:57] Analytics, MediaWiki-Authentication-and-authorization, MediaWiki-Core-Team: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1093688 (bd808) NEW [21:50:14] nuria, ok, will try to modify the forwarder [21:51:44] mforns: what doesn't add up here is that varnish would be cutting the log, seems like if this is gather with resource timing [21:52:16] we might have run into a limit there too, although I think data size for beacon is 64 bits: If the User Agent limits the amount of data that can be queued to be sent using this API and the size of data causes that limit to be exceeded, this method returns false. [21:52:59] nuria: from what I remember from my random walks in the Schema: namespace, MultimediaViewerNetworkPerformance is not particularly huge [21:54:57] tgr: ya, no way it's close to 64K [21:55:45] tgr, mforns : i will be off for the next hour, will check back with you later. [21:55:53] ok nuria [21:57:17] Analytics, MediaWiki-Authentication-and-authorization, MediaWiki-Core-Team: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1093732 (csteipp) Adding captchas would be good: * Captcha presentation * Captcha sol... [22:01:42] nuria: turns out it *is* large compared to the rest [22:01:43] https://phabricator.wikimedia.org/P365 [22:02:25] mforns: does that match wich schemas you are seeing often in the error logs? [22:04:02] tgr its curious that the size of this schema logs is around 500 chars [22:04:43] Analytics, MediaWiki-Core-Team, VisualEditor, VisualEditor-Performance, and 3 others: Apply Schema:Edit instrumentation to WikiEditor - https://phabricator.wikimedia.org/T88027#1093918 (Krenair) >>! In T88027#1045283, @gerritbot wrote: > Change 191221 had a related patch set uploaded (by Alex Monk... [22:04:50] UDP packet size? [22:04:52] tgr, no never mind [22:05:02] maybe [22:07:05] tgr, from what I can see, udp packet size is 64k [22:08:03] the theoretical maximum, yes [22:08:57] but is that actually used by udp2log? [22:09:07] or whatever is in its place these days [22:10:00] 512 bytes of payload is a common default for UDP [22:10:13] aha [22:10:32] this would make sense [22:10:34] and if you have UDP packet loss, it won't cause any loggable errors for a <512 payload [22:10:55] but if it's split into two packets and one of them gets lost... [22:10:59] yes, and that would explain unended logs and two events in the same line [22:11:09] it seems like a reasonable hypothesis [22:11:47] ok, I'll try to play with the size of the logs around this 512 limit [22:12:05] see if <512 logs do not get the error [22:48:01] tgr, if I send events with a query string of >1014 chars (including leading '?' and trailing ';') it gets cut by varnishncsa [22:48:33] or another step of the pipeline before the EL forwarder [22:57:12] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1094093 (mforns) Trying another way to read the UDP logs from varnishncsa I outputed the read lines from inside the forwarder. The '2 events in one line'... [23:09:40] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1094132 (mforns) This person has the same problem: https://www.varnish-cache.org/forum/topic/126 [23:14:05] Analytics-EventLogging, Analytics-Kanban: Some events not validating for MultimediaViewerNetworkPerformance - https://phabricator.wikimedia.org/T91347#1094144 (Tgr) Per P365 it seems MultimediaViewerNetworkPerformance has the largest payload amongst frequent events, but this bug will affect other events a... [23:17:33] Analytics-EventLogging, Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by varnishncsa - https://phabricator.wikimedia.org/T91347#1094159 (Tgr) [23:30:54] Eloquence, http://datavis.wmflabs.org/agents/ [23:33:57] mforns, tgr back... [23:34:25] hi nuria [23:34:33] added some comments on the task [23:34:49] https://phabricator.wikimedia.org/T91347#1094132 [23:34:57] tgr changed its name [23:36:24] Analytics-EventLogging, Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by varnishncsa - https://phabricator.wikimedia.org/T91347#1094198 (Nuria) @tgr, not sure it i s a bug, rather a limit we need to abide to given varnish logging. Will consult with ops on wether this can be ch... [23:37:14] Analytics-EventLogging, Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by varnishncsa - https://phabricator.wikimedia.org/T91347#1094202 (mforns) And looking at varnishncsa code there seems to be some code that trims the query string and as a side effect truncates it, if I see it... [23:37:30] nuria: I would say this is a bug in EventLogging that might or might not be caused by hard limits in varnishncsa :) [23:38:00] https://www.varnish-cache.org/trac/browser/bin/varnishncsa/varnishncsa.c?rev=79c2d962221bb7ce3582caa1a0be5df4841e0832#L271 [23:38:39] in varnishncsa code the qs gets trimmed, but I do not see where that 'len' param comes from [23:39:26] tgr: If it's not udp packet size it's varnishncsa, there is no other things up the change, mforns could you follow up with brandon or faidon on varnish logging limits? [23:39:42] nuria, sure [23:40:18] tgr: sorry, i mean , there is no other EL code up the pipeline from vranishncsa, that is where the trail stops [23:41:37] well, EventLogging is a system to get structured data from the browser to a database [23:41:53] if that's not working, that's a bug IMO [23:42:37] an event logger with a limit on the length after JSON + percent encoding would be rather ridiculous [23:43:35] it there is no way to fix the varnishncsa limitation, the JS module for EL could split events and the server side could reassemble them [23:43:53] or replace varnishncsa with varnishkafka [23:44:03] etc [23:44:29] tgr: so two options here short term: 1) we can change url logging size on varnish 2) we cannot change it and 1024 will be a limit we have to abide to [23:46:12] tgr: you can call it a bug or a limitation of the system as is, EL has other limitations (like sizes of db fields) [23:47:06] or table sizes [23:48:20] in the short term, I'm not sure there is a problem to solve [23:48:31] given that this has been going on for a year or so [23:49:07] tgr: Wouldn't you want to get data for those events though? [23:49:17] of course if varnishncsa can simply be reconfigured that's cool [23:50:52] MediaViewer is not actively developed anymore, so even if we fail to record half the events that has limited impact [23:52:21] nuria, tgr, I spoke with bblack, he asked me to file a task in phab for analytics and operations with needs-triage and assign it maybe to ottomata initially... [23:52:59] mforns: did he knew where to change the limits? i cannot find those on puppet anywhere... [23:53:04] he said it's probably a udp packet thing [23:53:25] mforns: rather than varnish? [23:53:27] that has to fit the log into a 1500b packet, that minus the headers gets 1014 bytes left [23:53:36] mforns: ah i see [23:53:57] mforns: so then what is the action, doesn't seem like there is any , right? [23:54:11] mforns: or is it changing udp packet size + logging limit? [23:54:38] add operations to the task, see if ottomata can have a look / convice ops to have a look? [23:55:19] nuria, if it's udp, I don't know... [23:55:41] I showed the varnishncsa code to bblack and he did no comments on that [23:56:07] mforns: ok, i am not sure what is the action then. We can talk to ottomata tomorrow [23:58:36] Analytics-EventLogging, Analytics-Kanban, operations: EventLogging query strings are truncated to 1014 bytes by varnishncsa - https://phabricator.wikimedia.org/T91347#1094281 (mforns) [23:59:44] nuria, bblack thinks the varnishcode does not truncate the qs, so yea, let's talk with andrew tomorrow