[00:06:44] something odd i don't understand...oozie job for a workflow started a second attempt, but the previous attempt doesn't look to have failed. http://analytics1001.eqiad.wmnet:8088/cluster/app/application_1456242175556_17453 [00:06:48] what should i be looking at? [00:08:14] it won't be a big deal (other than network traffic) for the es side of things to duplicate, it throws away the duplicate update on the es side. but it seems odd [00:08:34] it's http://analytics1001.eqiad.wmnet:8088/cluster/app/application_1456242175556_17453 [00:32:33] Analytics-Kanban: Migrate limn-mobile-data/reportupdater reports to use standalone reportupdater - https://phabricator.wikimedia.org/T128375#2074328 (mforns) a:mforns [00:35:09] (PS1) Mforns: Migrate the reports to the standalone reportupdater [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/274031 (https://phabricator.wikimedia.org/T128375) [00:38:40] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/274032 (https://phabricator.wikimedia.org/T128375) [00:40:38] any reason dumps.wikimedia.org is not HTTPS-only? [00:41:02] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/274034 (https://phabricator.wikimedia.org/T128375) [00:43:23] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-flow-data] - https://gerrit.wikimedia.org/r/274035 (https://phabricator.wikimedia.org/T128375) [00:46:41] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/274037 (https://phabricator.wikimedia.org/T128375) [00:48:07] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/274038 (https://phabricator.wikimedia.org/T128375) [00:49:23] (PS1) Mforns: Migrate the reports to use standalone reportupdater. [analytics/limn-extdist-data] - https://gerrit.wikimedia.org/r/274039 (https://phabricator.wikimedia.org/T128375) [01:07:00] madhuvishy: assertion is so you can see table was created [01:07:13] madhuvishy: uses clientIp just like it could use any other field passes in [01:07:16] *passed in [01:07:30] madhuvishy: hopefully this makes sense [01:07:49] madhuvishy: in fact after your autoincrement changes you can assert on that field i think [01:28:09] nuria: the event that comes in won't have auto inc Id. I changed it to uuid though [01:30:09] that should work [04:46:04] Analytics-Kanban: Pageviews API reporting inaccurate data for pages titles containing special characters - https://phabricator.wikimedia.org/T128295#2074796 (MusikAnimal) I assume this is related, but I also noticed that the `/pageviews/top` endpoint sometimes returns the wrong characters when there should be... [08:55:03] Analytics-Kanban: Pageviews API reporting inaccurate data for pages titles containing special characters - https://phabricator.wikimedia.org/T128295#2074948 (JAllemandou) @MusikAnimal : The issue @Ottomata is describing leads to badly formatted data loaded in the API. Now that the problem is found and solve... [09:23:44] Hello folks o/ [09:24:08] I am going to work on Debian re-imaging today (and possibily varnish-kafka) but I'll be available for any ping if you need me! [09:24:48] Hi elukey :) [09:25:03] I'll be on backfilling correct data in hadoop :) [09:25:17] Same as you, doing stuff, but available on ping when Lino sleeps :) [09:25:35] joal: what was the final root cause of the encoding issue? [09:25:53] elukey: I don't know ... It's scary :S [09:26:21] elukey: I assume that upgrade did a weird restart, but how come precisely, no idea :S [09:26:45] :( [09:29:40] elukey: At least it's solved ... At some point yesterday, I wass feeling like we would be looking for the solution for days [09:40:01] Analytics-Tech-community-metrics: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#2075039 (Qgil) In that case, I recommend to decline this task. I'm removing #possible-tech-projects for now. [09:57:05] joal: (whenever you have time) what is the procedure to backfill X days of data? Manually remove the files and then restart oozie? [11:36:55] Analytics, Operations, Traffic: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2075280 (elukey) Adding a link that might be useful for the future if we want to move away from varnish-kafka written in C: https://github.com/xcir/python-varnishapi (ava... [12:16:34] Analytics-Tech-community-metrics, Possible-Tech-Projects: Data Analytics toolset for MediaWikiAnalysis - https://phabricator.wikimedia.org/T116509#2075402 (Aklapper) a:Anmolkalia>None @Anmolkalia: I am resetting the assignee of this task because there have been no signs of progress lately (please c... [13:00:20] ottomata: killing your spark shell used yesterday for encoding debugging [13:03:35] elukey: to backfill: 1 - Define which jobs needs to be rerun using oozie diagram [13:04:39] 2 - Remove _SUCCESS files to prevent new jobs to start on existing runs having been succesfull (instead of new ones) based on dependencies [13:05:27] 3 - Launch new coordinators with [start|stop]_time defined as needed (rerunning coordinator will not take _SUCCESS files removal into account) [13:05:33] 4 - Monitor ;) [13:06:56] ahhh because it was camus the one with issues on existing files! [13:07:04] sorry confusion, thanks for the clarification :) [13:07:33] * joal not understand :S [13:26:16] joal: I keep confusing camus with oozie when we work on the cluster, and the various cases that can block them (like camus not overriding files etc..) [13:29:23] right elukey, I can understand that :) [13:30:25] you have a lot of patience :) [13:36:29] elukey: not really :) [13:56:43] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2075637 (mobrovac) [14:10:12] thanks for backfilling jo [14:10:27] joal: do you know how many days need to be backfilled? [14:11:57] ah, never mind, I see https://phabricator.wikimedia.org/T128295#2074948 [14:22:46] milimetric: hello! [14:46:49] Analytics, Operations, Traffic: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2075747 (Ottomata) Sounds good to me! Re: python varnish C types: cool! I didn’t know there was an existing lib out there for that. Ori and I wrote some python varnish b... [15:23:24] Analytics-Tech-community-metrics: Data Analytics toolset for MediaWikiAnalysis - https://phabricator.wikimedia.org/T116509#2075864 (Aklapper) Removing #Possible-Tech-Projects due to T89135#2073997 [15:25:55] Hey milimetric, I assume you have found your answer :) [16:05:02] milimetric: yt? [16:05:08] hey nuria yea [16:06:02] milimetric: we should plan on you continuing with marcel's work on browser data, if you have the availability as he will be out for a week and i think now we just need to tie it all together [16:06:12] Analytics-Kanban: Pageviews API reporting inaccurate data for pages titles containing special characters - https://phabricator.wikimedia.org/T128295#2076006 (MusikAnimal) @JAllemandou Thanks for the explanation. So are you saying that new pageview data coming in should be accurate? Looks like things are smoot... [16:06:38] nuria: yes, that's what I was reading up on this morning [16:06:54] I was about to look at your patch, and I have to help nathaniel with something [16:08:19] milimetric: sounds good, let's talk later, i have manager's meeting but should be free after [16:08:36] milimetric: 1pm your time [16:08:43] k [16:09:27] (CR) Ottomata: [C: -1] "Seeing as refinery-drop-api-action-partitions, coordinator.xml and workflow.xml are the same as the CirrusSearchRequestSet load jobs, I th" [analytics/refinery] - https://gerrit.wikimedia.org/r/273557 (https://phabricator.wikimedia.org/T108618) (owner: BryanDavis) [16:12:38] milimetric: can you comment: https://gerrit.wikimedia.org/r/#/c/270151/ [16:12:39] ? [16:17:10] * milimetric looks [16:21:18] ottomata: what'd you guys decide to do with https://github.com/wikimedia/operations-puppet/tree/production/modules/statistics/manifests/limn/ yesterday? [16:21:25] is that keeping that name? [16:22:37] uhh [16:22:59] hm https://gerrit.wikimedia.org/r/#/c/273487/ [16:23:04] no, its being deleted [16:23:42] ah thanks milimetric didn't know about that other patch [16:24:26] ottomata: what's the beta version of https://meta.wikimedia.org/beacon/event ? [16:24:31] is that documented anywhere? [16:25:38] ? [16:25:42] is that eventlogging endpoint? [16:26:16] ottomata: yea [16:27:15] milimetric: i dunno where it would be documented, dunno where beta eventlogging extension configs live [16:27:17] but i guess there? [16:27:30] but, i know that eventlogging-load-tester's default uses [16:27:37] bits.beta.wmflabs.org/event.gif [16:27:49] ah! ok, that must be it [16:27:58] I'll try to find the place where it should be, and write it there [16:28:05] i'm sure beacon/event shoudl work too [16:28:05] schana: ^ [16:28:27] schana: http://bits.beta.wmflabs.org/event.gif or http://bits.beta.wmflabs.org/beacon/event should work [16:28:41] I can troubleshoot with you to make sure we see the events [16:28:44] but I've got standup now [16:29:02] and then we should find where in the docs this should be and update :) [16:30:52] a-team: standddupppp [16:30:57] comnigngg [16:31:37] oo slow internet [16:31:53] thanks milimetric - I'm actually tied up until probably later tomorrow, so maybe get together sometime later this week? [16:32:38] sure schana [16:57:13] Analytics-Tech-community-metrics, DevRel-March-2016, Patch-For-Review: What is contributors.html for, in contrast to who_contributes_code.html and sc[m,r]-contributors.html and top-contributors.html? - https://phabricator.wikimedia.org/T118522#2076201 (Aklapper) Adding #DevRel-March-2016 as this is do... [16:57:41] nuria: ottomata: got dropped off. will ask questions here [16:58:07] Analytics, Operations, Traffic, Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2076213 (ema) So the trivial part is done, see https://gerrit.wikimedia.org/r/274135. Now we need to figure out the tricky one. :) Essentially, the... [16:59:02] nuria: none of the changes can happen until quick surveys is done - i can do the initial patch now - but the rest we can do after? [16:59:29] madhuvishy: let's get timeline for wrapping up quicksurveys from laila [16:59:43] she updated the thread let me see [16:59:45] madhuvishy: they buped up their sampling recently so they might not have that much more to go [16:59:57] nuria: https://phabricator.wikimedia.org/T128407 [16:59:58] madhuvishy: to be effective survey cannot run for a long time [17:00:03] March 8 or 14 [17:01:55] madhuvishy: then we shpould go ahead with code changes , so code is ready to roll as it is only 1 week [17:05:37] nuria: alright [17:13:32] ottomata: https://gerrit.wikimedia.org/r/#/c/273488/ merged in mediawiki-config [17:13:55] nothing is exploding and the sockets are good on the mw hosts :) [17:16:46] Analytics-Kanban: Pageviews API reporting inaccurate data for pages titles containing special characters - https://phabricator.wikimedia.org/T128295#2076264 (JAllemandou) @MusikAnimal : You're welcome, thanks for having filedthe bug ! The breakage occurred Feb 23rd, polluting data between from the 23rd includ... [17:23:19] Analytics-Kanban: Pageviews API reporting inaccurate data for pages titles containing special characters - https://phabricator.wikimedia.org/T128295#2076283 (JAllemandou) @Ottomata : > @JAllemandou, in order to make sure we aren't bitten by this, do you think we should attempt to set file.encoding for Hado... [17:31:39] Hey a-team, how is it? [17:37:25] Analytics-Tech-community-metrics, MediaWiki-extension-requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1299500 (Sumit) IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been acc... [18:03:30] (PS2) Joal: Update oozie diagram to last status [analytics/refinery] - https://gerrit.wikimedia.org/r/268216 [18:03:55] (PS3) Joal: Update oozie diagram to last status [analytics/refinery] - https://gerrit.wikimedia.org/r/268216 [18:09:23] (PS1) Joal: Correct mobile_apps uniques jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/274157 [18:16:19] (CR) Nuria: [C: 2 V: 2] "Looks good." [analytics/refinery] - https://gerrit.wikimedia.org/r/274157 (owner: Joal) [18:20:16] nuria: hi, who is the owner of the pageview api ? [18:20:36] matanya: owner? as is who maintains the pageview counts? analytics [18:20:54] matanya: infrastructure is maintained by services/ops and analytics [18:20:57] as in, who is the person to ask about weird info [18:21:13] matanya: aham, as in strage pageview counts? [18:21:21] matanya: here , teh channel [18:21:31] so ask away [18:21:35] as in a huge drop in views [18:21:49] https://he.wikipedia.org/wiki/%D7%95%D7%99%D7%A7%D7%99%D7%A4%D7%93%D7%99%D7%94:%D7%9B%D7%99%D7%9B%D7%A8_%D7%94%D7%A2%D7%99%D7%A8#.D7.94.D7.A9.D7.95.D7.95.D7.90.D7.AA_.D7.A6.D7.A4.D7.99.D7.95.D7.AA_.D7.91.D7.99.D7.9F_.D7.95.D7.99.D7.A7.D7.99.D7.A4.D7.93.D7.99.D7.95.D7.AA [18:21:49] D7: Testing: DO not merge - https://phabricator.wikimedia.org/D7 [18:21:58] there are many huge drops in views [18:22:23] a wikimedian in he.wiki created a graph based on the tool showing aggragted views for all articales of a wiki [18:22:42] the last graph in the link above is commons [18:23:14] why did it drop from ~ 15M to 5M in Spet 12 ? [18:23:19] Hi matanya, I can help [18:23:27] and stayed this way [18:23:31] hi joal thank you [18:23:35] matanya: see https://vital-signs.wmflabs.org/#projects=hewiki/metrics=Pageviews [18:23:41] matanya: anD: [18:23:53] https://vital-signs.wmflabs.org/#projects=hewiki,commonswiki/metrics=Pageviews [18:23:58] and annotations arround [18:24:05] This drop is explained by us making our crawler detection remove more non-user [18:24:07] matanya: sorry, annotations [18:24:19] indeed nuria :) [18:24:22] on the graphs [18:24:34] i.e api calles ? [18:24:40] calls [18:25:32] matanya: what? [18:25:50] matanya: no, sorry, we get a lot of automated traffic as in crawlers [18:26:05] I see [18:26:07] arg, there is a css bug and annotations are not visible, need to fix that [18:26:24] matanya: you can find the regexp we use here: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L49 [18:26:47] thank you joal that is useful [18:26:53] matanya: we tag as spider/crawler any user-agent matching this regexp [18:27:42] matanya: It's not perfect, we know some automated traffic still go tagged as user (we think about 5%), but it's what we have now [18:28:07] not bad at all [18:28:19] thank you very much [18:28:30] You're welcome :) [18:29:35] logging off a-team! [18:29:39] byyyyeeeeeeee o/ [18:29:42] Bye elukey :) [18:29:46] Have a good evening [18:31:33] byee [18:35:50] wikimedia/mediawiki-extensions-EventLogging#538 (wmf/1.27.0-wmf.15 - 64807c4 : Chad Horohoe): The build has errored. [18:35:50] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/64807c46d3c4 [18:35:50] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/112924280 [18:43:25] nuria, milimetric : can you confirm this format is ok for uniques publishing: hdfs:///tmp/joal/laud_archive/000000_0.gz [18:43:51] joal: let me look [18:49:00] joal: looks good but would we have 1 file per day? mmm...let me see how other dumps look , we would need two endpoints where to serve files 1 daily one monthly [18:50:34] I guess milimetric would know better but we need to decide how will you get to files: like: http://dumps.wikimedia.org/other/unqiue-devices/2015/2015-05/ ? [18:51:13] nuria: why not, I don't know if we prefer to continue old forlder format or have a new one :) [18:51:36] joal: I would say go for a format that you think makes sense and is useful [18:51:43] huhu :) [18:51:46] then if people also want a different format, we can generate that one as a derivative [18:52:04] I think , makes sense [18:52:07] milimetric: generated a simple TSV [18:52:20] nuria: TSV or CSV (I don't mind [18:52:23] as for where on dumps.wikimedia.org it goes, I think that'll fit nicely into the new structure I started working on with https://gerrit.wikimedia.org/r/#/c/269696/ [18:54:22] milimetric, nuria: with date in filename as for other reports [18:54:42] joal: ok, and all daily files available in a monthly folder? [18:55:07] nuria: works for me [18:55:12] milimetric: so structure would be: http://dumps.wikimedia.org/analytics/unqiue-devices/2015-05/ [18:55:14] ? [18:55:19] We could have all daily files and the monthly file in that folder [18:55:30] and what about monthly data? where would that go? [18:55:39] yes, nuria, that's what I got on the mailing list with a couple of approvals [18:55:51] milimetric, joal: k [18:55:57] but then we got hit with email-a-geddon, so I have been hesitant to restart the discussion [18:56:08] http://dumps.wikimedia.org/analytics/unique-devices/daily/2015-05/ [18:56:10] :D [18:56:20] do we add "daily" to url? [18:56:36] nuria: not if we want to put daily and monthly in the same folder [18:56:38] or how do we surface monthly resolution? [18:56:47] joal: k, i am ok with that too [18:57:43] milimetric: for "all" in dashiki plots, there is no other solution that filing a ticket right? [18:57:48] nuria: filename patterns: last_access_uniques_daily-20150501.tsv last_access_uniques_monthly-201505.tsv [18:58:05] nuria: +.gz, sorry [18:58:19] joal: rather: unique_devices_daily_blah [18:58:27] good for me [18:58:29] joal: otherwise last access is pretty confusing [18:58:36] yeah makes sense [18:58:43] ok grat :) [18:59:01] joal: we need to document the files contents too in wikitech, although that is easy is just a pointer to the docs that already exist [19:00:33] yup [19:02:45] ottomata: around? mysql doesn't seem to be running in deployment-eventlogging03 - any idea why [19:02:46] ? [19:04:04] madhuvishy: I think because it is installed ad hoc, it is not on puppet [19:04:13] nuria: I know - i tried to start it [19:04:20] madhuvishy: and? [19:04:31] https://www.irccloud.com/pastebin/2FiqTqfL/ [19:04:46] ah no, i think you cannot start with upstart [19:04:48] let me see [19:05:47] madhuvishy: /etc/init.d/mysql start --default-storage-engine=tokudb [19:05:54] aah [19:06:01] madhuvishy: https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/TestingOnBetaCluster#Database [19:06:25] nuria: still same error though [19:06:34] madhuvishy: sudo [19:06:46] madhuvishy: and let me know [19:07:20] nuria: no - tried with sudo too [19:07:33] madhuvishy: k let me ssh [19:07:37] thanks [19:10:53] madhuvishy: looks like permits are bad, let me see where is taht lock file [19:13:37] madhuvishy: [19:13:39] https://www.irccloud.com/pastebin/tns0zc6k/ [19:13:56] nuria: yup [19:13:56] madhuvishy: root@deployment-eventlogging03:/var/log# more mysql.err [19:14:10] huh [19:14:13] so...ahem... let's disable huge pages , let me see how is that done [19:14:43] i think [19:14:52] just running that echo statement should do it [19:15:46] nuria: started mysql [19:15:55] k [19:15:56] let [19:16:07] let's add that to troubleshoot wiki [19:16:13] let me add it [19:16:15] thanks :) [19:16:34] nuria: also, default storage engine is already configured to be tokudb now [19:16:39] no need to pass it in [19:17:49] madhuvishy: k, docs updated [19:18:50] milimetric: yt? [19:21:37] madhuvishy: let me know when you are done testing on eventlogging beta [19:21:44] nuria: sure [19:24:17] madhuvishy: hiiii sorry [19:24:26] ottomata: np, nuria helped :) [19:27:04] milimetric: let me know when you are back [19:34:13] nuria: back [19:34:35] in meeting, will talk in a bit [19:42:17] ottomata: nuria I wrote this up - https://etherpad.wikimedia.org/p/el-clientips-drop [19:43:47] are new mysql tables created when capsule's revision changes? [19:45:12] madhuvishy: cool, answered your qs [19:45:13] uhhh [19:45:17] madhuvishy: no i don't think so [19:45:30] the table is named after the specific schema reviison, not the capsule [19:45:34] right [19:45:48] but the capsule rev change in el code will cause the event to be validated against the composite schema using the new rev [19:46:05] i think its ok to leave the NULL clientIp in the table for now [19:46:08] that's why i am a bit confused? it will keep the client_ip field in the db until the schema revision changes? [19:46:12] if/when someone makes a new schema change, it will make a new table [19:46:13] new tables would be quite disruptive ;) [19:46:14] yes [19:46:16] that is right [19:46:17] right [19:46:53] HaeB: well you won't have new tables unless your schema changes - which causes new tables anyway [19:46:58] so that's fine i think [19:47:32] ottomata: okay so plan looks fine? [19:47:38] madhuvishy: great, just saying ;) [19:50:30] hmmm - ottomata so, when we update EL puppet to have new format specifier and have it run on all EL machines, and then drop the ip from varnish - because we can't control varnish puppet runs - there will be instances where it's still sending ip - but our format string is wrong no? [19:50:37] that's where I'm confused [19:50:58] * madhuvishy is confused a lot [19:50:58] ah, yes, hm. madhuvishy [19:51:05] hm [19:51:28] so errors are bound to happen until all varnishes have updated puppet [19:51:35] I can't think of a clean way [19:52:43] HMMMMM [19:52:56] madhuvishy: what if, we just add a new format specifier for do nothing? [19:53:02] instead of making clientIp None [19:53:03] like [19:54:09] ottomata: that would work - but not if we drop the clientip from varnishkafka - unless you send an empty string or something [19:54:31] hmmm [19:54:32] actually [19:54:34] does that already work? [19:54:36] what if it was [19:54:39] yes [19:54:56] '%q %{recvFrom}s %{seqId}d %t %i %{userAgent}i' [19:54:59] if you sent some garbage from varnishkafka instead of ip [19:55:01] %i instead of %h [19:55:07] or hm [19:55:14] i guess it needs to be optional [19:55:18] the format specifier [19:55:21] in the regex [19:55:21] hmm [19:55:26] so we do need a new one [19:55:27] yes - but it can't be in the current way [19:55:46] 'n': (r'(?P<%s>[^\t]+)?', str), [19:55:48] would that work? [19:55:51] with the ? at the end? [19:56:08] not sure [19:56:17] hm, could make it simpler too, remove the name matching [19:57:50] ottomata: hmmm [19:58:17] ok let me try [20:00:10] milimetric: back, yt? [20:00:23] yep [20:00:49] weiiird [20:00:53] my keyboard just stopped working [20:00:53] milimetric: then for the "all" projects [20:00:59] then i had to force restart [20:01:00] weiiird [20:01:10] milimetric: there is nothing we can do but file a ticket correct? [20:01:38] yeah the only solution for "all" involving the API is to add it to the /aggregate endpoint as "all" in the {project} param or something like that [20:02:05] and that needs some oozie updates and loader job updates [20:02:16] we could of course still use the existing file [20:02:33] we could even just hard-code special case that until it becomes available in the API [20:02:52] like, if project = "all", fetch that TSV [20:03:13] ottomata: may be the problem is this - we have only one format string - if it was source event format string - and dest format string we'd have more flexibility? [20:03:45] dunno [20:03:50] milimetric: puff but not so pretty, and also "all" doesn't give that much info , is 90% enwiki [20:04:22] madhuvishy: i'm suggesting the optional format string to make it possible for the same format string to work with or without a field [20:04:27] not sure if that works [20:04:37] it might mess with the regex, not sure [20:04:53] milimetric: so i am not sure "all" provides much value, but maybe you think otherwise [20:05:53] well, I don't personally care too much about statistics, but a lot of people certainly want this data [20:06:11] ottomata: the parser expect the event to conform to the format string - if we have an optional one in the middle - the parser should figure out somehow if or if not the event has the optional field or not [20:06:13] I believe it was the most requested thing after we released the graph [20:07:34] Krinkle: just happened to see https://commons.wikimedia.org/wiki/File:Wikipedia_webrequest_flow_2015-10.png , which looks awesome ... but shouldn't the arrow from Kafka go to Hadoop, instead of Eventlogging? [20:07:43] cf. https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest [20:07:46] nuria: I know the hack wouldn't be pretty, of course, it would be an eye-sore. But I'm not too worried if it's only there for a few weeks while we update the API [20:07:48] ottomata: in this case ip and ua are at the end - if our format string goes "%blah %blah %opt %ua" and you didn't have an ip in the event - how does the parser understand that there's no ip but there's a UA [20:07:50] madhuvishy: ja, if an 'optional' field in the regex will work [20:08:30] good q madhuvishy.hm maybe it won't? maybe we'll just have to change the varnishkafka string to put a blank field there [20:08:31] '-' [20:08:31] ? [20:08:32] milimetric: but , does it really provide value cause "all" is basically the enwiki curve just higher on y-axis [20:08:36] nuria: aaaahahahaha: https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/daily/2015100100/2015103000 [20:08:43] (and is Eventlogging data considered part of webrequest?) [20:08:46] ottomata: right - and that would work as is [20:08:51] with no changes to parser [20:09:02] ja true, with just %i instead of the code changes, ja? [20:09:12] as in with my recent patch [20:09:17] milimetric: ja! [20:09:19] wouldn'ti t work now with just %i [20:09:21] without your patch? [20:09:23] HaeB: Maybe, but I don't think so. Firstly, this is not a global services overview, but just within the reachable scope of a page view web request. So it doesn't contain orthogonal services or secondary services triggered independently. A few things are mentioned but not in detail, such as logging. Secondly, as I understand it, for the purpose of [20:09:23] EventLogging, Kafka is used as both consumer and producer. [20:09:25] ottomata: ya - %i is already something though [20:09:28] yes [20:09:30] but if you don't name the field [20:09:31] %h is ip [20:09:33] i think it won't put it in the event [20:09:36] nuria: I agree that it's usually the enwiki curve, but some people (like news orgs, comms, etc, just want the total number, so it's bigger) [20:09:46] HaeB: Hadoop, in that view, is merely an implementation detail behind the Kafka service [20:09:46] am I wrong? [20:10:17] HaeB: you can change name of graphic to be "requests coming from web" as that is what it represents. [20:10:19] hmmm [20:10:23] what would k be in that case? [20:10:23] event = {k: f(match.group(k)) for f, k in zip(self.casters, keys)} [20:10:29] just the group match number? [20:10:29] ottomata: ah [20:10:30] hmmm [20:10:30] HaeB: and apps too, so "requests" [20:10:37] ottomata: yes [20:10:43] i guess that wouldn't work then [20:10:48] nuria: that's very ambiguous terminology then ;) [20:11:05] ok madhuvishy ja, i'm not sure how to go about this deploy smooothly either [20:11:25] HaeB: There is another graph made by Elukey that intends to be a more global overview (more inclusive, but less detailed on intimate MediaWiki-oriented relationships) - https://wikitech.wikimedia.org/wiki/File:Infrastructure_overview.png - which includes Hadoop, as well as multiple data centres [20:11:26] HaeB: in an architecture graph? well "http requests" if you like taht better [20:11:26] ...if the webrequest table does not contain the information that is denoted as "webrequest log" in that diagram [20:11:43] so [20:11:48] ottomata: yup, we can make varnishkafka emit '-' always [20:11:49] Krinkle: yes, saw that too [20:11:52] madhuvishy: i guess let's use your patch, and plan to do that [20:11:53] with - [20:11:56] and it would keep working [20:11:59] with a big comment as to why [20:12:06] HaeB: that graph is not about webrequest table, sorry. [20:12:10] and then maybe one day if we have a convenient el downtime or something, we can remove it [20:12:17] who knows, maybe during the codfw switchover [20:12:21] milimetric: ok, will add the "all" special request then [20:12:21] ottomata: if we want to drop it - yes you finished my sentence [20:12:28] hehe [20:13:10] nuria: you know what I mean about varying by pattern too? [20:13:21] I will see if I can figure out how to do it and comment in gerrit [20:13:23] in any case, if i read https://commons.wikimedia.org/wiki/File:Wikipedia_webrequest_flow_2015-10.png without these unmentioned assumptions (for example when coming from https://en.wikipedia.org/wiki/Wikimedia_Foundation#Hardware ), i would assume that WMF uses EventLogging to count pageviews, for example [20:13:47] milimetric: I just looked on vital signs, yes, i see what you mean .. but not what i changed in teh code to have broken it though [20:13:57] k, thinking [20:14:51] milimetric: but again.. i did not understand how the dashing was happening before [20:15:04] milimetric: let me add the "all" [20:15:40] Krinkle: "Hadoop, in that view, is merely an implementation detail behind the Kafka service" - but so would EL be, no? [20:15:49] i think the diagram is needlessly confusing in that respect [20:17:12] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2077248 (mobrovac) >>! In T114443#2076614, @RobLa-WMF wrote: > @mobrovac - I'm confused, why don't you think T120212 is a blocker for this? It is, but it's an indirect one: it is bl... [20:18:24] madhuvishy: gotta run for a bit, back online later [20:18:59] ottomata: instead of the %h specifier - we can come up with one for "ignore"/"omit" say "o" and have the format specifiers dict be 'o': (r'(?P<%s>\S+)', None), [20:19:03] ottomata: alright [20:19:05] ttyl [20:19:27] (just to remove the clientIP language from the code) [20:19:50] madhuvishy: that would be cool [20:19:56] +1 to that idea [20:20:02] and we can also update our puppet [20:20:04] ya okay [20:20:08] i'll do that [20:22:20] (PS1) Joal: Add archive stage to last_access_uniques jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/274187 (https://phabricator.wikimedia.org/T126767) [20:22:38] (CR) Milimetric: Fetch Pageview Data from Pageview API (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/270867 (https://phabricator.wikimedia.org/T124063) (owner: Nuria) [20:22:56] I think that gives some hints about how to vary the patterns, nuria ^ [20:24:07] Analytics-Wikistats: Problems with Erik Zachte's Wikipedia Statistics - https://phabricator.wikimedia.org/T127359#2077299 (Samat) Dear Erik, 13 years is a very long time, and I fully understand your answer. I would be happy if we didn't loose your efforts writing the scripts and your scripts would be mainta... [20:28:25] HaeB: No, because EventLogging is also consumed again. [20:28:46] And because it is piped into statsd as well, which is also on it [20:28:54] (e.g. Navigation Timing [20:29:33] EventLogging is effectively consumer facing via a web request for things like Nav Timing. Without it on the map, it's not obvious how it ends up there. [20:29:42] I could remove Kafka, but not Event Logging [20:29:54] but kept it to also show that we log requests in general (not just for EL) [20:30:12] Which seems useful as a starting point to learn where to go for different things [20:30:23] Anyway, I'll do another revision later this year, got various other comments as well. [20:30:24] thanks :) [20:31:13] Analytics: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1848951 (tomasz) [20:32:16] Krinkle: obviously one can argue endlessly about the semantics of what is a detail or not, or what consuming means exactly ... but isn't Hadoop data consumed again too? ;) [20:33:08] anyway, it's a great chart overall, i just think that hadoop is a major part of the infrastructure too, and that "webrequest" should not be used in two different meanings [20:33:16] milimetric: ok, will look [20:35:56] HaeB: Not by me :P [20:36:02] (I don't consume Hadoop) [20:37:32] i thought it was written from the view of a general webrequest, not from the view of a particular employee ;) [20:37:34] "the life of a Wikipedia webrequest inside Wikimedia Foundation infrastructure" [20:40:53] milimetric: also, my colors are wrong when it comes to the left bar squares and the color on graph, ja! [20:41:55] Hm... I didn't notice that. I think I didn't fully context switch when I first looked [20:42:07] HaeB: I think you are going to have to come to terms with the fact that "web request" is used in many contexts ... [20:43:31] nuria: *shrug* still no need to use the exact same term there when it's easy to avoid that particular confusion in this case [20:43:47] HaeB: i am kidding ... ay ay [20:45:03] nuria: don't worry, i'm not on a crusade to convert the entire web analytics world to the correct dogma ;) [20:45:15] HaeB: jaja [20:52:46] milimetric: need to send a pull request to pageview npm module to support "all-access".. doing that [20:54:37] ottomata: I messed up :( [20:55:32] oh no, actually I didn't ... Something else did [21:06:47] milimetric: created pull rqeuest, looking at colors next [21:07:35] cool, I'm sure he'll merge it quickly, he's very responsive [21:17:37] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2077418 (RobLa-WMF) I realize that the blocking relationship is transitive, but given Otto's comment (T114443#2072426), it would seem that it would be clearer to make the blocking re... [21:21:56] a-team: camus broken ... Trying to sort it [21:22:05] joal: do you need help? [21:22:54] madhuvishy: hm, I know the thing, what would be useful is to stop puppet agent and prevent camus from running automatically [21:23:03] madhuvishy: But I can do without that [21:23:16] joal: ah - i cant stop puppet i think - dont have sudo on stat1002 [21:23:25] it's on analutics1027 [21:23:42] i dont think i have sudo anywhere - only hdfs user [21:23:50] same for me [21:34:47] ottomata: can we merge this so I can test with a new format string on beta? https://gerrit.wikimedia.org/r/#/c/274286/1 [21:36:41] looking [21:37:01] ottomata: okay - related change to EL is https://gerrit.wikimedia.org/r/#/c/274152/2 [21:40:21] Hey ottomata ... I don't know why but camus broke today at 17:10 UTC :( [21:40:58] cool madhuvishy ja looks ok to me [21:40:59] whaa? [21:41:02] looking [21:41:16] ottomata: I'm on it: same as usual, moving files issue [21:41:33] hmmm [21:41:47] joal: so when that happens, its like camus stopped in the middle while moving files to their final dest, right? [21:41:48] ottomata: I'm in the process of soving it [21:41:53] and then the next run fails because it can't overwrite the filename? [21:42:35] ottomata: I don't know what the root cause is: log before the fail looks ok, then fails because it can't move files [21:42:48] madhuvishy: am checking for changes on eventlog1001 [21:42:57] ottomata: possibly the issue comes at writing history dir ? [21:43:04] joal: do the two runs start at the same offset? [21:43:14] I have not double checked [21:43:31] ottomata: currently making the thing work again, will investigate more after :) [21:43:35] ok [21:43:43] i'll look to, lemme finish a couple things [21:43:57] ottomata: Can you stop puppet and comment automatic camus ? [21:43:59] madhuvishy: ! :) [21:44:00] https://puppet-compiler.wmflabs.org/1897/ [21:44:02] https://puppet-compiler.wmflabs.org/1897/eventlog1001.eqiad.wmnet/ [21:44:05] sure [21:44:28] joal: just webrequest? [21:44:35] ottomata: umm what is this? [21:44:40] just checked webrequest actually [21:45:26] madhuvishy: ! [21:45:28] puppet compiler [21:45:43] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build?delay=0sec [21:46:44] ottomata: i don't understand - my change is not merged right? what are we looking at? [21:48:30] ha! its compiling the puppet catalog on that node based on just the ops/puppet repo [21:48:39] for head and your change [21:48:41] and then diffing them [21:49:06] ah interesting [21:49:10] and it looks weirdd [21:49:20] https://www.irccloud.com/pastebin/eTOO0Cv6/ [21:49:40] is that what you're saying? [21:50:16] ottomata: ^ [21:50:44] ja [21:50:55] ya that is indeed weird [21:51:48] madhuvishy: ja [21:52:26] ottomata: doesn't do the same for the client side change [21:52:48] ottomata: Don't bother stopping puppet and camus, I'm almost done [21:53:24] joal: already done, just stopped webrequest [21:53:30] will reenable whenever you say [21:53:45] oh its different [21:54:05] hm ottomata: crontab -l on anlytics1027 still gives me camus :( [21:54:31] ottomata: calling hiera() does something weird to the {} - puppet substitution? [21:56:03] ottomata: i'm not in a hurry - so deal with camus first, then we can figure this out [21:57:03] ? [21:57:17] joal: # Puppet Name: camus-webrequest [21:57:17] #*/10 * * * * /srv/deployment/anal ... [21:57:18] no? [21:57:43] now yes, when I wrote the other, not ... weird ottomata :) [21:57:53] ottomata: nevermind :) [21:58:07] ok! [21:58:08] :) [21:59:13] ottomata: for the client side thing it says [21:59:16] https://www.irccloud.com/pastebin/fWiEPVEH/ [22:00:04] wondering if doing hiera() of the string triggered some weird substitution rules [22:00:36] for {..} [22:01:52] milimetric, are we doing "most popular files" yet? I swear Chris/Erik worked on that but don't recall what happened in the end [22:02:25] IIRC erik has a cron job that extracts this data out of mediacounts. [22:02:26] ottomata: successfull camus run ! [22:02:34] ottomata: You can restore :) [22:02:55] Ironholds: as in media files? [22:03:19] ok! [22:03:28] There they are :) [22:03:31] Ironholds: http://dumps.wikimedia.org/other/mediacounts/daily/2016/ [22:03:35] thanks joal [22:03:37] Search for 'top1000' [22:04:11] qchris: :) Ironholds hopefully we'll get them also into the AQS (pageview api) infrastructure soon [22:04:45] https://phabricator.wikimedia.org/T88775 [22:05:31] sweet! [22:05:37] y'all are the best :) [22:06:33] I don't know about me, but these folks ^ , they're pretty amazing :) [22:08:57] ottomata: In those times of troubled cluster, I bless faster oozie response times ! [22:11:02] oh is marcel out til next week? [22:11:17] :) [22:11:32] when i find myself in times of trouble, faster ooooooozie comes to me [22:12:34] * joal is laughing very loud :D [22:12:43] :D [22:20:38] joal: madhuvishy, is marcel out for a week? [22:20:48] i was trying to do the review of his stuff so he could work on it tomorrow :) [22:21:05] ottomata: aah yes I think milimetric took over his work though [22:21:28] I think you're right madhuvishy [22:23:52] oh ok [22:23:53] ha [22:24:03] milimetric: i'm having a lot of trouble with this patch [22:24:07] it doesn't make sense to me [22:24:10] i keep trying to make it make sense [22:24:16] i'm writing a comment on the task now [22:24:45] ok, let's make it make sense then [22:24:52] we're good at that [22:25:40] ja, let's talk about it tomorrow [22:25:50] ottomata: what to do about this EL puppet change - do you have any idea why that weirdness happens? [22:25:51] k, cool [22:26:00] oh sorry madhuvishy am trying to finish up this comment... [22:26:05] ottomata: np :) [22:26:07] i haven't been able to look at it [22:31:57] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2077786 (mobrovac) [22:32:11] Analytics, ArchCom-RfC, Discovery, EventBus, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1710260 (mobrovac) >>! In T114443#2077418, @RobLa-WMF wrote: > I realize that the blocking relationship is transitive, but given Otto's comment (T114443#2072426), it would seem that... [22:35:41] Analytics-Kanban, Patch-For-Review: Puppetize reportupdater to be executed in stat1002 and run the browser reports {lama} - https://phabricator.wikimedia.org/T127327#2039936 (Ottomata) HMmMMm, we should talk about this more. I was going to comment a bunch on the most recent patch, but then a lot of thing... [22:45:19] a-team, cluster seems stabilized, I'm going to have some sleep and will continue to monitor tomorrow :) [22:45:29] good night joal :) [22:46:10] nite! [23:03:27] laters a-team! [23:03:34] nitey [23:03:34] Analytics, Pageviews-API: Commons traffic 3-fold drop around September 2015 - https://phabricator.wikimedia.org/T128496#2077930 (Milimetric) Just for clarity, the annotations are at the bottom of the graph as little "A"s. @Yurik, we could add those to the vega graphs if you like, it would be a wiki-to-wi... [23:08:48] Analytics, Discovery, Discovery-Search-Sprint: Update camus and mediawiki to be able to read/write avro protocols for logging - https://phabricator.wikimedia.org/T128530#2077941 (EBernhardson) [23:10:57] (PS1) EBernhardson: Update camus to support reading avro schemas from an avro protocol [analytics/refinery/source] - https://gerrit.wikimedia.org/r/274307 (https://phabricator.wikimedia.org/T128530) [23:12:09] Analytics, Pageviews-API: Pageviews API not updated on 2/18/2016 at 8;34 utc - https://phabricator.wikimedia.org/T127414#2077959 (Milimetric) I think that 5 hour estimate was an optimistic one considering sometimes things can go wrong. Usually it's actually faster than 5 hours, but the cluster is often... [23:22:57] Analytics, Discovery, Discovery-Search-Sprint, Patch-For-Review: Update camus and mediawiki to be able to read/write avro protocols for logging - https://phabricator.wikimedia.org/T128530#2078035 (EBernhardson) [23:30:08] Analytics, Discovery, Discovery-Search-Sprint, Patch-For-Review: Update camusto be able to read/write avro protocols for logging - https://phabricator.wikimedia.org/T128530#2078066 (EBernhardson)