[04:21:48] Analytics, MediaWiki-extensions-ContentTranslation, ContentTranslation-Release5: Limn language dashboard: eswiki graph is wrong/stuck - https://phabricator.wikimedia.org/T99074#1316600 (Springle) Indeed, I reported S7 issues to analytics@ a couple weeks ago (they started after the full /tmp problems i... [07:41:04] halfak: Good morning ? [07:41:14] Good morning :) [07:41:18] :) [07:41:23] Hey ! \o/ [07:41:42] Thanks for the feedback on worklong [07:41:54] I'll try not to forget the signature thing :) [07:44:29] :) It's awesome to have you writing logs in the same space. [07:45:09] It's a practice that has saved me so many headaches and won some substantial praise -- now if only I could make them easier to find for people interested in reading the research. [07:45:39] halfak: Very interesting point :) [07:46:04] * joal scratches his head with another new problem to try to solve ;) [07:47:12] * halfak too [08:36:06] (PS1) KartikMistry: Add languages to be deployed on 20150528 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/214309 [08:49:16] Analytics-Tech-community-metrics: Median time to review for Gerrit Changesets, per month - https://phabricator.wikimedia.org/T97715#1316986 (Dicortazar) @Aklapper, as far as I know, those charts were wrong at some point are were moved to the third and fourth chart in korma.wmflabs.org/browser/gerrit_review_q... [08:59:18] Analytics-Tech-community-metrics, ECT-May-2015: Community Metrics for IRC channels not updated since 09/2013 - https://phabricator.wikimedia.org/T96371#1316997 (Dicortazar) Data updated in korma. [09:45:31] Analytics-Engineering, MediaWiki-API, Wikipedia-Android-App, Wikipedia-iOS-App: Add page_id and namespace to X-Analytics header in App / api requests - https://phabricator.wikimedia.org/T92875#1317063 (Mattflaschen) Yeah, take a look at https://www.mediawiki.org/wiki/API:Query#Specifying_pages .... [10:29:45] joal / halfak: where's your work logs? I'll make an attempt at joining if there's an emerging standard :) [10:29:58] \o/ [10:30:03] * halfak gets link [10:30:34] See the list here: https://meta.wikimedia.org/wiki/Research_talk:Measuring_edit_productivity [10:30:54] * joal likes emerging standards ! [10:31:04] If you click the button, some wiki-templates will create a new page. but if there is already a page for today, it will bring you to it. [10:31:16] Treat it like a talk page and put new stuff on the bottom. [10:31:19] And sign your stuff. [10:31:21] No other rules. [10:31:22] :) [10:31:41] Sorry, can't miss that opportunity : http://xkcd.com/927/?cmpid=pscau [10:31:47] * joal hides [10:31:57] If you want to work on a new project, just drop the {{work log}} template on the talk page. [10:38:45] * halfak should write up some docs on how to worklog [10:38:52] o/ lzia [10:38:56] how's ICWSM? [10:39:08] hellooo halfak. :-) goood. [10:39:13] DarTar said the workshop went amazingly well :) [10:39:43] yeah. I'm quite happy. :-) [10:39:50] We owe you a summary email. ;-) [10:39:59] I was gonna ask. I look forward to it :) [10:40:52] why are you here this early in the day btw? :D [10:41:05] I'm in copenhagen. [10:41:12] oh! that makes sense [10:41:19] I've got a workshop this weekend. "Open Collaboration Data Factory" [10:41:31] Also, must get VE analysis done before SF wakes up. [11:14:54] templates are fascinating :) I think this work log thing seems easy enough even I could use it [11:18:03] Analytics, MediaWiki-extensions-ContentTranslation, ContentTranslation-Release5: Limn language dashboard: eswiki graph is wrong/stuck - https://phabricator.wikimedia.org/T99074#1317235 (Milimetric) Thanks, Sean, I saw your note but didn't realize that was still going on. Is there another phab task to... [11:32:31] Analytics-Tech-community-metrics, Phabricator, ECT-May-2015: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1317252 (Aklapper) [13:05:19] Analytics, MediaWiki-extensions-ContentTranslation, ContentTranslation-Release5: Limn language dashboard: eswiki graph is wrong/stuck - https://phabricator.wikimedia.org/T99074#1317428 (Amire80) p:Triage>Normal [14:46:23] (PS1) Joal: Add get pageview_info udf and underlying functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 [14:50:25] (PS3) Milimetric: Use Dygraphs in Vital Signs [analytics/dashiki] - https://gerrit.wikimedia.org/r/214270 (https://phabricator.wikimedia.org/T96339) [14:50:32] mforns: I DID IT!! :) [14:50:42] milimetric, ? [14:50:44] I'm going to take a look at your code now and then do you wanna talk? [14:51:01] my latest patch, it should finish the task [14:51:16] milimetric, oh [14:51:47] milimetric, I had rebased my change on your previous patch on that changeset [14:51:56] the annotations bug turned out to be a really tricky one [14:52:05] milimetric, and was changing stuff [14:52:08] oh, awesome, did you push the latest? [14:52:23] not yet, it doesn't work right now [14:52:32] wanna talk about it? [14:52:43] it's likely I could point out any problems [14:53:01] milimetric, and with your latest changes, the parser for tabular/categorized data will not be needed any more [14:53:17] milimetric, ok, batcave [14:53:20] ok, omw [14:54:27] milimetric, having problems to enter the hangout.. [14:54:40] mforns: authentication or google being weird? [14:54:42] i can invite you again [15:11:01] Quarry: SQL String functions not working - https://phabricator.wikimedia.org/T100057#1317797 (Aklapper) [15:17:50] (PS2) KartikMistry: Add languages to be deployed on 20150528 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/214309 [15:19:05] (CR) KartikMistry: [C: 2] Add languages to be deployed on 20150528 [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/214309 (owner: KartikMistry) [15:26:05] ottomata: joal So I tried to change the app session metrics code to not use HiveContext, but use SQLContext and load the parquet files directly as in - https://gerrit.wikimedia.org/r/#/c/212541/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/PageviewAggregator.scala - and ran into the same OOM error [15:26:23] ! [15:26:23] hm. [15:27:46] ottomata: I tried with 2 days of data and that happened - gonna try for an hour now [15:28:09] oh hm, madhuvishy try increaseing your driver mem too [15:28:12] ottomata: ok, this works: [15:28:13] i think the default is 512 [15:28:15] https://www.irccloud.com/pastebin/oDHk8hpi [15:28:24] and that sometimes even causes OOMs in hive if the data is large [15:28:31] increase to 1500M or somethign [15:28:51] ottomata: is that a cli param? [15:29:41] executor-memory - this one? [15:29:50] drier memory [15:29:51] umm [15:30:10] --driver-memory [15:30:24] the OOMs we got were def in the driver [15:31:19] but ja, try for an hour first [15:31:25] we got OOMs even on a singe tiny hour [15:33:05] madhuvishy: ottomata suggested to define a reusable utility code to load parquetFile as temporary tables in sqlContext [15:33:13] I thnk it would a great idea :) [15:33:42] joal: yeah, let's talk about that after standup/other meetings [16:00:09] Analytics-Kanban: Load Test Event Logging - https://phabricator.wikimedia.org/T100667#1317914 (Milimetric) NEW a:Milimetric [16:02:24] Analytics-EventLogging, Analytics-Kanban: Load Test Event Logging {oryx} - https://phabricator.wikimedia.org/T100667#1317930 (kevinator) [16:03:37] (CR) Ottomata: Add get pageview_info udf and underlying functions (8 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 (owner: Joal) [16:04:32] Analytics-Tech-community-metrics, ECT-May-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1317941 (Dicortazar) Another update, we already have a set of JSON files to be visualized. Working now on the viz part :). [16:09:28] ottomata, what's the use case for dialect and title > a project/project_class map, do you know? [16:09:55] It seems like kind of a bass-ackwards way to be doing things. "we know what page they looked at" "what language was it in?" "no idea" "okay. What sort of page was it? Wiktionary, Wikipedia..." "also no idea" [16:10:21] (CR) OliverKeyes: Add get pageview_info udf and underlying functions (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/214349 (owner: Joal) [16:10:28] Ironholds: please advise joal! [16:10:36] joseph is implementing this, i am just reviewing [16:10:39] and i know that you know much more than I do [16:10:53] gotcha! [16:10:56] eh, not really [16:11:19] can I invite you to a meeting tomorrow [16:11:19] ? [16:11:26] we will be discussing this stuff with [16:11:38] just did :) [16:15:05] Sure [16:15:20] I'm just really confused by the prioritisation here, from the consumer POV [16:15:26] that reminds me, I have a patch to submit *pops knuckles* [16:27:26] Ironholds: re: the above, we're just guessing so your input will be considered [16:27:41] on prioritisation or on the implementation? [16:28:02] on exactly what fields we expose through our hourly aggregate [16:28:15] and on what field transformations we prioritize, yea [16:28:24] our "customers" include you, everyone [16:28:33] ..huh. gotcha. [16:46:57] Ironholds: look at the GetPageviewInfoUdf, I think it's what you are describing (more or less) [16:49:25] joal, it's not [16:49:30] I want a map of project and project class [16:51:15] :) [16:51:54] (this is also great! But, knowing it was to the page on Pittsburgh isn't on its own tremendously valuable if I can't tell what language the page was in) [16:52:01] Ironholds: splitting the project part of the map by the dot I guess [16:53:20] Ironholds: I don't really go to tasking meetings, so i don't get the prioritzation much either [16:53:35] We'll go deeper on that tomorrow, but I have first been asked to just leave the project (en.wikipedia for instance) [16:53:37] joal, but also factoring in mobile and wap and sometimes wikidata is going to be the project which means 'www' will be the language and.. [16:53:56] Instead of going though project and class, and not knowing when it's a language [16:54:14] Ironholds: I remove www [16:54:23] and mobile/zero bits [16:54:41] I only keep language and project related classes (commons.wikimedia for instance) [16:55:29] fair! [16:55:57] So why not splitting by the dot and ahve the class and project [16:56:01] I don't mind :) [16:56:11] I would event prefer it in fairness ;) [16:58:21] I have no problem with that if we're confident in the regex [16:58:37] I tend to use logic chains rather than regex because C++98, but ;p [16:58:46] huhu :) [16:58:58] Ironholds: currently running a test on one hour of data [16:59:03] Will send you the resutlts [16:59:13] cool! [17:02:55] Analytics-Cluster, operations: Fix llama user id - https://phabricator.wikimedia.org/T100678#1318116 (Ottomata) NEW a:Ottomata [18:19:15] is labs down or is it just me? [18:20:38] Ironholds: DNS issues. [18:20:59] specifically, the DNS server was /not running/. Bblack restarted it, so should be OK now [18:22:35] that...is a DNS issue [18:22:43] oh wait, I get it [18:22:45] TLA collision [18:22:58] Domain Name System, not Do Not Start ;p [18:23:09] I knew we should've added a devil's dictionary to our onboarding guide [18:23:43] valhallasw, on the subject of "deliberately droll humour in technical events": have you read the Open Location Code standard? [18:24:18] I thoroughly recommend the passage that ends in "This means that we cannot exactly represent latitude 90 in a code. We are willing to accept this shortcoming since there is no permanent settlement at the North Pole." [18:27:50] Ironholds: is that the tomtom one? [18:28:01] no, it's not it seems [18:28:03] valhallasw, Google! [18:28:12] it's actually pretty neat (I'm writing the C++ implementation. Because bored.) [18:29:15] Ironholds: http://www.mapcode.com/ [18:29:18] that's the tomtom one [18:30:53] ahh, neat! [18:31:30] Ironholds: I quite liked their approach 'storage is cheap, so we're just going to use a 20kB data table to help get shorter codes' [18:31:45] (Abandoned) Milimetric: [WIP] Update to Vega 2.0 (not ready to merge) [analytics/dashiki] - https://gerrit.wikimedia.org/r/212455 (owner: Milimetric) [18:42:58] joal: ottomata So I tried for an hour of data and it's fine - even for 3 hours. but when I try for a day it says heap space exceeded or sth like that. [18:43:19] hmm [18:43:26] hm, in the driver process, right? [18:43:30] did you increase --driver-memory [18:43:31] ? [18:43:36] hive even does that too sometimes [18:43:48] ottomata: yeah.. I did. let me paste command [18:43:51] hm [18:44:12] spark-submit --driver-memory 1500M --num-executors=6 --executor-cores=2 --executor-memory=2g --class org.wikimedia.analytics.refinery.job.AppSessionMetrics --verbose /home/madhuvishy/workplace/refinery-source/source/refinery-job/target/refinery-job-0.0.12-SNAPSHOT.jar -o /user/madhuvishy/tmp/ -y 2015 -m 5 -d 20 -n 2 [18:44:31] hm [18:44:43] you get the OOM printed out on the CLi to you? [18:44:50] ottomata: yeah [18:45:27] I think you forget --master yarn [18:45:32] madhuvishy: --^ [18:45:49] Running local for those is too big :) [18:45:52] joal: aah, okay let me try that. [18:46:17] oh! :) [18:58:06] madhuvishy: works better ? [18:58:38] joal: trying for 2 days - it's still running [18:58:47] sounds better indeed :) [18:59:17] joal: :) it did say java.lang.OutOfMemoryError: Java heap space a couple times, but is continuing to run [18:59:22] not sure if that's normal [18:59:32] Sounds not normal no ... [19:00:13] joal: hmmm waiting for it to finish - to see what happens [19:00:16] but let it finish (fail or work) [19:00:18] yeah [19:19:21] joal: so the 2 day job failed. but 1 day finished fine. [19:22:53] madhuvishy: how did it fail? [19:23:11] (PS3) Madhuvishy: [WIP] Productionize app session metrics - Parse args using scopt - Move away from HiveContext to reading Parquet files directly - Change reports to run for last n days instead of daily or monthly (not sure if this is gonna work yet) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/212573 (https://phabricator.wikimedia.org/T97876) [19:23:20] ottomata: multiple heap space errors [19:23:41] https://www.irccloud.com/pastebin/CgXhG0p5 [19:24:32] hmm, ok, interseting, totally different errors than the hive context ones [19:24:42] looks like an exectutor failed several times [19:24:44] possibly wiht OOM? [19:24:50] where di you see the OOMs? [19:25:58] ottomata: yeah. I think atleast 2 of them where OOMs. I lost the trace - so not sure where [19:27:38] application_id? [19:30:03] ottomata: 1 sec finding [19:33:34] ottomata: can't find the one i ran. rerunning - application id is application_1430945266892_36786. There's been one heap space error so far [19:33:55] k tahnks [19:34:39] madhuvishy: try more executors [19:35:49] ottomata: okay will try that. may be double? [19:36:50] ottomata: any case - this is for 2 days. this report is supposed to run for last 30 days of data - do you think that'll succeed even given more executors - or we should revisit our approach [19:36:58] joal: ^ [19:37:43] not sure. [19:37:48] madhuvishy: can you paste your submit command again? [19:37:55] the code that existed before could run daily or monthly. i changed it to run last 30 days - given the reqt was weekly reports for last 30 days. [19:38:00] ottomata: sure. [19:38:28] yeah> hm. [19:38:29] spark-submit --driver-memory 1500M --num-executors=6 --executor-cores=2 --executor-memory=2g --class org.wikimedia.analytics.refinery.job.AppSessionMetrics --verbose /home/madhuvishy/workplace/refinery-source/source/refinery-job/target/refinery-job-0.0.12-SNAPSHOT.jar -o /user/madhuvishy/tmp/ -y 2015 -m 5 -d 20 -n 2 [19:42:08] ottomata: also code is here - https://gerrit.wikimedia.org/r/212573. it still needs some work but may be i'm doing something obviously stupid [19:47:37] madhuvishy: no --master yarn [19:47:38] ? [19:49:26] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [19:49:47] madhuvishy: i'm running home from this cafe, got a little bit more work left, and I think your thing is going to take some deep diving, can I give you my tomorrow for this? :) [19:50:03] ottomata: uhhh i was pretty sure i put that. may be i pasted wrong. [19:50:14] yeah, i think you did [19:50:15] ottomata: yeah, okay lets do that. [19:50:17] otherwise the job woudlnt' bein yarn [19:50:18] :) [19:50:27] yup :) [19:58:55] mforns: hi [19:59:07] I just saw the icinga alarm, was going to look into it but wanted to check with you [19:59:07] Hey madhuvishy [19:59:22] I am off for tonight, let's talk about your code tomorrow :) [20:00:16] G'd night team :) [20:01:25] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [20:04:46] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [20:09:37] hm, mforns: I looked into it and it looks like it's corrected now, but basically I found MobileWikiAppArticleSuggestions and Edit were two schemas not validating on the client stream. The validation exceptions seemed real, no system problems [20:09:59] halfak: Grace was saying that you're going to try to make the meeting in a couple of hours [20:10:10] if you're too tired, I'm happy to catch up with you now and represent both of us [20:10:13] o/ milimetric yes that's right. [20:10:37] No worries. :) Much appreciated though. [20:11:04] I've got a bit of nuance to work through for our next go/no-go. [20:11:09] So I'd better make it. [20:11:28] Also, apparently there was some issue that might have affected the pilot that I've got to learn more about [20:11:52] BUT, I'd love another person to look at what I've been doing and think with me about it if you can spare 30 minutes to read through. [20:13:50] halfak: I'll read up, but I'm about to message about some invalid events in the Edit channel [20:14:03] I see you guys are chatting about possible issues there too, maybe it's related? [20:18:43] Maybe [20:18:45] Indeed [20:21:52] halfak: on second thought, it sounds unrelated. So, what should I read? [20:25:22] https://meta.wikimedia.org/wiki/Research_talk:VisualEditor%27s_effect_on_newly_registered_editors/Work_log/2015-05-27 [20:25:29] https://meta.wikimedia.org/wiki/Research_talk:VisualEditor%27s_effect_on_newly_registered_editors/Work_log/2015-05-28 [20:25:37] Sampling and measurement of VE stuff. [20:25:51] I found some concerning things towards the end of the 28th log [20:26:13] If you were to skim the stuff leading to that and look carefully at the analysis of the 28th, that would be very helpful. :) [21:02:25] Analytics-General-or-Unknown, CA-team, Community-Liaison, Wikimedia-Extension-setup: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1318815 (Tgr) It is not clear (to me, anyway) that the WMF privacy policy covers the WM-RU site. The policy explicitly mentions websites r... [23:00:35] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Forwarder modification to produce to multiple outuputs, 1 to zmq and 1 to Kafka [8 points] {oryx} - https://phabricator.wikimedia.org/T98779#1319135 (madhuvishy) [23:00:37] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Configuration for Python & Kafka in Beta labs [8 points] {oryx} - https://phabricator.wikimedia.org/T98780#1319134 (madhuvishy) [23:02:13] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Forwarder modification to produce to multiple outuputs, 1 to zmq and 1 to Kafka [8 points] {oryx} - https://phabricator.wikimedia.org/T98779#1277503 (madhuvishy) T98779 and T98780 are covered in the same patch - https://gerrit.wikimedia.org/r... [23:05:04] milimetric: still around? [23:05:23] hi madhuvishy, yes, but i'm in a meeting. I'll be done in 25 [23:16:21] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Configure Kafka in beta labs + Forwarder modification to produce to multiple output [16 points] {oryx} - https://phabricator.wikimedia.org/T98779#1319178 (madhuvishy) [23:16:40] milimetric: no worries, I spoke to Kevin and figured it out [23:28:39] milimetric: if you have time after - one question - https://gerrit.wikimedia.org/r/#/c/210701/ I dont see which task this patch belongs to. [23:29:25] madhuvishy: looking, I remember this task [23:31:13] madhuvishy: I think it was actually this one that you merged in phab: https://phabricator.wikimedia.org/T98779 [23:31:55] the config part of that task is handled by the gerrit patch that's already linked in T98779 [23:32:11] milimetric: Hmmm. the wording confuses me [23:32:13] and the forwarder change in eventlogging's server folder is https://gerrit.wikimedia.org/r/#/c/210701/ [23:32:27] forwarder to produce in one, forwarder to accept in other [23:32:30] so, there are four logical changes: [23:32:40] forwarder and producer, code and config [23:33:03] both of the config patches are in https://gerrit.wikimedia.org/r/#/c/210765/ [23:33:41] the forwarder code patch is this one: https://gerrit.wikimedia.org/r/210701 [23:33:54] and the producer code patch is the one I did but can't find now [23:34:09] madhuvishy: tell me if that doesn't make sense still, as I'm not sure it makes sense to me either :) [23:34:23] milimetric: okay. for my understanding - is the forwarder producing multiple outputs or accepting multiple outputs. [23:35:14] milimetric: the forwarder code patch claims its accepting, the phab tasks says forwarder to produce [23:35:34] that's why i thought they were two different things [23:37:38] madhuvishy: ok, the config should be configuring two separate forwarders, one for "server side" and one for "client side" [23:38:15] each of these forwarders should be able to forward to multiple outputs, in our case we're trying to get them to forward to zeromq and kafka at the same time [23:40:24] milimetric: okay, so the description in the patch is slightly misleading? [23:40:43] which one? :) [23:41:10] milimetric: this - https://gerrit.wikimedia.org/r/#/c/210701/ [23:42:59] madhuvishy: no, I don't think so [23:43:18] i'd make a few grammar improvements, but it seems correct, why, what did I say that was different? [23:43:31] basically, each forwarder can have more than one OUTPUT [23:43:38] and there are multiple forwarders [23:44:10] milimetric: no I get that, i just got confused - thought forwarder accepts means that the forwarder consumes from multiple things [23:44:18] anyway, makes sense now [23:44:59] let me add a comment on the task with this patch. [23:45:39] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Configure Kafka in beta labs + Forwarder modification to produce to multiple output [16 points] {oryx} - https://phabricator.wikimedia.org/T98779#1319277 (madhuvishy) The python code changes for the forwarder to produce multiple outputs (zmhtt... [23:46:51] madhuvishy: you're welcome to add a new patchset to that Gerrit change and link the Bug: T98779 properly [23:47:16] I've gotta sign off for tonight, getting hungry and dizzy :) [23:47:44] milimetric: yeah I'll do that. Thanks for helping out :) Good night!