[00:04:07] having an oozie problem i still dont understand, my job is logging: ActionInputCheck:: In checkListOfPaths: hdfs://analytics-hadoop/wmf/data/wmf/pageview/hourly/year=2015/month=11/day=24/hour=0/_SUCCESS is Missing. [00:04:18] i can clearly see that exact path from `hdfs dfs -ls /wmf/data/wmf/pageview/hourly/year=2015/month=11/day=24/hour=0/_SUCCESS` though [00:42:21] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1846657 (kevinator) @SLaporte just suggested another feature that would make life easier: It would be nice if the Top API included the Page ID and the Namespace... [01:13:05] i don't know if i broke something...but the oozie command has stopped responding on stat1002 :S both the job i was trying to submit, and just general `job -info ...` commands :S [01:20:08] ebernhardson: [01:20:10] Fatal Error - Oozie Job discovery-popularity-score-wmf.pageview_hourly->discovery.popularity_score-2015,11,24-wf [01:20:19] i have gotten a few of these emails [01:20:33] as far as oozie responding [01:20:35] madhuvishy: yea it's my testing today. didn't realize this would email you all every time i test again [01:20:48] ya i din't think so either, dont know why [01:20:51] madhuvishy: it seems to have started again, it took ~10 minutes between me doing `oozie job ... -run` [01:20:55] yaaa [01:20:59] all other times that took ~5s [01:21:00] that's sometimes terrible [01:21:15] ok good, i was worried i broke something :) [01:21:32] i been throwing garbage at oozie all day as i figure out what i'm doing wrong [01:22:02] madhuvishy: thanks! [01:22:42] no problem, let me know if you need any help [01:22:48] oozie can be exhausting [01:23:25] i think i'm getting close...it's finally starting my spark app [01:24:15] oh good! [01:24:36] do you know how to find spark logs for jobs run through oozie [01:24:40] i wrote it somewhere [01:24:47] yea in hdfs /var/log, found it in the oozie docs [01:24:57] oh yay good [01:25:26] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark [01:27:23] although this error doesn't mean anything to me :S in /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_24310/analytics1048.eqiad.wmnet_8041 [01:27:32] yeahhh [01:27:40] that is why [01:27:58] the actual spark logs get swallowed by the console logger [01:28:05] well, it has an error: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf [01:28:10] but i dont see how that can be :S [01:28:37] how are you launching it? [01:29:04] madhuvishy: https://gerrit.wikimedia.org/r/#/c/256167/5/oozie/popularity_score/workflow.xml [01:29:10] thats the latest version on git [01:29:39] something is different between launching on oozie and launching with spark-submit, with spark-submit i didn't need to set SPARK_HOME=/bogus either [01:30:00] looking [01:30:01] [01:30:10] this is why it's sending us emails i think [01:30:20] oh duh, of course that's why it's emailing you all. i'll comment that out for now [01:30:27] you can may be error to kill now [01:30:28] (and need to replace it with one that emails our team instead of yours) [01:31:35] ebernhardson: can you paste the command you use to launch the job? [01:32:39] oozie job -Ddiscovery_oozie_directory=hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie -Ddiscovery_data_directory=hdfs://analytics-hadoop/user/ebernhardson -config discovery_oozie/popularity_score/coordinator.properties -verbose -Dstart_time=2015-11-24T00:00Z -Dstop_time=2015-12-01T00:00Z -run [01:35:26] ebernhardson: my guess is that the analytics_oozie_directory is not getting picked up [01:35:48] when we deploy we pass an actual directory that's not a symlink [01:35:54] madhuvishy: wouldn't it have to for the email send to work? [01:36:09] hmmm [01:36:11] but sec i'll try pointing to real dir [01:40:20] madhuvishy: same error. it's almost like the classpath for java is wrong, or it's not finding the right jars for hive or something? [01:41:33] ebernhardson: hmmm, may be. what is the workflow id? [01:42:01] madhuvishy: oozie is 0116447-150922143436497-oozie-oozi-C the job it started and failed is application_1441303822549_243207 [01:42:39] ebernhardson: I see [01:42:42] https://www.irccloud.com/pastebin/3Qz5vlOz/ [01:43:02] madhuvishy: where you find that? [01:43:06] do you have access to hue? [01:43:16] if so [01:43:17] https://hue.wikimedia.org/oozie/list_oozie_workflow/0116448-150922143436497-oozie-oozi-W/?coordinator_job_id=0116447-150922143436497-oozie-oozi-C [01:43:23] wikitech creds [01:43:27] if not [01:43:50] oozie job -log 0116448-150922143436497-oozie-oozi-W [01:43:59] note that this is the Workflow id [01:44:20] coordinator launches multiple workflows, this is one of them, they end with C and W respectively [01:44:27] madhuvishy: it seems to like my ldap credentials [01:44:34] oh good [01:44:59] and indeed, end_year was replaced with year. i tried looking through oozie job -log but there is sooo much going on in there [01:45:13] yeah [01:45:24] hue makes it a little easier to look at, when it works [01:45:32] :) [01:46:04] ok fixed, killed and resubmitted [01:47:58] failed again i see [01:48:25] hue seems to have died for me too... just now [01:48:51] kevinator: hue always keeps dying [01:49:48] ebernhardson: I think the current error is [01:49:49] org.apache.oozie.action.ActionExecutorException: OozieClientException: org.apache.oozie.DagEngineException: E0738: The following 1 parameters are required but were not defined and no default values are available: location [01:50:02] just in case oozie is slow [01:54:05] found it, thats the add partition workflow which runs after spark. [01:55:48] ya looks like it [01:59:15] ok ran again, it looks like oozie thinks spark completes 'OK' even though it's failing [02:03:10] ebernhardson: yeah [02:03:11] i think it might be class path, i just booted up my job with spark-submit and it spits out a rediculously long list of jars, including (picked one at random): /usr/lib/hive/lib/hive-exec.jar [02:03:24] https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_243239/tasks/task_1441303822549_243239_m_000000/attempts/attempt_1441303822549_243239_m_000000_0/logs lists the class path used for the oozie job [02:03:29] and hive-exec.jar isn't in there anywhere [02:03:30] hmm [02:03:45] for the job kicked off by oozie i mean [02:04:07] i would look into the spark logs [02:04:36] i wonder if i should just have it use a shell script and use spark-submit ... would be the easiest way :) [02:05:22] the spark logs are the ones that say HiveConf not found (978 times) [02:06:12] still? [02:06:55] where are they? [02:08:03] looking, i didn't look at the most recent run [02:08:52] it would be in hdfs /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_243247 [02:09:31] thats the run by oozie, here is one by spark-submit (that i canceled before completing with ctrl-c): /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_243255 [02:11:36] oh these dont give you anything [02:12:01] you can also just access them like yarn logs -applicationId [02:13:59] i think i might have found it, spark-submit sources /usr/lib/spark/conf/spark-env.sh [02:14:16] that does: for f in ${GIVE_GOME}/lib/*;jar; do SPARK_CLASSPATH=$SPARK_CLASSPATH:$f;done [02:14:20] s/GIVE/HIVE/ [02:14:25] so, it force includes all the hive jar's [02:14:29] okay.. [02:15:41] that is put in place by puppet, i wonder if its on all the servers... [02:16:10] i'm sure it is. these errors feel to me like something else is going on [02:18:11] i'm pretty sure oozie's spark runner isn't using the spark-submit thing though, but only a theory from docs i've read i'm not at all sure [02:18:25] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark create a log4j.properties file like the one there, and put it on hdfs. [02:18:51] link it on spark-opts with extra arg --files hdfs://.../log4j.properties [02:20:01] ebernhardson: ^ [02:23:31] ok started 0116520-150922143436497-oozie-oozi-C with the log4j in place pointing to hdfs://analytics-hadoop/user/ebernhardson/spark.log [02:23:58] spark has now started [02:25:37] job now killed, but no log :( [02:26:08] ebernhardson: where is your log4j file? [02:26:57] madhuvishy: hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie/popularity_score/log4j.properties [02:29:35] currently launching with oozie job -Ddiscovery_oozie_directory=hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie -Ddiscovery_data_directory=hdfs://analytics-hadoop/user/ebernhardson -config discovery_oozie/popularity_score/coordinator.properties -verbose -Dstart_time=2015-11-24T00:00Z -Dend_time=2015-12-01T00:00Z -Danalytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/2015-12-01T16.01.05Z--47ad904/oozie -run [02:29:50] hmmm that's weird [02:30:14] i have not tried using an hdfs directory as the spark log output directory though, although i doubt that's the problem [02:30:36] thats what the docs said to do for oozie: Putting the file on a temp directory on Hadoop and using a hdfs:// url should do the trick [02:31:23] ohh, i meant for the properties file [02:31:27] sorry i wasn't clear [02:32:45] ebernhardson: can you try just /tmp/spark.log - i can look into the actual worker machine and see if it shows up if you dont have access [02:32:55] sure [02:34:04] I remember I was gonna log a ticket to make file logging the default for spark, probably haven't done it. this should not be so hard [02:34:18] ok kicked off again to 0116536-150922143436497-oozie-oozi-C [02:36:01] looks to have run and died [02:36:14] the driver should have been analytics1038.eqiad.wmnet:8042 i believe [02:37:25] i think it wrote something, but the error is probably on an executor [02:37:40] hmm, i can re-run it with 2 executors instead of 48 [02:38:12] 48 was kinda chosen randomly...it kicks off ~1300 tasks to chew through initially [02:38:14] yeah, i'll look at yarn spark and see if it shows me which ones after you do [02:39:29] ok starting again as 0116562-150922143436497-oozie-oozi-C [02:40:28] kicked off application_1441303822549_243407 which is now finished (dead) [02:41:00] ok looking [02:44:10] looks like it should be 1043 or 1051 for the executors, 1031 for the driver [02:44:39] ebernhardson: [02:44:41] actually [02:44:45] if you look at yarn logs [02:44:47] it shows [02:45:13] http://pastebin.com/rPaVFF1e [02:45:57] yea there is one of those lines for each matching NoClassDefFoundError exception [02:46:53] well, almost. 978 NoClassDefFound and 980 java_gateway.py [02:48:19] ebernhardson: aah it exceeded the recursion depth when logging? [02:49:02] (there's nothing in any of the spark logs btw) they just say Driver commanded a shutdown [02:50:06] maybe, i'm not entirely sure :S the odd thing was i tried to lookup those lines and the only java_gateway.py in spark's v1.3.0 git tag is https://github.com/apache/spark/blob/v1.3.0/python/pyspark/java_gateway.py [02:50:29] but that has 111 lines, and our error comes from line 364, 369 and 483 :S [02:50:52] can probably disassemble this jar somehow i suppose... [02:52:30] hmmm [02:52:45] this all runs when you use pyspark submit? [02:52:57] not sure, i always used spark-submit [02:53:11] ah yeah spark-submit [02:54:34] my best guess is still because spark-submit is a shell script that sources /usr/lib/spark/conf/spark-env.sh, which includes all the hive jars in SPARK_CLASSPATH. otto added that via puppet [02:54:45] https://gerrit.wikimedia.org/r/#/c/203358/ [02:57:12] yeah but we have other spark jobs running via oozie, how is this different? [02:57:26] do they use HiveContext? [02:57:29] or maybe just SqlContext instead? [02:57:46] SqlContext is needed for plain data frames, HiveContext would be required to talk to hive metadata [02:58:37] ebernhardson: aah [02:59:05] that makes sense [02:59:13] may be it was never tested [02:59:24] mostly i'm only guessing because https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_243332/tasks/task_1441303822549_243332_m_000000/attempts/attempt_1441303822549_243332_m_000000_0/logs lists java.class.path= [02:59:28] and that doesn't include hive [03:01:18] hmmm [03:01:27] we'd have to ask andrew/joseph [03:01:57] ok i'll check tomorrow. I appreciate all the help! i must have sucked up about 2 hours of your time now... [03:02:13] but i did learn a bunch of new commands to find things, and hue :) [03:02:14] helps a bunch [03:02:21] No problem! :) [06:00:08] Analytics, TimedMediaHandler, Wikimedia-Video: Record and report metrics for audio and video playback - https://phabricator.wikimedia.org/T108522#1847111 (Tbayer) See also T88775 BTW I guess that by "low-level stats", you mean the [[https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts |med... [07:12:57] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1847214 (RobLa-WMF) @ottomata: I don't know all of the details, but I think the ID idea is a good one. One //possible hitch//:... [08:51:13] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1847358 (mobrovac) >>! In T116206#1846067, @Milimetric wrote: > @mobrovac: those sound like things I can do? Let me know if you've started on... [08:51:26] joal: ^^ [08:51:46] joal: doesn't need to be done today, but soon-ish would be really good [08:58:28] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1847388 (Lcanasdiaz) Producing new data. The 2nd bug was related with the period of time which was overwritten for some studies. [09:02:56] Analytics, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847396 (jcrespo) NEW a:Milimetric [09:04:17] Analytics, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847408 (ori) @Nuria, could you help plan out this work? [09:24:06] Analytics-Tech-community-metrics, Gerrit-Migration: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1847441 (Qgil) Bitergia's work is committed through monthly sprints, so all we know today is that this task wi... [09:56:42] (CR) Joal: [C: 2 V: 2] "Ottomata: see attached task comment: https://phabricator.wikimedia.org/T116772" [analytics/refinery] - https://gerrit.wikimedia.org/r/256027 (https://phabricator.wikimedia.org/T116772) (owner: Joal) [10:02:20] Analytics-Backlog, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1847524 (JAllemandou) Question on the previous link "reflection" section: What about bots? [10:23:20] hey mobrovac [10:23:42] Just saw your ping [10:24:15] About test data, I wonder what would be an easy way to go : for now everything is loaded through hadoop [11:43:23] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1847718 (Lcanasdiaz) Final test passed. We're resuming the gathering process and generating the new data. It should be in korma in a few of h... [12:11:05] Analytics-Backlog: Make EL mysql consumer's worker thread consume from the left side of the queue - https://phabricator.wikimedia.org/T120209#1847820 (mforns) NEW [12:12:00] Analytics-Backlog: Fix EL mysql consumer's deque push/pop usage - https://phabricator.wikimedia.org/T120209#1847833 (mforns) [12:12:11] Analytics-Backlog: Fix EL mysql consumer's deque push/pop usage {oryx} - https://phabricator.wikimedia.org/T120209#1847820 (mforns) [14:38:17] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848085 (Ottomata) Hm, not sure I follow. We are proposing that a schema be ID-able via a URI, and also remotely locatable if t... [15:11:28] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848209 (JanZerebecki) I think that only means that a client that gets a URL ending in '/' for an API should not... [15:17:33] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848214 (Ottomata) Hm, I think I see. We are coupling the URI to the ID, which according to the W3C should not be relied upon.... [15:18:09] (PS5) Milimetric: Archives are downloaded in .txt.gz format: fix matching and opening [analytics/wikistats] - https://gerrit.wikimedia.org/r/92066 (owner: Nemo bis) [15:18:19] (PS6) Milimetric: Move download limiter to proper place and comment it as it's only a test [analytics/wikistats] - https://gerrit.wikimedia.org/r/92056 (owner: Nemo bis) [15:18:39] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1848215 (Lcanasdiaz) We had an error in our library so the process didn't finish correctly. This morning I took the code from the branch we w... [15:19:09] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848216 (mobrovac) From my POV, the URL **is** the ID. [15:26:35] hi a-team! [15:26:39] Hey mforns [15:27:26] hiya [15:29:17] Analytics-Backlog: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848250 (Tbayer) NEW [15:29:48] a-team, I need to leave now and should be back for standup. In case I'm not (who knows), today I have debugged hte discovery team spark job [15:30:04] aha joal [15:30:05] The patch is not yet pushed, but will when I come back [15:30:20] ok [15:30:54] if you're not there I'll bring this up in standup, see you in a while joal [15:31:38] madhuvishy or others, could you give this a quick (https://phabricator.wikimedia.org/T120224 ) and maybe prod oozie a bit? The November numbers are needed for https://www.mediawiki.org/wiki/Wikimedia_Product#Reading which supposed to be updated today [15:45:23] Analytics-Backlog, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1848357 (Halfak) +1. I think bots and other tools may explain a lot of an increase in efficiency of content production. [15:57:21] Analytics-Cluster, operations: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848408 (BBlack) [15:57:41] Analytics-Backlog: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848410 (Nuria) p:Triage>High [15:59:14] Analytics-Cluster, operations: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848418 (BBlack) a:BBlack Yeah this is all the same issue and still present. I think @fgiunchedi is on the right track here about streaming, I'm going to write up a gene... [15:59:20] Analytics-Kanban: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848250 (Nuria) [16:00:40] HaeB: we will try to look at this within the day today but we might get to it tomorrow. [16:00:57] HaeB: i will ping you later on today [16:17:47] Analytics-Cluster, Traffic, operations, Patch-For-Review: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848484 (BBlack) [16:31:25] Analytics-Backlog, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1848516 (Milimetric) p:Low>Normal [16:37:26] joal: whenever you get back let me know if you have made a script to generate fake data for AQS. Otherwise I think that'd be useful for the test cluster [16:40:12] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1848553 (Milimetric) >>! In T112956#1846657, @kevinator wrote: > @SLaporte just suggested another feature that would make life easier: It would be nice if the T... [16:43:54] milimetric: yt? [16:45:29] in a hangout, nuria, I might be a little late for standup [16:45:36] k [16:46:09] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1848568 (Aklapper) p:Low>Normal [16:46:27] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1848570 (Aklapper) p:Lowest>Low [16:50:55] Analytics-Cluster, Traffic, operations, Patch-For-Review: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848582 (BBlack) Open>Resolved This should be resolved now with the change above applied. I've tested the files from this ticket and... [16:55:51] great milimetric, I had forgotten about that ! [16:56:03] milimetric: It'll be perfect for the test clusterI assume :) [17:01:10] a-team: stadduppp [17:01:14] *standuppp [17:01:41] Morning ebernhardson :) [17:01:51] I have news for you (after standup and many meetings though) [17:02:21] be there in a sec... [17:03:00] start without me guys - bit late [17:03:27] (finishing up with wes...) [17:03:45] ottomata: k [17:03:50] ottomata: we will wait 2 mins [17:11:46] ottomata: so fun thing i found out about oozie + spark yesterday. Hive jars are included via spark-submit and other required environment variables are set when using the spark-submit shell script, but not when kicking off jobs via oozie [17:12:00] ottomata: any hints on where i would look to integrate those environment variables into oozie? [17:12:18] * ebernhardson fails at english...some day i will figure it out [17:13:04] alternatively, i could just make oozie run a shell script instead of the direct spark loader. And run spark-submit from the shell script [17:15:13] ebernhardson: if you give me an hour, I'll have some updates for you on the thing [17:15:36] joal: oh sweet, you can have a day i'm not in a rush :) [17:19:07] hmmm, the jars aren't included? that's weird. i coudl see maybe not the hive-site.xml file, which it probably needs [17:26:02] joal: dunno what is up with EL [17:26:09] mysql consumer logs say Data inserted [17:26:34] but latest data in Edit_13457736 is 20151202164718 [17:26:40] i can see events for Edit flowing in through files [17:26:43] all-events.log [17:26:59] looks good [17:27:00] 2015-12-03 17:24:36,322 (MainThread) Edit_13457736 queue is large or old, flushing [17:27:01] dunno.. [17:27:37] also, anyone know what changed here? [17:27:37] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=6&fullscreen [17:28:26] starting a few days ago insertAttempted rate gits all spikey [17:28:48] mforns: ?^ [17:28:52] gets* [17:28:56] ottomata, looking [17:29:40] the average is the same as inserted.rate, so it is probalby ok...but kinda weird, dunno what would have changed [17:30:01] is 2015-12-03 17:29:54,267 (MainThread) Sleeping 5 seconds [17:30:16] is that from the kafka consumption throttling? [17:30:21] ottomata, yes [17:30:25] the queue seems full [17:30:49] do we have more events coming in than we used to? [17:30:51] no [17:31:00] ottomata, is this sleep recent? [17:31:06] yes [17:31:08] happening constantly [17:34:57] !log restarting eventlogging on eventlog1001 [17:35:55] Analytics-Kanban: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848798 (JAllemandou) Thanks for noticing that ! It actually shown us an error in this job execution: it was launched the 15th of the month instead of the 1st, skewing data half a month late. This means that... [17:36:02] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Checking_consumer_offsets [17:38:09] Analytics-Backlog: Eventlogging devserver needs maintenance - https://phabricator.wikimedia.org/T120245#1848811 (Nuria) NEW [17:42:27] ottomata: I realized I don't have access to eventlog1001! Do I have to make Ops-Access-Request? [17:44:19] whaa yeah for sure [17:44:27] ask for eventlogging-admins [17:44:55] mforns: how is it sleeping 1 seconds [17:44:56] ? [17:44:57] 2015-12-03 17:44:43,050 (MainThread) Sleeping 1 seconds [17:45:01] code looks like it has to sleep 5 [17:45:30] ottomata, it may be that when backfilling I changed my local code in /home/mforns [17:45:46] and executed: sudo python setup.py develop [17:45:59] develop? [17:46:26] yes, exaclty, I thought this would create a local build, but it seems it modified the system libs [17:46:30] hmm whoops [17:46:34] don't do sudo! :p [17:46:43] mmm [17:46:58] yeah wow, it is consuming very slow from kafka [17:47:14] why no burrow email!? [17:47:23] but the only thing I changed was the queue size [17:48:40] well, it wasn't running your code until i restarted just now [17:48:50] before it was still running the old code, because it was sleeping 5 [17:48:50] ottomata, yes [17:48:54] sure [17:48:56] Analytics-Kanban: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1848951 (Nuria) NEW [17:49:06] still not sure why its consuming so slow [17:49:08] and also [17:49:10] WHY NO BURROW EMAIL@ [17:49:12] GRRR [17:49:14] ottomata, let me install the current production code again [17:49:55] ottomata: yeah! i couldn't test that then, cos nothing was going wrong - may be it doesn't take comma separated emails [17:50:21] may be change it to just yours [17:50:25] or an-internal [17:50:35] OH! madhuvishyi think we aren't monitoring the right consumer group? [17:50:36] i can't think of anything else [17:50:39] mforns: ok [17:50:44] [email "madhuvishy@wikimedia.org, otto@wikimedia.org"] [17:50:44] group=eqiad,eventlogging-00 [17:50:44] group=eqiad,eventlogging-mysql-00 [17:50:44] group=eqiad,eventlogging-files-00 [17:51:02] identity=mysql-m4-master [17:51:04] ! :o [17:51:29] ottomata, may I restart eventlogging? [17:51:36] these are the wrong groups? [17:51:42] Analytics-Backlog, Analytics-Dashiki, Google-Code-In-2015, Need-volunteer: Vital-signs layout is broken - https://phabricator.wikimedia.org/T118846#1848985 (Aklapper) [17:52:32] yes [17:52:34] mforns: [17:53:03] madhuvishy: yes, [17:53:07] but burrow still says error false [17:53:09] for the rigiht one [17:53:14] fixing the config... [17:53:29] ottomata, done [17:53:45] Analytics-Backlog, Analytics-Dashiki, Google-Code-In-2015, Need-volunteer: Vital-signs layout is broken - https://phabricator.wikimedia.org/T118846#1849023 (Aklapper) https://codein.withgoogle.com/dashboard/tasks/4876388897128448/ [18:06:50] !log restarting eventlogging on eventlog1001 [18:07:21] Analytics-Backlog: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1849074 (Milimetric) [18:07:52] Analytics-Kanban: mobile_apps_uniques_monthly not updating {hawk} - https://phabricator.wikimedia.org/T120224#1849078 (Milimetric) a:JAllemandou [18:08:15] Analytics-Kanban: Backfill EL data for 2015-11-27 incident {oryx} - https://phabricator.wikimedia.org/T119981#1849082 (Milimetric) [18:09:40] ottomata, look at the consumer log, it's inserting veeery slowly [18:10:03] there's only one insertion thread, is that right? [18:10:04] 2015-12-03 18:09:35,832 (Thread-15 ) Data inserted 4000 [18:10:10] yes ottomata [18:10:48] Analytics-Cluster, Analytics-Kanban, Easy: Update client IP in webrequest table to use IP [5 pts] {hawk} - https://phabricator.wikimedia.org/T116772#1849099 (Milimetric) [18:10:49] Analytics-Kanban: Remove client_ip computation from refine - https://phabricator.wikimedia.org/T120105#1849098 (Milimetric) [18:11:14] Analytics-Kanban: Remove avro schema from jar [1 pts] - https://phabricator.wikimedia.org/T119893#1849108 (Milimetric) [18:11:20] Analytics-Kanban, Services: Response times pageview API. Dashboard . [8 pts] - https://phabricator.wikimedia.org/T119886#1849110 (Milimetric) [18:17:10] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1849158 (Nuria) p:Normal>High [18:17:43] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1849159 (Nuria) p:Triage>High [18:17:47] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1849160 (Milimetric) p:High>Normal [18:17:58] mforns: you should join @wikimedia-operations [18:18:33] ottomata, I'm there [18:18:57] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1779777 (Milimetric) p:Normal>High [18:19:46] Analytics-Backlog: Eventlogging devserver needs maintenance - https://phabricator.wikimedia.org/T120245#1849171 (Milimetric) p:Normal>Low [18:21:04] Analytics-Backlog: Change the Pageview API's RESTBase docs for the top endpoint - https://phabricator.wikimedia.org/T120019#1849183 (Milimetric) p:Triage>High [18:28:50] madhuvishy: @! [18:28:51] 2015-12-03 18:26:49 [ERROR] Failed to send email message:501 : malformed address: , otto@wikimedia.org> may not follow oh hooo [18:29:14] [email "<%= @to_emails %>"] [18:29:17] bad template fixing [18:29:25] hmm maybe not [18:29:32] Analytics-Backlog: Use a new approach to compute monthly top 1000 articles (brute force in hive doesn't work) - https://phabricator.wikimedia.org/T120113#1849249 (Milimetric) p:Triage>High [18:30:15] madhuvishy: do you know it takes multiple emails? [18:34:11] Analytics-Backlog, Datasets-Webstatscollector, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1849262 (Milimetric) We looked into this issue quite a bit as part of the data loading jobs for the Pageview API. The proble... [18:34:59] ottomata: no it doesn't say anything about multiple emails, I thought comma separated would work. [18:35:09] I'll open a bug upstream too [18:41:23] k i think we can render multipe notifier sections in the meantime [18:43:28] ottomata: I got an email! [18:43:41] a-team sorry for hard stop, my computer just crashed (never happened with this one before :) [18:43:44] oh! [18:43:54] mforns, milimetric : I leave you EL sdtuff ? [18:44:08] ebernhardson: I have time for you now :) [18:44:17] joal, yes I'm looking at it with ottomata [18:44:31] thanks a lot mfi [18:44:34] mforns: --^ [18:45:24] https://www.irccloud.com/pastebin/HuCOe5vc [18:45:56] nuria: the november mobile apps job as started - hadoop will take care of it - application_1441303822549_245239 [18:46:12] k [18:47:39] ottomata: it went from 12 partitions have errors to 6 partitions have errors [18:48:10] hm! [18:48:18] cool [18:48:19] looking too [18:48:20] i get them too! [18:48:45] huh madhuvishy thats becuase I just started another consumer?! [18:48:46] huh. [18:53:31] ebernhardson: plop ? [18:54:22] joal: here [18:54:27] great :) [18:54:29] was in stand up [18:54:31] batcave ? [18:54:43] oops, sorry ebernhardson, didn't mean to disturb :) [18:55:04] i don't bring my laptop to standup, i'm in the office both days :) [18:55:09] sure lemme grab a room [18:55:13] k [18:57:07] hm, madhuvishy, mforns, dunno about that burrow email, it is going back and forth about what it considers to be behind [18:57:26] Yeah [18:58:45] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849414 (Dzahn) [18:59:29] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849419 (Dzahn) looks like https works just fine and only a redirect is missing from http->https, will look into it [18:59:34] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849420 (Dzahn) a:Dzahn [19:24:48] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849544 (Deskana) #discovery is already recording some survival time data: http://discovery.wmflabs.org/metrics/#survival [19:24:50] Analytics-Backlog, Datasets-Webstatscollector, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1849545 (Tbayer) Thanks Dan! I'm still curious about the actual reason and whether it could affect either the validity of our... [19:29:18] back laters... [19:34:59] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849603 (JKatzWMF) @deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions? Is it session length or how long a user stays on the page? [19:39:55] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849637 (Deskana) >>! In T119352#1849603, @JKatzWMF wrote: > @deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions? The population that this data is recorded for is everyone who arrives... [19:39:57] (PS10) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the ProgramMetrics API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [19:40:31] a-team, I update the Spark doc to ease its use with oozie: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark [19:43:08] thanks! [19:43:43] ebernhardson: Still waiting for my last test to finish, then push :) [19:44:40] milimetric: I might have broken wikimetrics-staging for a bit. fixing but it's fine if it's broken for a little while right? [19:44:54] madhuvishy: of course, no prob [19:44:56] okay [19:45:14] madhuvishy: of course! [19:46:03] (PS11) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the ProgramMetrics API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [19:53:16] milimetric: back up [19:53:19] and working [19:53:20] https://metrics-staging.wmflabs.org/reports/program-metrics/create/ [19:53:27] invalid cohorts will fail though [19:53:45] k madhuvishy, that's 11 minutes of downtime, so you have to save me 11 cookies [19:53:49]