[00:04:07] having an oozie problem i still dont understand, my job is logging: ActionInputCheck:: In checkListOfPaths: hdfs://analytics-hadoop/wmf/data/wmf/pageview/hourly/year=2015/month=11/day=24/hour=0/_SUCCESS is Missing. [00:04:18] i can clearly see that exact path from `hdfs dfs -ls /wmf/data/wmf/pageview/hourly/year=2015/month=11/day=24/hour=0/_SUCCESS` though [00:42:21] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1846657 (kevinator) @SLaporte just suggested another feature that would make life easier: It would be nice if the Top API included the Page ID and the Namespace... [01:13:05] i don't know if i broke something...but the oozie command has stopped responding on stat1002 :S both the job i was trying to submit, and just general `job -info ...` commands :S [01:20:08] ebernhardson: [01:20:10] Fatal Error - Oozie Job discovery-popularity-score-wmf.pageview_hourly->discovery.popularity_score-2015,11,24-wf [01:20:19] i have gotten a few of these emails [01:20:33] as far as oozie responding [01:20:35] madhuvishy: yea it's my testing today. didn't realize this would email you all every time i test again [01:20:48] ya i din't think so either, dont know why [01:20:51] madhuvishy: it seems to have started again, it took ~10 minutes between me doing `oozie job ... -run` [01:20:55] yaaa [01:20:59] all other times that took ~5s [01:21:00] that's sometimes terrible [01:21:15] ok good, i was worried i broke something :) [01:21:32] i been throwing garbage at oozie all day as i figure out what i'm doing wrong [01:22:02] madhuvishy: thanks! [01:22:42] no problem, let me know if you need any help [01:22:48] oozie can be exhausting [01:23:25] i think i'm getting close...it's finally starting my spark app [01:24:15] oh good! [01:24:36] do you know how to find spark logs for jobs run through oozie [01:24:40] i wrote it somewhere [01:24:47] yea in hdfs /var/log, found it in the oozie docs [01:24:57] oh yay good [01:25:26] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark [01:27:23] although this error doesn't mean anything to me :S in /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_24310/analytics1048.eqiad.wmnet_8041 [01:27:32] yeahhh [01:27:40] that is why [01:27:58] the actual spark logs get swallowed by the console logger [01:28:05] well, it has an error: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf [01:28:10] but i dont see how that can be :S [01:28:37] how are you launching it? [01:29:04] madhuvishy: https://gerrit.wikimedia.org/r/#/c/256167/5/oozie/popularity_score/workflow.xml [01:29:10] thats the latest version on git [01:29:39] something is different between launching on oozie and launching with spark-submit, with spark-submit i didn't need to set SPARK_HOME=/bogus either [01:30:00] looking [01:30:01] [01:30:10] this is why it's sending us emails i think [01:30:20] oh duh, of course that's why it's emailing you all. i'll comment that out for now [01:30:27] you can may be error to kill now [01:30:28] (and need to replace it with one that emails our team instead of yours) [01:31:35] ebernhardson: can you paste the command you use to launch the job? [01:32:39] oozie job -Ddiscovery_oozie_directory=hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie -Ddiscovery_data_directory=hdfs://analytics-hadoop/user/ebernhardson -config discovery_oozie/popularity_score/coordinator.properties -verbose -Dstart_time=2015-11-24T00:00Z -Dstop_time=2015-12-01T00:00Z -run [01:35:26] ebernhardson: my guess is that the analytics_oozie_directory is not getting picked up [01:35:48] when we deploy we pass an actual directory that's not a symlink [01:35:54] madhuvishy: wouldn't it have to for the email send to work? [01:36:09] hmmm [01:36:11] but sec i'll try pointing to real dir [01:40:20] madhuvishy: same error. it's almost like the classpath for java is wrong, or it's not finding the right jars for hive or something? [01:41:33] ebernhardson: hmmm, may be. what is the workflow id? [01:42:01] madhuvishy: oozie is 0116447-150922143436497-oozie-oozi-C the job it started and failed is application_1441303822549_243207 [01:42:39] ebernhardson: I see [01:42:42] https://www.irccloud.com/pastebin/3Qz5vlOz/ [01:43:02] madhuvishy: where you find that? [01:43:06] do you have access to hue? [01:43:16] if so [01:43:17] https://hue.wikimedia.org/oozie/list_oozie_workflow/0116448-150922143436497-oozie-oozi-W/?coordinator_job_id=0116447-150922143436497-oozie-oozi-C [01:43:23] wikitech creds [01:43:27] if not [01:43:50] oozie job -log 0116448-150922143436497-oozie-oozi-W [01:43:59] note that this is the Workflow id [01:44:20] coordinator launches multiple workflows, this is one of them, they end with C and W respectively [01:44:27] madhuvishy: it seems to like my ldap credentials [01:44:34] oh good [01:44:59] and indeed, end_year was replaced with year. i tried looking through oozie job -log but there is sooo much going on in there [01:45:13] yeah [01:45:24] hue makes it a little easier to look at, when it works [01:45:32] :) [01:46:04] ok fixed, killed and resubmitted [01:47:58] failed again i see [01:48:25] hue seems to have died for me too... just now [01:48:51] kevinator: hue always keeps dying [01:49:48] ebernhardson: I think the current error is [01:49:49] org.apache.oozie.action.ActionExecutorException: OozieClientException: org.apache.oozie.DagEngineException: E0738: The following 1 parameters are required but were not defined and no default values are available: location [01:50:02] just in case oozie is slow [01:54:05] found it, thats the add partition workflow which runs after spark. [01:55:48] ya looks like it [01:59:15] ok ran again, it looks like oozie thinks spark completes 'OK' even though it's failing [02:03:10] ebernhardson: yeah [02:03:11] i think it might be class path, i just booted up my job with spark-submit and it spits out a rediculously long list of jars, including (picked one at random): /usr/lib/hive/lib/hive-exec.jar [02:03:24] https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_243239/tasks/task_1441303822549_243239_m_000000/attempts/attempt_1441303822549_243239_m_000000_0/logs lists the class path used for the oozie job [02:03:29] and hive-exec.jar isn't in there anywhere [02:03:30] hmm [02:03:45] for the job kicked off by oozie i mean [02:04:07] i would look into the spark logs [02:04:36] i wonder if i should just have it use a shell script and use spark-submit ... would be the easiest way :) [02:05:22] the spark logs are the ones that say HiveConf not found (978 times) [02:06:12] still? [02:06:55] where are they? [02:08:03] looking, i didn't look at the most recent run [02:08:52] it would be in hdfs /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_243247 [02:09:31] thats the run by oozie, here is one by spark-submit (that i canceled before completing with ctrl-c): /var/log/hadoop-yarn/apps/ebernhardson/logs/application_1441303822549_243255 [02:11:36] oh these dont give you anything [02:12:01] you can also just access them like yarn logs -applicationId [02:13:59] i think i might have found it, spark-submit sources /usr/lib/spark/conf/spark-env.sh [02:14:16] that does: for f in ${GIVE_GOME}/lib/*;jar; do SPARK_CLASSPATH=$SPARK_CLASSPATH:$f;done [02:14:20] s/GIVE/HIVE/ [02:14:25] so, it force includes all the hive jar's [02:14:29] okay.. [02:15:41] that is put in place by puppet, i wonder if its on all the servers... [02:16:10] i'm sure it is. these errors feel to me like something else is going on [02:18:11] i'm pretty sure oozie's spark runner isn't using the spark-submit thing though, but only a theory from docs i've read i'm not at all sure [02:18:25] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark create a log4j.properties file like the one there, and put it on hdfs. [02:18:51] link it on spark-opts with extra arg --files hdfs://.../log4j.properties [02:20:01] ebernhardson: ^ [02:23:31] ok started 0116520-150922143436497-oozie-oozi-C with the log4j in place pointing to hdfs://analytics-hadoop/user/ebernhardson/spark.log [02:23:58] spark has now started [02:25:37] job now killed, but no log :( [02:26:08] ebernhardson: where is your log4j file? [02:26:57] madhuvishy: hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie/popularity_score/log4j.properties [02:29:35] currently launching with oozie job -Ddiscovery_oozie_directory=hdfs://analytics-hadoop/user/ebernhardson/discovery_oozie -Ddiscovery_data_directory=hdfs://analytics-hadoop/user/ebernhardson -config discovery_oozie/popularity_score/coordinator.properties -verbose -Dstart_time=2015-11-24T00:00Z -Dend_time=2015-12-01T00:00Z -Danalytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/2015-12-01T16.01.05Z--47ad904/oozie -run [02:29:50] hmmm that's weird [02:30:14] i have not tried using an hdfs directory as the spark log output directory though, although i doubt that's the problem [02:30:36] thats what the docs said to do for oozie: Putting the file on a temp directory on Hadoop and using a hdfs:// url should do the trick [02:31:23] ohh, i meant for the properties file [02:31:27] sorry i wasn't clear [02:32:45] ebernhardson: can you try just /tmp/spark.log - i can look into the actual worker machine and see if it shows up if you dont have access [02:32:55] sure [02:34:04] I remember I was gonna log a ticket to make file logging the default for spark, probably haven't done it. this should not be so hard [02:34:18] ok kicked off again to 0116536-150922143436497-oozie-oozi-C [02:36:01] looks to have run and died [02:36:14] the driver should have been analytics1038.eqiad.wmnet:8042 i believe [02:37:25] i think it wrote something, but the error is probably on an executor [02:37:40] hmm, i can re-run it with 2 executors instead of 48 [02:38:12] 48 was kinda chosen randomly...it kicks off ~1300 tasks to chew through initially [02:38:14] yeah, i'll look at yarn spark and see if it shows me which ones after you do [02:39:29] ok starting again as 0116562-150922143436497-oozie-oozi-C [02:40:28] kicked off application_1441303822549_243407 which is now finished (dead) [02:41:00] ok looking [02:44:10] looks like it should be 1043 or 1051 for the executors, 1031 for the driver [02:44:39] ebernhardson: [02:44:41] actually [02:44:45] if you look at yarn logs [02:44:47] it shows [02:45:13] http://pastebin.com/rPaVFF1e [02:45:57] yea there is one of those lines for each matching NoClassDefFoundError exception [02:46:53] well, almost. 978 NoClassDefFound and 980 java_gateway.py [02:48:19] ebernhardson: aah it exceeded the recursion depth when logging? [02:49:02] (there's nothing in any of the spark logs btw) they just say Driver commanded a shutdown [02:50:06] maybe, i'm not entirely sure :S the odd thing was i tried to lookup those lines and the only java_gateway.py in spark's v1.3.0 git tag is https://github.com/apache/spark/blob/v1.3.0/python/pyspark/java_gateway.py [02:50:29] but that has 111 lines, and our error comes from line 364, 369 and 483 :S [02:50:52] can probably disassemble this jar somehow i suppose... [02:52:30] hmmm [02:52:45] this all runs when you use pyspark submit? [02:52:57] not sure, i always used spark-submit [02:53:11] ah yeah spark-submit [02:54:34] my best guess is still because spark-submit is a shell script that sources /usr/lib/spark/conf/spark-env.sh, which includes all the hive jars in SPARK_CLASSPATH. otto added that via puppet [02:54:45] https://gerrit.wikimedia.org/r/#/c/203358/ [02:57:12] yeah but we have other spark jobs running via oozie, how is this different? [02:57:26] do they use HiveContext? [02:57:29] or maybe just SqlContext instead? [02:57:46] SqlContext is needed for plain data frames, HiveContext would be required to talk to hive metadata [02:58:37] ebernhardson: aah [02:59:05] that makes sense [02:59:13] may be it was never tested [02:59:24] mostly i'm only guessing because https://hue.wikimedia.org/jobbrowser/jobs/job_1441303822549_243332/tasks/task_1441303822549_243332_m_000000/attempts/attempt_1441303822549_243332_m_000000_0/logs lists java.class.path= [02:59:28] and that doesn't include hive [03:01:18] hmmm [03:01:27] we'd have to ask andrew/joseph [03:01:57] ok i'll check tomorrow. I appreciate all the help! i must have sucked up about 2 hours of your time now... [03:02:13] but i did learn a bunch of new commands to find things, and hue :) [03:02:14] helps a bunch [03:02:21] No problem! :) [06:00:08] Analytics, TimedMediaHandler, Wikimedia-Video: Record and report metrics for audio and video playback - https://phabricator.wikimedia.org/T108522#1847111 (Tbayer) See also T88775 BTW I guess that by "low-level stats", you mean the [[https://wikitech.wikimedia.org/wiki/Analytics/Data/Mediacounts |med... [07:12:57] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1847214 (RobLa-WMF) @ottomata: I don't know all of the details, but I think the ID idea is a good one. One //possible hitch//:... [08:51:13] Analytics-Backlog, Beta-Cluster-Infrastructure, Services, Scap3, WorkType-NewFunctionality: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1847358 (mobrovac) >>! In T116206#1846067, @Milimetric wrote: > @mobrovac: those sound like things I can do? Let me know if you've started on... [08:51:26] joal: ^^ [08:51:46] joal: doesn't need to be done today, but soon-ish would be really good [08:58:28] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1847388 (Lcanasdiaz) Producing new data. The 2nd bug was related with the period of time which was overwritten for some studies. [09:02:56] Analytics, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847396 (jcrespo) NEW a:Milimetric [09:04:17] Analytics, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847408 (ori) @Nuria, could you help plan out this work? [09:24:06] Analytics-Tech-community-metrics, Gerrit-Migration: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1847441 (Qgil) Bitergia's work is committed through monthly sprints, so all we know today is that this task wi... [09:56:42] (CR) Joal: [C: 2 V: 2] "Ottomata: see attached task comment: https://phabricator.wikimedia.org/T116772" [analytics/refinery] - https://gerrit.wikimedia.org/r/256027 (https://phabricator.wikimedia.org/T116772) (owner: Joal) [10:02:20] Analytics-Backlog, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1847524 (JAllemandou) Question on the previous link "reflection" section: What about bots? [10:23:20] hey mobrovac [10:23:42] Just saw your ping [10:24:15] About test data, I wonder what would be an easy way to go : for now everything is loaded through hadoop [11:43:23] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCR repository number mismatch on korma - https://phabricator.wikimedia.org/T116484#1847718 (Lcanasdiaz) Final test passed. We're resuming the gathering process and generating the new data. It should be in korma in a few of h... [12:11:05] Analytics-Backlog: Make EL mysql consumer's worker thread consume from the left side of the queue - https://phabricator.wikimedia.org/T120209#1847820 (mforns) NEW [12:12:00] Analytics-Backlog: Fix EL mysql consumer's deque push/pop usage - https://phabricator.wikimedia.org/T120209#1847833 (mforns) [12:12:11] Analytics-Backlog: Fix EL mysql consumer's deque push/pop usage {oryx} - https://phabricator.wikimedia.org/T120209#1847820 (mforns) [14:38:17] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848085 (Ottomata) Hm, not sure I follow. We are proposing that a schema be ID-able via a URI, and also remotely locatable if t... [15:11:28] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848209 (JanZerebecki) I think that only means that a client that gets a URL ending in '/' for an API should not... [15:17:33] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848214 (Ottomata) Hm, I think I see. We are coupling the URI to the ID, which according to the W3C should not be relied upon.... [15:18:09] (PS5) Milimetric: Archives are downloaded in .txt.gz format: fix matching and opening [analytics/wikistats] - https://gerrit.wikimedia.org/r/92066 (owner: Nemo bis) [15:18:19] (PS6) Milimetric: Move download limiter to proper place and comment it as it's only a test [analytics/wikistats] - https://gerrit.wikimedia.org/r/92056 (owner: Nemo bis) [15:18:39] Analytics-Tech-community-metrics, DevRel-December-2015: Explain / sort out / fix SCM repository number mismatch on korma - https://phabricator.wikimedia.org/T116483#1848215 (Lcanasdiaz) We had an error in our library so the process didn't finish correctly. This morning I took the code from the branch we w... [15:19:09] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1848216 (mobrovac) From my POV, the URL **is** the ID. [15:26:35] hi a-team! [15:26:39] Hey mforns [15:27:26] hiya [15:29:17] Analytics-Backlog: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848250 (Tbayer) NEW [15:29:48] a-team, I need to leave now and should be back for standup. In case I'm not (who knows), today I have debugged hte discovery team spark job [15:30:04] aha joal [15:30:05] The patch is not yet pushed, but will when I come back [15:30:20] ok [15:30:54] if you're not there I'll bring this up in standup, see you in a while joal [15:31:38] madhuvishy or others, could you give this a quick (https://phabricator.wikimedia.org/T120224 ) and maybe prod oozie a bit? The November numbers are needed for https://www.mediawiki.org/wiki/Wikimedia_Product#Reading which supposed to be updated today [15:45:23] Analytics-Backlog, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1848357 (Halfak) +1. I think bots and other tools may explain a lot of an increase in efficiency of content production. [15:57:21] Analytics-Cluster, operations: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848408 (BBlack) [15:57:41] Analytics-Backlog: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848410 (Nuria) p:Triage>High [15:59:14] Analytics-Cluster, operations: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848418 (BBlack) a:BBlack Yeah this is all the same issue and still present. I think @fgiunchedi is on the right track here about streaming, I'm going to write up a gene... [15:59:20] Analytics-Kanban: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848250 (Nuria) [16:00:40] HaeB: we will try to look at this within the day today but we might get to it tomorrow. [16:00:57] HaeB: i will ping you later on today [16:17:47] Analytics-Cluster, Traffic, operations, Patch-For-Review: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848484 (BBlack) [16:31:25] Analytics-Backlog, DBA: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1848516 (Milimetric) p:Low>Normal [16:37:26] joal: whenever you get back let me know if you have made a script to generate fake data for AQS. Otherwise I think that'd be useful for the test cluster [16:40:12] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1848553 (Milimetric) >>! In T112956#1846657, @kevinator wrote: > @SLaporte just suggested another feature that would make life easier: It would be nice if the T... [16:43:54] milimetric: yt? [16:45:29] in a hangout, nuria, I might be a little late for standup [16:45:36] k [16:46:09] Analytics-Tech-community-metrics, DevRel-December-2015, Patch-For-Review: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1848568 (Aklapper) p:Low>Normal [16:46:27] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1848570 (Aklapper) p:Lowest>Low [16:50:55] Analytics-Cluster, Traffic, operations, Patch-For-Review: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1848582 (BBlack) Open>Resolved This should be resolved now with the change above applied. I've tested the files from this ticket and... [16:55:51] great milimetric, I had forgotten about that ! [16:56:03] milimetric: It'll be perfect for the test clusterI assume :) [17:01:10] a-team: stadduppp [17:01:14] *standuppp [17:01:41] Morning ebernhardson :) [17:01:51] I have news for you (after standup and many meetings though) [17:02:21] be there in a sec... [17:03:00] start without me guys - bit late [17:03:27] (finishing up with wes...) [17:03:45] ottomata: k [17:03:50] ottomata: we will wait 2 mins [17:11:46] ottomata: so fun thing i found out about oozie + spark yesterday. Hive jars are included via spark-submit and other required environment variables are set when using the spark-submit shell script, but not when kicking off jobs via oozie [17:12:00] ottomata: any hints on where i would look to integrate those environment variables into oozie? [17:12:18] * ebernhardson fails at english...some day i will figure it out [17:13:04] alternatively, i could just make oozie run a shell script instead of the direct spark loader. And run spark-submit from the shell script [17:15:13] ebernhardson: if you give me an hour, I'll have some updates for you on the thing [17:15:36] joal: oh sweet, you can have a day i'm not in a rush :) [17:19:07] hmmm, the jars aren't included? that's weird. i coudl see maybe not the hive-site.xml file, which it probably needs [17:26:02] joal: dunno what is up with EL [17:26:09] mysql consumer logs say Data inserted [17:26:34] but latest data in Edit_13457736 is 20151202164718 [17:26:40] i can see events for Edit flowing in through files [17:26:43] all-events.log [17:26:59] looks good [17:27:00] 2015-12-03 17:24:36,322 (MainThread) Edit_13457736 queue is large or old, flushing [17:27:01] dunno.. [17:27:37] also, anyone know what changed here? [17:27:37] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=6&fullscreen [17:28:26] starting a few days ago insertAttempted rate gits all spikey [17:28:48] mforns: ?^ [17:28:52] gets* [17:28:56] ottomata, looking [17:29:40] the average is the same as inserted.rate, so it is probalby ok...but kinda weird, dunno what would have changed [17:30:01] is 2015-12-03 17:29:54,267 (MainThread) Sleeping 5 seconds [17:30:16] is that from the kafka consumption throttling? [17:30:21] ottomata, yes [17:30:25] the queue seems full [17:30:49] do we have more events coming in than we used to? [17:30:51] no [17:31:00] ottomata, is this sleep recent? [17:31:06] yes [17:31:08] happening constantly [17:34:57] !log restarting eventlogging on eventlog1001 [17:35:55] Analytics-Kanban: mobile_apps_uniques_monthly not updating - https://phabricator.wikimedia.org/T120224#1848798 (JAllemandou) Thanks for noticing that ! It actually shown us an error in this job execution: it was launched the 15th of the month instead of the 1st, skewing data half a month late. This means that... [17:36:02] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Checking_consumer_offsets [17:38:09] Analytics-Backlog: Eventlogging devserver needs maintenance - https://phabricator.wikimedia.org/T120245#1848811 (Nuria) NEW [17:42:27] ottomata: I realized I don't have access to eventlog1001! Do I have to make Ops-Access-Request? [17:44:19] whaa yeah for sure [17:44:27] ask for eventlogging-admins [17:44:55] mforns: how is it sleeping 1 seconds [17:44:56] ? [17:44:57] 2015-12-03 17:44:43,050 (MainThread) Sleeping 1 seconds [17:45:01] code looks like it has to sleep 5 [17:45:30] ottomata, it may be that when backfilling I changed my local code in /home/mforns [17:45:46] and executed: sudo python setup.py develop [17:45:59] develop? [17:46:26] yes, exaclty, I thought this would create a local build, but it seems it modified the system libs [17:46:30] hmm whoops [17:46:34] don't do sudo! :p [17:46:43] mmm [17:46:58] yeah wow, it is consuming very slow from kafka [17:47:14] why no burrow email!? [17:47:23] but the only thing I changed was the queue size [17:48:40] well, it wasn't running your code until i restarted just now [17:48:50] before it was still running the old code, because it was sleeping 5 [17:48:50] ottomata, yes [17:48:54] sure [17:48:56] Analytics-Kanban: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1848951 (Nuria) NEW [17:49:06] still not sure why its consuming so slow [17:49:08] and also [17:49:10] WHY NO BURROW EMAIL@ [17:49:12] GRRR [17:49:14] ottomata, let me install the current production code again [17:49:55] ottomata: yeah! i couldn't test that then, cos nothing was going wrong - may be it doesn't take comma separated emails [17:50:21] may be change it to just yours [17:50:25] or an-internal [17:50:35] OH! madhuvishyi think we aren't monitoring the right consumer group? [17:50:36] i can't think of anything else [17:50:39] mforns: ok [17:50:44] [email "madhuvishy@wikimedia.org, otto@wikimedia.org"] [17:50:44] group=eqiad,eventlogging-00 [17:50:44] group=eqiad,eventlogging-mysql-00 [17:50:44] group=eqiad,eventlogging-files-00 [17:51:02] identity=mysql-m4-master [17:51:04] ! :o [17:51:29] ottomata, may I restart eventlogging? [17:51:36] these are the wrong groups? [17:51:42] Analytics-Backlog, Analytics-Dashiki, Google-Code-In-2015, Need-volunteer: Vital-signs layout is broken - https://phabricator.wikimedia.org/T118846#1848985 (Aklapper) [17:52:32] yes [17:52:34] mforns: [17:53:03] madhuvishy: yes, [17:53:07] but burrow still says error false [17:53:09] for the rigiht one [17:53:14] fixing the config... [17:53:29] ottomata, done [17:53:45] Analytics-Backlog, Analytics-Dashiki, Google-Code-In-2015, Need-volunteer: Vital-signs layout is broken - https://phabricator.wikimedia.org/T118846#1849023 (Aklapper) https://codein.withgoogle.com/dashboard/tasks/4876388897128448/ [18:06:50] !log restarting eventlogging on eventlog1001 [18:07:21] Analytics-Backlog: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1849074 (Milimetric) [18:07:52] Analytics-Kanban: mobile_apps_uniques_monthly not updating {hawk} - https://phabricator.wikimedia.org/T120224#1849078 (Milimetric) a:JAllemandou [18:08:15] Analytics-Kanban: Backfill EL data for 2015-11-27 incident {oryx} - https://phabricator.wikimedia.org/T119981#1849082 (Milimetric) [18:09:40] ottomata, look at the consumer log, it's inserting veeery slowly [18:10:03] there's only one insertion thread, is that right? [18:10:04] 2015-12-03 18:09:35,832 (Thread-15 ) Data inserted 4000 [18:10:10] yes ottomata [18:10:48] Analytics-Cluster, Analytics-Kanban, Easy: Update client IP in webrequest table to use IP [5 pts] {hawk} - https://phabricator.wikimedia.org/T116772#1849099 (Milimetric) [18:10:49] Analytics-Kanban: Remove client_ip computation from refine - https://phabricator.wikimedia.org/T120105#1849098 (Milimetric) [18:11:14] Analytics-Kanban: Remove avro schema from jar [1 pts] - https://phabricator.wikimedia.org/T119893#1849108 (Milimetric) [18:11:20] Analytics-Kanban, Services: Response times pageview API. Dashboard . [8 pts] - https://phabricator.wikimedia.org/T119886#1849110 (Milimetric) [18:17:10] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1849158 (Nuria) p:Normal>High [18:17:43] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1849159 (Nuria) p:Triage>High [18:17:47] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1849160 (Milimetric) p:High>Normal [18:17:58] mforns: you should join @wikimedia-operations [18:18:33] ottomata, I'm there [18:18:57] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1779777 (Milimetric) p:Normal>High [18:19:46] Analytics-Backlog: Eventlogging devserver needs maintenance - https://phabricator.wikimedia.org/T120245#1849171 (Milimetric) p:Normal>Low [18:21:04] Analytics-Backlog: Change the Pageview API's RESTBase docs for the top endpoint - https://phabricator.wikimedia.org/T120019#1849183 (Milimetric) p:Triage>High [18:28:50] madhuvishy: @! [18:28:51] 2015-12-03 18:26:49 [ERROR] Failed to send email message:501 : malformed address: , otto@wikimedia.org> may not follow oh hooo [18:29:14] [email "<%= @to_emails %>"] [18:29:17] bad template fixing [18:29:25] hmm maybe not [18:29:32] Analytics-Backlog: Use a new approach to compute monthly top 1000 articles (brute force in hive doesn't work) - https://phabricator.wikimedia.org/T120113#1849249 (Milimetric) p:Triage>High [18:30:15] madhuvishy: do you know it takes multiple emails? [18:34:11] Analytics-Backlog, Datasets-Webstatscollector, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1849262 (Milimetric) We looked into this issue quite a bit as part of the data loading jobs for the Pageview API. The proble... [18:34:59] ottomata: no it doesn't say anything about multiple emails, I thought comma separated would work. [18:35:09] I'll open a bug upstream too [18:41:23] k i think we can render multipe notifier sections in the meantime [18:43:28] ottomata: I got an email! [18:43:41] a-team sorry for hard stop, my computer just crashed (never happened with this one before :) [18:43:44] oh! [18:43:54] mforns, milimetric : I leave you EL sdtuff ? [18:44:08] ebernhardson: I have time for you now :) [18:44:17] joal, yes I'm looking at it with ottomata [18:44:31] thanks a lot mfi [18:44:34] mforns: --^ [18:45:24] https://www.irccloud.com/pastebin/HuCOe5vc [18:45:56] nuria: the november mobile apps job as started - hadoop will take care of it - application_1441303822549_245239 [18:46:12] k [18:47:39] ottomata: it went from 12 partitions have errors to 6 partitions have errors [18:48:10] hm! [18:48:18] cool [18:48:19] looking too [18:48:20] i get them too! [18:48:45] huh madhuvishy thats becuase I just started another consumer?! [18:48:46] huh. [18:53:31] ebernhardson: plop ? [18:54:22] joal: here [18:54:27] great :) [18:54:29] was in stand up [18:54:31] batcave ? [18:54:43] oops, sorry ebernhardson, didn't mean to disturb :) [18:55:04] i don't bring my laptop to standup, i'm in the office both days :) [18:55:09] sure lemme grab a room [18:55:13] k [18:57:07] hm, madhuvishy, mforns, dunno about that burrow email, it is going back and forth about what it considers to be behind [18:57:26] Yeah [18:58:45] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849414 (Dzahn) [18:59:29] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849419 (Dzahn) looks like https works just fine and only a redirect is missing from http->https, will look into it [18:59:34] Quarry, Labs, Labs-Infrastructure, HTTPS: Quarry should be HTTPS-only - https://phabricator.wikimedia.org/T107627#1849420 (Dzahn) a:Dzahn [19:24:48] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849544 (Deskana) #discovery is already recording some survival time data: http://discovery.wmflabs.org/metrics/#survival [19:24:50] Analytics-Backlog, Datasets-Webstatscollector, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1849545 (Tbayer) Thanks Dan! I'm still curious about the actual reason and whether it could affect either the validity of our... [19:29:18] back laters... [19:34:59] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849603 (JKatzWMF) @deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions? Is it session length or how long a user stays on the page? [19:39:55] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1849637 (Deskana) >>! In T119352#1849603, @JKatzWMF wrote: > @deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions? The population that this data is recorded for is everyone who arrives... [19:39:57] (PS10) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the ProgramMetrics API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [19:40:31] a-team, I update the Spark doc to ease its use with oozie: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Spark [19:43:08] thanks! [19:43:43] ebernhardson: Still waiting for my last test to finish, then push :) [19:44:40] milimetric: I might have broken wikimetrics-staging for a bit. fixing but it's fine if it's broken for a little while right? [19:44:54] madhuvishy: of course, no prob [19:44:56] okay [19:45:14] madhuvishy: of course! [19:46:03] (PS11) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the ProgramMetrics API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) [19:53:16] milimetric: back up [19:53:19] and working [19:53:20] https://metrics-staging.wmflabs.org/reports/program-metrics/create/ [19:53:27] invalid cohorts will fail though [19:53:45] k madhuvishy, that's 11 minutes of downtime, so you have to save me 11 cookies [19:53:49] :P [19:53:51] :D [19:53:56] done [19:54:07] no worries, it's good to just demo the flow and we can debug the invalid cohorts later [19:54:12] yeah [19:54:33] (today, of course, but after your meeting with Amanda if you don't have time after metrics) [19:54:55] milimetric: yup i'll ping you if i have time in between lunch and that [20:03:30] mforns: is there an update on EL? ops channel has a lot of conversations... [20:03:53] nuria [20:04:17] ottomata spawned another consumer in parallel [20:04:28] and it seemed to have a positive effect [20:05:08] ... ahem... but that doesn't explain what is going wrong now ... [20:05:23] if you look at grafana, you'll see that the inserted.rate is higher than the Sum of Schema Events [20:05:43] nuria, hehe, no the problem is with the database [20:05:55] mforns: ya, now we are catching up [20:06:11] jaime said that the db is seeing a lower insert rate since a couple days [20:06:19] mforns: aham [20:06:42] it may be related with the disk of m4-master machine filling up [20:07:06] mforns: ya, taht was going to be my next question [20:07:09] *that [20:07:30] ebernhardson: Pushed a patch, hopefully ok :) [20:07:34] nuria, we didn't define any action items on the db though [20:07:34] Need to go for tonight [20:07:38] Bye a-team ! [20:07:43] joal, bye! [20:07:46] See you tomorrow :) [20:07:53] mforns: scala maybe tomorrow? [20:08:02] joal, yes! np :] [20:08:07] mforns: probably depends on EL I guess [20:08:12] aha [20:08:13] :] [20:08:14] Arf anyway :) [20:08:47] nuria, we're still following the catching-up of the db [20:08:56] nite! [20:08:56] mforns: ok, will talk with otto to see if we need to follow up cause it sounds like we might [20:09:19] andrew submitted a patch to improve the consumer logs, and I suggested a couple changes [20:09:47] nuria, yes, and maybe we need to have a look at mysql alternatives :/ [20:10:54] mforns: mmm... i do not see how we can be on the limits on mysql with EL [20:12:22] nuria, right now the consumers are inserting full speed, right? the bottleneck is the database, and we're inserting slightly over the incoming events [20:12:35] so, I'd say we are close to the limit, no? [20:12:43] mforns: database speed was fine not that long ago [20:13:07] (hi just joining) [20:13:13] mforns: i think nuria is right, because when I started up that 2nd consumer, inserts doubled [20:13:51] ottomata, aha, so.. then we should consider having various consumers [20:14:08] well, it would be good to know why one can't insert faster, but yes, i think we should [20:14:42] nuria, ottomata, and I thought also... now that we have kafka, it makes no sense to have an insertion queue inside python [20:14:53] we built that because zmq was loosing events [20:15:14] now we could just block consumption from kafka [20:15:20] and have a single thread [20:15:29] no? [20:15:40] yea for sure [20:15:44] agree [20:16:06] you could just batch consume N events, and then insert, and then do another N [20:16:07] etc. [20:16:33] ottomata, yes, we should continue batching by schema, but no need of queueing [20:16:44] ya [20:17:36] and the other thing that would help a lot is avoiding to split the batches because some fields are not defined, this would also improve the insertion speed [20:18:41] dunno what that is about, but ok! :) [20:18:51] hehe [20:19:00] mforns: it is pretty easy to puppetize more consumer processes now [20:19:01] if we want to [20:19:07] oh cool! [20:20:11] btw ottomata I wrote some comments on the patch [20:22:06] mforns, ottomata : the way i see it if a 2nd consumer fixes things problem is not on db [20:22:24] mforns, ottomata but it is an OK short term patch,. do not get me wrong [20:22:31] very Ok [20:22:47] nuria, yes [20:23:03] mforns: cool [20:23:10] like coments, just amended [20:23:20] nuria: agree [20:23:22] but still the db is slower than before, insertion statements with 3000 events are taking 20 seconds... [20:23:25] nuria, ^ [20:23:27] that's true. [20:23:32] hm [20:23:35] yeah iunnooo [20:23:48] mforns: are we sure that many events used to be faster [20:23:48] ? [20:23:51] it usually doesn't get that high, eh? [20:24:14] mforns: do we have anywhere written numbers from before? i wrote the testing i did here [20:24:55] nuria, ottomata, mmm I don't know for sure about previous performance [20:25:06] maybe we can look at the log archive [20:25:10] for the consumer [20:25:43] mforns: i just amended the patch again with some wording changes. since table name actually comes from get_table, i'd rather not say it is a table [20:25:50] mforns: my prior numbers [20:25:52] now i just say 'inserted 1234 Edit_12344 events' [20:25:52] etc. [20:25:54] https://www.irccloud.com/pastebin/lZl9uNKl/ [20:25:57] in vanadium [20:26:05] ah ok [20:26:05] cool [20:26:12] then i highly doubt 3000 should be 20+ seconds [20:26:20] mforns: but again [20:26:25] that was maybe tricky [20:26:29] the way i calculated that [20:26:45] mforns: let's add timing to the log line! :) [20:26:54] aha [20:28:25] ottomata, there is a missing ' in the end of the first log [20:28:44] the second is cool! [20:28:50] Analytics-Backlog: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1849821 (Nuria) [20:29:36] woops [20:29:41] patchy too falsty [20:29:43] fasty [20:30:59] there mforns [20:31:25] nuria, yes, in the back of my head I had more or less the same numbers for insertion times [20:31:57] mforns: if db is slower now due to storage limitations we should address those, let me see where was the ticket for that [20:32:11] nuria: i'm not so sure about the db being slower yet [20:32:14] if we merge this patch it hink it will tell us [20:32:21] ottomata: k, [20:32:26] sure, lets do that [20:32:48] milimetric: got time now? [20:34:56] ottomata, in the second log, we'll need scid[0], scid[1] instead of *scid, this can only be the last parameter [20:35:18] k [20:35:20] :) [20:35:21] glad you are here [20:35:24] spot checking my crap [20:35:24] ehhe [20:35:38] xD [20:36:16] well, I pushed and popped from the same end of the queue [20:36:19] :] [20:36:26] mforns: i'm going to just do indexes on the firs tlog too [20:36:32] instead of * [20:36:37] ottomata, cool [20:36:51] ok amended [20:40:25] page view api grafana thing has advanced a bit more https://usercontent.irccloud-cdn.com/file/jCgA22ju/ [20:40:54] addshore: coool [20:41:02] ottomata, the time will be negative, it should do: time.time() - insert_started_at, instead [20:41:16] anyone any idea if there is a grafana labs instance? [20:41:36] http://grafana.wmflabs.org/ [20:41:38] i think [20:41:49] hahaha [20:41:51] not sure if it's being maintained [20:41:51] mforns: GEEZ [20:41:55] i am just firing away her earen't i [20:41:57] oh cool! [20:41:59] i should read the code before i git review it [20:42:14] ottomata, hehehe [20:42:16] :D [20:42:45] mforns: ok amended again [20:42:46] WHAT NOW?! [20:43:09] xD [20:44:24] ottomata, merged! [20:44:26] :] [20:46:25] k mforns, deploying... [20:51:18] mforns: just restarted eventlogging with change [20:51:22] only one mysql consumer proc running now [20:51:30] it queues a lot [20:51:32] ok cool! [20:51:32] 5000 at a time [20:51:38] for lots of queues [20:51:49] yup [20:51:49] 2015-12-03 20:51:45,984 (Thread-15 ) Inserted 4855 MobileWebUIClickTracking_10742159 events in 43.245376 seconds [20:52:02] fuuuu [20:52:29] it looks like it is constantly queeing 5000 events [20:52:40] huge backlog [20:52:55] yeah, i mean that makes sense [20:53:03] ottomata, yes, we should fix the queue size, I'll submit a patch [20:53:04] but, nuria, if the actual insert call takes 43 secondw [20:53:05] s [20:53:07] ... [20:53:10] yes [20:53:13] that indicates it is a db thing, ja? [20:53:53] ottomata: mmmm [20:54:23] ottomata: taht table must be huge, what about others? [20:54:27] *that [20:54:36] there is only one insert thread [20:54:46] yea others are smaller [20:54:46] 2015-12-03 20:54:16,640 (Thread-15 ) Inserted 83 Edit_13457736 events in 0.648424 seconds [20:55:04] 2015-12-03 20:54:33,841 (Thread-15 ) Inserted 1749 Edit_13457736 events in 16.056973 seconds [20:55:09] dunno [20:55:33] 2015-12-03 20:54:46,559 (Thread-15 ) Inserted 39 Edit_13457736 events in 5.092891 seconds [20:55:35] varies [20:55:36] ottomata: have interview in 5 mins SORRY, will catch up, but edit table is also huge [20:55:39] k [20:55:41] ottomata, the too big queue is filling up the memory, I changed its size, may I restart? [20:56:01] mforns: yes [20:56:04] ok [20:56:05] well, one time 83 events took 0.6 seconds, then next time 39 events took 5 seconds [20:56:44] done [20:56:58] still seeing Queueing 5000 [20:57:00] that right? [20:57:34] ottomata, yes [20:57:44] but the queue size instead of 1000 is 100 [20:58:31] now the process should consume up to 10% of memory only [21:00:05] before it was queueing up to 1000 batches of 5000 events each, now only 100 batches of 5000 events each [21:00:09] ohhhh [21:00:13] queue size == # of batches? [21:00:17] yes [21:00:20] <3 new pageviews API: https://upload.wikimedia.org/wikipedia/commons/b/b8/Pageviews_ORES_and_Revscoring.svg [21:00:21] hm, that is a misleading name :) [21:00:25] yes [21:00:36] :) [21:00:37] halfak: :D [21:00:39] names are really bad... sorry [21:00:46] I had to hack on the JS to get meta working :) [21:00:50] Was easy :D [21:02:33] yeah, mforns, i thikn there must be something wrong with db [21:02:35] 2015-12-03 21:02:14,650 (Thread-15 ) Inserted 4842 MobileWebUIClickTracking_10742159 events in 38.798655 seconds [21:02:36] too long. [21:02:42] yes [21:02:52] mforns: how about I puppetize more processes to get around it? :D [21:03:02] maybe db will hurt, but then someone will notice :) [21:03:03] ottomata, at this rate, we could never have inserted 200 events per second [21:03:04] heheh [21:03:06] yeah [21:03:26] ottomata, yes, I think this is a great idea, even when db goes back to normal [21:03:37] k [21:03:38] how many? [21:03:56] don't know... 3? [21:04:08] lets do 4 [21:04:12] ok :] [21:05:39] Analytics-EventLogging, Deployment-Systems, Scap3: Move EventLogging service to scap3 - https://phabricator.wikimedia.org/T118772#1850014 (greg) [21:10:09] Analytics-EventLogging, Deployment-Systems, Scap3: Move EventLogging service to scap3 - https://phabricator.wikimedia.org/T118772#1850031 (Ottomata) eventlogging-service is using systemd, and we want to port all of eventlogging to Jessie and systemd sometime in the not too distant future. [21:10:27] mforns: https://gerrit.wikimedia.org/r/#/c/256755/1 [21:16:52] mforns: there they gooooo [21:17:18] ottomata, aha [21:17:34] I was reading the code, but couldn't do a proper review [21:17:38] s'ok [21:17:42] just wanted you to see it [21:17:50] i'm watching mem on eventlog1001 [21:17:52] ok [21:18:02] since there are now 4 procs doing the same queueing [21:18:22] yes... I didn't think of that [21:18:29] oh! [21:18:48] and the code is reset to master, which has queue_size = 1000 [21:18:53] oh, it is? [21:18:56] yes [21:19:03] why? didn't you setup.py install? [21:19:21] ottomata, if I execute eventloggingctl stop / start, it will restart the 4 consumers? [21:19:22] hmm, it seems to be stablizing now [21:19:23] mem usage [21:19:25] yes [21:19:46] ottomata, oh no, the code is ok = 100 [21:20:05] cool [21:20:59] yeah def big pauses in inserts, i think whatever is happenign with large inserts on the master is what was slowing things down [21:21:10] aha [21:21:44] !log restarted eventlogging with 4 mysql consumer processes running in parallel [21:24:29] ottomata, the effects are already in grafana [21:25:21] oh? [21:25:21] showme [21:25:21] the el dash? [21:25:21] looking [21:25:31] hehe yeah [21:25:39] 13k attempted! [21:26:09] ottomata, that's what fits in the 4 queues [21:26:21] 100 * 5000 * 4 = 2000000 [21:26:24] aye [21:26:48] but they didn't get inserted yet [21:27:05] mforns: yeah, just looking at logs, its clear that long running (large?) inserts into certain tables are bottlenecking the rest of the data [21:27:05] we could make the eventlogging-valid-mixed topic keyed by schema [21:27:09] yea [21:27:15] yes [21:27:18] meh, that wouldn't really help [21:27:19] hm [21:27:29] it would help in that all say, Edit events would go to the same process [21:27:33] oh, I don't know about the topic [21:27:35] and be queued togeterh [21:27:39] together [21:27:44] might be a little ore efficient [21:27:47] I see [21:27:58] but, there wouldn't be any way to keep big tables together [21:28:07] so, no matter, what, probably any given process would have some bottlenecker [21:28:09] :/ [21:28:16] meh, we should just figure out what's up with db [21:28:21] aha [21:28:43] ottomata, jaime has plans to: keep the whole data in the replica [21:28:53] and auto-purge the m4-master after 90 days [21:29:09] this will keep tables relatively "small" [21:29:26] and put a threshold on disk usage [21:29:37] I mean limit [21:36:24] mforns: overall.inserted.rated is up [21:36:43] around 500/ sec [21:37:01] guess attempted will be spikey til it catches up while it fills queues [21:38:01] ottomata, if you look at last 30 minutes, attempted is not spikey any more, it already stabilized [21:40:20] wth [21:40:21] milimetric: hey [21:40:26] hey madhuvishy [21:40:28] debug? [21:40:34] attempted is almost 0 [21:40:35] still in the meeting with amanda [21:40:40] oh [21:40:46] i'm around later, just ping me [21:41:01] milimetric: she mentioned that she thought the wikipage stuff would be done by 12th [21:41:12] wikipage? [21:41:19] ya the template parsing [21:41:27] and the results populating on the page [21:41:30] of the event [21:42:15] hm, but that couldn't be anyway because we talked to her about needing volunteers to build that into a gadget [21:42:44] like, no matter what we did we'd have to work with some sort of mediawiki dev [21:43:09] mforns: that's because queues are full i thikn [21:43:10] no? [21:43:30] yes, but some inflow is expected, even if very small [21:43:40] there's alittle here and there [21:46:28] there's another attempted peak... O.o [21:46:48] milimetric: can you join us for 10 minutes? [21:47:32] madhuvishy: sure! [21:47:40] batcave? [21:47:47] yeah [22:07:36] FYI 11:07 PM Labs, Graphite: Install WmfPageview datasource plugin on Labs Grafana install - https://phabricator.wikimedia.org/T120298#1850439 (Addshore) [22:14:41] madhuvishy: this is the metric that got abandoned: https://gerrit.wikimedia.org/r/#/c/174773/ [22:19:55] Analytics, MobileFrontend: Make MobileWebUIClickTracking schema usable (too big) - https://phabricator.wikimedia.org/T108723#1850605 (Jdlrobson) What needs to happen here? [22:22:17] milimetric: i really do not want to add a wiki ui on top of wikimetrics [22:22:33] nuria: batcave? I'm just about to talk about related-ish things with madhu [22:26:35] laters all! [22:26:49] bye! [22:39:53] Analytics-Kanban: Create a set of celery tasks that can handle the global metric API input {kudu} [0 pts] - https://phabricator.wikimedia.org/T117288#1850738 (Milimetric) [22:44:07] bye a-team! see you tomorrow [22:45:46] ciao [22:52:50] (Restored) Milimetric: Add pages edited metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/174773 (https://bugzilla.wikimedia.org/73072) (owner: Mforns) [22:53:08] (PS2) Milimetric: Add pages edited metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/174773 (https://bugzilla.wikimedia.org/73072) (owner: Mforns) [22:53:56] (CR) jenkins-bot: [V: -1] Add pages edited metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/174773 (https://bugzilla.wikimedia.org/73072) (owner: Mforns) [23:20:49] (PS3) Milimetric: Add pages edited metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/174773 (https://bugzilla.wikimedia.org/73072) (owner: Mforns) [23:51:26] milimetric: do you have ideas on how to create invalid cohorts to test