[02:03:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Pchelolo) Implementing a new event in the current #eventbus system made me think about a fairly random idea. We have a... [07:26:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10mobrovac) I like this idea, with the exception of making these classes JSON-serialisable. These objects may (and probab... [08:10:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10JAllemandou) Many thanks @Ottomata ! Those notebooks are awesome :) [08:15:00] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Can't write from Spark to local FS - https://phabricator.wikimedia.org/T200609 (10JAllemandou) Hi @GoranSMilovanovic , please excuse me I meant to follow-up on this task but then forgot... I think @Milimetric is right... [10:25:56] 10Analytics, 10EventBus, 10Product-Analytics, 10MW-1.32-release-notes (WMF-deploy-2018-08-14 (1.32.0-wmf.17)), and 2 others: Load change tags into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) Thanks @Pchelolo! This will be super helpful for us.... [10:56:52] I tried to look into the varnishkafka log producer alarm, but I have no permits to ssh into cp2022, see grafana: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=cp2022&var-network=eth0&from=now-3h&to=now [11:01:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10diego) @Ottomata for example this: ``` # # Counting Links in Wikipedias using a parquet DUMP #Setup import pandas as pd sqlContext.sql('use wmf') # Define UDF from pys... [11:01:51] mforns: Same for me - I think we should either ask in ops chan, or wait for ottomata [11:03:21] joal, do you know what exactly is the webrequest varnishkafka log producer? being it out, means that we're loosing irrecoverable webrequests? [11:03:34] mforns: I think so yes [11:03:42] yes, then we should ping ops [11:05:54] pinged them [11:06:56] Thanks mforns - Following with you over there [11:09:32] ok mforns - nothing to worry about - Thanks for having checked! [11:09:55] yea, cool, thank you too! [13:23:27] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Ottomata) Maybe instead of JSON-serializable, they could be array (object/dict) serializable? Or have a toArray functi... [13:40:05] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Nuria) I second @mobrovac concerns. I think that mixing entities (user) with events (revision-create) it is likely to... [13:55:53] 10Analytics, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578 (10Tbayer) >>! In T193578#4420882, @Tbayer wrote: >>>! In T193578#4393587, @fdans wrote: >> Ping @Tbayer let's mark this as resolved? > We got a lot of good info... [14:19:21] hi joal [14:19:59] do you know if there is a jupyter command to shutdown a kernal? So I can leave proccess runing and add a line code at the end saying, when this finish, shutdown the kernel [14:21:28] dsaez: Hi ! [14:21:37] dsaez: if there is one I don't know it [14:22:14] dsaez: python kernels are not super important - Spark are a bit more anoying for not-facilitating job management [14:22:30] dsaez: You could possibly explicitely close the spark session at the end? [14:22:57] joal: let me check, I think there is a spark.close() or something like this [14:23:17] joal dsaez: see https://github.com/jupyter/notebook/issues/1880 & https://github.com/jupyterlab/jupyterlab/issues/4775 ? [14:23:43] bearloga: excelent ! [14:23:46] thx [14:29:09] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) My apologies. I incorrectly assumed that wikimedia.org.il was the same as il.wikimedia.org. Now that I understand the situation, @Nuria is right here: the best course of action is to deploy a separat... [14:29:17] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) 05Open>03declined [14:31:48] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Nuria) Also, wikimedia.org.il is not a site hosted by WMF, again, I think your best option is an install of piwik on labs. [14:32:27] dsaez: spark.stop() [14:32:27] ? [14:32:41] seems to do it! [14:38:43] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 3 others: Fix "score_schema" -- invalid JSON Schema - https://phabricator.wikimedia.org/T197828 (10Halfak) [14:48:27] ottomata ...yep that also works.. [14:51:44] dsaez: so the memory problem you are having is not a jupyter related one [14:51:47] i can repro on the CLI too [14:51:52] we should solve there first [14:51:59] ottomata: was trying that as well :) [14:52:04] but, there is a larger problem of not being able to set spark settings in the notebook [14:52:08] i'm trying to figure that out [14:52:11] i can *kinda* do it [14:52:19] I understand that is general problem with pyspark [14:52:27] the overhead memory is used to perform the python operations [14:52:32] by setting environemnt variables in my .jupyter/jupyter_notebook_config.py file, and then restarting my jupyter server [14:52:33] outside spark [14:52:40] aye [14:52:46] i've increased both executor memory and overhead [14:52:48] 8G and 4G [14:52:52] still getting same error [14:53:03] i'm going to make a separate task for your problem [14:53:14] ok [14:53:21] ottomata I have the feeling it's a driver issue [14:54:51] nope [15:00:11] 10Analytics: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10Ottomata) p:05Triage>03Normal [15:00:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) [15:01:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) [15:01:36] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) Diego's problem is a larger pyspark issue, not related to Jupyter Notebooks. I've created {T201519} to track it. [15:01:59] ping ottomata standduppp [15:39:16] a-team: to the batcave (if you wanna talk about the community health initiative dashboard project (https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit, )) [15:39:42] milimetric: we still are da cave :D [16:18:19] ottomata: I'm gone for diner, but will be bak after - Will you have a minute to talk about the WikiDump data reading issue? [16:18:30] ottomata: and notebook for revscores [16:31:11] joal sure [16:31:15] sounds grrr8 [16:38:22] blog post up yeehaw [16:38:22] https://wikimediafoundation.org/2018/08/08/eventstreams-updates/ [16:45:24] ottomata: pinged them about missing image [16:45:32] ottomata: do you see it? [16:46:49] ah yes it is missing [16:47:34] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Framawiki) >>! In T199046#4486794, @Nuria wrote: > @Itzike would you consider installing piwik in a lab hosts and maintaning it yourself? My main concerns here are not so much priorities or traffic but rather acc... [16:50:57] (03CR) 10Ottomata: "OO yes much better with copy" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata) [16:51:07] (03PS5) 10Ottomata: Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) [16:53:11] (03PS6) 10Ottomata: Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) [17:48:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) > Need to be able to set custom Spark settings before Kernel is launched I'm not sure we can do this. :/ I'm trying to figure some nice way to allow a user in th... [18:44:57] Heya ottomata - I'm ready when you want :) [18:45:35] heyaaa [18:45:36] gr8 [18:45:37] bc? [18:45:38] joal: ? [18:45:42] yay ! owm [19:05:45] (03PS4) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) [19:23:36] ottomata, I have an issue with the EL sanitization spark job email alert, because spark logs regular stuff to stderr, and would trigger the alert email even if no failure... [19:34:54] maybe we could add sth like `result=$? ; if [ $result -ne 0 ]; then echo "error"; fi` to the end of the command...? [19:35:26] (having muted stderr before - 2>&1) [19:37:00] maybe we can pass config parameters for log4j? [20:08:16] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) After a quick h-o with @Ottomata and @JAllemandou we've understood that the `/precache` endpoint used to produc... [20:19:25] (mforns sorry in hangout with joal) [20:26:04] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) That's roughly right. Precache will always produce ORES native format that has been designed for JSON tool devel... [20:30:20] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design, 10Readers-Web-Kanbanana-Board: [Spike 8hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10Jdlrobson) > number of sections opened/closed per page (we could potentially use the section usage schema: https://meta.wik... [20:37:42] mforns: don't remember how the job is launched by puppet [20:37:45] does it use a wrapper script? [20:37:47] the refine jobs do [20:37:54] if so, then def, that would be easy [20:38:04] also, this probably falls under the larger cronspam cleanup we wanted to do [20:38:11] can't remmebe rif we made that a goal this quarter or not [20:38:23] using something like https://habilis.net/cronic/ [20:39:48] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Sooooo, could we make a /v4/precache endpoint that does this? [20:41:44] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) Ok, I revoke the idea of having this in ores - @Halfak said that within an API version regardless of the model... [20:43:59] ottomata, not sure if it uses a wrapper script, I think not, it just calls spark-submit [20:44:38] cronic looks good, but do we have it available? [20:51:28] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) And also, looking into native ORES response, like https://ores.wikimedia.org/v3/scores/enwiki/854077897 the onl... [21:07:57] mforns: not really, but we wanted to do someting about these cron emails [21:10:37] mforns: i gotta run, soryr, let's talk more tomrorow [21:10:43] driving back to NYC tonight, 6.5 h drive yyyuuuuck [21:10:50] sure ottomata np! [21:11:03] oooou, drive safe! [21:18:01] 10Analytics: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10JAllemandou) We did some tests in PySpark CLI with @Ottomata this evening and found memory settings that work (with some minor changes in code). Job succeeded for both Pyspark and Scala-shell with... [21:18:13] Off for tonight a-team - See you tomorrow :) [21:18:24] byeeee [21:23:51] OMG !!! https://dawn.cs.stanford.edu/2018/08/07/sparser/ [21:23:54] Impressive :) [21:28:38] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) FWIW, I am interested in supporting a super-basic format, but I don't want to call it v4 because it will be a dow... [21:29:58] !log Webrequest data-loss warnings for upload and text for hours 2018-08-08-18 were contained only false positive (possibly related to network glitch ?) [21:30:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log