[02:03:33] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Pchelolo) Implementing a new event in the current #eventbus system made me think about a fairly random idea. We have a...
[07:26:30] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10mobrovac) I like this idea, with the exception of making these classes JSON-serialisable. These objects may (and probab...
[08:10:44] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10JAllemandou) Many thanks @Ottomata ! Those notebooks are awesome :)
[08:15:00] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Can't write from Spark to local FS - https://phabricator.wikimedia.org/T200609 (10JAllemandou) Hi @GoranSMilovanovic , please excuse me I meant to follow-up on this task but then forgot... I think @Milimetric is right...
[10:25:56] <wikibugs_>	 10Analytics, 10EventBus, 10Product-Analytics, 10MW-1.32-release-notes (WMF-deploy-2018-08-14 (1.32.0-wmf.17)), and 2 others: Load change tags into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) Thanks @Pchelolo! This will be super helpful for us....
[10:56:52] <mforns>	 I tried to look into the varnishkafka log producer alarm, but I have no permits to ssh into cp2022, see grafana: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=cp2022&var-network=eth0&from=now-3h&to=now
[11:01:37] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10diego) @Ottomata for example this:   ```  # # Counting Links in Wikipedias using a parquet DUMP   #Setup import pandas as pd sqlContext.sql('use wmf')   # Define UDF from pys...
[11:01:51] <joal>	 mforns: Same for me - I think we should either ask in ops chan, or wait for ottomata 
[11:03:21] <mforns>	 joal, do you know what exactly is the webrequest varnishkafka log producer? being it out, means that we're loosing irrecoverable webrequests?
[11:03:34] <joal>	 mforns: I think so yes
[11:03:42] <mforns>	 yes, then we should ping ops
[11:05:54] <mforns>	 pinged them
[11:06:56] <joal>	 Thanks mforns - Following with you over there
[11:09:32] <joal>	 ok mforns - nothing to worry about - Thanks for having checked!
[11:09:55] <mforns>	 yea, cool, thank you too!
[13:23:27] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Ottomata) Maybe instead of JSON-serializable, they could be array (object/dict) serializable?  Or have a toArray functi...
[13:40:05] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Nuria)  I second @mobrovac concerns. I think that mixing entities (user) with events (revision-create)  it is likely to...
[13:55:53] <wikibugs_>	 10Analytics, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578 (10Tbayer) >>! In T193578#4420882, @Tbayer wrote: >>>! In T193578#4393587, @fdans wrote: >> Ping @Tbayer let's mark this as resolved? > We got a lot of good info...
[14:19:21] <dsaez>	 hi joal
[14:19:59] <dsaez>	 do you know if there is a jupyter command to shutdown a kernal? So I can leave proccess runing and add a line code at the end saying, when this finish, shutdown the kernel
[14:21:28] <joal>	 dsaez: Hi !
[14:21:37] <joal>	 dsaez: if there is one I don't know it
[14:22:14] <joal>	 dsaez: python kernels are not super important - Spark are a bit more anoying for not-facilitating job management
[14:22:30] <joal>	 dsaez: You could possibly explicitely close the spark session at the end?
[14:22:57] <dsaez>	 joal: let me check, I think there is a spark.close() or something like this
[14:23:17] <bearloga>	 joal dsaez: see https://github.com/jupyter/notebook/issues/1880 & https://github.com/jupyterlab/jupyterlab/issues/4775 ?
[14:23:43] <dsaez>	 bearloga: excelent ! 
[14:23:46] <dsaez>	 thx
[14:29:09] <wikibugs_>	 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) My apologies.  I incorrectly assumed that wikimedia.org.il was the same as il.wikimedia.org.  Now that I understand the situation, @Nuria is right here: the best course of action is to deploy a separat...
[14:29:17] <wikibugs_>	 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Milimetric) 05Open>03declined
[14:31:48] <wikibugs_>	 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Nuria) Also, wikimedia.org.il is not a site hosted by WMF,  again, I think your best option is an install of piwik on labs.
[14:32:27] <ottomata>	 dsaez: spark.stop()
[14:32:27] <ottomata>	 ?
[14:32:41] <ottomata>	 seems to do it!
[14:38:43] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 3 others: Fix "score_schema" -- invalid JSON Schema - https://phabricator.wikimedia.org/T197828 (10Halfak)
[14:48:27] <dsaez>	 ottomata ...yep that also works.. 
[14:51:44] <ottomata>	 dsaez:  so the memory problem you are having is not a jupyter related one
[14:51:47] <ottomata>	 i can repro on the CLI too
[14:51:52] <ottomata>	 we should solve there first
[14:51:59] <joal>	 ottomata: was trying that as well :)
[14:52:04] <ottomata>	 but, there is a larger problem of not being able to set spark settings in the notebook
[14:52:08] <ottomata>	 i'm trying to figure that out
[14:52:11] <ottomata>	 i can *kinda* do it
[14:52:19] <dsaez>	 I understand that is general problem with pyspark
[14:52:27] <dsaez>	 the overhead memory is used to perform the python operations
[14:52:32] <ottomata>	 by setting environemnt variables in my .jupyter/jupyter_notebook_config.py file, and then restarting my jupyter server
[14:52:33] <dsaez>	 outside spark
[14:52:40] <ottomata>	 aye
[14:52:46] <ottomata>	 i've increased both executor memory and overhead
[14:52:48] <ottomata>	 8G and 4G
[14:52:52] <ottomata>	 still getting same error
[14:53:03] <ottomata>	 i'm going to make a separate task for your problem
[14:53:14] <dsaez>	 ok
[14:53:21] <joal>	 ottomata I have the feeling it's a driver issue
[14:54:51] <joal>	 nope
[15:00:11] <wikibugs_>	 10Analytics: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10Ottomata) p:05Triage>03Normal
[15:00:42] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata)
[15:01:08] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata)
[15:01:36] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) Diego's problem is a larger pyspark issue, not related to Jupyter Notebooks.  I've created {T201519} to track it.
[15:01:59] <nuria_>	 ping ottomata standduppp
[15:39:16] <milimetric>	 a-team: to the batcave (if you wanna talk about the community health initiative dashboard project (https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit, ))
[15:39:42] <joal>	 milimetric: we still are da cave :D
[16:18:19] <joal>	 ottomata: I'm gone for diner, but will be bak after - Will you have a minute to talk about the WikiDump data reading issue?
[16:18:30] <joal>	 ottomata: and notebook for revscores
[16:31:11] <ottomata>	 joal sure
[16:31:15] <ottomata>	 sounds grrr8
[16:38:22] <ottomata>	 blog post up yeehaw
[16:38:22] <ottomata>	 https://wikimediafoundation.org/2018/08/08/eventstreams-updates/
[16:45:24] <nuria_>	 ottomata: pinged them about missing image
[16:45:32] <nuria_>	 ottomata: do you see it?
[16:46:49] <ottomata>	 ah yes it is missing
[16:47:34] <wikibugs_>	 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Framawiki) >>! In T199046#4486794, @Nuria wrote: > @Itzike would you consider installing piwik in a lab hosts and maintaning it yourself?  My main concerns here are not so much priorities or traffic but rather acc...
[16:50:57] <wikibugs_>	 (03CR) 10Ottomata: "OO yes much better with copy" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)
[16:51:07] <wikibugs_>	 (03PS5) 10Ottomata: Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908)
[16:53:11] <wikibugs_>	 (03PS6) 10Ottomata: Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908)
[17:48:15] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) > Need to be able to set custom Spark settings before Kernel is launched  I'm not sure we can do this.  :/  I'm trying to figure some nice way to allow a user in th...
[18:44:57] <joal>	 Heya ottomata - I'm ready when you want :)
[18:45:35] <ottomata>	 heyaaa
[18:45:36] <ottomata>	 gr8
[18:45:37] <ottomata>	 bc?
[18:45:38] <ottomata>	 joal: ?
[18:45:42] <joal>	 yay ! owm
[19:05:45] <wikibugs_>	 (03PS4) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705)
[19:23:36] <mforns>	 ottomata, I have an issue with the EL sanitization spark job email alert, because spark logs regular stuff to stderr, and would trigger the alert email even if no failure...
[19:34:54] <mforns>	 maybe we could add sth like `result=$? ; if [ $result -ne 0 ]; then echo "error"; fi` to the end of the command...?
[19:35:26] <mforns>	 (having muted stderr before - 2>&1)
[19:37:00] <mforns>	 maybe we can pass config parameters for log4j?
[20:08:16] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) After a quick h-o with @Ottomata and @JAllemandou we've understood that the `/precache` endpoint used to produc...
[20:19:25] <ottomata>	 (mforns sorry in hangout with joal)
[20:26:04] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) That's roughly right.  Precache will always produce ORES native format that has been designed for JSON tool devel...
[20:30:20] <wikibugs_>	 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design, 10Readers-Web-Kanbanana-Board: [Spike 8hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10Jdlrobson) > number of sections opened/closed per page (we could potentially use the section usage schema: https://meta.wik...
[20:37:42] <ottomata>	 mforns:  don't remember how the job is launched by puppet
[20:37:45] <ottomata>	 does it use a wrapper script?
[20:37:47] <ottomata>	 the refine jobs do
[20:37:54] <ottomata>	 if so, then def, that would be easy
[20:38:04] <ottomata>	 also, this probably falls under the larger cronspam cleanup we wanted to do
[20:38:11] <ottomata>	 can't remmebe rif we made that a goal this quarter or not
[20:38:23] <ottomata>	 using something like https://habilis.net/cronic/
[20:39:48] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Sooooo, could we make a /v4/precache endpoint that does this?
[20:41:44] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) Ok, I revoke the idea of having this in ores - @Halfak said that within an API version regardless of the model...
[20:43:59] <mforns>	 ottomata, not sure if it uses a wrapper script, I think not, it just calls spark-submit
[20:44:38] <mforns>	 cronic looks good, but do we have it available?
[20:51:28] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Pchelolo) And also, looking into native ORES response, like https://ores.wikimedia.org/v3/scores/enwiki/854077897 the onl...
[21:07:57] <ottomata>	 mforns:  not really, but we wanted to do someting about these cron emails
[21:10:37] <ottomata>	 mforns:  i gotta run, soryr, let's talk more tomrorow
[21:10:43] <ottomata>	 driving back to NYC tonight, 6.5 h drive yyyuuuuck
[21:10:50] <mforns>	 sure ottomata np!
[21:11:03] <mforns>	 oooou, drive safe!
[21:18:01] <wikibugs_>	 10Analytics: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10JAllemandou) We did some tests in PySpark CLI with @Ottomata this evening and found memory settings that work (with some minor changes in code).  Job succeeded for both Pyspark and Scala-shell with...
[21:18:13] <joal>	 Off for tonight a-team - See you tomorrow :)
[21:18:24] <mforns>	 byeeee
[21:23:51] <joal>	 OMG !!! https://dawn.cs.stanford.edu/2018/08/07/sparser/
[21:23:54] <joal>	 Impressive :)
[21:28:38] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) FWIW, I am interested in supporting a super-basic format, but I don't want to call it v4 because it will be a dow...
[21:29:58] <joal>	 !log Webrequest data-loss warnings for upload and text for hours 2018-08-08-18 were contained only false positive (possibly related to network glitch ?)
[21:30:00] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log