[07:11:26] <wikibugs_>	 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10MoritzMuehlenhoff) @herron: There's a number of issues with this task: - Heather was in both the cn=nda and cn=wmf LDAP groups, but a user not should be in both. Generally cn=wmf is for WMF st...
[07:56:04] <wikibugs_>	 10Analytics: How to get display statistics of the content publised on Commons - https://phabricator.wikimedia.org/T201180 (10WMDE-leszek) Thanks @Milimetric !  My colleagues are looking for some kind of API that can be more "interactively" asked. So the dump solution would sadly not be an option here. In any cas...
[08:31:56] <joal>	 Hi team, reminder that I'm in holidays this week :)
[08:55:10] <nuria_>	 holaaa 
[08:59:00] <wikibugs_>	 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10Nuria) @varment: I have corrected such wikipedia15 has access to wikimediafoundation
[10:11:49] <wikibugs_>	 (03CR) 10Zhuyifei1999: "Was trying this in my vagrant install and got:" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/428140 (https://phabricator.wikimedia.org/T192731) (owner: 10Framawiki)
[10:16:01] <wikibugs_>	 (03CR) 10Zhuyifei1999: "Wait no.. that's my session screw up after the first login attempt failed with:" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/428140 (https://phabricator.wikimedia.org/T192731) (owner: 10Framawiki)
[12:00:00] <wikibugs_>	 (03PS3) 10Amire80: Measure articles published using CX2 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435)
[13:21:00] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) > Can someone now document the difference between mw.eventLog.pageviewToken...
[14:04:48] <ottomata>	 milimetric:  yt?  can you help with https://phabricator.wikimedia.org/T201420#4495642 or show me where to fix it?
[14:05:05] <milimetric>	 yep, looking
[14:06:23] <milimetric>	 ottomata: it's all the sql here: https://github.com/wikimedia/analytics-reportupdater-queries/tree/master/page-creation but I'll do it
[14:08:21] <wikibugs_>	 (03PS1) 10Milimetric: Update table name [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/452372
[14:08:58] <wikibugs_>	 (03CR) 10Milimetric: [V: 032 C: 032] Update table name [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/452372 (owner: 10Milimetric)
[14:10:26] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Milimetric) Thanks @Nettrom.  Backfilling that data so it shows up on the dashboard is a bit of a pain, but if you think it would be useful I'll do it.  I've updated the ta...
[14:11:20] <milimetric>	 ottomata: I'm also going to set reportupdater to re-run back to whenever mediawiki_page_create_3 has data
[14:13:43] <ottomata>	 milimetric:  ok thanks
[14:14:08] <fdans>	 morning a-team :)
[14:16:54] <milimetric>	 morning fdans 
[14:17:02] <milimetric>	 ok, set to rerun 2018-07-27 to 2018-08-13
[14:32:09] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] "This works with older refinery-camus too, so it should be backwards compatible.  Merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/451869 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)
[14:32:54] <ottomata>	 thanks milimetric
[14:32:57] <ottomata>	 hi fdans!
[14:33:56] <wikibugs_>	 (03PS9) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600)
[14:34:32] <wikibugs_>	 (03CR) 10Fdans: Adds empty dir removal to hive partition dropping jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans)
[14:34:45] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) @tbayer, the following came up in @Krinkle's code review and I was hoping t...
[14:34:50] <fdans>	 ottomata: o/
[14:37:37] <ottomata>	 fdans:  i want to do a refinery-source and refinery deploy today
[14:38:01] <ottomata>	 is ^^^ ready to go?
[14:38:03] <ottomata>	 shall we merge?
[14:38:06] <fdans>	 yep
[14:38:26] <ottomata>	 alllrighty
[14:38:30] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans)
[14:40:01] <ottomata>	 a-team	any objection to a refinery source deploy?
[14:40:09] <ottomata>	 it only has one change ( a fix to my camus checker email stuff)
[14:59:52] <ottomata>	 !log deploying refinery-0.0.69 and refinery changes for T198908
[14:59:56] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:59:57] <stashbot>	 T198908: Alarms on throughput on camus imported data  - https://phabricator.wikimedia.org/T198908
[15:28:03] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10Milimetric) p:05Triage>03Normal a:03Milimetric
[15:33:03] <wikibugs_>	 (03PS1) 10Ottomata: Fix for camus checker check_dry_run [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452399
[15:33:19] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Fix for camus checker check_dry_run [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452399 (owner: 10Ottomata)
[15:54:49] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Nettrom) @Milimetric Thanks for taking care of the SQL queries! I don't see a need for backfilling the data at the moment, there's not a benefit warranting that cost. As me...
[15:58:53] <wikibugs_>	 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10Varnent) @Nuria - wonderful - thank you!!
[16:13:45] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) 05Open>03Resolved a:03Nuria This appears to be working - thank you @Nuria!
[17:53:16] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Just had a meeting with Aaron and Petr.  We talked about how there is a Mediawiki JobQueue based job that also...
[18:10:24] <wikibugs_>	 (03PS1) 10Ottomata: camus - Add --check-java-opts and --check-emails-to option [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452439 (https://phabricator.wikimedia.org/T198908)
[18:11:42] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] "This is all getting a little messy, but I'm just pushing this through for now.  :/" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452439 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)
[19:30:31] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Add spark yarn scala and pyspark 'large' kernels [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/451781 (https://phabricator.wikimedia.org/T201519) (owner: 10Ottomata)
[19:32:17] <wikibugs_>	 (03PS1) 10Ottomata: Fix display name of Spark Scala - YARN (large) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452463
[19:32:24] <wikibugs_>	 10Analytics, 10ORES, 10Scoring-platform-team, 10Services (designing): ORES hooks - https://phabricator.wikimedia.org/T201869 (10Pchelolo)
[19:32:43] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Fix display name of Spark Scala - YARN (large) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452463 (owner: 10Ottomata)
[19:42:07] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) I merged https://gerrit.wikimedia.org/r/#/c/analytics/jupyterhub/deploy/+/451781/ which adds 2 new 'large' kernels that bump executor memory to 4g and memoryOverhea...
[19:42:25] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata)
[19:46:08] <ottomata>	 dsaez:  yt?
[19:48:41] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10Ottomata) @diego, I've installed a PySpark YARN (large) notebook kernel that has 4g exectors with 2g memoryOverhead.  This should be sufficient for your...
[20:00:54] <mforns>	 HaeB, regarding ingestion of PageIssues into Druid, I'd prefer not to modify the generic EventLoggingToDruid job to add schema-specific features. But one thing we could do is add to it the ability to stringify arrays of simple types like: Array[String], into a string field that could be indexed to Druid. That would solve the PageIssues case, no?
[20:02:17] <mforns>	 Another option would be to modify the schema, if it is still in develoment.
[20:02:17] <mforns>	 *development
[20:02:17] * ebernhardson already has SWAP bugs to file :P
[20:03:46] <leila>	 dsaez: are you working on the latex? I can't compile anymore. :D
[20:03:57] <leila>	 dsaez: sorry. wrong channel. switching.
[20:05:02] <wikibugs_>	 10Analytics: [EventLoggingToDruid] Allow ingestion of simple-type arrays by converting them to strings - https://phabricator.wikimedia.org/T201873 (10mforns)
[20:06:11] <wikibugs_>	 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10EBernhardson)
[20:13:35] <ottomata>	 ebernhardson:  awesome!
[20:13:36] <ottomata>	 :)
[20:14:25] <ottomata>	 ebernhardson:  venv/ isn't in sys.path of pyspark notebook???
[20:14:37] <ottomata>	 that works for regular python notebook, yes?
[20:21:54] <ebernhardson>	 ottomata: yes, loading a python 3 notebook loads the library installed in the spark notebook
[20:22:03] <ottomata>	 hmmMMMM
[20:22:04] <ottomata>	 interesting
[20:22:06] <ottomata>	 that is very strange
[20:22:10] <ottomata>	 good bug!
[20:22:10] <ottomata>	 looking
[20:23:27] <ottomata>	 oh i know why
[20:23:28] <ottomata>	 hmmmm
[20:23:29] <ottomata>	 hm
[20:23:30] <ottomata>	 hm
[20:23:30] <ottomata>	 h
[20:23:31] <ottomata>	 hm
[20:23:32] <ebernhardson>	 ottomata: for comparison, sys.executable reports $HOME/venv/bin/pyhon3 for one, and /usr/bin/python3 for the other
[20:23:35] <ottomata>	 yup
[20:24:03] <ebernhardson>	 ottomata: you may need to set spark.pyspark.python and spark.pyspark.driver.python separately, if this is being booted through the spark scripts
[20:24:21] <ebernhardson>	 i'm betting its done so that python path is same on driver and executors
[20:25:03] <ebernhardson>	 for real magic you could tar up venv and ship it to the executors via `--archive venv.tar.gz#venv` :)
[20:33:34] <ottomata>	 ebernhardson:  but you couldn't then use the !pip installed things
[20:33:46] <ottomata>	 since that is done after the notebook (and spark shell) is launched
[20:34:20] <ebernhardson>	 ottomata: well, you would have to restart the kernel to rebuild the venv i guess? But the alternative is you couldn't run custom code (like tensorflow) on the executors
[20:36:25] <ottomata>	 ebernhardson:  something like that might be more for https://wikitech.wikimedia.org/wiki/SWAP#Custom_Spark_Kernels
[20:37:43] <ebernhardson>	 ottomata: nifty, i can work with that
[20:39:12] <ottomata>	 ebernhardson: , hm these kernels are installed globally for all users
[20:39:17] <ottomata>	 hmm
[20:40:26] <ebernhardson>	 ottomata: if you can use $HOME in that thing, you can set the confs in PYSPARK_SUBMIT_ARGS
[20:40:29] <ebernhardson>	 checking
[20:42:49] <wikibugs_>	 (03PS1) 10Ottomata: Fix pyspark local to not use --master yarn [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452568
[20:43:08] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Fix pyspark local to not use --master yarn [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452568 (owner: 10Ottomata)
[20:45:10] <ottomata>	 ebernhardson:  $HOME and ~ don't look like they work
[20:45:17] <ottomata>	 at least not for argv
[20:45:29] <ottomata>	 i was trying to make the notebook use ~/venv/bin/python3 when launching
[20:45:56] <ebernhardson>	 yea
[20:48:53] <wikibugs_>	 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10Ottomata) I don't think we can really fix this.  The Spark kernels are installed globally for all users, so we can't refer to a user's venv.  To use your own custom kernel, you...
[20:48:56] <ottomata>	 yeah, ebernhardson i dunno if we can really help there :/
[20:49:07] <ottomata>	 i mean, i could install a custom kernel for every single user, and make one get installed automatically
[20:49:08] <ebernhardson>	 ottomata: i can make it work for me at least, good enough :)
[20:49:28] <ottomata>	 but the way swap is set up (i think i'd love to refactor the whole thing, but that's another big project), makes that really difficult to maintain
[20:49:40] <ottomata>	 the kernels and venvs aren't puppetized, they are just created when the user first logs in
[20:49:49] <ottomata>	 so there's no way to update/upgrade them by admins.
[20:50:06] <ottomata>	 kinda a trade off.  folks can install stuff in venv, but we don't have any good way to keep track of that
[20:50:19] <ottomata>	 it was hard enough when moving folks to the new notebook serers
[20:50:20] <ottomata>	 servers
[20:50:54] <ebernhardson>	 yea i can imagine
[21:13:42] <HaeB>	 ottomata ebernhardson: i don't know if it's the same issue, but i similarly fail to import NeilPatelQuinn[m 's mfdata package even after installing it successfully on SWAP:
[21:14:07] <HaeB>	 https://www.irccloud.com/pastebin/Z6pHR8mA/package%20import%20fail%20on%20SWAP
[21:14:44] <HaeB>	 (here, it seems it's being installed into a temp folder)
[21:17:06] <HaeB>	 mforns: yes, i mean, the squashed array would be very in the case of >1 values, but it would solve the case of 1 value
[21:17:40] <ottomata>	 HaeB:  which notebook kernel are you using?
[21:17:43] <HaeB>	 it's not super important to solve this right now (after all this is the very first time we are doing this), but it would provide some value for the analysis
[21:17:59] <HaeB>	 ottomata: Python 3
[21:18:10] <ottomata>	 hm
[21:19:27] <mforns>	 HaeB, sorry did not understand, what would be the squashed array?
[21:20:25] <HaeB>	 the stringified array  (e.g. ['abc', 'def'] --> 
[21:20:57] <HaeB>	 mforns: i mean the stringified array  (e.g. ['abc', 'def'] --> 'abc,def' - i guess this is what you meant?)
[21:21:07] <mforns>	 HaeB, yes
[21:21:36] <mforns>	 would that work for your analyses in Druid?
[21:24:36] <ottomata>	 HaeB:  i'm not sure this is a notebook issue...this seems to do the same on the CLI too
[21:24:37] <HaeB>	 mforns: yes, i meant to say above that it would be suitable workaround (not very useful in the case of >1 values, but it would solve the case of 1 value)
[21:24:43] <ottomata>	 maybe wmdata doesn't work with pip install so well?
[21:24:47] <ottomata>	 or...have you gotten that to work before?
[21:25:03] <HaeB>	 ottomata: i haven't, but neil has apparently
[21:25:30] <HaeB>	 ottomata: do you think the temp folder is the issue?
[21:25:46] <ottomata>	 i'm not sure, it does install wmfdata-0.1-py3.5.egg-info 
[21:25:51] <ottomata>	 into the venv
[21:25:52] <ottomata>	 but there don't seem to be any sources
[21:29:42] <HaeB>	 ottomata: hmm... any further hints we can convey to NeilPatelQuinn[m ?
[21:30:06] <ottomata>	 HaeB: i just tried it on my local mw-vagrant
[21:30:09] <ottomata>	 same behavior there
[21:30:15] <mforns>	 HaeB, we can make sure that the elements in the array are sorted before stringifying, so that we reduce possible combinations of string values
[21:30:15] <ottomata>	 so its def a problem with the library
[21:30:39] <ottomata>	 i don't know much about packaging for pip
[21:30:48] <ottomata>	 but whatever is going on (probably somethign in setup.py?) isn't right
[21:38:08] <wikibugs_>	 10Analytics: [EventLoggingToDruid] Allow ingestion of simple-type arrays by converting them to strings - https://phabricator.wikimedia.org/T201873 (10mforns) Should we sort the array before stringifying? Like: [1,2] and [2,1] generate the same string "[1,2]"? This would reduce the number of possible string value...
[21:43:30] <wikibugs_>	 (03CR) 10Milimetric: "This looks good!  Except the access issue I mentioned, this could be merged and set up to run in puppet (I do that part).  Let me know abo" (031 comment) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) (owner: 10Amire80)
[21:51:10] <HaeB>	 mforns: yes, that sounds like a good idea
[21:51:47] <mforns>	 HaeB, hm, now on a second thought, if the order of the array has a meaning, that would corrupt it
[21:52:22] <HaeB>	 ottomata: ok, thanks for looking into it - will make sure neil sees it when he is back online
[21:53:43] <HaeB>	 mforns: yes, it should have (corresponding to the order of maintenance templates on the page), but we could live with that loss of information - actually  it might even be desirable, because otherwise the range of possible values would be too complex to display meaningfully in a graph
[21:54:35] <mforns>	 HaeB, ok, left a comment on the task, will mention it in tomorrow's standup
[21:56:10] <HaeB>	 mforns: great, thanks - and again, if it threatens to become too complex, feel free to deprioritize it in favor of getting the rest of the schema into Druid/Superset
[21:56:29] <mforns>	 ok, makes sense
[22:47:29] <wikibugs_>	 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10EBernhardson) 05Open>03declined
[22:58:44] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10diego) @Ottomata ,thanks for the update.  I've tried the same code (the version posted above by @JAllemandou ) and got the same error.  ``` Py4JJavaError...
[23:42:01] <wikibugs_>	 (03PS5) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971)
[23:43:40] <wikibugs_>	 (03CR) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans)
[23:44:32] <wikibugs_>	 (03PS6) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971)
[23:46:13] <wikibugs_>	 (03CR) 10Fdans: "Applied changes, tested query in a small subset of the virtual pageview data and no rogue domains are slippin'" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans)
[23:58:58] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), 10Patch-For-Review, 10Performance-Team (Radar): Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207 (10Krinkle)