[07:11:26] 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10MoritzMuehlenhoff) @herron: There's a number of issues with this task: - Heather was in both the cn=nda and cn=wmf LDAP groups, but a user not should be in both. Generally cn=wmf is for WMF st... [07:56:04] 10Analytics: How to get display statistics of the content publised on Commons - https://phabricator.wikimedia.org/T201180 (10WMDE-leszek) Thanks @Milimetric ! My colleagues are looking for some kind of API that can be more "interactively" asked. So the dump solution would sadly not be an option here. In any cas... [08:31:56] Hi team, reminder that I'm in holidays this week :) [08:55:10] holaaa [08:59:00] 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10Nuria) @varment: I have corrected such wikipedia15 has access to wikimediafoundation [10:11:49] (03CR) 10Zhuyifei1999: "Was trying this in my vagrant install and got:" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/428140 (https://phabricator.wikimedia.org/T192731) (owner: 10Framawiki) [10:16:01] (03CR) 10Zhuyifei1999: "Wait no.. that's my session screw up after the first login attempt failed with:" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/428140 (https://phabricator.wikimedia.org/T192731) (owner: 10Framawiki) [12:00:00] (03PS3) 10Amire80: Measure articles published using CX2 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) [13:21:00] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) > Can someone now document the difference between mw.eventLog.pageviewToken... [14:04:48] milimetric: yt? can you help with https://phabricator.wikimedia.org/T201420#4495642 or show me where to fix it? [14:05:05] yep, looking [14:06:23] ottomata: it's all the sql here: https://github.com/wikimedia/analytics-reportupdater-queries/tree/master/page-creation but I'll do it [14:08:21] (03PS1) 10Milimetric: Update table name [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/452372 [14:08:58] (03CR) 10Milimetric: [V: 032 C: 032] Update table name [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/452372 (owner: 10Milimetric) [14:10:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Milimetric) Thanks @Nettrom. Backfilling that data so it shows up on the dashboard is a bit of a pain, but if you think it would be useful I'll do it. I've updated the ta... [14:11:20] ottomata: I'm also going to set reportupdater to re-run back to whenever mediawiki_page_create_3 has data [14:13:43] milimetric: ok thanks [14:14:08] morning a-team :) [14:16:54] morning fdans [14:17:02] ok, set to rerun 2018-07-27 to 2018-08-13 [14:32:09] (03CR) 10Ottomata: [V: 032 C: 032] "This works with older refinery-camus too, so it should be backwards compatible. Merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/451869 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata) [14:32:54] thanks milimetric [14:32:57] hi fdans! [14:33:56] (03PS9) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [14:34:32] (03CR) 10Fdans: Adds empty dir removal to hive partition dropping jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [14:34:45] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) @tbayer, the following came up in @Krinkle's code review and I was hoping t... [14:34:50] ottomata: o/ [14:37:37] fdans: i want to do a refinery-source and refinery deploy today [14:38:01] is ^^^ ready to go? [14:38:03] shall we merge? [14:38:06] yep [14:38:26] alllrighty [14:38:30] (03CR) 10Ottomata: [V: 032 C: 032] Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [14:40:01] a-team any objection to a refinery source deploy? [14:40:09] it only has one change ( a fix to my camus checker email stuff) [14:59:52] !log deploying refinery-0.0.69 and refinery changes for T198908 [14:59:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:59:57] T198908: Alarms on throughput on camus imported data - https://phabricator.wikimedia.org/T198908 [15:28:03] 10Analytics, 10Analytics-Kanban, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10Milimetric) p:05Triage>03Normal a:03Milimetric [15:33:03] (03PS1) 10Ottomata: Fix for camus checker check_dry_run [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452399 [15:33:19] (03CR) 10Ottomata: [V: 032 C: 032] Fix for camus checker check_dry_run [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452399 (owner: 10Ottomata) [15:54:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Nettrom) @Milimetric Thanks for taking care of the SQL queries! I don't see a need for backfilling the data at the moment, there's not a benefit warranting that cost. As me... [15:58:53] 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10Varnent) @Nuria - wonderful - thank you!! [16:13:45] 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) 05Open>03Resolved a:03Nuria This appears to be working - thank you @Nuria! [17:53:16] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Just had a meeting with Aaron and Petr. We talked about how there is a Mediawiki JobQueue based job that also... [18:10:24] (03PS1) 10Ottomata: camus - Add --check-java-opts and --check-emails-to option [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452439 (https://phabricator.wikimedia.org/T198908) [18:11:42] (03CR) 10Ottomata: [V: 032 C: 032] "This is all getting a little messy, but I'm just pushing this through for now. :/" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/452439 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata) [19:30:31] (03CR) 10Ottomata: [V: 032 C: 032] Add spark yarn scala and pyspark 'large' kernels [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/451781 (https://phabricator.wikimedia.org/T201519) (owner: 10Ottomata) [19:32:17] (03PS1) 10Ottomata: Fix display name of Spark Scala - YARN (large) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452463 [19:32:24] 10Analytics, 10ORES, 10Scoring-platform-team, 10Services (designing): ORES hooks - https://phabricator.wikimedia.org/T201869 (10Pchelolo) [19:32:43] (03CR) 10Ottomata: [V: 032 C: 032] Fix display name of Spark Scala - YARN (large) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452463 (owner: 10Ottomata) [19:42:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) I merged https://gerrit.wikimedia.org/r/#/c/analytics/jupyterhub/deploy/+/451781/ which adds 2 new 'large' kernels that bump executor memory to 4g and memoryOverhea... [19:42:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) [19:46:08] dsaez: yt? [19:48:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10Ottomata) @diego, I've installed a PySpark YARN (large) notebook kernel that has 4g exectors with 2g memoryOverhead. This should be sufficient for your... [20:00:54] HaeB, regarding ingestion of PageIssues into Druid, I'd prefer not to modify the generic EventLoggingToDruid job to add schema-specific features. But one thing we could do is add to it the ability to stringify arrays of simple types like: Array[String], into a string field that could be indexed to Druid. That would solve the PageIssues case, no? [20:02:17] Another option would be to modify the schema, if it is still in develoment. [20:02:17] *development [20:02:17] * ebernhardson already has SWAP bugs to file :P [20:03:46] dsaez: are you working on the latex? I can't compile anymore. :D [20:03:57] dsaez: sorry. wrong channel. switching. [20:05:02] 10Analytics: [EventLoggingToDruid] Allow ingestion of simple-type arrays by converting them to strings - https://phabricator.wikimedia.org/T201873 (10mforns) [20:06:11] 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10EBernhardson) [20:13:35] ebernhardson: awesome! [20:13:36] :) [20:14:25] ebernhardson: venv/ isn't in sys.path of pyspark notebook??? [20:14:37] that works for regular python notebook, yes? [20:21:54] ottomata: yes, loading a python 3 notebook loads the library installed in the spark notebook [20:22:03] hmmMMMM [20:22:04] interesting [20:22:06] that is very strange [20:22:10] good bug! [20:22:10] looking [20:23:27] oh i know why [20:23:28] hmmmm [20:23:29] hm [20:23:30] hm [20:23:30] h [20:23:31] hm [20:23:32] ottomata: for comparison, sys.executable reports $HOME/venv/bin/pyhon3 for one, and /usr/bin/python3 for the other [20:23:35] yup [20:24:03] ottomata: you may need to set spark.pyspark.python and spark.pyspark.driver.python separately, if this is being booted through the spark scripts [20:24:21] i'm betting its done so that python path is same on driver and executors [20:25:03] for real magic you could tar up venv and ship it to the executors via `--archive venv.tar.gz#venv` :) [20:33:34] ebernhardson: but you couldn't then use the !pip installed things [20:33:46] since that is done after the notebook (and spark shell) is launched [20:34:20] ottomata: well, you would have to restart the kernel to rebuild the venv i guess? But the alternative is you couldn't run custom code (like tensorflow) on the executors [20:36:25] ebernhardson: something like that might be more for https://wikitech.wikimedia.org/wiki/SWAP#Custom_Spark_Kernels [20:37:43] ottomata: nifty, i can work with that [20:39:12] ebernhardson: , hm these kernels are installed globally for all users [20:39:17] hmm [20:40:26] ottomata: if you can use $HOME in that thing, you can set the confs in PYSPARK_SUBMIT_ARGS [20:40:29] checking [20:42:49] (03PS1) 10Ottomata: Fix pyspark local to not use --master yarn [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452568 [20:43:08] (03CR) 10Ottomata: [V: 032 C: 032] Fix pyspark local to not use --master yarn [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/452568 (owner: 10Ottomata) [20:45:10] ebernhardson: $HOME and ~ don't look like they work [20:45:17] at least not for argv [20:45:29] i was trying to make the notebook use ~/venv/bin/python3 when launching [20:45:56] yea [20:48:53] 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10Ottomata) I don't think we can really fix this. The Spark kernels are installed globally for all users, so we can't refer to a user's venv. To use your own custom kernel, you... [20:48:56] yeah, ebernhardson i dunno if we can really help there :/ [20:49:07] i mean, i could install a custom kernel for every single user, and make one get installed automatically [20:49:08] ottomata: i can make it work for me at least, good enough :) [20:49:28] but the way swap is set up (i think i'd love to refactor the whole thing, but that's another big project), makes that really difficult to maintain [20:49:40] the kernels and venvs aren't puppetized, they are just created when the user first logs in [20:49:49] so there's no way to update/upgrade them by admins. [20:50:06] kinda a trade off. folks can install stuff in venv, but we don't have any good way to keep track of that [20:50:19] it was hard enough when moving folks to the new notebook serers [20:50:20] servers [20:50:54] yea i can imagine [21:13:42] ottomata ebernhardson: i don't know if it's the same issue, but i similarly fail to import NeilPatelQuinn[m 's mfdata package even after installing it successfully on SWAP: [21:14:07] https://www.irccloud.com/pastebin/Z6pHR8mA/package%20import%20fail%20on%20SWAP [21:14:44] (here, it seems it's being installed into a temp folder) [21:17:06] mforns: yes, i mean, the squashed array would be very in the case of >1 values, but it would solve the case of 1 value [21:17:40] HaeB: which notebook kernel are you using? [21:17:43] it's not super important to solve this right now (after all this is the very first time we are doing this), but it would provide some value for the analysis [21:17:59] ottomata: Python 3 [21:18:10] hm [21:19:27] HaeB, sorry did not understand, what would be the squashed array? [21:20:25] the stringified array (e.g. ['abc', 'def'] --> [21:20:57] mforns: i mean the stringified array (e.g. ['abc', 'def'] --> 'abc,def' - i guess this is what you meant?) [21:21:07] HaeB, yes [21:21:36] would that work for your analyses in Druid? [21:24:36] HaeB: i'm not sure this is a notebook issue...this seems to do the same on the CLI too [21:24:37] mforns: yes, i meant to say above that it would be suitable workaround (not very useful in the case of >1 values, but it would solve the case of 1 value) [21:24:43] maybe wmdata doesn't work with pip install so well? [21:24:47] or...have you gotten that to work before? [21:25:03] ottomata: i haven't, but neil has apparently [21:25:30] ottomata: do you think the temp folder is the issue? [21:25:46] i'm not sure, it does install wmfdata-0.1-py3.5.egg-info [21:25:51] into the venv [21:25:52] but there don't seem to be any sources [21:29:42] ottomata: hmm... any further hints we can convey to NeilPatelQuinn[m ? [21:30:06] HaeB: i just tried it on my local mw-vagrant [21:30:09] same behavior there [21:30:15] HaeB, we can make sure that the elements in the array are sorted before stringifying, so that we reduce possible combinations of string values [21:30:15] so its def a problem with the library [21:30:39] i don't know much about packaging for pip [21:30:48] but whatever is going on (probably somethign in setup.py?) isn't right [21:38:08] 10Analytics: [EventLoggingToDruid] Allow ingestion of simple-type arrays by converting them to strings - https://phabricator.wikimedia.org/T201873 (10mforns) Should we sort the array before stringifying? Like: [1,2] and [2,1] generate the same string "[1,2]"? This would reduce the number of possible string value... [21:43:30] (03CR) 10Milimetric: "This looks good! Except the access issue I mentioned, this could be merged and set up to run in puppet (I do that part). Let me know abo" (031 comment) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) (owner: 10Amire80) [21:51:10] mforns: yes, that sounds like a good idea [21:51:47] HaeB, hm, now on a second thought, if the order of the array has a meaning, that would corrupt it [21:52:22] ottomata: ok, thanks for looking into it - will make sure neil sees it when he is back online [21:53:43] mforns: yes, it should have (corresponding to the order of maintenance templates on the page), but we could live with that loss of information - actually it might even be desirable, because otherwise the range of possible values would be too complex to display meaningfully in a graph [21:54:35] HaeB, ok, left a comment on the task, will mention it in tomorrow's standup [21:56:10] mforns: great, thanks - and again, if it threatens to become too complex, feel free to deprioritize it in favor of getting the rest of the schema into Druid/Superset [21:56:29] ok, makes sense [22:47:29] 10Analytics, 10Analytics-Cluster: Can't install packages to SWAP / pyspark cluster mode - https://phabricator.wikimedia.org/T201874 (10EBernhardson) 05Open>03declined [22:58:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: pyspark2 job killed by YARN for exceeding memory limits - https://phabricator.wikimedia.org/T201519 (10diego) @Ottomata ,thanks for the update. I've tried the same code (the version posted above by @JAllemandou ) and got the same error. ``` Py4JJavaError... [23:42:01] (03PS5) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) [23:43:40] (03CR) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans) [23:44:32] (03PS6) 10Fdans: Filter out unwanted wikis from wmf.virtualpageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) [23:46:13] (03CR) 10Fdans: "Applied changes, tested query in a small subset of the virtual pageview data and no rogue domains are slippin'" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans) [23:58:58] 10Analytics, 10Analytics-EventLogging, 10MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), 10Patch-For-Review, 10Performance-Team (Radar): Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207 (10Krinkle)