[00:00:18] (03PS7) 10Nuria: Classification of actors for bot detection [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) [00:01:01] (03CR) 10Nuria: [C: 04-1] "Still testing oozie workflows, please feel free to comment on naming." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [00:02:21] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) [00:10:08] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) [00:13:14] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) @Nuria is the list of steps in the description complete? [00:13:29] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up preview counting for KaiOS app - https://phabricator.wikimedia.org/T244548 (10nshahquinn-wmf) @Nuria is the list of steps in the description complete? [00:18:32] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10Nuria) mmm no, the pageview header was used for something else (discarding "previews" on some app functionality that is - I think- no longer alive) Can't these pageviews be... [00:24:55] 10Analytics, 10Inuka-Team, 10Product-Analytics: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) >>! In T244547#5858544, @Nuria wrote: > mmm no, the pageview header was used for something else (discarding "previews" on some app functionality that is - I th... [00:56:03] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Krinkle) >>! In T242712#5856672, @elukey wrote: > […] I don't see any event related to wiki=login, are we sure that we have it in recent changes? >... [01:27:16] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Krinkle) The filename is not unique globally, and it is quite normal for e.g. en... [02:24:03] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Tgr) >>! In T234590#5858682, @Krinkle wrote: > The only complication with this i... [04:36:43] nuria and milimetric: I wrote out a few potential options for the API design. Look over them if you'd like before tomorrow's meeting https://www.irccloud.com/pastebin/f9jrGWor/design_options.txt [06:42:58] helloooo [06:45:50] o/ [06:45:55] fdans: buongiorno :D [06:46:22] elukey: o/ [06:52:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10fdans) For the record this is the approach being followed by the team, as a result of which I've made the above change to the way Wikistats bundles its f... [09:20:10] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10elukey) Sent an email to dev@superset: https://lists.apache.org/thread.html/rfd9d61e017bc643c898fba6add57c13e85037e1102414ecfb9df7a49%40%3Cdev.superset.apache.org%3E It s... [09:39:16] <_joe_> hi! I need to make a query to turnilo but it times out [09:39:27] <_joe_> I wanted to ask for guidance on how to proceed [10:25:16] (following up with Joe) [10:28:36] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10elukey) @Ottomata any opinion about the last two ideas from Timo? [11:36:28] * elukey lunch! [12:44:06] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10MarcoAurelio) [13:01:34] 10Analytics: Issues querying table in Hive - https://phabricator.wikimedia.org/T244484 (10JAllemandou) Nothing to add to what Andrew said. Adding partitions with folders not following the pattern convention needs to be done 'manually' (can be done through a script, but with explicit single partition commands). [13:04:04] 10Analytics: Request for database on hadoop user space - https://phabricator.wikimedia.org/T244504 (10JAllemandou) 05Open→03Resolved [13:18:18] (03CR) 10Joal: "2 comments:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [14:24:30] 10Analytics, 10stewardbots, 10User-Elukey: Deprecation (if possible) of the #central channel on irc.wikimedia.org - https://phabricator.wikimedia.org/T242712 (10Ottomata) I think any of these ideas would work. For the third option (the EventBus option), if we do that, maybe a more generic account creation s... [14:44:32] 10Analytics: Issues querying table in Hive - https://phabricator.wikimedia.org/T244484 (10Ottomata) More docs for you: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_representations https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention_and_auto-purging#Hive_... [14:58:03] hey joal, yt? [14:58:08] Hi dsaez [14:58:14] Here for 5 minutes no more :) [14:58:17] kids after [14:58:42] very quick [14:59:02] I'm trying to read 'hdfs:///wmf/data/wmf/mediawiki/wikitext/history/snapshot=YYYY-MM/wiki_db=WIKI_DB' [14:59:25] df = spark.read.parque('hdfs:///wmf/data/wmf/mediawiki/wikitext/history/snapshot=2019-10/wiki_db=enwiki') [14:59:31] but this gives an error [14:59:37] is guess because is avro [14:59:39] an not parquet [14:59:46] but not sure how to do it [14:59:46] yes [14:59:47] dsaez: spark.read.avro [14:59:48] ? [15:00:02] really? [15:00:06] let me see [15:00:09] ottomata: in order for this to work we need to make avro lib available everywhere [15:00:11] ah no [15:00:12] hehe [15:00:18] just guessing! [15:00:26] spark.read.format("avro").load("PATH") [15:00:40] joal tried that, but also got an error [15:00:49] in order to have this working, you should preload refinery-job.jar [15:00:57] dsaez: I guess ou use a notebook right? [15:01:24] ooh [15:01:29] yes [15:01:30] joal: this is a reason why we *should* deploy refinery artifacts to notebook hosts( but i guewss diego does notebook on sta box) [15:01:53] ottomata, this I'm trying in notebook1003 [15:01:57] ottomata: could we update scala-spark kernels to use hdfs:///wmf/refinery/current/artifacts/refinery-job.jar ? [15:02:22] dsaez: if you use notebook on stat, you should be able to do it [15:02:25] yes, but i'm considering dropping those kernels in newpyter :p [15:02:29] right [15:02:35] I know that you can add jars when starting the notebook [15:02:41] also we'd need to deploy the jar to notebook* [15:02:45] which we don't for space reasons right now [15:02:50] ottomata: joars are on hdfs [15:02:53] ottomata: jars are on hdfs [15:02:55] oh right [15:03:14] dsaez: until we find a better solution, can you try with spark2-shell? [15:03:22] dsaez: will past you the starting command here [15:03:52] ook. [15:03:56] from stat1007 or stat1004: spark2-shell --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 16G --conf spark.dynamicAllocation.maxExecutors=64 --conf spark.executor.memoryOverhead=2048 --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar [15:03:56] sure [15:04:01] well joal you can make a python notebook and load up a spark session withwhatever you want [15:04:11] neil's wmfdata lib is good for that [15:04:15] true ottomata [15:04:16] or you can just use findspark [15:04:18] I'll try [15:04:24] https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/spark.py [15:04:36] anyway - need to go for kids laters ! [15:04:47] https://wikitech.wikimedia.org/wiki/SWAP#Launching_as_SparkSession_in_a_Python_Notebook [15:05:24] for starting the notebook to load other packes I do this (this for adding the xml parser) [15:05:26] pyspark2 --master yarn --deploy-mode client --executor-memory 8g --driver-memory 8g --conf spark.dynamicAllocation.maxExecutors=128 --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=we [15:05:26] bproxy.eqiad.wmnet -Dhttp.proxyPort=8080 -Dhttps.proxyHost=webproxy.eqiad.wmnet -Dhttps.proxyPort=8080" --packages com.databricks:spark-xml_2.11:0.4.1 [15:05:49] I think if you start with --jars /srv/deployment/analytics/refinery/artifacts/refinery-job.jar will do the magic [15:06:46] that'll do! [15:06:46] ottomata, what I do to run notebooks on stat machines, is first set this env variables: [15:07:06] export PYSPARK_DRIVER_PYTHON=jupyter [15:07:06] export PYSPARK_DRIVER_PYTHON_OPTS='notebook' [15:07:06] export PYSPARK_PYTHON=/usr/bin/python3.5 [15:07:06] export PYSPARK_PYTHON=/srv/home/dsaez/3.6/bin/python [15:07:13] and then, the command I told you [15:07:21] oh cool! [15:07:30] so pyspark command will actually launch the notebook [15:07:31] that's cool [15:07:34] didn't knwo you could do that [15:07:38] yep [15:08:20] to load the .jar in the notebook1003 machine... I think is possible to upload the .jar from the same notebook, but I'll double check [15:53:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) mmm, /assets-v2 should point to git/clone/latest/dist/assests-v2? >Rename current index.html to index.old.html and add a link to it from new i... [16:11:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Ottomata) > For this to work we also need additional apache configuration, can we push also the apache changes to CR them? I'm not sure if we do? Whate... [16:14:41] fdans, ottomata - has anybody tested --^ in labs or similar? [16:28:00] elukey: ottomata hmmm I thought the prerequisite was this, which I pushed yesterday [16:28:00] https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/570667/ [16:28:59] fdans: looks good! [16:29:14] we should be able to merge and deploy that with no symlinks and everything will work as is now, right? [16:30:21] yes ok but was it tested anywhere? [16:43:58] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog: EventLogging MEP Upgrade Phase 1 - https://phabricator.wikimedia.org/T244521 (10mforns) Thanks a lot @jlinehan for refactoring the older task and putting together this one. [16:44:09] elukey: nope! [17:01:06] ping mforns standup [18:00:14] mforns: i'm eating lnch but can work wtih ya whenever [18:00:20] there is some manual setup for the eventstreamconfig [18:00:24] but the rest should work without it [18:00:27] ottomata, OK! [18:00:35] wanna tardis? [18:00:44] i thikn you shoudl be able to get regular eventlogging working with eventgate with jsut that puppet patch i linked [18:00:48] mforns: sure, now? [18:01:02] also, I HATE tardis beacuse it notified me every day for like a year before I figured out what it was [18:01:03] whenever is good for you [18:01:12] i though it was some annoying sre networking calendar tghing [18:01:12] xD [18:01:14] or calendar spam [18:01:17] ok gimme 5 mins [18:01:18] hahaha [18:01:36] ottomata, hey take time to eat, actually I will have a snack too now [18:01:51] let's meet at half-past [18:01:55] ok pfct [18:01:59] cool [18:07:38] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog: EventLogging MEP Upgrade Phase 1 - https://phabricator.wikimedia.org/T244521 (10jlinehan) [18:07:54] gone for diner with kids, back when they're in bed [18:09:57] joal: presto is on statXXXX and notebookXXXX [18:10:12] for later if you want to check :) [18:10:20] going afk folks, have a good weekend! [18:29:02] :) [18:31:03] ottomata, so batcave? given that you hate tardis :] [18:31:12] ya [18:31:16] :] omw [18:33:44] nuria: not working on 2/14 [18:33:46] sorryyyy [18:57:36] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, 10user-sbassett: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10sbassett) !!**Security Review Summary - T242124 - 2020-02-07**!! Overall, this extension looks fine from a secur... [19:22:52] 10Analytics, 10Analytics-Kanban: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) [19:23:24] 10Analytics, 10Analytics-Kanban: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) Design Document: https://docs.google.com/document/d/1D-v2vTtFt94xZ9HVSky7BKpzF4H2LZ2yC5H9KR_rgKI/edit [19:29:52] 10Analytics, 10Analytics-Kanban: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra - https://phabricator.wikimedia.org/T244597 (10Nuria) [19:30:41] 10Analytics, 10Analytics-Kanban: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra - https://phabricator.wikimedia.org/T244597 (10Nuria) pinging @milimetric and @JAllemandou so they know @lexnasser is doing this work [20:06:26] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Performance perception survey stopped collecting data at 2020-02-07T17:00:00Z UTC - https://phabricator.wikimedia.org/T244599 (10Gilles) [20:07:27] anything happened in the EventLogging pipeline at 17:00 UTC today? [20:08:39] gilles: let me see, there are no alarms that fired, one sec [20:09:52] gilles: all schemas? [20:10:10] no, just the quicksurveys ones [20:10:50] actually wait I'm seeing the same thing when I query navigationtiming on hive [20:11:08] gilles: mmm.. there is something periodic https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=NavigationTiming&from=now-2d&to=now [20:11:19] ordering by descending dt value, the most recent one is 2020-02-07T16:59:59Z [20:12:08] SELECT * FROM event.navigationtiming WHERE year = 2020 AND month = 2 AND day = 7 ORDER BY dt DESC LIMIT 10; [20:12:12] for example [20:13:19] is that graph measured before the data gets to hive? is it really close to the varnish intake? [20:13:38] maybe something is just wrong with my query [20:14:04] this is very odd though, I was fairly sure you could order by dt [20:14:18] MAX(hour) gives me the same thing [20:14:44] gilles: that periodic thing i think is a metrics artifact cause it goes back a month, so not related [20:14:46] no gilles that graph is before hive [20:14:48] that is messages in kafka [20:15:21] gilles: the graph is kafka so after varnish but before hive [20:15:42] is that delay to hive expected? that's basically 3 hours behind at the moment [20:15:47] that sounds about right [20:15:49] ok [20:15:51] thanks [20:15:54] gilles: ya, always 2 at least [20:16:09] nevermind, then, sorry for the fire drill [20:16:20] np! let us know if it doesn't come in [20:17:03] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Performance perception survey stopped collecting data at 2020-02-07T17:00:00Z UTC - https://phabricator.wikimedia.org/T244599 (10Gilles) Seems like the delay of data to Hive is expected, generally more than 2 hours. Let's see if the data turns up later. [20:33:51] nuria: let me know if you have some time today for a 15-20 min chat. it's not urgent. [21:04:53] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10Ottomata) Thanks @sbassett! > Vulnerable Packages Since both of these are from stylelint-config-wikimedia@0.8.0, and... [22:34:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10saper) What is the canonical URL for the new stats after go live? For example, will this be https://stats.wikimedia.org/v2/#/pl.wikipedia.org/contributi... [22:37:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) The url: https://stats.wikimedia.org/v2/#/pl.wikipedia.org/contributing/active-editors/normal|line|2-year|~total|monthly would keep on working a... [22:40:59] (03CR) 10Nuria: [C: 04-1] Classification of actors for bot detection (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [22:46:54] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Security Readiness Reviews, and 2 others: Security Review For EventStreamConfig extension - https://phabricator.wikimedia.org/T242124 (10sbassett) >>! In T242124#5860849, @Ottomata wrote: > Since both of these are from stylelint-config-wikimedia@0.8.0, a... [22:54:56] 10Analytics, 10Analytics-Wikistats: Canonical wikistats v2 URLs should be permalinks to the period the graph is referring to - https://phabricator.wikimedia.org/T244618 (10saper) [23:14:42] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10saper) Thanks. I just realized there is a something I don't like with those links, just filed {T244618} for this... possibly a duplicate though [23:18:01] (03PS8) 10Nuria: Classification of actors for bot detection [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) [23:18:32] (03CR) 10Nuria: "On patch * added "actor/rollup/hourly" directory." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria) [23:20:36] (03CR) 10Nuria: [C: 04-1] "Still, testing oozie, -1-ing myself" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/562368 (https://phabricator.wikimedia.org/T238361) (owner: 10Nuria)