[00:03:42] ottomata: it works! Says 'hi dan' [01:38:35] awesooome!~ [07:39:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10elukey) >>! In T237752#5802478, @saper wrote: > > I love the power `mod_rewrite` gives me, but after 6 months I always scratch my head again... The sol... [08:16:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Enable encryption in Spark 2.4 by default - https://phabricator.wikimedia.org/T240934 (10elukey) Sent an email to users@spark.apache.org, let's see if anybody comes back with suggestions! [08:41:36] 10Analytics, 10Analytics-Kanban, 10Operations, 10Wikimedia-Logstash, and 6 others: Move AQS logging to new logging pipeline - https://phabricator.wikimedia.org/T219928 (10elukey) [08:42:54] PROBLEM - Check the last execution of wikimedia-discovery-golden on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:42:58] 10Analytics, 10Analytics-Kanban, 10Operations, 10Wikimedia-Logstash, and 6 others: Move AQS logging to new logging pipeline - https://phabricator.wikimedia.org/T219928 (10elukey) 05Stalled→03Open Dan deployed the new version of service-runner for aqs, I applied the puppet patch and verified that the ne... [08:43:20] PROBLEM - Check the last execution of reportupdater-wmcs on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:44:28] !log move aqs to the new rsyslog-logstash pipeline [08:44:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:47:10] isaacj: o/ [08:47:19] I see that you are having fun on stat1007 :D [08:53:42] RECOVERY - Check the last execution of wikimedia-discovery-golden on stat1007 is OK: OK: Status of the systemd unit wikimedia-discovery-golden https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:54:08] RECOVERY - Check the last execution of reportupdater-wmcs on stat1007 is OK: OK: Status of the systemd unit reportupdater-wmcs https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:01:29] https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=stat1007&var-datasource=eqiad%20prometheus%2Fops&var-cluster=analytics&fullscreen&panelId=8 [09:01:33] lol [09:04:06] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:04:50] PROBLEM - aqs endpoints health on aqs1006 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:26] PROBLEM - aqs endpoints health on aqs1008 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:30] PROBLEM - aqs endpoints health on aqs1007 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:05:46] PROBLEM - aqs endpoints health on aqs1005 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:07:20] PROBLEM - aqs endpoints health on aqs1009 is CRITICAL: /analytics.wikimedia.org/v1/edits/per-page/{project}/{page-title}/{editor-type}/{granularity}/{start}/{end} (Get daily edits for english wikipedia page 0) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:19:58] RECOVERY - aqs endpoints health on aqs1008 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:20:02] RECOVERY - aqs endpoints health on aqs1009 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:20:02] RECOVERY - aqs endpoints health on aqs1007 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:20:18] RECOVERY - aqs endpoints health on aqs1005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:20:23] segment deletion :( [09:20:28] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:21:14] RECOVERY - aqs endpoints health on aqs1006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/aqs [09:28:59] I got a jstack from a broker this time [09:40:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Dropping data from druid takes down aqs hosts - https://phabricator.wikimedia.org/T226035 (10elukey) 05Resolved→03Open Re-happened this morning :( This time I have a thread dump of the druid1004's broker: ` 2020-01-15 09:17:58 Full t... [10:37:08] !log remove spark-core flume-ng from all the hadoop workers - T242754 [10:37:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:37:11] T242754: Removed not used CDH packages from Hadoop nodes - https://phabricator.wikimedia.org/T242754 [10:39:20] !log remove flume-ng from all stat/notebooks - T242754 [10:39:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:44:17] !log remove flume-ng and spark-python/core packages from an-coord1001,analytics1030,analytics-tool1001,analytics1039 - T242754 [10:44:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:44:20] T242754: Removed not used CDH packages from Hadoop nodes - https://phabricator.wikimedia.org/T242754 [10:44:55] 10Analytics, 10Analytics-Kanban: Removed not used CDH packages from Hadoop nodes - https://phabricator.wikimedia.org/T242754 (10elukey) p:05Triage→03Normal [10:47:09] PROBLEM - Oozie Server on an-coord1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.catalina.startup.Bootstrap https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie [10:48:10] ouch [10:49:52] so flume-ng is a dep of the oozie package, my bad [10:50:47] RECOVERY - Oozie Server on an-coord1001 is OK: PROCS OK: 1 process with command name java, args org.apache.catalina.startup.Bootstrap https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie [11:23:59] PROBLEM - eventlogging Varnishkafka log producer on cp4031 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:24:05] PROBLEM - statsv Varnishkafka log producer on cp4031 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:24:35] PROBLEM - Webrequests Varnishkafka log producer on cp4031 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:35:31] RECOVERY - Webrequests Varnishkafka log producer on cp4031 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:36:45] RECOVERY - eventlogging Varnishkafka log producer on cp4031 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/eventlogging.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:36:51] RECOVERY - statsv Varnishkafka log producer on cp4031 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/statsv.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [11:42:22] Hi Analytics! Question, we got aprroval from Security for the release of aggregated data related to a research project from one of our formal collaborators - see https://phabricator.wikimedia.org/T235309. The data is in the home directory of the collaborator on a stat machine. Could you tell me what would be the next steps for the data release and/or who is the best person to ask for next steps? Thanks! [11:46:27] miriam: o/ is the end goal to publish it in a place like https://dumps.wikimedia.org/other/analytics/ ? [11:46:45] in any case, I'd open a task with the Analytics tag explaining the use case [11:46:50] then we can take it from there [11:47:21] (we periodically groom the incoming column of the analytics tag so multiple people will check) [11:48:11] grazie elukey :) the analytics dataset page would be a good place, but any public repo like figshare would also work [11:49:29] I'll leave the final judgment to my analytics overlords then :) [11:49:40] (going to lunch!) [11:49:54] elukey: thanks and enjoy your lunch! [12:16:28] 10Analytics, 10Research, 10Security-Team: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Miriam) [12:48:41] Here for siesta time team [12:58:44] (03CR) 10Joal: [C: 03+1] "LGTM :) Still wondering about having TimeSeries in the name, but this is detail :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [13:06:16] (03CR) 10Joal: [C: 03+1] "Found 1 nit triple checking something else. Minor though :)" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [13:16:07] (03CR) 10Joal: "Another bunch of annoying comments, but minimal stuff - Should be ready to merge tonight I think" (0310 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [13:35:29] 10Analytics, 10Analytics-Kanban: The guava error still persists in data quality bundles - https://phabricator.wikimedia.org/T241375 (10JAllemandou) Git tells me that we bumped guava from 12 to 18 when adding jsonschema loader. IIRC the `com.github.java-json-tools - json-schema-core` dependency need `jackson-co... [14:08:39] elukey: yep, overnight script. i see it was killed (or died). i think i know the issue. apologies for not running in nice if it was getting in the way of other processes. i didn't think it was going to use that much memory but i had tweaked it and accidentally was storing a bunch more info than i needed in memory [14:11:09] isaacj: np! [14:11:27] there are limits that kill processes consuming X% of memory [14:11:41] it was 90% of total, I lowered down to 80 [14:12:04] ahhh that makes sense. yeah, it was slowly accumulating english wikipedia text in memory which it didn't need to do :) [14:12:39] i adjusted and am running again but i'll monitor memory consumption just to make sure [14:36:25] lovely morning indeed [14:42:34] FIRE FIRE FIRE [14:43:09] 10Analytics, 10Core Platform Team Workboards (Green): Flink Spike - https://phabricator.wikimedia.org/T241185 (10Milimetric) For most of the use cases there we can generate historical events and store them somewhere, that way we can replay them through our Flink compute layer whenever we want. So one decision... [14:45:59] milimetric: funny you comment on there --^ [14:46:26] oh yeah? [14:46:33] milimetric: I'm about to post an example of regular-update (daily batches) that would work for almost all use case listed :) [14:46:37] ah! [ANNOUNCE] Apache Superset (Incubating) version 0.35.2 Released [14:46:44] joal: wait... don't :) [14:47:09] we know we can do it in batch, that's what made ottomata sad when I mentioned it in the meeting [14:47:09] ok milimetric - need to drop (Lino just awakened), let's talk later about that :) [14:47:15] k ttyl [14:51:08] 10Analytics, 10Core Platform Team Workboards (Green): Flink Spike - https://phabricator.wikimedia.org/T241185 (10Ottomata) Like revision-create? HDFS/Hive is fine no? A benefit of using Flink to process these is it treats batch as a special case of a bounded stream. Aside from some job setup, the same cod... [14:51:31] (03PS11) 10Fdans: Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) [14:52:20] (03CR) 10jerkins-bot: [V: 04-1] Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) (owner: 10Fdans) [14:52:59] ffs I forgot to pull milimetric 's changes [15:03:57] heya teeaaammm [15:07:18] (03PS1) 10Elukey: Release Superset 0.35.2 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/565037 [15:15:41] 10Analytics, 10serviceops, 10Product-Analytics (Kanban): Clarify multi-service instance concepts in helm charts and enable canary releases - https://phabricator.wikimedia.org/T242861 (10Ottomata) [15:18:19] (03PS12) 10Fdans: Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) [15:18:55] (03CR) 10jerkins-bot: [V: 04-1] Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) (owner: 10Fdans) [15:20:02] (03PS13) 10Fdans: Add vue-i18n integration, English strings [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) [15:20:14] 10Analytics, 10Research, 10Security-Team: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Michele.tizzoni) Hi everyone. I am the first author of the reference paper and also the corresponding author for the publicatio... [15:20:25] really great, even superset 0.35.2 has issues [15:20:26] sigh [15:20:39] * elukey cries in a corner [15:21:04] going to try to drop/restore the superset db in staging to be sure [15:27:14] ah no nice better now [15:31:53] 10Analytics, 10Analytics-Kanban: Archive /home/ezachte data on stat1007 - https://phabricator.wikimedia.org/T238243 (10Milimetric) Maybe I have rose-colored glasses on, but I read the facebook comments a couple of times and it seems to me we have plans to address all the concerns: * people like the huge table... [15:32:19] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) FYI we will be using service-runner native prometheus metrics being developed in {T205870} for this. [15:35:30] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10elukey) [15:35:48] 10Analytics, 10Core Platform Team Workboards (Green): Flink Spike - https://phabricator.wikimedia.org/T241185 (10Milimetric) HDFS is fine to store events, given Flink is agnostic to fetching batches/streams. That's great. Where to put the output is a great question, and neither Druid nor Cassandra seem like... [15:36:28] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10elukey) Deployed https://gerrit.wikimedia.org/r/#/c/analytics/superset/deploy/+/565037/ to an-tool1005, anybody can test it via: ` ssh an-tool1005.eqiad.wmnet -L 9080:an-t... [15:36:50] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10User-Elukey: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10elukey) p:05Triage→03Normal a:03elukey [15:40:12] (03PS15) 10Mforns: Add Spark/Scala module for timeseries anomaly detection [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) [15:41:26] (03CR) 10Mforns: "Thanks!" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [15:55:35] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10User-Elukey: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10Nuria) pinging @mforns who has ops duty this week to test [15:56:49] (03PS4) 10Mforns: Add anomaly detection to data quality stats workflow [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) [15:57:10] (03CR) 10Mforns: "Thanks!" (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [16:01:33] nuria: client error logging sync? [16:01:42] ottomata: yes [16:02:06] mforns: any idea why sometimes pageview API shows 0 views and sometimes unkown? I was looking at the query and having a momentary blank: [16:02:07] https://tools.wmflabs.org/pageviews/?project=ru.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=%D0%92%D0%B8%D0%BB%D0%B0-%D0%9D%D0%BE%D0%B2%D0%B0-%D0%B4%D0%B8-%D0%A4%D0%BE%D1%88-%D0%9A%D0%BE%D0%B0 [16:03:19] milimetric, could it be because sometimes there are records found in cassandra that then are filtered out in AQS code resulting in 0 [16:03:27] whereas other times no records are found at all? [16:03:39] resulting in absent [16:03:54] but why would there be a record in Cassandra with 0 views, wouldn't that be filtered out by the loading query: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/daily/pageview_per_article.hql [16:04:10] and why sometimes... [16:06:32] milimetric, I don't recall if filtering for i.e. user_type is done when querying cassandra or in AQS code, if the latter, you can have a record in cassandra that contains only bots, and then when applying the filter user_type=user, then you get a 0 [16:06:44] but if the first, then this makes no sense [16:06:59] oh right 'cause we store multiple counts in one record! [16:07:05] yes [16:07:08] ok, cool [16:07:29] thanks! That must be it, but it does look weird from a user perspective [16:07:34] definitely [16:07:50] I'll add this to the FAQ [16:07:53] maybe we could fill in zeros in AQS before returning... [16:12:27] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10User-Elukey: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10mforns) Will do! [16:21:31] milimetric: this query returns 0's but not "unknowns" https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/ru.wikipedia/all-access/user/%D0%92%D0%B8%D0%BB%D0%B0-%D0%9D%D0%BE%D0%B2%D0%B0-%D0%B4%D0%B8-%D0%A4%D0%BE%D1%88-%D0%9A%D0%BE%D0%B0/daily/2019122500/2020011400 [16:22:36] nuria: exactly, because the "unknown" date is not in the result, look for 20200113 (Jan 13th) which is the unknown result [16:24:03] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats2/Metrics/FAQ#Why_are_pageviews_numbers_sometimes_0_and_sometimes_undefined? [16:25:01] milimetric: you mean there is no record for the 202013? [16:25:26] 10Analytics, 10Analytics-Kanban, 10Pageviews-API, 10RESTBase-API: REST pageviews API doesn't handle requests with page titles contains parentheses - https://phabricator.wikimedia.org/T241671 (10Milimetric) Good point! It's kind of a bug but one that would be very costly to fix. I explained it in the FAQ... [16:25:28] milimetric: with that small number of views it makes sense there will not be any records at all for 1 day [16:26:16] nuria: yeah, that's the problem. So in Cassandra, for Jan 12 for that page we have probably like {desktop/bot: 1} [16:26:29] and so when someone queries for desktop/user we return 0 [16:26:49] because we know that if we aggregated desktop/bot it's basically known that other counts are 0 [16:26:57] but for Jan 13 we just have no data [16:26:59] milimetric: we can verify that but seems possible [16:27:17] milimetric: that would be my guess but let's verify, we can do it after standup [16:27:21] I'm 100% sure, I remember from when we set this up it was an issue but I thought we put it up on an FAQ already [16:32:00] 10Analytics, 10Product-Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10Milimetric) I wrote to Yael, will wait for their reply. [16:47:09] elukey, nuria, I'm able to query wmf.data_quality_stats via Presto from Superset O.o [16:53:24] mforns: whattt [16:53:26] elukey: I'm having a weird Kerberos issue. My credentials (on notebook1003) are still valid; it's before the "renew until" time and `klist` exits with status 0. However, I can't start a spark session. Any ideas about what might cause this or what I should look for? [16:54:45] (The inability to start the Spark session is definitely a Kerberos issue; it hangs and after several minutes gives me a error mentioning Kerberos. I've also encountered this in the past and fixed it by running `kinit` again) [16:56:02] nuria: luca and I are discussing hardware in bc, we have q for you [16:56:06] if you have a moment to join before standupo [16:56:52] ottomata: k joining [16:57:12] (03CR) 10Joal: [C: 03+2] "LGTM! Merging :)" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [16:57:14] elukey: how would you feel about me getting another person to ask you a question right now? :) [16:57:44] it's like a human version of the Slashdot effect [17:02:00] (03CR) 10Joal: [C: 03+2] "LGTM ! Not merging (jar not deployed) - feel free to do it whenever :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/563200 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [17:02:36] (03Merged) 10jenkins-bot: Add Spark/Scala module for timeseries anomaly detection [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/561674 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [17:02:53] neilpquinn: in a meeting now, I promise that I'll answer right after it :) [17:03:16] no worries at all! take your time :) [17:19:39] neilpquinn: ok I am here now [17:21:16] neilpquinn: if I sudo as you on notebook1003 I see an expired toke [17:21:18] *token [17:21:44] return code is 0, I don't think it returns 1 if the credentials are expired [17:22:20] expires says 11/27/2019 11:01:51 [17:31:46] elukey: well, mine says expires 01/15/2020 02:55:48, renew until 01/15/2020 16:55:40—so are we looking at different things? [17:32:30] elukey: also, this is within my Jupyter virtual environment so that's probably different my ticket on notebook1003 bare metal [17:33:47] elukey: ahh, it looks like the exit status is only set when you run `klist -s` [17:33:53] so that answers that question [17:34:39] but it still leaves the question of why my commands were failing even before "renew until" time. Is that renewal up to 24 hours not automatic? [17:36:52] neilpquinn: ah ok so the ticket inside the notebook is different, this is something that I didn't get before now [17:37:20] ah, glad to know I helped you a bit :) [17:38:10] now I am not sure how a notebook handles a ticket, because there might be a bug in that [17:38:31] when you do klist, what file is mentioned? [17:39:40] Here's my current klist output: [17:39:46] https://www.irccloud.com/pastebin/efbx2EDQ/ [17:42:04] neilpquinn: something interesting - try to klist on a regular ssh session on 1003 [17:42:12] the file is the same [17:42:43] elukey: huh, you're right...and yet they have different expiration dates [17:43:11] neilpquinn: but it doesn't make any sense, the file with the kerberos ticket is the same [17:43:59] super weird... [17:47:59] and in theory in the credentials cache there should be only the last ticket [17:48:35] neilpquinn: can you try to kinit in the ssh session and then check what klist shows in the notebook? [17:49:13] elukey: huh, `ls /tmp`gives different results using SSH and using the Jupyter console [17:49:53] lovely [17:50:08] Okay, just did `kinit` via SSH [17:50:22] Here's `klist` via SSH: [17:50:25] https://www.irccloud.com/pastebin/yFU5EL0Z/ [17:50:37] And here's `klist` on the Jupyter terminal: [17:51:05] https://www.irccloud.com/pastebin/wmvhyp6T/ [17:52:26] neilpquinn: I'll need to dig into how jupyter handles kerberos, it seems that there is a different view when in the notebook [17:52:32] it wasn't what I thought [17:54:30] elukey: check out http://tljh.jupyter.org/en/latest/topic/security.html#per-user-tmp...I doubt we're using this specific JupyterHub distribution, but something similar is probably being done here [17:55:47] sigh [17:55:51] thansk neilpquinn [17:57:12] no prob! honestly, it's been working fine from me so I'm not sure you need to change anything...I work almost exclusively in Jupyter so in all these months I never noticed I had to kinit separately in the two environments [17:59:35] elukey: I was actually here for a different question (my ticket in the Jupyter environment hadn't expired, but Spark was still giving me Kerberos errors). In the meantime, my ticket expired for real so this instance of the problem is gone. I'll ask separately if I get it again :) [18:00:44] neilpquinn: thanks for the ping! I'll try to gather more info [18:02:49] hah, yuvi is all over jupyter docs [18:02:56] neilpquinn: q for you [18:03:12] what kind of notebook sharing do you wish you had? [18:03:27] just sharing with other swap users? sharing publicly? readonly? editible? [18:07:16] ottomata: hmm, good question! I would say I have two main use cases: (1) public, readonly sharing—"publication" and (2) version control-style collaboration with other SWAP users—"collaboration [18:10:35] There is a use case for private publication (privication?), but honestly it hasn't come up that often for me because even when the raw data can't be shared, _usually_ the aggregated output can be shared. The main thing that I worry about doing is accidentally publishing a sample of rows that contain private data (from doing `df.head()` or similar) [18:12:39] Currently, we use either Github (because it has a nice viewer for notebook files) or the published folder for notebook publication. GitHub is nice because you get the notebook right by the file source and there's a nice interface for pushing updates [18:13:55] ; on the other hand, the published folder is nice because notebooks published that way can run Javascript so we can include a toggle that hides the code cells by default (because asking a PM to look at all the code cells is a terrible reading experience) and can also do interactive stuff [18:14:50] also, the published folder lends itself better to doing automatic periodic updates of the notebook with a cron or similar [18:18:38] for collaboration, we essentially need version control; we use Git (and GitHub) for this currently, but the main problem is that the notebook format mixes input and output so looking at the source or the diff is basically useless. [18:19:00] The Jupyter project has tools to help with that, but we haven't started using them; I tried installing nbdime (https://github.com/jupyter/nbdime) at one point on one of the notebook machines but it failed in a weird way and I wasn't able to figure it out [18:20:35] ottomata ^ I could probably talk a lot more about this—happy to jump on a call or chat during All Hands :) [18:24:47] logging off for today! [18:48:18] Gist updated team: https://gist.github.com/jobar/940f08c644b337b22758f8ec5b15f3c3 [18:48:34] numbers are a lot higher (12M distinct images from commons) [18:49:02] I will also investigate how many image-loads come from action=edit (if possible) [19:06:55] 10Analytics: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) Hi all! Luca and I are giving some thought this quarter into how to re-write the existent SWAP system into something better! I've put a few requirements at the top of this ticket, but let's disc... [19:22:44] milimetric: would you have a minute? [19:44:52] Gone for diner - Back after [19:52:20] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Add support for private wikis to pageviews API - https://phabricator.wikimedia.org/T242793 (10Nuria) If we feel a wikis pageview data is of broad interest that wiki should not probably be marked as private. The idea of marking it as such is restrict data harves... [20:44:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10saper) Thank you. I think that getting rid of "/v2" is not a very worthy goal in itself. It could be also changed to something else. Unfortunately, I don... [20:44:57] 10Analytics, 10Analytics-Kanban: The guava error still persists in data quality bundles - https://phabricator.wikimedia.org/T241375 (10Nuria) >We could try using version 16, and also test refine :) +1 [20:45:00] hi. where would I find historical pageview data? I am looking for pageviews from a country as on March 2017 [21:04:07] sukhe: Hi - The pageview table in hadoop contains data since 2015 [21:08:03] joal: thanks! trying! [21:14:14] gone for tonight team [22:00:21] (03CR) 10Nuria: [C: 04-1] "This changeset renders a white page with error http://localhost:5000/dist-dev/bundle.js.en not found" (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/558702 (https://phabricator.wikimedia.org/T240617) (owner: 10Fdans) [22:47:55] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Add support for private wikis to pageviews API - https://phabricator.wikimedia.org/T242793 (10sbassett) Are we comfortable with every page //title// on private wikis potentially being made public, such that allowing any user to publicly find and/or enumerate th... [22:55:15] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Add support for private wikis to pageviews API - https://phabricator.wikimedia.org/T242793 (10Nuria) >Are we comfortable with every page title on private wikis potentially being made public No, we are not. Declining ticket on our end. [22:55:39] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Add support for private wikis to pageviews API - https://phabricator.wikimedia.org/T242793 (10Nuria) 05Open→03Declined