[00:22:58] RECOVERY - Disk space on Hadoop worker on analytics1070 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [02:17:56] RECOVERY - Disk space on Hadoop worker on an-worker1081 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [02:51:34] (03CR) 10Nuria: [C: 03+1] Preserve sanitized TwoColConflict events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621682 (https://phabricator.wikimedia.org/T260965) (owner: 10Awight) [02:51:40] (03CR) 10Nuria: [C: 03+2] Preserve sanitized TwoColConflict events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621682 (https://phabricator.wikimedia.org/T260965) (owner: 10Awight) [02:54:50] (03CR) 10Nuria: [C: 03+2] Sanitize ReferencePreviews events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621683 (https://phabricator.wikimedia.org/T260969) (owner: 10Awight) [04:42:58] RECOVERY - Disk space on Hadoop worker on an-worker1084 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [06:13:33] RECOVERY - Disk space on Hadoop worker on an-worker1083 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [06:32:32] 10Analytics: Check home/HDFS leftovers of lulu - https://phabricator.wikimedia.org/T261089 (10MoritzMuehlenhoff) [06:50:51] !log Dropping wikitext-history snapshots 2020-04 and 2020-05 keeping two (2020-06 and 2020-07) to free space in hdfs [06:50:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:53:53] RECOVERY - Disk space on Hadoop worker on an-worker1078 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [06:54:01] RECOVERY - Disk space on Hadoop worker on an-worker1094 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [06:54:13] RECOVERY - Disk space on Hadoop worker on an-worker1095 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [07:54:56] 10Analytics-Clusters, 10Discovery, 10Discovery-Search (Current work): mjolnir-kafka-msearch-daemon dropping produced messages after move to search-loader[12]001 - https://phabricator.wikimedia.org/T260305 (10Zbyszko) Quick test showed that reducing the number of partitions does not fix the issue. Our current... [08:47:22] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Performance-Team: Parse user agents in navtiming instead of relying on eventlogging to do it - https://phabricator.wikimedia.org/T260580 (10Gilles) a:03Gilles [11:04:57] 10Analytics, 10Growth-Team, 10Product-Analytics: PrefUpdate captures user preference modifications at registration - https://phabricator.wikimedia.org/T260867 (10phuedx) This appears to have been caused by {30731c2}, which removed the `PrefUpdateInstrumentation::isKnownSettingsPage` test prior to sending the... [11:10:00] I don't see the recommended way to start a PySpark kernel with ~/venv made available. Maybe I have to configure a custom kernel with PYSPARK_PYTHON set to ~/bin/python3? [11:15:12] awight_: are you talking about PAWS notebooks? [11:15:54] joal: exactly. I swear I was able to "import wmfdata" at some point, but struggling to reconnect the ~/venv library to my kernel now. [11:17:13] A pointer to the current recommendation would be great, I can probably work out the details from there... [11:18:42] awight_: you can't change the venv used by PAWS - Normally, the default venv is used by PAWS [11:19:44] Isn't the default venv ~/venv? At least, it seemed to act that way once. [11:19:58] awight: I thought it is [11:20:26] okay, I'll scratch my head for a bit :-) [11:23:40] When I "import pandas", the path is reported as: although I also have pandas in my ~/venv [11:24:09] hm [11:24:37] I'll delete my virtualenv and recreate... [11:29:24] awight: I realise I skip these issues by pip-installing package in the venv through the PAWS terminal [11:34:41] (same outcome after recreating ~/venv) [11:34:47] Okay will try the terminal next :-) [11:37:44] Maybe I'm misunderstanding: creating a new terminal won't find "pip" until I manually run "source ~/venv/bin/activate", so I'm not sure how this is any different than my ssh/screen terminal. Running a "New console for notebook" then "!pip install wmfdata" shows "Requirement already satisfied", which is interesting. [11:38:00] especially, [11:38:02] Requirement already satisfied: pandas>=0.20.1 in /srv/home/awight/venv/lib/python3.5/site-packages (from wmfdata) (0.25.3) [11:38:38] printing the pandas module as a string still responds with the /usr/lib... path. [11:39:50] ooh I think I see the point. I can (and should) "!pip install ..." from inside the notebook itself. [11:40:25] still, ImportError: No module named 'wmfdata' [11:40:28] O_O [11:41:43] same results when installing other packages! I've fully shut down all kernels a few times, fwiw. [11:42:50] Getting spooky: http://localhost:8001/user/awight/lab -> Cannot find template: "error.html" [11:42:55] In "/srv/home/awight/venv/share/jupyter/lab/static" [11:44:49] Blank page http://localhost:8001/user/awight/tree -- I'm just gonna try stat1005 now. This has all been from stat1007. [11:52:25] Same there. [11:58:22] Unrelated bug report: I believe that *only* on stat1005, the edit cursor is invisible. [12:02:09] ^ please disregard, it was my "text editor key map" setting. [12:11:29] awight: Let's wait for otto to be there - he is the master of paws [12:25:41] joal: Thanks--a few days' wait would be fine, of course. [12:26:31] I... ran the wmfdata queries in question earlier today (!) so am not blocked on the coding, I know it works. [12:29:02] Off-topic but maybe more on-channel, I'm excited to start playing around with a new dataset, the templates whose invocations are most frequently edited (like an editor-facing version of Special:MostLinkedPages) [12:29:56] The results look reasonable so far: "Cite web", "Infobox person", and so on. But this lets me find the most popularly human-edited templates in any language. [12:31:18] Nice awight! [12:32:22] :-) It's derived from a TemplateWizard event stream for now. I think the equivalent events for VisualEditor will need extra instrumentation to provide the template name. [13:14:31] hello team! [13:16:12] ottomata: if you're interested in the PAWS "import" mystery above, I have one more data point: I get the same behavior under old Jupyter, Jupyter Lab, and Newpyter with a "new stacked Conda environment". [13:16:28] hi mforns :) [13:16:53] (awight: don't have backscroll before 30 mins ago :) ) [13:20:51] http://wm-bot.wmflabs.org/browser/index.php?display=%23wikimedia-analytics if you'd like. But tl;dr is that I cannot "pip install " and then "import " from any pyspark kernel in PAWS. [13:21:34] I've tried: * port 8000, port 8880, stat1005, stat1007 (acting very broken now), the /user/awight/tree interface and /usr/awight/lab interface [13:22:10] It seems to be a new problem as of this morning, previously I would get my entire ~/venv loaded transparently. [13:22:44] 10Analytics, 10Patch-For-Review, 10Product-Analytics (Kanban): Whitelist new VisualEditorFeatureUse fields - https://phabricator.wikimedia.org/T256048 (10Ottomata) Hi @MNeisler! I just added the #Analytics tag to make sure this gets triaged by our team. In the future you can do the same. I'm going to also... [13:27:01] I have to pick small people up from school, please feel free to ping me here or on Phabricator if I can provide any information, and please kill any processes of mine if it's helpful. [13:36:13] awight: in paws? [13:36:16] in cloudvps? [14:21:44] (03CR) 10Milimetric: [C: 03+2] "not merging, subject to refinery-source deploy to make sure the jar version matches" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621702 (https://phabricator.wikimedia.org/T254233) (owner: 10Joal) [14:24:31] (03CR) 10Milimetric: [C: 03+2] Update mediawiki-history dumper to fix sorting bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/621511 (https://phabricator.wikimedia.org/T254233) (owner: 10Joal) [14:30:22] (03Merged) 10jenkins-bot: Update mediawiki-history dumper to fix sorting bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/621511 (https://phabricator.wikimedia.org/T254233) (owner: 10Joal) [14:37:33] joal: I was trying to go through refinery-source to figure out if unique devices counts activity on the API, to answer https://phabricator.wikimedia.org/T258748#6351495 [14:37:47] basically, do we need WMF-Last-Access on the REST API requests? [14:37:52] and I think yes, we do [14:38:10] or, at least, we currently count those requests if they are identified as pageviews, which some of them are [14:38:49] 10Analytics, 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Nuria) It is the same site, so yes. the piwik scripts are reused. [14:38:57] 10Analytics, 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Nuria) 05Open→03Resolved [15:02:54] ping ottomata milimetric fdans elukey , coming to batcave? [15:03:02] nuria: we are here [15:26:54] 10Analytics-Radar, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents, 10Product-Infrastructure-Data, and 2 others: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP on a single page) - https://phabricator.wikimedia.org/T259371... [15:37:32] 10Analytics, 10Analytics-Wikistats: Add overall ORES scores to Wikistats - https://phabricator.wikimedia.org/T178019 (10Milimetric) 05Declined→03Open The idea in the referenced discussion was to average scores for different classes of articles, per wiki. Like: enwiki, new articles, average revision ORES... [15:43:46] ottomata: Hi, this is all in PAWS. [15:47:18] awight: we don't maintain paws, that is in cloud vps / labs [15:47:24] https://wikitech.wikimedia.org/wiki/PAWS [15:48:26] ottomata: btw, will you have time to look at https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/609222 in the next day or so? [15:48:40] ya i see it! will look at it today [15:49:05] ty :) [16:03:05] 10Analytics: Check home/HDFS leftovers of lulu - https://phabricator.wikimedia.org/T261089 (10Milimetric) p:05Triage→03High a:03mforns [16:05:41] 10Analytics, 10Platform Team Sprints Board (Sprint 2), 10Platform Team Workboards (Green): Ingest api-gateway.request events to turnillo - https://phabricator.wikimedia.org/T261002 (10Milimetric) p:05Triage→03Medium We won't get to this soon, but it's a short task, so bug us if you either want one of us... [16:06:17] (03CR) 10Milimetric: [V: 03+2] Preserve sanitized TwoColConflict events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621682 (https://phabricator.wikimedia.org/T260965) (owner: 10Awight) [16:07:19] 10Analytics, 10Analytics-Kanban, 10Two-Column-Edit-Conflict-Merge, 10Patch-For-Review, 10User-awight: Sanitize and store historical conflict events - https://phabricator.wikimedia.org/T260965 (10Milimetric) [16:09:57] 10Analytics-Radar, 10Growth-Team, 10Product-Analytics: PrefUpdate captures user preference modifications at registration - https://phabricator.wikimedia.org/T260867 (10Milimetric) this is just instrumentation work, right? [16:10:44] Ok Razzi here! Let me know if my client is working [16:11:04] Hi razzius - works for me :) [16:11:13] :) [16:13:08] 10Analytics, 10Analytics-Kanban: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10Milimetric) a:03Ottomata How about analytics-announce? [16:14:41] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10Milimetric) p:05Triage→03High a:03fdans [16:15:28] 10Analytics-Radar, 10MediaWiki-REST-API, 10Platform Team Workboards (Green), 10Story: Followups to access logging after envoy 1.16 - https://phabricator.wikimedia.org/T260820 (10Milimetric) [16:19:54] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Structured-Data-Backlog: Migrate EventLogging MediaViewer data to Event Platform - https://phabricator.wikimedia.org/T260582 (10Milimetric) p:05Triage→03Medium [16:19:56] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Structured-Data-Backlog: Migrate EventLogging MediaViewer data to Event Platform - https://phabricator.wikimedia.org/T260582 (10Milimetric) >>! In T260582#6394558, @Tgr wrote: > All the other logging mechanisms have been removed from MediaViewer AF... [16:24:23] 10Analytics-Clusters: Establish what data must be backed up before the HDFS upgrade - https://phabricator.wikimedia.org/T260409 (10Milimetric) (can do a quick checksum to check the main cluster data against the little backup cluster data) [16:26:09] 10Analytics: Investigate showing realtime the eventlogging banner stream (currently sampled at 1%) - https://phabricator.wikimedia.org/T255446 (10Milimetric) p:05High→03Medium [16:27:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): Whitelist new VisualEditorFeatureUse fields - https://phabricator.wikimedia.org/T256048 (10Milimetric) [16:31:32] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607309 (https://phabricator.wikimedia.org/T256048) (owner: 10MNeisler) [16:32:01] 10Analytics-Radar, 10Growth-Team, 10Product-Analytics: PrefUpdate captures user preference modifications at registration - https://phabricator.wikimedia.org/T260867 (10phuedx) >>! In T260867#6406731, @Milimetric wrote: > this is just instrumentation work, right? That's correct. [16:32:19] (03CR) 10Mforns: [V: 03+2 C: 03+2] Add the new VisualEditorFeatureUse fields to eventlogging whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/607309 (https://phabricator.wikimedia.org/T256048) (owner: 10MNeisler) [16:32:31] (03CR) 10Nuria: [V: 03+2 C: 03+2] Sanitize ReferencePreviews events [analytics/refinery] - 10https://gerrit.wikimedia.org/r/621683 (https://phabricator.wikimedia.org/T260969) (owner: 10Awight) [17:33:00] 10Analytics, 10Analytics-Kanban: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10Ottomata) Does that exist? [17:35:20] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10Ottomata) This just needs some safety checks around the .toString() calls added to the [[ https://gerrit.wikimedia.org/r/plugins/gitiles/eventgate-wikimedia/+/refs/heads/master... [17:38:37] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10Ottomata) Actually, I can take this one, as I'll be making a [[ https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/609222 | change ]] in eventgate-wikimedia today for cdanis. [17:38:49] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10Ottomata) a:05fdans→03Ottomata [17:51:49] ottomata: Sorry for the confusion--my report is regarding stat100* / Jupyter / SWAP. [18:03:38] (03CR) 10Nuria: [C: 03+2] Tune threshold for hourly traffic anomaly detection [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620767 (https://phabricator.wikimedia.org/T251814) (owner: 10Mforns) [18:03:48] (03CR) 10Nuria: [V: 03+2 C: 03+2] Tune threshold for hourly traffic anomaly detection [analytics/refinery] - 10https://gerrit.wikimedia.org/r/620767 (https://phabricator.wikimedia.org/T251814) (owner: 10Mforns) [18:03:59] ah! [18:04:08] awight and this is on all stat nodes? [18:04:25] with regular jupyterhub on port 8000, (not the new newypter conda stuff on stat1008 8880) ? [18:14:07] ottomata: Well, stat1007 completely melted down on me and I started getting high-level errors preventing the interface from displaying at all. But prior to that, yeah it was behaving in the same bad way. [18:15:26] Currently, I'm on stat1005 and everything seems normal except for this py module loading glitch. I've tried 8000 (regular jupyterhub and jupyter "lab"), and also 8880. [18:16:01] stat1008 port 8880 is a totally different jupyterhub and env [18:16:17] = [18:16:25] awight: can you try [18:16:26] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter#Resetting_user_virtualenvs [18:16:28] on stat1007? [18:16:39] +1 happy to focus on any one environment, I just wanted to gather information at first [18:17:04] cool [18:17:04] sure! I did try this, but I'll follow the instructions exactly and see what happens [18:17:10] oh intreesting [18:19:09] * awight notes that I was trying old-school "virtualenv" on the last time around [18:26:20] ottomata: Those instructions make my notebook look slightly healthier, for example I get colored text, but still seeing the ImportError [18:26:44] ok that's good [18:27:43] hmm awight i'm going to restart your notebook server real quick, i'm not sure if it was restarted [18:28:02] can you try now? [18:29:17] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10Nuria) [18:29:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews - https://phabricator.wikimedia.org/T257860 (10Nuria) 05Open→03Resolved [18:29:34] 10Analytics, 10Analytics-Kanban: Create intermediate table that holds public data for geoeditors dataset so it can be used to load cassandra - https://phabricator.wikimedia.org/T244597 (10Nuria) 05Open→03Resolved [18:29:36] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add editors per country data to AQS API (geoeditors) - https://phabricator.wikimedia.org/T238365 (10Nuria) [18:29:59] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: EventStreamConfig's auto-topics config is incorrect - https://phabricator.wikimedia.org/T255888 (10Nuria) 05Open→03Resolved [18:30:58] ottomata: Maybe a new permissions error? "restarting kernel" -> "kernel error" [18:37:31] hm [18:37:55] let's clean the slate for sure [18:37:58] awight: log out of jupyterhub [18:38:06] then, i'll stop your notebook server [18:38:17] then you can log back in, start the server back up, and fingers crossed it'll be ok [18:39:38] let me know whne you've done that [18:39:48] when you've logged out of jupyterhub* [18:42:01] ottomata: okay, I'm shut down and logged out [18:42:34] ok awight log back in start server and try [18:42:36] i'm watching logs [18:43:43] ImportError :-( [18:43:57] is there more info? [18:44:00] jupyter looks fine [18:44:04] looks like aproblem with the venv [18:44:37] wait ImportError [18:44:37] ? [18:44:50] from just a python import statement? [18:44:52] what are you importing? [18:45:09] is it installed? (you just recreated your venv) [18:47:25] I'm running !pip install from within my notebook. I'm importing the wmfdata-python package, but have seen the same issue with any package. For example, I'll take a trending pypi package: [18:47:29] !pip install filediffs [18:48:12] ugh bad example, it has requirements we can't satisfy [18:49:54] okay here's a good example, [18:49:54] !pip install pyaltt2 [18:49:57] import pyaltt2 [18:50:35] The "import" works fine from the venv python REPL in ssh, but not in the notebook [18:51:17] that's strange, so the pip install goes fine [18:51:21] I've also tried pip installing from the ssh commandline, and from a Jupyter terminal [18:51:24] exactly [18:51:28] and it works from a notebook terminal, and its good [18:51:30] +1 [18:51:38] but improt in a python notebook kernel has importerror [18:51:43] The issue seems to be the glue between Jupyter python and venv [18:51:45] does it just say the package doesn't exist? [18:51:51] show me [18:51:53] import sys [18:51:55] sys.prefix [18:51:56] and [18:51:56] ImportError: No module named 'pyaltt2' [18:52:01] sys.path [18:52:07] in the notebook [18:52:15] and vs in the terminal [18:52:28] from the notebook: ['/tmp/spark-83ba134b-8a4c-4e35-8c63-8e7430022db8/userFiles-c17e4379-1f17-4fec-9347-69f8b57a2379', '', '/usr/lib/spark2/python/lib/py4j-src.zip', '/usr/lib/spark2/python', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', [18:52:34] '/usr/lib/python3/dist-packages/IPython/extensions', '/srv/home/awight/.ipython'] [18:53:13] from the terminal: [18:53:13] import sys [18:53:13] print(sys.prefix) [18:53:17] (oops) [18:53:26] /home/awight/venv [18:53:33] ['', '/srv/home/awight', '/srv/deployment/analytics/refinery/python', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/home/awight/venv/lib/python3.5/site-packages', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages'] [18:54:13] hmm hard to read here, which one is sys.prefix and which is sys.path [18:54:14] ? [18:54:51] the arrays are sys.path [18:55:06] am i missing sys.prefix for the notebook? [18:55:07] looks like I failed to paste the notebook sys.prefix, it's just /usr [18:55:10] :-) [18:55:12] just usr! [18:55:13] hm [18:55:18] what kind of notebook is this? [18:55:22] kernel [18:55:25] just a regular python kernal? [18:55:26] pyspark local [18:55:30] ohhh pyspark [18:55:31] hmmm [18:55:51] has this worked before? [18:55:58] hmmm [18:55:59] can you do [18:56:01] !which pip [18:56:01] ? [18:56:05] what's that show in the notebook [18:56:05] ? [18:56:16] ooh +1 the plain python3 kernel is definitely running in the venv as hoped [18:57:01] this is a reason why i want to get rid of the custom spark kernels [18:57:04] they cause too much confusion [18:57:19] if you like you can try use spark via a regular python kernel with neil's wmfdata [18:57:35] https://github.com/wikimedia/wmfdata-python/blob/master/wmfdata/spark.py#L69 [18:57:48] i guess it always does yarn? but we should make it smarter [18:58:01] :-) nice wrapper [18:58:02] but awight has pip isntall + import worked in pyspark notebook before? [18:58:12] for you? [18:58:22] actually, it makes sense that that doesn't work [18:58:23] hm [18:58:36] I usually do the pip install from a commandline, so not 100% sure about that. But I am sure that I've been using venv modules in pyspark kernels until today [18:58:58] pyspark yarn won't boot :-/ [18:59:15] the kernel? [18:59:19] or via wmfdata? [18:59:27] the "pyspark yarn" kernel, sorry. [18:59:30] hm [18:59:36] oh [18:59:37] did you kinit [18:59:37] ? [18:59:38] Trying neil's builder workaround next, that looks promising [18:59:48] yes, and klist shows the active tokens [18:59:49] you have to kinit on the jupyter termiinal [18:59:59] j,, [19:00:04] :facepalm: [19:00:21] yeha unforuntetly it is a different session than the regular ssh terminal one [19:01:21] ottomata: mind if I switch to the "lab" version, still on port 8000? if that's expected to act the same... [19:01:27] that should be exactly the same [19:01:37] awight: if you can get the regular python kernel instead of the custom spark kernels and get thigns to work, i'd recommend that [19:01:48] i'd like to remove the custom spark kernels eventually [19:01:54] and i'd rather not spend time figuring out why they aren't working [19:02:17] Perfect for my needs as welll [19:02:28] an import of a lib in your venv in a custom spark kernel probably won't work, not sure why it would have in the past, and it def would not work in yarn [19:02:51] FYI, I didn't have kerberos tokens in that last environment. But that wouldn't affect simple "!pip install" and "import ", right? [19:03:14] right, kerberos is just for interaction with hadoop and yarn stuff [19:03:40] * awight hopes you all don't star to regret letting the amateurs self-serve :-) [19:03:43] *start [19:04:04] :) spark is complicated [19:04:08] distributed is complicated [19:05:42] cdanis: merged that change [19:05:52] wooot thanks ottomata ! [19:05:54] you ok with deploying it to wherever you need? i guess you need it in logging? [19:06:05] or analytics? [19:06:08] yeah, in logging [19:06:11] aye [19:06:27] I might wait on deploying it for a bit -- I also need to write jsonschema for the NEL format and do some local testing [19:06:38] cool [19:06:40] will definitely be hassling you again :) [19:06:41] ok i'll wait then too [19:06:43] sounds good [19:10:20] ottomata: Thanks for all the debugging help! Plain python kernel + wmfdata wrappers FTW :-D [19:10:29] great! [19:10:32] glad it works [19:13:55] Hmm, losing the "local" master might be a bit of a drawback in my case, I think my smaller queries are much slower. But not a blocker. [19:17:19] yeah awight i think that should be supported by wmfdata [19:17:25] i betcha neil would appreicate a patch ;) [19:17:36] that should probably be the default, actually [19:17:39] hmm [19:17:56] or maybe not, since it is also used in his lib as an abstracted way to make sql queries [19:17:57] to hive [19:18:10] but ya should at least be parameterized [19:20:33] Great, I'll tinker with this. Running into some other little gotchas, like mariadb queries need to include a constant "database()" column in order to group results by wiki. [19:20:45] Growing pains :-) [19:51:33] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10Ottomata) Thanks! I just re-read and edited the 3rd post a bit more, but I think it looks good. Images: I don't have any good ideas for the first two posts.... [20:11:14] 10Analytics, 10Analytics-Kanban: Create new mailing list for analytics systems users - https://phabricator.wikimedia.org/T260849 (10Milimetric) No, sorry, I was suggesting that as the name [20:14:21] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10srodlund) @Ottomata RE waiting until next week -- this is fine. I already moved these to the blog to prep, but we can definitely wait for additional feedback a... [20:17:04] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10Milimetric) >>! In T257071#6335139, @A455bcd9 wrote: > Hi, > > Actually there's maybe something simpler and as useful: instead of displaying the Wikipedias read by country o... [20:18:45] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10A455bcd9) Thanks for your answer. Too bad :( [20:23:40] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10mforns) > I don't have any good ideas for the first two posts. For the EventGate one, I kinda like the idea of using a picture of a stile, maybe this or this?... [21:04:39] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Structured-Data-Backlog: Migrate EventLogging MediaViewer data to Event Platform - https://phabricator.wikimedia.org/T260582 (10Nuria) pinging @MarkTraceur and @Ramsey-WMF so they confirm they no longer wish to process this data, note that instrume... [21:08:28] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: NULL-values for useragent column in event.searchsatisfaction - https://phabricator.wikimedia.org/T259944 (10Nuria) 05Open→03Resolved [21:46:54] Hey team, I'm wondering if anybody has opinions on using a personal nickname, connected to a personal email, here (this one is). I'm looking into a cloak and am wondering if that should be done with a nickname connected to my work email. [21:51:48] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Structured-Data-Backlog: Migrate EventLogging MediaViewer data to Event Platform - https://phabricator.wikimedia.org/T260582 (10Ramsey-WMF) Confirming that there's no need to process this data. The SD team will replace the code with Event Platform... [22:17:57] 10Analytics, 10Product-Analytics: Get "edits hourly" on a daily basis - https://phabricator.wikimedia.org/T231938 (10kzimmerman) @JKatzWMF @Tnegrin wanted to give you a heads up that this task has moved out of Analytics Engineering's backlog and falls under their work to make incremental updates available in t... [22:18:59] 10Analytics-Kanban, 10Product-Analytics: Data Lake incremental Data Updates - https://phabricator.wikimedia.org/T258511 (10kzimmerman) [22:53:45] razzius: I think its up to you! but other SRE's might have better opinions on that [22:54:02] you could joini in #wikimedia-sre and ask there [23:09:58] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10Structured-Data-Backlog: Migrate EventLogging MediaViewer data to Event Platform - https://phabricator.wikimedia.org/T260582 (10Nuria) if up for a little while is 1 month, no. But leaving that code unmaintained for a long time will just eventually...