[05:28:19] 10Analytics, 10Analytics-Kanban: Configure Oozie job for loading geoeditors data into Cassandra - https://phabricator.wikimedia.org/T248289 (10lexnasser) [05:48:04] (03PS1) 10Lex Nasser: Configure Oozie job for loading geoeditors data into Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) [08:54:47] o/ [09:01:08] If I wanted to find all historical page names of a current page, would anyone have any ideas? Right now I would plan on using page_history in hadoop and using page_name and page_name_historical (but would love more up to date data rather than monthly snapshots..) [09:08:36] It's be great to be able to pull this data out and account for it in one or more of the pageview APIs actually [09:08:46] I guess that isn't already done anywhere [09:10:28] Also, if there has been any thought about a "google trends" type thing, I'd love to talk about that (with some magic power from wikidata too) [09:25:25] addshore: o/ - I am not authoritative enough to give you an answer, but maybe you could reach out to the analytics@ mailing list? More chances that members of my team reads it [09:25:36] :D [09:25:54] maybe joal will be around later on, but our schedule is a little bit more complicated in these days [09:25:56] basically I went down a rabbit hole yesterday and ended up writing this https://addshore.com/2020/03/covid-19-wikipedia-pageviews/ [09:26:13] Yup, very understandable, I'v been off work for the last week with this covid thing :P [09:27:30] really nice :) [09:27:38] (the page) [09:29:45] I want to try to get to some sort of vaugly automated thing where I pass it a wikidata entity representing a topic, such as COVID-19, form there it fans out on wikidata finding other entities within the topic, then find all the wikipedia articles for the topics, find all historical names for those articles, and then find the "interest" / page views for the whole topic. [09:30:14] Then a cool thing to add on the front of that would be automatically generate the "topics" of interest from the current top viewed articles on wikipedia :D [09:30:37] all it really needs is a bit of glue :D everything else is already there, yay [09:30:54] anyway, I'll let you get back to work :) [09:31:35] feel free to write/chat anytime in here, I replied back just to avoid a long wait for you, happy to brainbounce if I can help :) [09:35:49] sweet, I'll probably take another look at it all this evening [10:21:40] 10Analytics, 10Operations, 10Product-Analytics, 10SRE-Access-Requests: Hive access - https://phabricator.wikimedia.org/T248097 (10Volans) p:05Triage→03Medium [11:29:18] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) >>! In T244499#5986989, @elukey wrote: > Opened https://issues.apache.org/jira/browse/BIGTOP-3330 Fixed, deployed and tested. >>! In T244... [11:44:06] * elukey lunch! [11:45:23] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Inuka-Team (Kanban), 10Patch-For-Review: Set up pageview counting for KaiOS app - https://phabricator.wikimedia.org/T244547 (10nshahquinn-wmf) 05Resolved→03Open Thanks for checking, @Nuria! I see them too now. | date | KaiOS_app_version | pagev... [12:36:00] Hi addshore - there is currently no API (public or not) allowing to retrieve historical names [12:37:19] addshore: for recent data, events.mediawiki_page_move is your friend, goruping b page_id [12:37:38] addshore: for historical data, mediawiki_page_history should do [12:38:18] addshore: putting that info in a API and allowing to get a better count of pageviews is definitely in our minds as well :) [12:38:56] aaah yes, events.mediawiki_page_move would allow me to see recent things outside of the history snapshots, nice! [12:41:33] addshore: last, but not least - Thanks a lot for the great analysis on covid pages :) [12:59:01] hey teamm :] [13:10:33] joal: how long does that event logging page move data hang around for? [13:11:08] I guess I could use that in conjunction with the mediawiki_history to create my fill accurate list of old page names. the event logging filling in for the last month and the mediawiki_history table doing the rest [13:11:34] im actually quite psyched to go and have another dive into it all now xD [13:12:49] addshore: it's not eventlogging, it's eventbus I think (nevermind, just to be precise) - It's present since 2017 (2018 for codfw) [13:16:08] 10Analytics, 10Release-Engineering-Team: wmfphab reverting github/wikimedia/KafkaSSE master to old commit - https://phabricator.wikimedia.org/T248170 (10Ottomata) Thank you! [13:18:58] aaah yes, eventbus!! [13:19:12] do the events there never get pruned? [13:19:49] exactly addshore - those events don't contain PII data so they are kept [13:19:58] schweeet [13:20:37] I should be able to pretty easily generate a trend for COVID-19 since the start of time in that case across all sites for all article names :) [13:20:40] <3 [13:20:54] \o/ [13:21:02] I suspect it to be quite exponential :) [13:25:42] hii elukey ! i'm still going to try to do an eventgate-logging-external deploy today [13:25:44] want to do it with me? [13:27:39] ottomata: sure! But I'll be back in ~20 mins [13:28:06] is it ok? [13:29:58] ok! i think so. i'm going to prep some things and explain those parts for ya then when you get back [13:38:40] ottomata: ok I am ready [13:39:48] ok! [13:39:54] ok here's where I am [13:40:04] we are doing https://phabricator.wikimedia.org/T226986#5986998 [13:40:17] i forgot to disable client side error logging on friday, so i'm doing that now [13:40:24] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/582804/1/wmf-config/InitialiseSettings.php [13:40:36] i'm syncing out that mediawiki-config change enow [13:40:40] we'll undo that when we are done [13:41:03] we are also doing https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#EventGate_/_eventgate-wikimedia_Code_Change [13:41:23] so, we aren't making a code change, but we need the docker image to be rebuilt, so eventgate-wikimedia gets a commit [13:41:45] in this case, we are bumping up the git sha of the schema repos that are baked into eventgate-wikimedia's image [13:41:52] https://gerrit.wikimedia.org/r/c/eventgate-wikimedia/+/582806/1/.pipeline/blubber.yaml [13:42:12] and! it looks like the image build pipeline has finished [13:42:26] if you expand the last PipelineBot comment on that change [13:42:42] you'll see that it finished and posted the image version tag we want to deploy [13:42:45] 2020-03-23-133653-production [13:42:49] ok so the blubber change will yield a new docker image right? [13:42:52] next steps i'm going to let you do :) [13:42:53] yup! [13:43:59] * elukey checks https://docker-registry.wikimedia.org/v2/wikimedia/eventgate-wikimedia/tags/list [13:44:40] so I am now at 4. (todo) right? [13:45:19] OHHH no we still need to do 3. ah the code change and we have to backport it to the wmf branch that we need to deplopy [13:45:28] ok this is just mw extension deploy stuff [13:45:38] do you want to do it or shall I? i can provide instructions [13:45:52] please do it, if you can comment is enough for me :) [13:45:54] so I'll follow [13:46:02] ok [13:46:05] so i'm going to follow [13:46:06] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1b:_extension/skin/vendor_changes [13:46:25] to deploy this [13:46:25] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/578951 [13:46:58] so am merging ^ now [13:47:07] then we backport to the wmf version branch [13:47:52] let's see what is the proper branch rn... [13:48:57] * elukey follows [13:49:47] ah ha, this one will do [13:49:47] https://www.mediawiki.org/wiki/Special:Version [13:49:54] https://www.mediawiki.org/wiki/MediaWiki_1.35/wmf.24 [13:50:58] so in my local WikimediaEvents checkout [13:51:03] git fetch --all [13:51:41] git branch --track wmf/1.35-wmf.24 gerrit/wmf/1.35.0-wmf.24 [13:51:46] (my remote is gerrit not origin) [13:52:08] i checkout that branch [13:52:13] and then cherry pick our change [13:52:24] this one [13:52:24] * ottomata https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/578951 [13:54:17] ack [13:54:38] so this is safe since you previously disabled error logging [13:55:09] yes [13:55:10] because otherwise eventgate would receive events "malformed" [13:55:11] okok [13:55:39] yes, which for our purposes would be ok too, since we are still running this as a trial [13:55:48] and we haven't deployed everywhere yet [13:55:54] so error rates would be very low [13:56:00] yepyep [13:56:03] we might get some anyway, since the config change has to go out to clients [13:56:12] and some clients probably have the old config still cached [13:58:13] while i'm waiting for some jenkins [13:58:25] we also have to make a commit to deployment-charts to deploy our new eventagte-wikimedia image [13:59:33] point 4 [14:00:49] heh, actually that is part of point 2. [14:00:54] deploy to eventgate-logging-external [14:00:56] so [14:00:59] there are 3 moving parts here. [14:01:02] the schema repos [14:01:28] the eventgate-logging-external service (and the eventgate image version it is running, which includes the schema repos) [14:01:38] and the WikimediaEvents extension code change [14:01:50] (well 4 moving parts if you count the error logging config) [14:01:59] sure ok but I am following https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#EventGate_/_eventgate-wikimedia_Code_Change [14:02:05] oh [14:02:11] OH [14:02:11] and point 3 is related to the docker image, that IIUC we already have no? [14:02:19] yes sorry i was looking at the deploy plan on the phab ticket [14:02:19] YES [14:02:25] ah snap my bad :( [14:02:29] you are correct! you got it erigiht [14:02:30] right [14:02:39] so [14:02:40] https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/582808 [14:02:44] ottomata, dcausse: is now's meeting cancleed? [14:02:54] joal: yes [14:03:03] ack dcausse ) [14:03:05] joal: unless something to share? [14:03:15] nothing new on our side [14:03:17] nothing special nope [14:03:28] ok :) [14:03:40] 10Analytics, 10Event-Platform, 10Core Platform Team (Icebox), 10Services (later): revision-create events are sometimes emitted in a secondary DC - https://phabricator.wikimedia.org/T207994 (10WDoranWMF) [14:03:41] elukey: we can deploy either eventgate or wmevents extension in either order, since error logging is disabled [14:03:44] doesn't matter which order [14:03:47] 10Analytics, 10ChangeProp, 10Event-Platform, 10Core Platform Team (Icebox): RESTBase content rerenders sometimes don't pick up the newest changes - https://phabricator.wikimedia.org/T176412 (10WDoranWMF) [14:03:55] am still waiting for jenkins to finish merging [14:03:58] the WMEvents stuff [14:03:59] so [14:04:12] let's deploy eventgate-logging-external [14:04:31] yep change looks good with 2020-03-23-133653-production [14:04:48] cool, lets have you push these buttons this time! [14:04:56] joal: a quick question, do you think it's ok to normalize http header names (lowercase them) at event generation? [14:05:27] context is wdqs logs, I found it annoying to have mixed case in some header names [14:05:28] so i just merged that [14:05:28] so elukey , log into deployment.eqiad.wmnet [14:05:28] cd /srv/deployment-charts/helmfile.d/services/staging/eventgate-logging-external [14:05:30] hm - I can't think of what it impacts [14:05:48] ottomata: I was wondering what host, can I quickly fix the docs? [14:05:51] for n00bs like me [14:06:01] sure [14:06:03] point 5 is from deployment etc.. right? [14:06:21] joal: I've added as a reviewer to https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/581626 just so that you're aware of the change [14:06:48] yup [14:07:39] so the change is merged afaics, going to deploy1001 [14:08:09] k [14:08:29] tried with source .hfenv; helmfile diff but I don't see the change yet [14:08:38] lets see [14:08:42] I see [14:08:43] - image: docker-registry.wikimedia.org/envoy-tls-local-proxy:1.11.2-1 [14:08:46] + image: docker-registry.wikimedia.org/envoy-tls-local-proxy:1.12.2-1 [14:09:24] oh i thinki think there is a dirty checkout, fixing [14:09:33] just did sudo git reset --hard [14:09:39] ack [14:10:34] ok looks better [14:12:28] yes I do see [14:12:28] - image: "docker-registry.wikimedia.org/wikimedia/eventgate-wikimedia:2020-02-25-183224-production" [14:12:31] + image: "docker-registry.wikimedia.org/wikimedia/eventgate-wikimedia:2020-03-23-133653-production" [14:12:34] two times [14:12:36] dcausse: let's ask ottomata as well (added him to the review) - ottomata: lower casing http-header - any problem you might think of? [14:12:56] no, sounds like a good idea in general [14:13:07] ottomata: so next step 'source .hfenv; helmfile apply' [14:13:25] joal, ottomata thanks! [14:13:27] yup [14:13:48] ottomata: doing [14:13:52] change looks good to me dcausse :) [14:13:57] thanks! :) [14:14:13] so i need to update the docs a little bit elukey, since we now have a canary release [14:14:25] but this is actually upgrading both releases [14:14:29] canary and production [14:14:38] we can target one or the other if we wanted to be more careful [14:14:47] canary reelease only has a single pod [14:15:00] but it serves real traffic (well, not in staging, but in the other k8s clusters) [14:15:10] * elukey nodes [14:15:36] also /srv/scap-helm/eventgate/test_event_0.0.2.json still exists, so that curl test command should work [14:16:09] i kinda want to make testing all potential events with eventgate easier, which would be easier after https://phabricator.wikimedia.org/T242454 [14:16:31] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): Add examples to all event schemas - https://phabricator.wikimedia.org/T242454 (10Ottomata) a:05Ottomata→03None [14:18:22] elukey: did it finish? [14:18:35] ottomata: might have done something wrong, my shell was hanging on deploy1001 and I did control+c [14:18:41] hm [14:19:06] i think it looks ok [14:19:17] if you do [14:19:17] kubectl get pods [14:19:23] you can see that the two pods are each new [14:19:27] 2m and 5m olds [14:19:28] old [14:20:03] ah that curl command won [14:20:07] wont' work on an https port [14:20:16] ah ok [14:20:59] ok so, this part is not documented because its hard to do for each event and service [14:21:02] since there are so many [14:21:05] i want to make this part better [14:21:07] but [14:21:19] in my home dir there is an eventgate-post.sh script that will help [14:21:28] we can find the port we want by doing [14:21:28] ahhh you have secrets [14:21:31] :D [14:21:41] kubectl get services [14:21:46] ok if it works in staging should we proceed with codfw? [14:21:49] we want the tls port [14:21:53] 4392 [14:21:57] yes going to test this there now [14:22:04] also, we need to target the staging pod directly [14:22:30] the tls certificate only has e.g. the discovery UR> or the public URL in the SAN [14:22:36] so we need to do a special curl --resolkve command [14:22:40] which eventgate-post does [14:22:47] so we get the pod's IP [14:22:55] kubectl get pods -o wide [14:23:02] let's use [14:23:02] 10.64.75.42 [14:23:43] so [14:23:45] ~otto/eventgate-post.sh eventgate-logging-external.discovery.wmnet:4392 ~otto/event-examples/eventgate-logging-external/mediawiki_client_error_1.0.0.json 10.64.75.42 [14:24:02] (i also have a local copy of a mediawiki_client_error_1.0.0.json event example) [14:24:03] < HTTP/1.1 201 Created [14:24:05] looks good! [14:24:05] let [14:24:12] let's proceed with codfw and eqiad [14:24:34] /srv/deployment-charts/helmfile.d/services/codfw/eventgate-logging-external [14:24:39] yup [14:25:20] doing codfw [14:25:47] +1 [14:26:19] does helm kill pods one by one replacing them with the newer ones? [14:26:32] ok done [14:26:34] yea [14:27:01] not sure the exact order [14:27:21] ready for eqiad when you are done testing [14:27:34] but it depools the old one from service routing, spawns the new one, checks for readiness, and then pools the new one [14:27:36] /srv/deployment-charts/helmfile.d/services/eqiad/eventgate-logging-external [14:28:02] k, doing same command in codfw on 10.192.64.246 [14:28:10] to test [14:28:16] < HTTP/1.1 201 Created [14:28:21] proceed with eqiad! [14:28:32] doing [14:29:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): Add examples to all event schemas - https://phabricator.wikimedia.org/T242454 (10Ottomata) I wonder...maybe we should add example schema validation checking to jsonschema-tools! Then we could use CI to fo... [14:29:54] 10Analytics: SparkR Kernel not starting in Jupyter & JupyterLab - https://phabricator.wikimedia.org/T248314 (10EYener) [14:30:22] ottomata: done! [14:30:50] ok! same thing now on 10.64.64.200 [14:31:00] < HTTP/1.1 201 Created [14:31:01] hooray! [14:31:25] \o/ [14:31:40] great, now we juust need to deploy the extension code change [14:31:41] buuuuut [14:31:45] jenkins still hasn't merged it [14:32:10] cool! and i do see a validation error or two in codfw! :p [14:32:16] as expected! :) [14:35:10] 10Analytics: SparkR Kernel not starting in Jupyter & JupyterLab - https://phabricator.wikimedia.org/T248314 (10elukey) Hi! Do you mean notebook1003? I don't see recent activities in there, but I have restarted your notebook just in case you want to retry. But maybe you are trying from a different host? Would hel... [14:39:17] 10Analytics: SparkR Kernel not starting in Jupyter & JupyterLab - https://phabricator.wikimedia.org/T248314 (10EYener) Hi @elukey! Apologies - yes, notebook1003. This seems to have fixed the problem! Is there a way I can restart on my end? Also, is there a way I can access error logs? I would be happy to includ... [14:39:59] ok i got the extension change too merge [14:40:05] back to the backport [14:42:38] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/582819 [14:42:44] same thing but cherry picked to current branch [14:43:07] * elukey nodes [14:43:10] *nods [14:43:37] * ottomata more jenkins waiting [14:43:44] * joal sees elukey is now part of the matrix [14:44:11] sure :D [14:52:59] ok! [14:53:04] merged into branch [14:53:11] time to deploy [14:59:14] got it onto deploy host submodule [14:59:15] now running [14:59:16] scap sync-file php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js '[[gerrit:578951|clientError: Changes event fields (T226986)]]' [14:59:16] T226986: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 [15:01:30] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Services (watching): Add examples to all event schemas - https://phabricator.wikimedia.org/T242454 (10jlinehan) >>! In T242454#5991930, @Ottomata wrote: > I wonder...maybe we should add example schema validation checking to js... [15:02:25] a-team standup ? [15:12:24] 10Analytics: SparkR Kernel not starting in Jupyter & JupyterLab - https://phabricator.wikimedia.org/T248314 (10elukey) 05Open→03Resolved a:03elukey @EYener we'd prefer to do it ourselves, reporting the host and username is sufficient for us to find the problem usually :) thanks! Going to close the task bu... [15:22:28] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Event schemas common schema should set additionalProperties: false - https://phabricator.wikimedia.org/T248173 (10Ottomata) [15:22:45] 10Analytics, 10Analytics-Kanban, 10EventStreams: KafkaSSE issue tracker and continuous integration - https://phabricator.wikimedia.org/T248044 (10Ottomata) [15:27:50] 10Analytics, 10EventStreams: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) Hi! I'm sorry that this is the first time I'm seeing this task! I'm surprised that this was happening back in November to you. We recently (this... [15:34:44] 10Analytics, 10EventStreams: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) a:03Ottomata [15:34:56] 10Analytics, 10Analytics-Kanban, 10EventStreams: EventStreams drops the connection after 15 minutes, which makes it unreliable - https://phabricator.wikimedia.org/T242767 (10Ottomata) [15:37:10] 10Analytics, 10Analytics-Kanban: Automated deletion of actor data for bot prediction after 90 days - https://phabricator.wikimedia.org/T247344 (10Ottomata) p:05Triage→03High [15:37:20] 10Analytics, 10Analytics-Kanban: Create UDF for action id generation - https://phabricator.wikimedia.org/T247342 (10Ottomata) p:05Triage→03High [15:38:51] 10Analytics: Refine + EventLoggingSchemaLoader should use api.svc instead of meta.wikimedia.org directly. - https://phabricator.wikimedia.org/T247510 (10Ottomata) p:05Triage→03Medium a:03Ottomata [15:39:20] 10Analytics: Refine + EventLoggingSchemaLoader should use api.svc instead of meta.wikimedia.org directly. - https://phabricator.wikimedia.org/T247510 (10Ottomata) p:05Medium→03Triage [15:40:15] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics: pip not accessible in new SWAP virtual environments - https://phabricator.wikimedia.org/T247752 (10Ottomata) [15:40:30] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP, 10Product-Analytics: pip not accessible in new SWAP virtual environments - https://phabricator.wikimedia.org/T247752 (10Ottomata) a:03elukey [15:47:30] 10Analytics: Refine + EventLoggingSchemaLoader should use api.svc instead of meta.wikimedia.org directly. - https://phabricator.wikimedia.org/T247510 (10elukey) If we want to make this switch `api.svc.eqiad.wmnet ` will need to be whitelisted in the Analytics' VLAN firewall rules, and we'd have a problem when/if... [15:52:14] OH joal i forgot to mention i worked on https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/582131 and am looking for review! [15:54:06] ack ottomata :) [16:05:30] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) Ok! changes deployed. I verified that mw.track('global.error', ...} will produce the expected... [16:06:55] 10Analytics, 10Analytics-Kanban, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10Jdforrester-WMF) >>! In... [16:12:58] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) Ah we haven't yet resolved the x-request-id stuff. >> In T226986#5959396, @Tgr wrote: >> Putt... [16:19:56] 10Analytics, 10Analytics-Kanban, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10Ottomata) Sounds like a... [16:22:07] 10Analytics, 10Analytics-Kanban, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10Jdforrester-WMF) >>! In... [16:30:05] 10Analytics, 10Analytics-Kanban, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10greg) Hey ya'll, tl;dr... [16:38:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): SQL definition for structure data in commons metrics - https://phabricator.wikimedia.org/T247101 (10jwang) Task is done and reviewed. Keep this ticket open until we reach the final decision. [16:48:16] 10Analytics, 10Analytics-Kanban, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10JAllemandou) p:05Mediu... [17:02:55] 10Analytics: Spike: look at how old Pageview API data is accessed - https://phabricator.wikimedia.org/T247539 (10JAllemandou) Did that in a notebook - ask me to show you :) Basic numbers: - Almost all traffic is for per-article pageviews, using `all-access` access-method and both `user` and `all-agent` agent-ty... [17:06:20] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10mforns) [17:06:49] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10mforns) [17:06:53] 10Analytics, 10Patch-For-Review: Spike: POC of refine with airflow - https://phabricator.wikimedia.org/T241246 (10mforns) [17:13:58] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10mforns) [17:14:08] elukey: Hey, How would you feel if we drop at least 700GB from wikidata's replica in analytics cluster? [17:25:17] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10mforns) Hi @EBernhardson! As we dicussed in IRC, this is the task I was mentioning. I'd like to access your Airflow instance to be able to test our job, could you help me with that? I couldn't find any do... [17:57:42] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions), 10covid-19: Weekly updates on editors & readers - https://phabricator.wikimedia.org/T247873 (10MMiller_WMF) Hi! I just want to post here to underscore that I would find more frequent edits really valuable. Our team c... [18:03:50] Amir1: sorry I was in a meeting [18:04:20] Amir1: where are those 700GB? On Hadoop or on a stat host? (sorry didn't get it) [18:04:32] I think both [18:04:39] wb_terms table in wikidatawiki [18:04:57] ahhh okok then joal is probably the best poc for this [18:05:51] Amir1: also one unrelated note - on all stat boxes there are now xmldumps mountpoints [18:05:59] if you want to move away from stat1007 etc.. [18:06:13] (also except stat1007 all stats are now the same) [18:06:15] elukey: oh nice, thanks [18:06:36] I will run the thing on another server now \o/ [18:06:47] elukey: I think Amir is talking about in mysql [18:06:56] in mw mysql replica [18:09:46] 10Analytics, 10Growth-Team, 10Product-Analytics: Hash edit session ID in EditAttemptStep and VisualEditorFeatureUse whitelisting - https://phabricator.wikimedia.org/T244931 (10nettrom_WMF) We discussed this in our sync meeting between our two teams, and decided to reach out to the Security team to make sure... [18:11:32] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10EBernhardson) Correct, there isn't (yet) any documentation, but that page is where it would go. All WMF specific code does into the repository you linked, there is a second repository (search/airflow) for... [18:13:14] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10EBernhardson) Oh one other limit I implied but didn't call out above, I couldn't figure out kerberos + multi-tenancy in airflow. It could perhaps be figured out, but we didn't need it at the time so I wen... [18:14:29] ottomata: wouldn't them get sqooped? [18:16:06] i think so ya, just was clarifyinig for luca that you weren't talknig about stat boxes [18:16:42] leila: [18:16:45] ufff [18:16:47] sorry [18:16:51] :D [18:17:05] elukey: let me know. [18:17:17] ottomata: yep I got that it wasn't stat boxes, I thought it was more related to sqoop [18:17:33] leila: tab completion failed me, sorry :D [18:17:39] but hi! [18:18:02] :) [18:19:32] elukey: sorry. I got excited that I have a ping from you. :D I live with the email you sent for now. ;) [18:19:35] ttyl [18:30:12] (03PS3) 10Ottomata: Support multiple possible schema base URIs in EventSchemaLoader [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/582131 (https://phabricator.wikimedia.org/T240985) [19:04:03] * elukey off! [19:34:53] 10Analytics, 10Analytics-Kanban, 10Release Pipeline, 10Patch-For-Review, and 2 others: Migrate EventStreams to k8s deployment pipeline - https://phabricator.wikimedia.org/T238658 (10Ottomata) @akosiaris eventstreams in k8s looks good to me! Let's proceed removing it from scb. There are a couple of open c... [20:13:34] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Krinkle) >>! In T226986#5991713, @gerritbot wrote: > Change 582804 **merged** by Ottomata: > [operations... [20:22:24] 10Analytics, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) Oh, I understood that we did want to just for good measure, but that it didn't really matter a... [20:33:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - First Class Jupyter Notebook system - https://phabricator.wikimedia.org/T224658 (10Ottomata) Draft design document here! https://docs.google.com/document/d/1r-oqMXViWvQCqsYz0qzezZBWpip8LvkvCGF6GivFB_8 [20:37:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:37:38] 10Analytics, 10Analytics-SWAP: Jupyter Notebooks TLC 2018-2019 - https://phabricator.wikimedia.org/T188275 (10Ottomata) [20:37:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:38:10] 10Analytics: Test aqs_hourly job from (Search team's) Airflow - https://phabricator.wikimedia.org/T248328 (10mforns) @EBernhardson thanks a lot for all explanations! [20:38:32] 10Analytics, 10Analytics-SWAP: Functionality to share & view SWAP notebooks - https://phabricator.wikimedia.org/T156934 (10Ottomata) [20:38:37] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:39:53] 10Analytics, 10Analytics-SWAP: Support R Kernels by default for all users. - https://phabricator.wikimedia.org/T190453 (10Ottomata) [20:39:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:40:00] 10Analytics, 10Analytics-SWAP: Notebook machine to double as RStudio Server? - https://phabricator.wikimedia.org/T190769 (10Ottomata) [20:40:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:41:32] 10Analytics, 10Analytics-SWAP, 10Discovery-Analysis, 10Product-Analytics: Get 'sparklyr' working on stats1005 - https://phabricator.wikimedia.org/T139487 (10Ottomata) Hi, is this still relevant? Is SparkR enough? [20:41:42] 10Analytics, 10Analytics-SWAP: Users should be able to read their jupyter instance logs - https://phabricator.wikimedia.org/T198764 (10Ottomata) [20:41:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [20:41:53] 10Analytics, 10Analytics-SWAP, 10Discovery-Analysis, 10Product-Analytics: Get 'sparklyr' working on stats1005 - https://phabricator.wikimedia.org/T139487 (10Ottomata) [20:41:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Newpyter - SWAP Juypter Rewrite - https://phabricator.wikimedia.org/T224658 (10Ottomata) [21:12:31] 10Analytics, 10Analytics-SWAP, 10Discovery-Analysis, 10Product-Analytics: Get 'sparklyr' working on stats1005 - https://phabricator.wikimedia.org/T139487 (10GoranSMilovanovic) @Ottomata No as far as I am concerned (switched to Pyspark, while SparkR would do - if I ever need it).