[04:39:19] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Add mediarequests metrics to wikistats UI - https://phabricator.wikimedia.org/T234589 (10Nuria) [06:59:21] (03CR) 10Elukey: [C: 03+1] "\o/" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549427 (owner: 10Joal) [07:17:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10elukey) [07:42:48] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10elukey) >>! In T212258#5645221, @Isaac wrote: >> I think it is the suggested way, and keep in mind that you'll have to do it o... [08:01:43] Good morning elukey [08:02:45] bonjour! [08:10:04] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for next deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549427 (owner: 10Joal) [08:29:31] joal: going to roll restart druid analytics via cumin ok? [08:29:38] ok for me elukey [08:29:47] elukey: slooooowly, right? [08:31:43] joal: we have a cumin cookbook [08:32:29] one node at the time: [08:32:34] - historical first, sleep 300s [08:32:48] - all the other daemons (one at the time), sleep 3-s [08:32:51] *30s [08:33:00] then next host [08:33:07] \o/ [08:33:22] I just need to fire the command, that's it :) [08:33:32] and for the public one, it also depools [08:33:36] and repools [08:34:07] !log restart kafka on kafka-jumbo1001 to test opendjk [08:34:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:34:20] !log roll restart druid daemons on druid analytics to pick up the new jvm [08:34:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:34:57] also curious to see what happens to realtime netflow [08:35:47] interesting :) [09:05:02] joal: roll restart done, realtime still working [09:05:06] not bad! [09:05:10] This is great :) [09:05:10] proceeding with public ok? [09:05:15] yessir! [09:05:22] !log roll restart druid daemons on druid public to pick up the new jvm [09:05:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:24:39] Need to run for errand - will be back in a bit [10:09:04] going to restart the hadoop masters [10:41:13] addshore: o/ - let me know when you have a min [10:41:23] HI! [10:41:31] i have 50 of them! [10:41:33] ish [10:41:35] maybe 30 [10:42:02] will need 5! [10:42:25] not sure if you are aware but we are close to deploy Kerberos to the Hadoop cluster [10:42:32] we have been testing it for months [10:42:44] so there will be some changes for users, like a new password etc.. [10:42:46] oooh [10:43:10] I am setting up some informal meetings with teams (Research, prod analytics, etc..) to explain what is happening [10:43:15] okay [10:43:23] do you think that we could arrange something for WMDE? Interested? [10:43:30] Yes, sounds good to me [10:43:42] so anyone needing access to things in the cluster will need to know this stuff right? [10:43:53] how close is close to deploy? :) [10:46:20] yes correct, say we are two weeks out more or less [10:46:38] okay [10:48:10] next week is tech conf, so wouldnt be ideal to do anything that week (lots of chaos) [10:48:21] probably the week after might work [10:48:29] I'll forward this onto leszek and see what he says [10:51:46] sure.. let me know when it would be best to do a meeting [11:16:18] joal: stopped timers on an-coord1001, need to restart hive and oozie (FYI) [11:20:16] ack elukey [11:20:43] I am going to do AQS this afternoon or next week [11:20:56] same thing for kafka [11:21:34] going to lunch! [12:22:36] !log restart oozie and hive daemons on an-coord1001 [12:22:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:34:54] !log roll restart cassandra on aqs to pick up new openjdk upgrades [12:34:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:34:39] (03PS1) 10Fdans: Strip project from www. if it's specified in the URL [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) [13:35:32] mforns ^ this fix is not the most generic one ever, if you see a better place to put this, you know the router better than I do [13:42:01] 10Analytics, 10Inuka-Team (Kanban): Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10SBisson) Version 0.6.9 includes the PR [13:54:10] * elukey interview, be back in ~1h [13:59:55] (03PS1) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) [14:00:32] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Patch-For-Review, and 2 others: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Ladsgroup) a:03Ladsgroup [14:02:54] (03CR) 10jerkins-bot: [V: 04-1] Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [14:27:23] (03PS1) 10Joal: [WIP] Add python oozie lib and oozie-dumper script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549861 (https://phabricator.wikimedia.org/T237271) [14:50:14] (03CR) 10Fdans: [V: 03+2 C: 03+2] Add python script to generate intervals for long backfilling [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547750 (https://phabricator.wikimedia.org/T237119) (owner: 10Fdans) [15:03:50] (03PS2) 10Joal: Add python oozie lib and oozie-dumper script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549861 (https://phabricator.wikimedia.org/T237271) [15:05:37] wow! --^ [15:12:10] fdans, heya! Why is the www change needed? what is the problem? [15:12:55] mforns: visit wikistats from the footer link here: https://www.mediawiki.org/wiki/MediaWiki [15:13:00] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Hi @lexnasser , I have some thoughts and questions about the overall dataset / example from above and on the save flag. Overall dataset: - Are we narrowing the query... [15:13:37] fdans, I see [15:27:35] (03CR) 10Mforns: Strip project from www. if it's specified in the URL (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [15:28:33] mforns: thank you, that place probably makes more sense, let me try :) [15:28:46] ok, let me know if it also fixes the problem [15:29:21] fdans: one qs - do you launch every time new coordinators to oozie via crontab? [15:29:26] to backfill I mean [15:29:36] elukey: yes [15:29:59] I asking because when the time comes and kerberos is activated, then you'll need to have valid credentials to submit jobs to oozie [15:30:23] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Ok, I looked up the current Varnish/ATS server assignment in puppet/conftool-data/node. I think esams looks like an interesting and stable workload. Specifically, two s... [15:30:41] say that you kinit and your ticket expires after 24h, the next oozie submit will fail [15:30:47] (if you don't renew) [15:31:09] do you submit as analytics or fdans? [15:31:19] hmmm elukey analytics [15:31:49] ok then we'll be able to kinit from the keytab, it should be a quick change to the crontab [15:31:57] keep it in mind when we switch :) [15:34:03] (03PS2) 10Fdans: Strip project from www. if it's specified in the URL [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) [15:34:08] (03CR) 10Fdans: Strip project from www. if it's specified in the URL (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [15:34:44] elukey: thank you for reminding me of this, I hadn't considered it :) [15:35:37] fdans: np, I know that kerberos will be a hassle very soon, I am used to it now but a lot of people will have to adapt.. :( [16:05:16] (03PS1) 10Fdans: Fixed - reduce SLA time, include start time in name of cassandra daily coords [analytics/refinery] - 10https://gerrit.wikimedia.org/r/549876 [16:08:38] (03PS2) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) [16:11:21] (03CR) 10jerkins-bot: [V: 04-1] Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [16:14:43] (03PS3) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) [16:16:08] 10Analytics, 10Inuka-Team (Kanban): Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) [16:17:43] (03CR) 10jerkins-bot: [V: 04-1] Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [16:18:29] (03PS4) 10Ladsgroup: Add query to track WDQS updater hitting Special:EntityData [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) [16:22:23] (03CR) 10Nuria: [C: 04-1] Strip project from www. if it's specified in the URL (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [17:01:35] ping joal standup [17:06:06] milimetric: o/ [17:06:10] nuria: are you familiar with this: "My reading is that we can get the action=submit part from the uri_query"? I haven't seen any uri_query values with action=submit https://phabricator.wikimedia.org/T225538#5647788 [17:06:26] milimetric: is there a reason that the geoeditors-monthly release is only for wikipedia projects? [17:07:33] leila: yes, that’s all that was requested for public access, the rest is of course available on the private cluster [17:08:48] milimetric: I see. two things. I highly recommend making it clear in the documentation that it's about wikipedia data only. both in https://dumps.wikimedia.org/other/geoeditors/readme.html and https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Public [17:09:44] milimetric: also, if it's cheap to produce for other projects, why not? For example, GLAM folks may benefit from having this data for Commons. And Wikidata is another place which is a good to get a better sense of the spread of contributions across the globe (to identify gaps of contributors). [17:22:15] milimetric: I'll put these on meta. (I came up with another one;) [17:40:31] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) ` papaul@asw2-d-eqiad# show | compare [edit interfaces] - ge-1/0/7 { - description dbstore1002; - enable; - } [17:40:44] mforns: cave? [17:40:51] joal, sure! [17:41:00] (03CR) 10Nuria: [C: 04-1] "Sorry, the replace is taking two args, i do not know what i was thinking." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [17:41:16] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) [17:44:17] leila: for geoeditors, this data is really not that useful for large projects like wikidata or commons [17:44:28] nuria: why? [17:44:36] leila: it is not atechnical issue is that the communities are real large [17:44:54] leila: and teh most usefulness of it is in very small emeerging communities [17:45:20] leila: might be of use but it is never been brought up as a use case [17:45:50] leila: whereus this has come up very many times for smaller communities in smaller wikis, where editors located in a country [17:45:58] speak as to efforts to be done in that country [17:46:46] nuria: I don't agree. You can identify gaps in contributors in these large languages, too. For example, a competition such as Wiki Loves Africa can learn through this data the distribution of editors they brought to Commons in month x (not for certain, but they can receive some signals). This is very powerful information for them as they're trying to engage more countries and contributions from specific [17:46:46] countries. [17:46:58] large projects* [17:48:29] leila: then , let them request the data, all request we had to date (and they were many) came from folks working with emerging communities , the data is been available internally for quite some time and there is been very few interest for any projects that are not the ones releaesed. [17:48:33] *released [17:48:52] nuria: what is the cost of releasing it for us? [17:51:01] leila: for wikidata and commons? not much [17:51:44] leila: but note that it does not have the 1-4 bucket [17:52:17] leila: so teh wiki loves africa case you are thinking about might all be on that bucket (very spare access to 1 project in one month) [17:52:44] nuria: yup re 1-5 bucket, which is fine. That's actually another interesting piece of information. How much extra activtiy beyond the one edit (upload of a photo) will the new editors of these contests do (in different geographies). [17:53:09] nuria: I should emphasize that I'm advocating for it if the cost is almost 0 for you. Otherwise, waiting for hearing for the need makes sense to me. [17:54:16] leila: well, it is not almost zero cause we need to revisit jobs and retest everything, it has a cost , it is just not high [17:55:03] leila: do file a ticket if you think is a good use case for commons [17:56:18] nuria: ok. I'm writing a wikimedia-l announcement about it for affiliates/volunteers to know more about it. maybe I see what the interest level is after posting that. [17:56:24] thanks! [17:56:45] leila: ya, ideally requests for data would come from the communities, just like the original request for this data did. [17:56:54] nuria: uhu [17:57:13] leila: you had another question about the caching dataset? [17:57:14] nuria: I have a sense we will send the Global Innovation Index folks to this dataset moving forward, right? [17:57:27] nuria: not caching. it wasn't me. [17:59:32] leila: no, these are EDITORS, they count EDITS [17:59:54] leila: they might want to count editos (which yeah, totally, makes a lot more sense) [17:59:57] *editors [18:00:09] leila: but we proposed that earlier on and they declined [18:00:12] nuria: I'm not sure what you mean. (did I forget a question I had?) [18:00:46] leila: the GII people use metrics around edits, makes sense? [18:00:50] leila: not editors [18:01:59] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) ` papaul@asw2-c-eqiad# show | compare [edit interfaces] - ge-4/0/18 { - description "analytics1003 - no-bw-mon"; - } [18:02:20] nuria: oh! that! you're right. [18:20:42] 10Analytics: Make stats.wikimedia.org point to wikistats2 by default - https://phabricator.wikimedia.org/T237752 (10Nuria) [18:51:37] 10Analytics, 10Inuka-Team (Kanban): Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10SBisson) a:05SBisson→03None Unassigning myself here so someone from the Analytics team who actually know what they're doing can take it. [18:57:06] 10Analytics, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) [18:57:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Replace the Analytics Hadoop coordinator - Hive/Oozie/etc... (hardware refresh) - https://phabricator.wikimedia.org/T205509 (10Papaul) [18:57:27] 10Analytics, 10DC-Ops, 10Operations, 10decommission, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) 05Open→03Resolved Complete [19:02:40] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) [19:02:58] 10Analytics, 10Operations, 10decommission, 10ops-eqiad, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) 05Open→03Resolved Complete [19:25:44] (03CR) 10Nuria: [C: 03+1] "Looks good, it is missing ticket." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/542999 (owner: 10Fdans) [19:29:27] (03CR) 10Nuria: Add query to track WDQS updater hitting Special:EntityData (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/549859 (https://phabricator.wikimedia.org/T218998) (owner: 10Ladsgroup) [19:30:39] 10Analytics, 10Analytics-Kanban, 10Tool-Pageviews: Add mediarequests metrics to wikistats UI - https://phabricator.wikimedia.org/T234589 (10Nuria) a:03fdans [19:31:09] (03CR) 10Nuria: [C: 03+1] Add mediarequests per referer metric (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/542999 (owner: 10Fdans) [19:50:07] who is the person on your end organizing the office hours starting in January? [19:51:02] mgerlach on our end has put together a plan for ours (which will start in January, too) and I wanted to see if it makes sense if we do a joint office hours? [19:51:59] (03CR) 10Mforns: Strip project from www. if it's specified in the URL (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [20:01:16] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Nuria) >we'll need the save flag. If i understood your requirement you need a unique identifier that links the page with the "save" so as to know when cache has expired for a given ite... [20:01:40] leila: our office hours are deserted [20:01:56] leila: so join office hour can only be an improvement [20:01:58] nuria: in what sense? [20:02:37] leila: in that they have not happened for a while, is more like users stop by in the chat anytime, so yes, seems like a good idea [20:03:49] nuria: ok. nice. who on your end manages them? I think mgerlach should have a chat with them? (I'd like us to send our announcement out by the 4th week of november and we were thinking of the first one to be in January.) [20:04:10] (03PS3) 10Fdans: Strip project from www. if it's specified in the URL [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) [20:04:23] we're thinking of doing those for 6 months, experimentally, and then assess if they make sense. to me, joining with you makes a lot of sense as there may be data related questions that you all may know better than us. [20:04:40] (03CR) 10Fdans: Strip project from www. if it's specified in the URL (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [20:19:15] nuria: is the save flag relevant to upload.wikimedia because it seems that there are no uri_queries with `action` under upload.... Does this mean that upload.wikimedia should not be the uri_host that should be used for the data? [20:19:49] lexnasser: from their request they want two distinct datasets, one you have 90% of it, the upload one [20:20:20] lexnasser: the other one is distinct on nature and that was not clear until his last comment [20:20:28] so is the save flag relevant for the upload one specifically? [20:20:33] lexnasser: no [20:20:33] nuria:^ [20:20:41] lexnasser: only for the text one [20:20:48] nuria: ok, sounds good [20:40:12] (03CR) 10Mforns: [C: 03+1] "LGTM! I let Nuria give the +2" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [20:44:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion for netflow datasource in Druid - https://phabricator.wikimedia.org/T229674 (10mforns) Yes, @Nuria, the data starts 17th of August, so we can merge end of next week. Or on Monday the 18th? Better chance of having an ops person to... [21:20:51] 10Analytics, 10Analytics-Kanban, 10Research: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10mforns) Hi all! One idea that maybe can reduce false positives when there are traffic peaks for any given reason. I assume that when there's a traffic peak in a... [21:55:24] (03CR) 10Nuria: [C: 03+2] Strip project from www. if it's specified in the URL [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/549851 (https://phabricator.wikimedia.org/T237520) (owner: 10Fdans) [21:57:40] 10Analytics, 10Analytics-Kanban, 10Research: Add data quality metric: traffic variations per country - https://phabricator.wikimedia.org/T234484 (10Nuria) >So, maybe, when counting pageviews per country, we can leave the top N% most visited articles out of the calculation. Nice! I think this is an excellent... [22:00:14] 10Analytics: Address refinery security vulnerabilities with jackson and netty - https://phabricator.wikimedia.org/T237774 (10Nuria) [22:00:23] 10Analytics: Address refinery security vulnerabilities with jackson and netty - https://phabricator.wikimedia.org/T237774 (10Nuria) [22:01:39] 10Analytics: Address refinery security vulnerabilities with jackson and netty - https://phabricator.wikimedia.org/T237774 (10Nuria) The netty issue seems quite easy to deal with. For the jackson one we probably want to build locally and try out some jobs before updating globally.