[01:35:32] 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Volker_E) [04:33:28] 10Analytics-Kanban, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Nuria) > (up to a few thousand visits a day up to tens of thousand but , yeah [04:33:40] 10Analytics, 10Design-Research: Setup and integrate analytics for Design Research Website - https://phabricator.wikimedia.org/T259322 (10Nuria) [06:03:27] good morning :) [06:32:49] !log roll restart of druid brokers on druid100[4-8] to pick up new changes [06:32:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:33:02] the cache metrics need a monitor stated in the conf now sigh [06:42:56] I'll also restart the historicals when the brokers' cache is more populated [09:00:19] !log SET GLOBAL expire_logs_days=14; on an-coord1001's mysql [09:00:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:00:45] !log SET GLOBAL expire_logs_days=14; on matomo1002's mysql [09:00:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:24:19] 10Analytics-Clusters, 10Patch-For-Review, 10User-Elukey: Upgrade Druid to its latest upstream version (currently 0.18.1) - https://phabricator.wikimedia.org/T244482 (10elukey) The upgrade of the public cluster went fine, I think that next week we'll be able to upgrade the Analytics one without too many probl... [09:41:09] from a quick test, varnishkafka seems compiling/working fine with varnish 6 [09:41:22] without any change to the source code [09:58:28] PROBLEM - Hue Kerberos keytab renewer on an-tool1009 is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue kt_renewer https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [09:59:11] ah! [09:59:14] this is downtime expired [09:59:52] PROBLEM - Hue CherryPy python server on an-tool1009 is CRITICAL: PROCS CRITICAL: 0 processes with command name python2.7, args /usr/lib/hue/build/env/bin/hue runcherrypyserver https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue/Administration [12:06:57] going afk for one/two hours (need to attend to something unexpected), if needed I'll be reachable via phone! [12:14:51] 10Analytics, 10MediaWiki-REST-API, 10Platform Team Sprints Board (Sprint 0), 10Platform Team Workboards (Green), 10Story: System administrator reviews API usage by client - https://phabricator.wikimedia.org/T251812 (10WDoranWMF) [13:05:40] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10Ottomata) @srodlan sounds good! No hurry, I'll just keep writing! :) [13:37:07] 10Analytics-Radar, 10Performance-Team, 10Readers-Web-Backlog (Tracking): Review referer configuration of origin/origin-when-crossorigin/origin-when-cross-origin - https://phabricator.wikimedia.org/T248526 (10TheDJ) Just an FYI: Google just announced they are moving their default (when you have NOT specified... [14:09:54] 10Analytics: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10elukey) I recently discovered that we have `base::expose_puppet_certs` in puppet. The class is aimed to copy the host's puppet TLS certificate and the Pupp... [14:28:31] elukey: hola! did we deploy the parquet indexation changes? [14:28:35] cc mforns [14:29:08] hi nuria, yes, but I still have to restart the oozie jobs [14:29:11] doing now [14:29:28] mforns: for pageviews and mediawiki reduced? [14:29:51] webrequest and mediawiki reduced [14:31:08] perfect [14:31:26] it is lovely that sqoop will start tomorrow [14:45:52] mforns: ok, let's just baby sit for w hile [14:45:57] *for a while [14:46:21] !log restarted mediawiki history reduced oozie job [14:46:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:46:36] !log restarted webrequest oozie bundle [14:46:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:50:46] nuria: in theory next week we should have a new indexation on mon/tues for the druid public cluster, then we could upgrade the analytics cluster [14:51:04] with the rolling upgrade there shouldn't be any user downtime [14:51:54] elukey: let's see how the mw history job does , joseph fixed a bug with skewed joins so i think it will work ok but we'll see [14:54:46] nuria: sure after the next indexation, but I am pretty sure it will be ok since we ran one and didn't see any issues [14:55:31] elukey: ya, i am more worried about other changes in mw than druid [14:56:10] nuria: yep yep [15:24:30] 10Analytics, 10Event-Platform, 10Technical-blog-posts: Story idea for Blog: Wikimedia's Event Platform - https://phabricator.wikimedia.org/T253649 (10srodlund) Awesome! I look forward to digging into these when I get back! [15:29:30] 10Analytics: RU reportupdater-ee-beta-features keeps logging a lot of daily errors to its logs - https://phabricator.wikimedia.org/T256195 (10mforns) Just got notified from the Growth team, that this data set is no longer used. I will remove the job from puppet. [15:32:47] 10Analytics, 10Analytics-Kanban: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10Milimetric) p:05Triage→03High [15:37:45] * elukey brb [15:52:03] I'm unable to start Jupyter on stat1007, unknown why not. [15:53:55] Not a blocker, I've moved my work to stat1005. [15:54:21] awight: weird [15:54:49] awight: can I try to remove your venv and retry> [15:54:50] ? [15:55:04] elukey: Sure, happy to help! [15:55:37] awight: can you retry now? [15:56:19] elukey: Success :) [15:56:30] perfect :) [15:56:35] So was this caused by something in my homedir? [15:56:58] sometimes it happens, the venv was not created correctly and/or some deps were not there [15:57:19] it should be auto-created by jupyterhub so not your fault probably [15:57:47] Okay well thanks for the instant resolution! [15:58:12] np! [16:08:43] logging off, have a good weekend folks! [16:16:25] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP on a single page) - https://phabricator.wikimedia.org/T259371 (10Jdlrobson) [16:16:38] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP on a single page) - https://phabricator.wikimedia.org/T259371 (10Jdlrobson) [17:09:27] 10Analytics, 10Better Use Of Data, 10Event-Platform: Goal: New schema and instruments must use the MEP system - https://phabricator.wikimedia.org/T259157 (10jlinehan) [17:15:52] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP on a single page) - https://phabricator.wikimedia.org/T259371 (10Jdlrobson) [17:17:13] GAHHH MY STACKED CONDA ENV IS NOT WORKING! [17:17:21] IT WA SWORKING [17:17:21] WHY IS IT NOW NOT WORKING! [17:17:25] GAHH [17:17:25] \ [17:28:55] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10MediaWiki-extensions-WikimediaEvents, 10Product-Infrastructure-Team-Backlog: Limit the number of mw-client-errors from a single client (suggested: maximum 5 errors should be recorded for a single IP o... - https://phabricator.wikimedia.org/T259371 [18:17:24] nuria et al. o/ [18:18:18] hi! [18:18:33] question: is it correct if I assume that country level pageviews for each project is not something your team is planning to work on during the next 6 months? [18:18:40] (for public release) [18:18:59] hi mforns . :) [18:28:23] leila: you mean by project, by country and by article, right? [18:32:05] mforns: not by article. but now that you asked, let me check something first. ;) [18:39:54] mforns: I have a few questions/requests related to having access to (human) pageview count to wikipedia projects by country (on a monthly time-scale). [18:42:39] 10Analytics-Radar, 10Better Use Of Data, 10Product-Analytics, 10Epic, and 2 others: Session Length Metric. Web implementation - https://phabricator.wikimedia.org/T248987 (10Nuria) >I don't mind, but maybe I don't understand what is confusing? Since many of the comments will no longer apply when we rework... [18:44:07] leila: i would not say that is correct, no, we will probably work on that on teh next 6 months [18:44:12] *the [18:44:47] nuria: 'got it. is there a planned publish time? (rough is fine. it will help me understand what "in-the-meantime" solutions we should think of.) [18:44:54] leila: remember per project and country aggregates are already public , per article are not [18:46:04] leila: we do not have a deadline for that no. [18:50:14] nuria: ok re the latter. [18:50:34] nuria: re the former, should I be able to get to the per project and country data via https://dumps.wikimedia.org/other/pageviews/readme.html ? [18:50:41] (I'm likely overlooking something) [18:52:28] leila: it is on teh APi: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Pageviews_split_by_country [18:52:46] leila: on privacy: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews/Pageviews_by_country [18:53:14] *the api [18:53:47] nuria: yup. helpful. thanks. [18:58:02] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10leila) @A455bcd9 will [[ https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews#Pageviews_split_by_country | pageviews split by country ]] give you what you need? while... [19:01:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Product-Analytics (Kanban): SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10Isaac) For reference, I followed up here: T249654#6352573 [19:05:51] hi, I have a question about permissions in superset for my account, is this the right place? [19:23:19] denny_wmf: SURE [19:23:24] denny_wmf: ask away [19:23:32] denny_wmf: do you have a wikitech account? [19:23:42] yes [19:24:05] denny_wmf: and a phabricator user? [19:24:08] https://www.irccloud.com/pastebin/HzF2c7NP/ [19:24:19] yes to both! [19:24:32] oh, what was this snippet thing, sorry, I hope that is readable [19:24:37] or else I'll repost [19:24:49] denny_wmf: there are different levels of access [19:26:28] denny_wmf: let me see, do you have access to stats machines? like can you ssh on stats1007.eqiad.wmnet (which will give you access to hadoop infra, superset is not the best place to run queries that might be longer than 1 minute, as it is just adashboarding tool) [19:27:31] haven't tried to ssh yet in any of our machines. that should be a short query :) [19:30:16] denny_wmf: right, that table is small but unless you have permits to access raw data you will not be able to query directly hadoop through superset [19:30:36] I didn't go through the Analyst onboarding and set up the ssh keys for that, will do that at a later time, thanks! [19:33:32] denny_wmf: all you need is a phab ticket similar to https://phabricator.wikimedia.org/T239654 [19:33:53] denny_wmf: since you are a WMF employee only verification of employment is needed [19:34:14] that step I already did: https://phabricator.wikimedia.org/T258837 [19:34:26] and also I can login to turnilo and superuser [19:34:40] denny_wmf: ah , let me look [19:35:17] 10Analytics-Clusters, 10Analytics-Radar, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) [19:35:20] denny_wmf: turnilo and superset access many data sources and most of it is druid and it is processed data [19:35:48] denny_wmf: querying hadoop directly (which uyou can do through presto via superset) requires access to raw data [19:37:33] ah, so if I can find the same data in druid, I can already access it [19:37:47] thanks, yes, now I see it [19:38:05] denny_wmf: for your line of work you are going to need access to infra/raw data and ssh so i would set those up [19:38:20] OK, thanks for the pointer to the docs on that! [19:38:28] sounds good! Thank you! that was super helpful! [19:38:29] denny_wmf: do please take a look at the sample ticket i send you [19:38:39] I will [19:38:44] denny_wmf: you will see you just need to paste your public key [19:39:18] yes, now I see it is a different ticket than I thought you sent me :) [19:40:10] denny_wmf: normally there is just one ticket and given that one and person job we give the appropriate access [19:40:47] denny_wmf: sorry for the confusion , in druid there is only geoeditors monthly, but not daily [19:41:17] monthly is actually sufficient for what I need now, so that's all good! [19:41:33] https://usercontent.irccloud-cdn.com/file/LQVEfAmP/Screen%20Shot%202020-07-31%20at%2012.41.17%20PM.png [19:42:06] denny_wmf: ok, this you can access in turnilo as well , see screenshot for country split for wikidata editors last month [19:42:24] yes, what I am looking for and what I cannot get from turnilo is "how many wikipedia wikis have fewer than 10 active monthly editors?" [19:42:53] with active being in the 5+ edit buckets [19:43:23] turnilo cuts the list off at a limit, so I don't get to the smaller wikis [19:45:38] denny_wmf: ya, for that the raw data store is the way to go , druid is for cubes of data of hundreds of millions of datapoints but it is not made for groupbys [19:51:14] denny_wmf: this? [19:51:17] select distinct(wiki_db) from geoeditors_monthly where activity_level!='1 to 4' and distinct_editors < 10 and wiki_db like '%wiki' and month='2020-05'; [19:51:47] yes! and the count is enough [19:52:14] denny_wmf: this is wikis though, not just wikipedias (the family), so.. [19:53:02] ah, yeah, then the list is better :) [19:56:29] denny_wmf i think this is the right select [19:56:38] select wiki_db, sum(namespace_zero_distinct_editors) as total from geoeditors_monthly where activity_level!='1 to 4' and sum(namespace_zero_distinct_editors) < 10 and wiki_db like '%wiki' and month='2020-05' group by activity_level [19:58:10] looks right to me! [19:59:38] thank you! [20:01:41] denny_wmf: [20:02:14] https://www.irccloud.com/pastebin/prY1GqzY/ [20:02:19] denny_wmf: select is [20:02:21] select wiki_db, sum(namespace_zero_distinct_editors) as total from geoeditors_monthly where activity_level!='1 to 4' and wiki_db like '%wiki' and month='2020-05' group by wiki_db, activity_level having sum(namespace_zero_distinct_editors) < 10; [20:04:13] perfect! thank you for the help! [20:04:45] denny_wmf: ssh keys next please! [20:05:32] denny_wmf: note numbers might be very different if you consider other namespaces, i think you know this but just making sure [20:05:56] already having the ssh key in my clipboard [20:06:02] yes, thanks for the warning! [20:10:15] https://phabricator.wikimedia.org/T259388 done [20:29:08] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10A455bcd9) [20:31:06] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10A455bcd9) @leila No, as said in my initial message this is not enough. It's for instance impossible to quickly answer "What's the most used language in Morocco?" (French? Eng... [20:32:54] denny_wmf: shinny start of the week for prompt responses, thank you [20:33:19] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria) [20:34:04] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria) [20:34:59] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria) [20:39:58] ottomata: let's talk monday about your eventgate server stuff [20:40:05] k [20:42:41] 10Analytics, 10Analytics-Wikistats: "Page views by edition of Wikipedia" for each country - https://phabricator.wikimedia.org/T257071 (10A455bcd9) And when I try to use the API with "all-wikipedia-projects" I get: https://wikimedia.org/api/rest_v1/metrics/pageviews/top-by-country/all-wikipedia-projects/all-acc... [20:57:32] bearloga: you around? and do you have a working mw + eventlogging dev env? [20:57:40] if so want to try somethign for me? :) [20:58:04] ottomata: yep still around and yep to working dev env [20:58:17] ottomata: HIT. ME. UP. YO. [20:58:23] ok, can you pull this change down? [20:58:23] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventLogging/+/617770 [20:58:32] in your local EventLogging working copy? [20:58:34] then run [20:58:36] npm install . [20:59:02] sure! i'll let you know when it's done [20:59:06] k [21:00:20] ottomata: inside or outside of vagrant? like should I `vagrant ssh` first or nah? [21:00:35] hm shoudln't matter actually! [21:00:40] do it outsid,e let's try that [21:00:44] k [21:04:08] ottomata: oh boy that's a lot of errors from npm install. [21:04:29] whaa [21:04:30] :( [21:04:34] ok try it in mw vagrant then instead [21:04:58] i bet they are from trying to compile node-rdkafka and your mac (you run macos?) is weird [21:05:15] YUUUUP [21:07:03] okay things are looking much better inside vagrant [21:08:03] ok lemme knwo when done [21:08:06] actually, in the meantime [21:08:13] different errors [21:08:16] oh [21:08:17] :( [21:08:19] what? [21:09:03] can i follow up on slack to share the error messages? i prefer sharing snippets and text files there [21:39:22] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10Nuria)