[06:59:37] !log restart eventlogging on eventlog1002 to pick up new logging settings [06:59:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:46:16] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, 10Services (designing): Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10mobrovac) The caveat with the maximum number of topics is that Kafka has no hard limit on it because it depends on zookee... [08:51:36] PROBLEM - Hadoop NodeManager on analytics1051 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [08:52:02] this is me checking --^ [08:52:46] RECOVERY - Hadoop NodeManager on analytics1051 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:17:36] 10Analytics, 10Patch-For-Review, 10User-Elukey: Refactor puppet code for the Hadoop Analytics cluster to roles/profiles - https://phabricator.wikimedia.org/T167790 (10elukey) The last patch ends the journey of this big refactoring, since we are now able to have multiple hadoop clusters defined in puppet (pro... [09:17:51] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor puppet code for the Hadoop Analytics cluster to roles/profiles - https://phabricator.wikimedia.org/T167790 (10elukey) [09:19:44] !log restart webrequest-load-wf-text-2018-8-1-7 (died due to yarn restarts) [09:19:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:05:15] 10Analytics-Kanban, 10Patch-For-Review: Simplify and document how to increase log verbosity/level for Eventlogging - https://phabricator.wikimedia.org/T200765 (10elukey) Added also https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration#Raise_log_verbosity_to_debug [10:05:23] 10Analytics-Kanban, 10Patch-For-Review: Simplify and document how to increase log verbosity/level for Eventlogging - https://phabricator.wikimedia.org/T200765 (10elukey) [10:06:39] !log restart all the yarn nodemanagers after minor max memory allocation change [10:06:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:09:32] 10Analytics, 10Beta-Cluster-Infrastructure, 10Services (watching): deployment-tin:/srv/mediawiki-staging/wmf-config/event-schemas is out of sync - https://phabricator.wikimedia.org/T199270 (10dcausse) 05Open>03Resolved a:03dcausse [10:40:36] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10elukey) Any update on this one? [10:41:29] 10Analytics, 10Operations, 10hardware-requests: eqiad: (2) hardware refresh for analytics1003 - https://phabricator.wikimedia.org/T198685 (10elukey) Quick check in to see if we can get the quotes for this host during the next couple of weeks. [11:51:16] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, 10Services (designing): Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10elukey) >>! In T199432#4468290, @mobrovac wrote: > The caveat with the maximum number of topics is that Kafka has no hard... [12:05:41] 10Analytics, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 (10elukey) Created T200895, pending work in there before proceeding. [13:10:54] mforns: hola :) [13:11:00] got back without any issue? [13:11:12] it might have been related to freenode then [13:26:04] ottomata: o/ [13:26:06] morningggg [13:26:17] if you have time today can you make me channel's op? [13:27:46] 10Analytics, 10ChangeProp, 10EventBus, 10WMF-JobQueue, 10Services (designing): Consider disabling automatic topic creation in main-kafka - https://phabricator.wikimedia.org/T199432 (10Ottomata) > One thing that might be good to do is to limit (via ACLs) the users that can create topics. I started commen... [13:28:12] hiaaa [13:28:24] elukey: of analytlics? [13:28:27] i don't thikn I am an op [13:28:33] is anybody? [13:28:50] ottomata: IIRC you were one! [13:29:04] yep for this channel I mean [13:30:20] i am? [13:31:03] elukey: i just tried [13:31:04] and got [13:31:05] You're not a channel operator [13:31:29] :( [13:33:41] I tried msg chanserv access #wikimedia-analytics list [13:33:52] and you and me are listed with a lot of flags [13:40:54] I could not join without identifying, it said I was banned [13:42:53] elukey: nice work with the multi hadoop patch thing [13:43:02] \o/ [13:43:04] is that in anticipation of data lake stuff? [13:43:20] i kept trying to think of other ways to do it but couldn't think of anything better [13:45:31] same thing, I tried to come up with other solutions for the data lake in labs but they were all horrible [13:45:54] in theory now we can have multiple clusters in prod [13:46:22] (03PS4) 10Mforns: Add saltrotate, a script that manages cryptographic salts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/449249 (https://phabricator.wikimedia.org/T199899) [13:48:04] so I need to go in a bit to run an errand before going to India.. I hope to be back before standup (since I should be done in ~30/40 mins), but I might miss it :( [13:48:08] in case I'll send e-scrum [13:48:49] * elukey afk! [13:49:00] OK! [13:49:23] 10Analytics: Use Snakebite instead of subprocess.Popen in HdfsUtils - https://phabricator.wikimedia.org/T200904 (10mforns) [14:07:44] addshore@stat1006:~$ mysql --defaults-extra-file="/etc/mysql/conf.d/research-client.cnf" -h analytics-slave.eqiad.wmnet [14:07:44] ERROR 1275 (HY000): Server is running in --secure-auth mode, but 'research'@'10.64.53.31' has a password in the old format; please change the password to the new format [14:07:45] :( [14:10:25] ottomata: any idea ^^ should I file a ticket? [14:17:25] oh huh! [14:17:40] addshore: let's ask in #wikimedia-operations [15:00:54] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, and 3 others: Redesign EventStreams for better multi-dc support - https://phabricator.wikimedia.org/T199433 (10Ottomata) [15:18:48] (03CR) 10Elukey: [C: 031] Add MobileApp fixes to EL sanitization whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447088 (https://phabricator.wikimedia.org/T200095) (owner: 10Mforns) [15:20:24] hi ottomata or joal! Is there an easy way to get pageview_hourly for non logged-in users only? thankss :) [15:32:45] maybe ping for mforns ^? [15:33:11] ottomata, thx! [15:33:33] ottomata, you mean miriam__'s? [15:34:26] hola mforns :) [15:34:31] holaaa [15:34:36] thanks ottomata! [15:36:19] hey mforns, we are trying to get an idea of number of pageviews from non logged-in users. Should I use the webrequest table? [15:36:23] miriam__, you mean generating a pageview_hourly-analogous data set for non-logged-in users? [15:36:38] mforns, yes :) if possible :) [15:36:43] miriam__, I don't think we have logged-in/non-logged-in information in webrequest.... [15:36:48] not sure though, looking [15:37:07] merci mforns :) [15:39:16] miriam__, yes, webrequest has that info stored within x-analytics field [15:39:22] https://wikitech.wikimedia.org/wiki/X-Analytics [15:40:34] amazing! so, from this, we should be able to draw hourly statistics for non logged-in users [15:40:35] but pageview_hourly does not have such info [15:41:10] yes (according to documentation) [15:41:13] yes true, could we aggregate the info in the webrequest table and get similar statistics? [15:41:39] mforns, ok, great, we'll give it a try! [15:42:23] if the x-analytics::loggedIn field is working, sure! would you like to have that as a one-off ? [15:46:06] mforns, yes, we would need these stats over a short period of time (i.e. the 12 days - June 28/Jul 9- when we ran the data collection on citation usage) [15:46:23] aha [15:48:06] miriam__, looks like the field is operational: [15:48:11] select x_analytics_map['loggedIn'], count(*) as c from wmf.webrequest where year=2018 and month=7 and day=31 and hour=0 group by x_analytics_map['loggedIn']; [15:48:23] OK [15:48:23] _c0 c [15:48:23] NULL 279131128 [15:48:23] 1 2993200 [15:48:23] Time taken: 42.354 seconds, Fetched: 2 row(s) [15:48:56] mforns, looks amazing :) thanks SO much!! [15:49:01] np :] [15:58:16] (03PS4) 10Sahil505: Made Bar-chart popup dynamic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/449029 (https://phabricator.wikimedia.org/T192416) [15:58:41] (03CR) 10Sahil505: [C: 04-1] "WIP" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/449029 (https://phabricator.wikimedia.org/T192416) (owner: 10Sahil505) [16:18:50] (03CR) 10Mforns: [V: 032 C: 032] "Self-merging to fix EL sanitization alarms." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447088 (https://phabricator.wikimedia.org/T200095) (owner: 10Mforns) [16:23:30] !log deploying refinery using scap [16:23:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:29:40] ottomata: qq - I am checking the copyright section of the debian archiva dir.. there are a lot of things reported :D [17:30:14] do you remember if it was mandatory to specify all the different licenses for jars etc.. ? [17:40:35] elukey: all i remember is it was the first time we/I decided to give up on building java packages from sources :p [17:41:50] elukey: it's mandatory for something uploaded to Debian, but we can live without it for apt.wikimedia.org [17:42:06] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 2 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10kchapman) Reminder there is a meeting today August 1st at 2pm PST(22:00 UTC, 23:00 CET) in #wikimediaoffice [17:42:54] moritzm, ottomata thanks! I got the source of all these, the LICENSE file in archiva, I am trying to simplify the debian one to list only fewer things [17:49:49] ottomata, could you please give me a +1 if appropriate to this short change? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/443508/ [17:50:14] it fixes a casing problem when sanitizing nested map fields [17:50:39] ooo right tricky mforns. hm [17:50:45] so this doesn't change the data schema at all, right? [17:50:50] just uses it for comparing the field names? [17:52:35] yea, it keeps the casing in the output [17:52:48] but it ensures the whitelist is case insensitive [17:53:31] (03CR) 10Ottomata: [C: 031] Fix case insensibility for MapMaskNodes in WhitelistSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443508 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [17:53:37] :] thanks! [17:54:54] ottomata, this is tested already, now I will start re-sanitizing appInstallId schemas since 11-2017, so that all appInstallIds are salted and hashed in event_sanitized database [17:55:29] event database will still keep the raw unsanitized data since 11-2017 until reading can calculate their appInstallId metrics retroactively [17:55:48] then, we'll productionize the script to delete old partitions from the event database [17:56:42] (03PS3) 10Mforns: Fix case insensibility for MapMaskNodes in WhitelistSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443508 (https://phabricator.wikimedia.org/T193176) [17:56:43] alllirght cool [17:57:51] (03PS6) 10Mforns: Add ability to salt and hash to eventlogging sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/446592 (https://phabricator.wikimedia.org/T198426) [18:11:39] * elukey off! [18:29:30] dsaez: yt? [18:29:39] is the pyspark jupyter stuff working better for you now? [18:40:17] ottomata: great news about EventStreams announcement! I think we can use it in WDQS now with all the latest improvements, awesome work! :) [18:46:22] (03CR) 10Mforns: [C: 032] Fix case insensibility for MapMaskNodes in WhitelistSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443508 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [18:46:56] (03CR) 10Mforns: [C: 032] Add ability to salt and hash to eventlogging sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/446592 (https://phabricator.wikimedia.org/T198426) (owner: 10Mforns) [18:48:02] SMalyshev: cool! [18:48:30] SMalyshev: kafka direct woudl probbly be a better choice for you, especially since you are pulling so much data at once [18:48:31] buuuut [18:48:32] as you like! [18:48:48] ottomata: kafka is not suitable for public consumption [18:48:50] there will be a blog post about that and some other new features coming out soon [18:49:01] so if you are not inside WMF, you can't do kafka [18:49:07] SMalyshev: that is true, but don't your wdqs instances run inside? [18:49:09] (including labs) [18:49:13] ah for labs [18:49:22] yeah, we plan on getting a kafka cluster with prod data in labs eventually [18:49:24] but that will be a while [18:49:29] mine in production are, but there are others, and there are labs [18:49:36] ah [18:49:51] and in general supporting one timestamp and letting eventstreams manage the rest is cleaner :) [18:50:00] so I'd like to have both options [18:50:26] now I think I finally can [18:51:01] aye [18:55:34] (03Merged) 10jenkins-bot: Fix case insensibility for MapMaskNodes in WhitelistSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/443508 (https://phabricator.wikimedia.org/T193176) (owner: 10Mforns) [18:57:42] ottomata: yeah, they are working better! thanks for that. [19:01:37] coooool [19:02:38] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10Ottomata) [19:03:51] (03Merged) 10jenkins-bot: Add ability to salt and hash to eventlogging sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/446592 (https://phabricator.wikimedia.org/T198426) (owner: 10Mforns) [19:11:02] ottomata: what was the trick? [19:11:28] dsaez: we dropped toree for pyspark [19:11:31] and just did ipython [19:11:38] had to manually configure the kernel [19:11:42] but it works better that way [19:11:52] toree is better for scala (and R?) though, so we're keeping it for those [19:22:05] 10Analytics-Kanban: Set a timeout for regex parsing in the Eventlogging processors - https://phabricator.wikimedia.org/T200760 (10Milimetric) Thanks @Volans, didn't know about the contextmanager, that's nicer. And @elukey, as for concerns about the use of signal complicating the code, after reading more about i... [20:23:11] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: [Wikistats 2] Bug in time-range selector on detail page - https://phabricator.wikimedia.org/T200497 (10mforns) [21:42:20] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 2 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10Ottomata) In todays RFC meeting, there was consensus to move forward with JSONSchema with strict policies about what is allowed (e.g.... [21:48:08] Is it possible to check on the login for wikipedia15 for piwik? I sent over an email - but basically the username/password we have is not working. Would it be possible to get one setup for the new site or password for that one sent to me? gvarnum@wikimedia.org [21:50:59] 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) 05Resolved>03Open The login information we have for wikipedia15 does not appear to be working. Can you email me updated info or set up a ge...