[00:17:18] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10zhuyifei1999) >>! In T188564#4795463, @Framawiki wrote: > If they were not stored as sqlite files would the problem be partially resolved? I have trouble seeing the interest... [00:28:43] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) [00:30:28] 10Analytics, 10Patch-For-Review, 10Services (done): Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Pchelolo) Everything looks perfect from CP side. Please close whenever you feel like it. [00:32:45] 10Analytics, 10Analytics-Kanban: Failure while refining webrequest upload 2018-12-01-14 - https://phabricator.wikimedia.org/T211000 (10Nuria) I looked at webrequests for "Problematic requests for end-of-hour upload 2018-12-01T14" (all "sequences" in table above) . Of the 82 requests 52 come from asia (at 11pm... [00:37:04] 10Analytics, 10Analytics-Kanban: Failure while refining webrequest upload 2018-12-01-14 - https://phabricator.wikimedia.org/T211000 (10Nuria) Also , response sizes do not seem any special: 16264 /wikipedia/commons/thumb/8/80/Mamas_and_the_Papas%27_John_Phillips_in_1967.JPG/438px-Mamas_and_the_Papas%27_John_Phi... [02:17:43] 10Analytics, 10Product-Analytics: Investigate referrer class change on Chrome Mobile from September 13, 2018 - https://phabricator.wikimedia.org/T211077 (10Tbayer) [04:46:40] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) [04:53:32] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install cloudvirtan100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10RobH) a:05RobH>03Cmjohnson Ok, this has had puppet run on all of the hosts. This is now ready for @cmjohnson to attach the othe... [09:38:14] 10Analytics-Tech-community-metrics, 10Developer-Advocacy: Advertise wikimedia.biterg.io more widely in the Wikimedia community - https://phabricator.wikimedia.org/T179820 (10Aklapper) p:05Low>03Lowest Not happening for Q4/2018 and not soon either: Due to changes to "increase security" and introduction of H... [09:56:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move turnilo to nodejs 10 - https://phabricator.wikimedia.org/T210705 (10elukey) a:03elukey [09:57:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move turnilo to nodejs 10 - https://phabricator.wikimedia.org/T210705 (10elukey) Upgraded turnilo in labs (turnilo.eqiad.wmflabs), if anybody wants to test it: `ssh -N turnilo.eqiad.wmflabs -L 9091:turnilo.eqiad.wmflabs:9091` As far as I... [10:08:00] RECOVERY - Check the last execution of check_webrequest_partitions on an-coord1001 is OK: OK: Status of the systemd unit check_webrequest_partitions [10:14:22] \o/ [10:33:20] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Banyek) In https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477503/ I have a proposal about how to install... [10:50:46] elukey joal hellooo deploying this as agreed yesterday? https://gerrit.wikimedia.org/r/#/c/analytics/aqs/deploy/+/477292/ [10:53:13] fdans: sure! [10:53:28] coooool beans [10:53:46] (03CR) 10Fdans: [V: 032 C: 032] "Proceeding as agreed" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/477292 (owner: 10Fdans) [10:55:58] !log deploying AQS to expose offset and underestimate numbers on unique devices [10:55:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:58:08] fdans: please check https://grafana.wikimedia.org/dashboard/db/aqs while you deploy [10:58:22] yessir [10:58:56] elukey: ah yes [10:59:08] elukey: I don't have permission to deploy to production [10:59:12] -.- [10:59:57] yes of course, we keep forgetting that [11:00:01] elukey: I'm going to open a task so that I get permissions [11:00:10] sure [11:00:11] elukey: mind initing the deployment? [11:00:50] should I start with scap deploy? (I mean, did you already updated the repo in deploy1001 etc..) [11:02:11] yes seems so [11:03:36] elukey: yes that's right :) [11:03:38] fdans: aqs1004 is done, do you want to check? [11:04:03] elukey: can't check, that's another permission that I'm asking for in the taskl [11:04:17] fdans: you can surely check the endpoint no? [11:04:41] elukey: internally aqs1004? [11:04:48] I can't log in to the aqs machines [11:05:07] I mean externally [11:05:19] you should be able to curl aqs from stat boxes [11:05:25] or from deploy1001 [11:05:53] oh [11:09:47] fdans: ? [11:10:02] 1 sec [11:11:29] elukey: please roll back, we need to parseInt the values [11:11:43] ack [11:12:17] we thought this was an old thing and that we didn't need to do it now, but it seems all numeric values coming from cassandra need to be inted [11:12:26] goddammit [11:12:44] thank you for your patience elukey [11:13:17] fdans: super fine, rollback completed [11:14:34] I am wondering if we could improve the labs set up to allow this kind of testing before reaching prod [11:14:45] let me know if I can do anything from the ops point of view [11:14:56] I know that the cluster in deployment-prep is not optimal [11:15:25] (we can create anything in the analytics project in horizon though) [11:17:08] fdans: ok if I go afk now? (need to go to the dentist) [11:17:41] elukey: yes we can resume in the afternoon, thank youuuu [11:17:48] <3 [11:18:40] * elukey lunch! [11:33:35] 10Analytics, 10Operations, 10SRE-Access-Requests: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10fdans) [11:34:02] (03PS1) 10Fdans: Force numeric uniques values to be ints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/477512 (https://phabricator.wikimedia.org/T164201) [12:11:48] Hi folks - pausing a bit my admin work to look at aqs - I'm super sorry fdans about not remembering the long-to-int cassandra-mod thing :( [12:12:09] fdans: Looking at pageviews code, we indeed need to parse long values [12:12:12] joal: I should have remembered [12:12:22] fdans: I should have had as well [12:12:55] Thanks fdans for managing our bad-memory issues :0 [12:12:57] :) [12:13:32] Oh by the way fdans - I also have a request for you: can I drop the "local_group_default_T_unique_devices_TEST" keyspace in cassadran? [12:14:15] fdans: and also I want to ask to pick names with more differences to prod ones, having test at the beginning [12:14:27] fdans: For instance: test_uniques_2018_11 [12:14:44] fdans: Like this, no possible mixing with prod keyspaces [12:14:53] joal: yessir [12:15:03] (not that the test post-fix is not visible :) [12:15:36] fdans: I assume the yes is about dropping keyspace? [12:15:48] about all the things!!! [12:15:56] Wow, that's a big yes :D [12:16:28] !log Drop cassandra test keyspace "local_group_default_T_unique_devices_TEST" [12:16:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:16:48] joal: ah you dropped? was about to do so [12:18:07] Done :) [12:20:27] (03CR) 10Joal: [C: 031] "LGTM ! thanks for the patch fdans :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [12:33:34] joal: i have some questions for you to help me answer with yoyur graph :P [12:40:39] joal: something like this Number of occourances of https://www.irccloud.com/pastebin/vL3jlSQs/ [12:40:53] * Number of occourances of description/label/alias text [12:40:59] * addshore is dashing out to lunch now though [12:41:07] but i imagine this would be easy to ask with the graph thing :D [12:44:38] Hi addshore :) [12:45:44] For this type of thing, the 'graph' aspect is not even needed I think [12:46:00] I'll wait for you to be back to discuss more details [12:57:36] I can discuss now ;) [13:04:26] joal: o/ [13:04:30] Hi elukey [13:04:37] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477529/ - I might have found a way to force druid to log properly [13:04:46] Hoooo :) [13:05:46] there is also a way to get jetty HTTP access logs [13:06:00] not sure if we want it or not [13:08:37] elukey: I don't know :S [13:12:51] elukey: I guess if we have an easy way to get rolling-access-logs for brokers and overlord that could be nice - in case [13:15:26] all right will try to check that as well [13:15:30] so we roll restart only once [13:15:38] sure :) [13:23:05] * fdans back from the lunchylunch [13:37:05] joal: let's merge this? https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/477512/ [13:37:26] reading fdans [13:41:17] (03CR) 10Joal: [C: 04-1] "Not ready yet :)" (035 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/477512 (https://phabricator.wikimedia.org/T164201) (owner: 10Fdans) [13:43:30] sorry joal, I had corrected that but didn't make it into the PS, correcting commit message now [13:43:44] np fdans :) [13:44:12] fdans: it proves my review was thorough :) [13:44:46] haha yeah because you reeeeeally need to prove that :D [13:44:54] fdans: :-P [13:46:19] (03PS2) 10Fdans: Force numeric uniques values to be ints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/477512 (https://phabricator.wikimedia.org/T164201) [13:47:36] (03CR) 10Joal: [V: 032 C: 032] "LGTM ! Merging" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/477512 (https://phabricator.wikimedia.org/T164201) (owner: 10Fdans) [13:48:42] 10Analytics, 10Analytics-Kanban: Failure while refining webrequest upload 2018-12-01-14 - https://phabricator.wikimedia.org/T211000 (10elukey) As FYI ema told me that https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477424/ reverted https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476311, an exper... [13:49:18] joal: --^ [13:49:33] did we check if the weird vk issue happened in hours before the 29th ? [13:50:30] elukey: I have not yet a valid check - tried one thing, but seems too eratic [13:50:51] elukey: Will be back onto that once my admin stuff is done (probably tommorow I assume :S) [13:51:00] elukey: Thanks for the notification :) [13:56:26] ack :) [14:08:32] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move turnilo to nodejs 10 - https://phabricator.wikimedia.org/T210705 (10elukey) Before doing this, we need to probably run npm install for turnilo with the nodejs10... Just realized it [14:09:47] o/ How can I install gensim for use in a SWAP notebook? Doing '!pip install gensim' doesn't seem to work. [14:20:10] 10Analytics, 10Patch-For-Review, 10Services (done): Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) Ah, @Pchelolo one more (hopefully easy) change please. @JAllemandou, we should have thought of this.... [14:20:55] bmansurov: that should be how [14:20:59] what doesn't work? [14:21:25] ottomata: importing the package into a script doesn't work [14:21:30] says the package is not found [14:22:33] "ImportError: No module named 'gensim'" [14:25:22] joal: do you know if we are collecting viewport size anywhere? [14:25:57] Hi Seddon - I actually don't know about viewport size ? [14:26:15] Seddon: If you tell me more, I might be able to answer (maybe) [14:27:18] joal: it's the viewable size of the browser the user is on. So not display resolution but viewport resolution. [14:29:04] 10Analytics, 10Patch-For-Review, 10Services (done): Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) I made a PR: https://github.com/wikimedia/change-propagation/pull/298 [14:29:10] nm figured it out [14:32:26] Seddon: I don't think we have that [14:32:58] Seddon: Could be available in some specific events (if those events logs them explicitely), but I'm not aware [14:43:31] (03PS1) 10Fdans: Update aqs to 57426a3 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/477559 [14:43:55] oka joal this is the good one [14:45:17] (03CR) 10Joal: [V: 032 C: 032] "Let's have that thing working!" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/477559 (owner: 10Fdans) [14:45:32] \o/ [14:46:57] joal: pending this I can't deploy aqs yet... https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477524/1/modules/admin/data/data.yaml [14:47:13] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create report for "articles with most contributors" in Wikistats2 - https://phabricator.wikimedia.org/T204965 (10ezachte) Wonderful. I think bots can be excluded at all for this report. No need to make it optional. [14:47:35] fdans: I'll do it and let you test? [14:47:56] joal: yessir, I'll check 1004 when you tell me [14:49:02] fdans: Let's apply the change in test data before breaking everything [14:49:45] joal: test data? [14:50:07] The update I sent on aqs-deploy repo for values in test-data in cassandra [14:51:37] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Banyek) @Milimetric could you be a little bit more specific please? As I understood currently you run the querie... [14:53:39] joal: sorry, I'm not sure what you mean, sorry I'm being thick [14:56:13] fdans: The spec of the endpoint says: if you request for data in 19700101 you should get those results https://github.com/wikimedia/analytics-aqs/blob/master/v1/unique-devices.yaml#L64-L87 [14:56:38] !log restart druid broker and historical on druid1001 [14:56:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:57:16] fdans: The patch you merged the other day was to make sure the manual script to generate test data was up-to-date (https://gerrit.wikimedia.org/r/c/analytics/aqs/deploy/+/477251) [14:57:44] fdans: Now before deploying the patch that actually expects that data to be in cassandra, I need to manually update the test data [14:58:20] joal: butbut the data matches, doesn't it? [14:58:34] fdans: have you updated it? [15:00:11] i mean this: [15:00:15] https://www.irccloud.com/pastebin/K3XmLjQ1/ [15:00:21] matches this, right? : [15:00:27] https://www.irccloud.com/pastebin/Es6jPC9M/ [15:00:31] joal [15:00:48] fdans: https://gist.github.com/jobar/2c5e56eccca7861457721b9bfafe705c [15:00:56] fdans: from prod --^ [15:01:29] OH [15:01:30] fdans: Need to go for the kids - The values have been updated [15:01:51] !log Update test values for uniques in cassandra before deploy [15:01:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:02:00] thank you joal [15:02:23] fdans: deployment.eqiad.wmnet is ready to be worked if elukey has a minute [15:02:31] fdans: or I'll do it just after standup [15:02:37] gone for kiiiiiids ! [15:06:07] heyyy [15:09:35] (03PS2) 10Mforns: Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) [15:11:16] (03CR) 10jerkins-bot: [V: 04-1] Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [15:13:12] (03PS3) 10Mforns: Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) [15:14:50] (03CR) 10jerkins-bot: [V: 04-1] Allow for custom transforms in DataFrameToDruid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477295 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [15:25:14] !log rolling restart of broker/historical/middlemanager on druid100[1-3] to pick up new logging settings [15:25:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:28:06] joal: sorry, ended up getting distracted by a whole bunch of stuff [15:28:10] and running into another meeting now [15:33:20] (03CR) 10Mforns: [C: 031] "Just spotted a typo in the comments, see inline. LGTM!" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [15:38:52] (03PS3) 10Fdans: Add project families to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) [15:39:05] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Decide whether to use schema references in the schema registry - https://phabricator.wikimedia.org/T206824 (10Ottomata) > We have managed to create a global schema resolver, so now we will... [15:39:22] mforns: nice catch! corrected now [15:46:36] (03CR) 10Mforns: [C: 032] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [15:48:58] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Milimetric) Thanks to @Banyek for the questions and a talk we just had over hangouts, we decided to go in two di... [15:52:40] (03CR) 10Fdans: [V: 032] Add project families to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/476220 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [15:57:44] fdans: I'm assuming aqs has not been deployed, right? [15:57:55] you assume right! [15:58:26] ok - I'm gonna do it now during standup as kids are sick and I'll need to bring them to the doc after [15:58:43] elukey: ok for that --^ ? [15:59:49] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Milimetric) 2. exact query for a materialized view that would allow us to import the revision table into Hadoop.... [16:00:01] a-team: both elukey and myself will be late to standup [16:02:20] AH [16:17:54] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia Commons images - https://phabricator.wikimedia.org/T210313 (10Milimetric) Still, if we got all the media requests from Media Viewer into EventLogging, we would not have all media requests for mediawiki in general. To do that, we'd hav... [17:11:37] 10Analytics, 10Services, 10Wikimedia-Stream, 10Patch-For-Review: EventStreams process occasionally OOMs - https://phabricator.wikimedia.org/T210741 (10mobrovac) >>! In T210741#4795035, @Pchelolo wrote: > Mm.. I will not be that certain the deserialization of JSON is the issue here. We deserialize much much... [17:12:00] !log cleanup logs on /var/log/druid on druid100[1-3] after change in log4j settings [17:12:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:12:18] so in theory now we don't need anymore crons for pruning etc.. [17:12:29] but I need to let a day pass to be sure :) [17:12:40] will restart the daemons on druid public tomorrow if all goes well tonight [17:12:59] ottomata: --^ [17:14:05] root fs on notebook1004 just filled up :( [17:14:30] ah snap again [17:14:31] :( [17:14:34] checking ebernhardson [17:15:41] i tried clearing out what i could find of my data in /tmp, but that only addde up to ~100M [17:17:15] ah sorry this is new, usually the srv fills up (where the home dirs are) [17:17:26] checking with du the usual suspects [17:17:53] spark was attempting to write to that fs and gave up on a no space left, but i was assuming that was trying to write to /tmp [17:19:15] yeah it is your notebook :D [17:19:17] 29G [17:19:23] ! where is it writing? [17:19:39] 29G systemd-private-cec57b14de5e438b8f3c043bfe66f1db-jupyter-ebernhardson-singleuser.service-KlO3zi [17:19:44] (under /tmp) [17:20:01] oh, interesting. I can't even enter that dir thugh [17:20:27] can I delete it or do you need it? [17:20:40] elukey: i think you can delete it, i don't know what could be there and can't look inside [17:21:58] there is a tmp dir with some spark dirs [17:22:04] snappy etc... [17:22:19] yea, whatever might be there i imagine spark could re-calc from scratch [17:23:04] oh, i wonder if .persist() on a non-yarn dataframe writes there. And if it clears it out [17:24:01] done with the cleanup [17:26:01] thanks! [17:33:49] heya elukey - Are you around for a minute for me to dpeloy aqs? [17:45:25] joal: sure [17:45:42] elukey: shall I start now? [17:46:20] +1 [17:46:27] Thanks :) [17:48:10] joal: question, for the 'family" edits metrics , is there any place where we are putting teh docs on "deduplication" strategy (when pertinent) for edit related metrics? [17:48:54] hmm nuria - I don't get the question :( deduplication strategy ? [17:50:19] successful deploy on aqs1004, tests look good - rolling over to other machines [17:50:46] !log Deploying aqs using scap for offset and underestimate values in unique-devices endpoints [17:50:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:51:55] joal: sorry, do we have any "active users" metrics like "active editors" agreggated per "family" [17:51:57] ? [17:53:29] joal: those might have a rationale of 'what is counted as an active editor' across wikis [17:53:39] Ah I get it nuria - we don't have digest-oriented metrics for project-families (meaning editors and edited-pages) Those require digest precomputation at project level [17:53:56] at project-family level sorry [17:53:58] joal: i see, so now we only have "edits" right? [17:54:08] joal: which does not have this problem [17:54:34] we have edits and byte-differences [17:54:58] joal: ok, neither need this deduplication, i see, makes sense [17:56:03] nuria: currently checking if edited-pages/new and registered-users/new work by default [17:57:48] nuria: We have enabled project-families for the two above (new pages and new registered users) [17:59:52] nuria: it works cause new-pages don't need dedup, and because new-users use the self-created restriction, prevented auto-creation by central-auth to show up [18:00:06] nuria: makes sense? [18:01:37] in the end, we miss families for editors and edited-pages aggregated metrics (we can even use families for top metrics) [18:02:05] but it doesn't make a lot of sense it term of page (more for user) [18:03:52] I confirm top works for project families [18:04:02] joal: sorry, new-pages makes sense but not registered users [18:04:12] why? [18:04:24] joal: you cannot register in two wikis with same user, is that what you mean? [18:04:47] nuria: well now with central-auth you can, you're the same user, and it's done automagically for you [18:04:59] Deployment finished elukey :) [18:05:02] joal: ah i see, but how about on past metrics? [18:05:07] Thanks for having kept an eye [18:05:53] joal: for say 2014 metrics on new registered users some deduplication would be needed, no? [18:06:19] nuria: well, before the mess-cleaning at the time of central auth, users with the same name in 2 wikis could be either 2 users, or the same - and we don't knw [18:06:33] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10elukey) Opened a procurement task for 1 Cloudb replica in T211135. We are not planning to buy two hosts with the following assumption: 1) We import data onc... [18:08:15] fdans: aqs deploy done [18:08:21] joal: ok, i see, since the metric is available for project families, let's please document the caveats, not sure where.. here? https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2/Metrics_Definition#Newly_Registered_Users [18:08:50] yesssss awesome thank you joal [18:08:50] nuria: Will do it here, yes [18:08:55] np fdans :) [18:09:02] fdans: sorry for the delay :S [18:09:32] nonono I'm getting my perms any time now so I won't be bothering anymore with aqs deploys [18:09:47] And by the way nuria, I'll add the project-family bit on the metrics that need it on that page, ok ? [18:10:12] * joal feels the power flowing to fdans :) [18:10:53] joal: ok, sounds good, also can fdans add teh per family examples that are working now to the examples page: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 [18:11:34] joal: all good with aqs right? [18:14:59] elukey: charts look ok [18:15:04] yep [18:15:21] joal: whenever you have time, check /var/log/druid on druid1001 [18:15:49] * joal prepares for the cleanliness after the mess [18:16:30] even if I think I still need to figure out a solution to cap the number of log rotated file [18:16:40] because seems not to work [18:16:45] elukey: for metrics, it would be good yes [18:16:56] yup, seems not to wirk indeed [18:17:15] but now we don't have anymore the 2018-XX-XX.log [18:17:22] and we have -request.log instead [18:17:26] and also -access.log [18:19:08] Super cool elukey [18:19:32] elukey: It think access will be less usefull than request, but who knows [18:19:35] :) [18:19:40] Thanks gain for that :) [18:19:45] :) [18:25:27] elukey: super nice [18:26:38] !log reenabled refinement of mediawiki_revision_score [18:26:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:27:39] nuria: \o/ [18:27:55] joal: it seems that the max files works, I forgot to delete the -metrics ones [18:28:02] the other hosts looks fine [18:28:04] let's see tomorrow [18:28:05] :) [18:28:17] ottomata: wanted to say - Sorry for not having forseen the array-issue :( [18:28:26] and thanks ottomata for the change [18:28:39] * joal feels bad about the wrong advise [18:28:49] Great elukey :) [18:30:58] joal: i didn' see it either! [18:31:05] i mean, we sould have tested but I think we are getting impatient [18:31:14] s/we/I/g :) [18:38:44] * elukey goes afk! [19:10:58] (03PS1) 10Ottomata: HiveExtensions normalize should convert all bad chars to underscores [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477614 [19:35:00] ottomata, do you have 10 mins to brainbounce on a solution to ELSanitization RefineMonitor problem? Found the problem, but struggle with solution... [19:35:04] can be later! [19:38:00] mforns: fors ure! [19:38:03] now is good [19:38:05] ok [19:38:10] IRC or bc? [19:38:50] either [19:38:53] bc y not [19:38:54] mforns: just saw that , makes total sense and i wonder if sanitization process shoudl not be in charge of making those dirs as _SANITIZED so we can tell appart not existing dirs (that should exist) versus dirs that should not be full with data [19:38:55] ok let's start here [19:39:00] oh ok [19:39:10] mforns: totally unflushed thought though [19:39:24] mforns: wher ein puppet is the sanitize job declared? [19:39:34] in data_drop.pp [19:40:01] data_purge ya? [19:40:04] nuria, I'm not sure _SANITIZED files are enough to avoid RefineMontior alarms [19:40:08] sorry yea [19:40:20] it's a refine_job [19:40:38] mforns: ok cool [19:40:53] ottomata, the problem is ELSanitization only refines part of the available partitions, depending on the whitelist [19:41:04] while RefineMonitor, does not know about the whitelist [19:41:20] and you ned to tell it to only check the tables that are in whitelist.yaml? [19:41:23] there's the table_whitelist_regex param that we could use [19:41:27] yes [19:41:54] but I don't imagine a way to extract the list of tables from EL whitelist and pass them to RefineMonitor in puppet??? [19:42:24] I thought first of using a hacky bash one-liner [19:42:32] i thinkinng of hacky things too..... [19:42:38] since the job_opts get rendered in a bash script... [19:42:43] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Bstorm) To sum up discussion with @Banyek this morning: From the Cloud perspective this is a problem with the re... [19:42:48] but when passing that to the refine_job, that would write it into a properties file, and not evaluate [19:43:13] the contents of a property file do not evaluate in bash right? [19:44:18] nuria, I think the idea of creating "empty" sanitized tables for those tables that are not in the whitelist is ok! [19:44:29] but we would have tons of empty tables in event_sanitized... [19:45:08] mforns: well empty partitions [19:45:12] right? [19:45:19] I believe (ottomata correct me if I'm wrong) that RefineMonitor directly checks database for existing data.. [19:45:46] mforns: i'm pretty sure it checks only files [19:45:51] oh [19:46:04] mforns: job_config goes to config file [19:46:13] but, job_opts is what has --config_file and passes to spark_job [19:46:37] so, we could make an $extra_job_opts param [19:46:39] in refine_job [19:46:44] ottomata, but not to RefineMonitor no? [19:46:47] that will get renedered in the wrapper script [19:46:56] well, they'd both use it [19:47:12] would it hurt if sanitize spark_job set --table_whitelist_regex [19:47:12] ? [19:47:16] probably not, i think it would just be ingored [19:47:18] ignored [19:47:37] ottomata, no it would not hurt [19:47:44] I checked, extra params are ok [19:48:05] but how to compile that list from whitelist in puppet? [19:49:01] I like the $extra_job_opts idea [19:49:21] and those could use the hacky bash one-liner I guess [19:49:49] yeah [19:49:51] its very hacky t ho [19:49:57] yeaaaa [19:51:05] mforns: [19:51:09] cat static_data/eventlogging/whitelist.yaml | grep -E '^\w+:$' | grep -v __defaults__ | sed 's@:$@@g' | paste -sd '|' - [19:51:10] ? [19:51:12] so hacky [19:51:21] extra_job_opts => $(cat static_data/eventlogging/whitelist.yaml | grep -E '^\w+:$' | grep -v __defaults__  | sed 's@:$@@g' | paste -sd '|' -) [19:51:36] maybe a yaml parser would be nicer... [19:51:46] i have yq installed locally, but not avail in debian :/ [19:51:50] hmmi could make apackage [19:51:56] man that would be nice to have in general [19:52:42] ottomata: refinemoNitor code is in refinery or elsewhere? [19:52:49] ottomata: RefinemOnitor [19:52:51] cat static_data/eventlogging/whitelist.yaml | yq -r 'keys[]' | grep -v __defaults__ | paste -sd '|' - [19:52:53] ayayay [19:52:54] its RefinerySource [19:54:47] mforns: i suppose you could just run RefineMonitor from your Sanitize code direclty [19:54:53] similar to how you run Refine from it [19:55:06] then you could pass it the whitelist regex in code, since you already parsed the whiltelist yaml file [19:55:43] you'd have all the same relevant configs anyway, since the current RefineMonitor command is using the same config file [19:55:47] as your Refine job [19:55:52] mforns, ottomata : you could also extend refinemonitor and call it sanitizemonitor and have it read the whitelist [19:56:26] hm, and then pass a monitor_job_class [19:56:28] that would work too [19:56:36] might be more flexible mforns [19:57:26] ottomata, both options make sense... [19:58:02] mforns, ottomata : actually (filter style) you can have a job that is refine monitor and adds another filter that is whitelist based [19:58:08] (03PS2) 10Ottomata: HiveExtensions normalize should convert all bad chars to underscores [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/477614 [19:58:10] maybe calling refinemonitor from ELSanitization is cleaner no? [19:58:16] refineMonitor.apply(currentfiulter).apply(whilistfilter) [19:58:19] less parameter passing involved [19:58:24] mforns: true [19:58:48] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Milimetric) +1 to @Bstorm's framing of the problem. I was going to say the same thing, and withdraw my revision... [19:58:55] nuria: doing that would require that we do both these things! (have Sanitization call RefineMonitor) [19:59:02] mforns: i think I'm fine with either way [19:59:19] ok ottomata nuria I will consider both options and try to implement [19:59:39] coo [19:59:44] wait, no, we just need a new class that calls refine monitor and checks the output a 2nd time with the whitelist constrain [19:59:48] sorry [19:59:49] ya that'll be way better than a fragile hacky bash command [19:59:52] not what i meant [20:02:19] ya, the fragile bash command passing as optional arguments that are not really optional is a bit hacky cc ottomata mforns [20:05:56] nuria, but if we call RefineMonitor from inside ELSanitization.scala, then we're fine, because RefineMonitor already accepts a table whitelist [20:06:07] and ELSanitization already computes that whitelist [20:06:47] ottomata, the only difference would be that RefineMonitor would run every hour... as opposed of once a daty [20:11:33] hm, aye. prob not a big deal, but yeah still annoying [20:11:38] maybe in that case the other way is better [20:11:46] and more flexible too, because we just can use any refine monitor class [20:11:52] if we have another problem like this again in the future [20:11:57] the extend RefineMonitor idea [20:12:13] aha, ok will try [20:13:01] 10Analytics, 10Analytics-Kanban, 10DBA, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Milimetric) And to follow up on my first bullet from before: 1. after more closely looking at our temporary sol... [20:17:36] 10Analytics, 10DBA, 10Data-Services, 10User-Elukey: Hardware for cloud db replicas for analytics usage - https://phabricator.wikimedia.org/T210749 (10Milimetric) The other reason for a single host instead of redundant hosts is this: our only critical use of the box is during the first few days of the month... [20:25:03] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [20:27:02] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) @Nuria hopefully the new task description makes it clear what we're trying to achieve. Please let me know if anything else needs clarification. Thanks! [21:31:16] ottomata: o/ When I get the recommendation data ready (hopefully, tomorrow), would you help me import it to a MySQL database? [21:31:50] It would involve reading the mysql password from the private puppet repo and running a Python import script. [21:34:09] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564 (10Framawiki) >>! In T188564#4796104, @zhuyifei1999 wrote: >>>! In T188564#4795463, @Framawiki wrote: >> The problem is that the recording at the end of the workers' task is to... [21:37:36] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10jijiki) p:05Triage>03Normal [21:38:22] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Grant fdans permissions to deploy AQS in prod, and accessing the aqs hosts - https://phabricator.wikimedia.org/T211095 (10jijiki) Pending approval from SRE meeting [21:46:22] 10Analytics, 10Research: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10Nuria) Thanks for the description. It helps. I think a design doc for this will help you clarify what you expect from each teams. It does not have to be very detailed but eno... [21:47:35] bmansurov: looked at ticket and it is more clear , there are still a few questions and I think having a design document will help you interact with teams as you will need to do some coordination with SRE teams and DBAs at least [21:48:05] nuria: thanks, most of the work is already done, the last bit is to import the data [21:48:22] nuria: I'm already talking to DBAs and SRE team members [21:49:56] 10Analytics, 10Analytics-Kanban: Refactor Sqoop, join actor and comment from analytics replicas - https://phabricator.wikimedia.org/T210522 (10Milimetric) I have done a limited test on 3 wikis: etwiki, simplewiki, and hawiktionary. Description of the test and results: * sqoop 2018-09 snapshot with new sqoop... [21:50:52] bmansurov: from your update these are the things that are remaining: 1) process to generate data on cluster 2) process to move those tsvs to mysql 3) process to load that data and "generate" new tables 4) process to delete old data and 5) strategy for rollback for system, [21:51:47] bmansurov: we can probably work on a process to generate the data in the cluster but that gets you just the tsvs files on the mysql machine, quite a bit will remain in terms of data ingestion on teh mysql end [21:52:38] bmansurov: let me know if this makes sense? [21:53:11] nuria: yeah makes sense. Ingestion part is also done, it's hidden in the gerrit repo. But I see your point, I'll make these clearer. [22:02:13] bmansurov: ok, another thing you can start doing is starting to work in your oozie jobs ready that would run pyspark [22:02:49] bmansurov: makes sense? [22:03:21] bmansurov: sorry missed you rping! [22:03:24] for sure i can help! [22:05:22] 10Analytics, 10Operations, 10Performance-Team, 10Traffic: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10jijiki) p:05Triage>03Normal [22:05:30] bmansurov: oozie it is the scheduler that will run your jobs , also i would look into having guards for data (a programatic criteria to know your data is good) so you can execute it as part of your job this probably requires persisting your data to hive so you can run your guards , hopefully this makes sense, let us know otherwise [22:09:15] ottomata: thanks, I'll ping you when the data is ready. [22:10:04] nuria: sure, I'll have to pair with someone from Analytics to get started on oozie. Given everyone's busy this time of the year, I'm hoping to do so come January. [22:10:17] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia Commons images - https://phabricator.wikimedia.org/T210313 (10Nuria) @Milimetric Right. I see your point, there needs to be some parsing of the firehouse of requests cause not all media consumption can be "eventy-fied" (true for images... [22:10:42] bmansurov: you can get started following the handy guide that we made: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie [22:10:53] nuria: awesome! [22:11:09] bmansurov: it has tons of examples and you can start running a baby version of your job [22:11:26] bmansurov: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie#Running_a_real_oozie_example [22:11:45] nuria: great [22:11:54] bmansurov: if stack asking for help here is best , every one on team uses oozie so we all can help [22:12:37] nuria: OK, I'll ask for help here. [22:41:56] 10Analytics, 10Patch-For-Review, 10Services (done): Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) LOOKING GOOD! ` 18/12/04 22:20:58 INFO DataFrameToHive: Writing DataFrame to /wmf/data/event/mediawiki... [22:42:53] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Services (done): Refinery Spark HiveExtensions schema merge should support merging of arrays with struct elements - https://phabricator.wikimedia.org/T210465 (10Ottomata) p:05Triage>03Normal a:03Ottomata [22:47:26] I dunno if anyone has seen this or has an idea, but in jupyter on notebook1004, `multiprocessing.RLock()` results in a `OSError: Read-only file system` doing the same thing from python3 or ipython3 in a shell has no error [23:22:09] fwiw it also fails on notebook1003, but works fine in PAWS [23:47:34] 10Analytics, 10Analytics-SWAP: Cannot instantiate multiprocessing.RLock - https://phabricator.wikimedia.org/T211163 (10EBernhardson)