[00:00:46] kostajh: and what prevents us to wait for test environment to be back up to test there? [00:01:29] nuria: afaik, beta doesn't have DB replication which I think is why we didn't notice the out-of-order events on deferred updates, and it also doesn't have the same Redis config as production (the cache value issue) [00:02:00] kostajh: i see, but your events validate on beta then and you have tested that they are emitted, correct? [00:02:18] nuria: yes, no problems there [00:03:26] kostajh: i see, and if your fix was not to work state of affairs was like it was before , i.e. events validating but out of sequence [00:04:04] nuria: right, out of sequence. And also, we are hashing some of the data that we log, and that depends on retrieving a value from redis. retrieving worked fine in Beta but we ran into some issues in production [00:04:38] kostajh: and if things were not to work what is the path? [00:05:08] nuria: you mean, if cache retrieval fails? [00:05:26] kostajh: no, if your fix does not do what you expect , are you redeploying it today? [00:05:47] nuria: if the fix doesn't work, we will leave it alone until the next swat window, next week [00:05:52] 10Analytics, 10Analytics-Kanban, 10New-Readers, 10Patch-For-Review: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10atgo) I've asked the social media folks to give me the data for clicks that they're seeing on those platforms to compare to what we're seeing in Piwik. [00:06:45] nuria: some background: https://phabricator.wikimedia.org/T210003 and https://phabricator.wikimedia.org/T210004 [00:06:49] kostajh: then i would prefer your team verifies the fix, given that there is no rollback needed if it was not to work and if there is a fix you will not deploy it right away [00:12:07] kostajh: meaning that it is really not urgent [00:17:04] nuria: we decided not to deploy it. We'll push it out next week. thanks for your time & consideration [00:22:56] kostajh: ok, as you think is best [06:11:48] 10Analytics, 10Product-Analytics: Event counts from Mysql and Hive don't match - https://phabricator.wikimedia.org/T210006 (10Nuria) a:03Nuria [07:39:43] https://www.apache.org/dyn/closer.lua/bigtop/bigtop-1.3.0/ [07:39:44] :) [08:19:04] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10elukey) >>! In T203669#4763197, @AndyRussG wrote: >>>! In T203669#4731161, @elukey wrote: >> @AndyRussG @Jseddon Hi! So I have something to show to you in: https://tu... [08:20:05] joal: o/ - do you think that it could work --^ ? [09:45:16] I was able to make deployment-server.analytics.wmflabs kinda working [09:45:28] in theory now we have a deployment server in there [09:45:43] (so we don't rely anymore on deployment-prep) [10:35:00] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10elukey) >>! In T206542#4707202, @Nuria wrote: > Did we updated docs with the new location for logs older than 90 days? Added a line... [11:23:47] (03PS3) 10Fdans: Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) [11:25:09] joal: helloooo you think we can merge this and backfill the new fields? [11:29:22] fdans: kids day, he'll be off until later on in the afternoon :) [11:30:03] elukey: yesyes, just leaving this here for when he's around, thank you luca :) [11:30:11] ack :) [11:39:25] hallo [11:39:48] a question about https://phabricator.wikimedia.org/T189475#4706502 and the comment after it [11:40:01] I'm testing things first at research_prod@db1108.eqiad.wmnet [11:40:22] can I do "create table" there to test that it works? [11:41:27] I'm connecting using `mysql -pnotactuallypassword -u research_prod -h db1108.eqiad.wmnet` from mwmaint1002 [11:41:52] fdans joal ^ [11:43:09] aharoni: hi! :) Did you see marcel's answer? [11:43:32] elukey: yeah, but it says: [11:43:35] If we store the data in the 'staging' database in analytics-store host [11:43:45] is it the same as research_prod@db1108.eqiad.wmnet? [11:44:10] ahhh yes yes he confuse the names, feel free to create a table in the staging db on db1108 [11:44:20] (that is analytics-slave, analytics-store is dbstore1002) [11:44:27] (but it happens to confuse the names :) [11:45:10] need to go now for an errand (sorry), will read later on so if you have any trouble feel free to write in here :) [11:45:17] * elukey lunch + errand! [11:45:36] elukey: thanks [12:01:25] \o [12:06:22] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10JAllemandou) >> We could add an event property with the value calculated per event--that is, the 1/sample rate. I imagine that float value could be put directly in Dr... [12:11:35] (03CR) 10Joal: [C: 04-1] "A copy-paste error, then I think we're good" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [12:11:45] fdans: --^ [12:29:48] ouioui [12:31:52] (03PS4) 10Fdans: Add offset and underestimate to uniques loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) [12:39:33] thanks for the rev joal, want to merge this and start the jobs? [12:39:47] this is what I'm thinking: [12:40:00] 1 - deploy refinery [12:40:52] 2 - kill uniques jobs [12:40:52] 3 - start jobs from beginning of time up to today (backfill) [12:41:12] 4 - restart jobs from today [12:41:55] fdans: not now, might have to leave any time [12:42:22] cool, anytime you're available joal :) [12:43:15] fdans: For backfilling we can start manually (without deploy), as data presence won't impact prod [12:43:55] fdans: About regular job, I'd like to wait until end of month, as the cassandra bundle needs to be started the 1st [12:44:05] ohhh right [12:44:22] fdans: We can nonetheless do as you suggest, allowing us to move faster [12:44:44] joal: is it ok if I start the backfill now then? [12:44:52] fdans: Let's backfill up to now, when this is done we'll readjust [12:44:53] :) [12:45:00] cooool [12:45:34] joal: let's merge and I'll checkout refinery from my home dir and launch the jobs from there? [12:47:07] (03CR) 10Joal: [V: 032 C: 032] "Let's go !" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/473746 (https://phabricator.wikimedia.org/T167539) (owner: 10Fdans) [12:48:04] fdans: merged - You can backfill (please be carefull, for instance make a job with 1 day, once it's done check the data in AQS AND in cassandra, then full start) [12:48:24] yessir [12:48:29] Thanks you :) [12:54:34] !log testing backfill of daily uniques in production for 2018-11-13 [12:54:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:01:25] looking good on aqs [13:05:34] looking good on cassandra [13:05:36] !log test backfill on 13 Nov daily uniques successful [13:05:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:06:25] !log launching backfilling jobs for daily and monthly uniques from beginning of time until Nov 20 [13:06:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:33:24] (03CR) 10WMDE-leszek: "Something like this might be of use, but let's better do it when needed, and how needed. This one has been a bit too hacky maybe?" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:33:27] (03CR) 10WMDE-leszek: "will abandon, will check if CI runs for patches now first though :)" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:33:58] (03CR) 10WMDE-leszek: "recheck" [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:34:32] (03Abandoned) 10WMDE-leszek: Added item.label.total metric [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440133 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:34:41] (03Abandoned) 10WMDE-leszek: Add item.label.length.avg metric [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/440876 (https://phabricator.wikimedia.org/T196883) (owner: 10WMDE-leszek) [14:52:03] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10elukey) >>! In T203669#4765604, @JAllemandou wrote: >>> We could add an event property with the value calculated per event--that is, the 1/sample rate. I imagine that... [14:59:04] joal: --^ \o/ [15:39:20] nuria: scrum of scrums conflicts with 1on1, sorry just noticed. I will write our update in the SoS etherpad and read everyone else's after the meeting. [15:53:06] milimetric: sounds fine [15:55:54] a-team: i am going to move standup half an hour down today, sorry about the trouble but i have a conflict. [16:02:49] 10Analytics, 10Pageviews-API: Pageviews top endpoint in descending order as of 2018-11-20 - https://phabricator.wikimedia.org/T210091 (10MusikAnimal) [16:06:10] 10Analytics, 10EventBus, 10Core Platform Team Kanban (Done with CPT), 10MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), 10Services (done): EventBus extension started emitting rev_count as a string - https://phabricator.wikimedia.org/T210013 (10mobrovac) 05Open>03Resolved a:03Pchelolo [16:31:42] nuria: I think it's ok to skip standup and send an escrum if you have a conflict, no? [16:53:15] hey y'all. Not sure if there's an easy fix for T210091 that you can make before the long holiday break? [16:53:16] T210091: Pageviews top endpoint in descending order as of 2018-11-20 - https://phabricator.wikimedia.org/T210091 [16:58:15] fdans: I prefer moving standup cause we have two holiday days ahead and i will not be able to attend the next two [16:58:49] musikanimal: whatatata? we totally need to look at that , thanks for pointing it out [16:59:49] nuria ah fair enough [17:01:42] mforns: the only code change i could find about sanitization is this one: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454562/ but i think this is the patch you mention you will abanadon? [17:02:06] nuria, yes [17:02:49] Will be there in 1/2hour for standup :) It actually makes my life easier as well [17:04:26] musikanimal: I'll look into that sort right now, cc fdans, joal [17:04:35] thanks! [17:12:16] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews top endpoint in descending order as of 2018-11-20 - https://phabricator.wikimedia.org/T210091 (10Milimetric) p:05Triage>03High a:03Milimetric [17:12:25] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10mforns) @elukey > Confirmed that it works! I used recordImpressionEventSampleRate as measure, everything works like a charm (caveat: the datasource in turnilo needs... [17:14:03] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10elukey) @AndyRussG Would it be possible to add the new 1/sample-rate field to the schema? [17:26:38] elukey, does the error java.lang.NoClassDefFoundError: org/apache/hadoop/hive/common/auth/HiveAuthUtils ring a bell to you? [17:28:41] mforns: IIRC it is the problem Andrew solved by temporarily downgrading the used jars for spark-refine [17:29:27] joal, makes sense [17:30:45] a-team: will be 5 minutes late to standup [17:30:52] np nuria [17:38:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10Nuria) 05Open>03Resolved [17:55:13] 10Analytics: druid ingestion should calculate 1/sample rate to be able to normalize event counts - https://phabricator.wikimedia.org/T210099 (10Nuria) [17:56:34] 10Analytics: druid ingestion should calculate 1/sample rate to be able to normalize event counts - https://phabricator.wikimedia.org/T210099 (10Nuria) a:03mforns [18:03:51] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10mforns) @AndyRussG We discussed in our daily meeting about this, and decided to modify our codebase to adapt to your needs, so that we can ingest 1/sampleRate as a n... [18:06:13] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10RobH) p:05Triage>03Normal [18:08:57] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10RobH) a:03Niharika @niharika: Please review https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups It seems that Hive has both the public and priva... [18:09:11] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10RobH) [18:20:50] * elukey off! [18:30:47] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10Niharika) @RobH Currently I only need access to Eventlogging data for TemplateWizard (As mentioned in task). I don't know if it's public or private - @Milimetric can pe... [18:32:40] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10DannyH) Yes, I approve. [18:40:08] 10Analytics, 10Analytics-Kanban, 10Pageviews-API: Pageviews top endpoint in descending order as of 2018-11-20 - https://phabricator.wikimedia.org/T210091 (10Nuria) Problem found, was due to a deployment we did yesterday that changed one of our underlinings dependencies, we will file bug to services team for... [18:48:57] mforns_brb: this is the issue with hive that broke refine after upgrade for which andrew needed to put the "older" jars on path: https://phabricator.wikimedia.org/T209407 [18:58:43] (03PS1) 10Milimetric: Sort articles by rank in top endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) [18:59:46] (03CR) 10Framawiki: Add "/.health/summary/v1/" API endpoint (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/474532 (https://phabricator.wikimedia.org/T205151) (owner: 10Rafidaslam) [19:03:28] (03CR) 10Milimetric: "shall we deploy this if it's good?" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) (owner: 10Milimetric) [19:04:17] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10RobH) a:05Niharika>03Milimetric @Milimetric: I'm assigning to you for feedback on if @nikarika needs the private-data version or not. Please advise and unassign yo... [19:04:46] ok, filed the task with services, pushed the patch for us [19:06:44] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10Milimetric) Private. Niharika would benefit from being a part of analytics-privatedata-users, including access to data before it's sanitized. [19:06:50] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10Milimetric) a:05Milimetric>03None [19:12:33] a-team: if someone can review my patch, we can still deploy today? [19:12:45] https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/475131/ [19:18:28] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow access to Data Lake/Hive for Niharika - https://phabricator.wikimedia.org/T210022 (10RobH) >>! In T210022#4766563, @Milimetric wrote: > Private. Niharika would benefit from being a part of analytics-privatedata-users, including access to data before it... [19:20:16] 10Analytics, 10Analytics-Kanban, 10Core Platform Team, 10Core Platform Team Backlog, and 2 others: Not able to scoop comment table in labs for mediawiki reconstruction process - https://phabricator.wikimedia.org/T209031 (10Anomie) >>! In T209031#4763909, @Bawolff wrote: > I was assuming bases on this comme... [19:21:44] (03CR) 10Nuria: [C: 032] "Looks good, I take unit tests succeeds?" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) (owner: 10Milimetric) [19:22:10] yep, unit tests work nuria [19:23:31] milimetric: and rank is defined always, correct? [19:24:01] 10Analytics, 10Analytics-EventLogging, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, 10Readers-Web-Backlog: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10Jdlrobson) [19:24:08] (03CR) 10Mforns: [C: 031] "LGTM as well, Nuria's question makes sense though" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) (owner: 10Milimetric) [19:24:17] (03CR) 10Milimetric: Sort articles by rank in top endpoint (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) (owner: 10Milimetric) [19:24:32] nuria: yes, but it's ok even if it's not [19:24:44] it only does weird things if many entries have rank: null, which shouldn't be possible [19:24:53] ok, so I'll merge and deploy this then [19:24:59] milimetric: i will merge [19:25:03] (03CR) 10Nuria: [V: 032 C: 032] Sort articles by rank in top endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/475131 (https://phabricator.wikimedia.org/T210091) (owner: 10Milimetric) [19:37:21] hm, I get this when building aqs, any thoughts: [19:37:22] https://www.irccloud.com/pastebin/vAeTlovZ/ [19:37:38] (running sudo ./server.js build --deploy-repo --force -c config.test.yaml) [19:41:35] milimetric, maybe not sudo? dunno, don't remember having this problem when Fran and I deployed last [19:42:10] mforns: thanks, yeah, fixing my docker now to work without sudo [19:42:54] milimetric, oh yea, I remember having to run a command and restart session for docker to work properly... [19:43:36] I’ll update docs if this works [19:47:11] (03PS1) 10Milimetric: Update aqs to 402e9ae [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/475141 [19:47:34] (03CR) 10Milimetric: [V: 032 C: 032] Update aqs to 402e9ae [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/475141 (owner: 10Milimetric) [19:49:13] !log deploying AQS [19:49:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:50:01] milimetric: Let's test on aqs1004 once canary deploy is finished before moving on to the rest ;) [19:50:10] yep, that's how I roll [19:50:16] Great [19:55:08] ok joal, confirmed all good, and deploy is done [19:55:19] awesome milimetric :) Many thanks for that ! [19:59:56] (03PS1) 10MarcoAurelio: Bump dependencies in requirements.txt to their latest versions [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) [20:00:32] (03CR) 10jerkins-bot: [V: 04-1] Bump dependencies in requirements.txt to their latest versions [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) (owner: 10MarcoAurelio) [20:01:26] (03PS2) 10MarcoAurelio: Bump dependencies in requirements.txt to their latest versions [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) [20:01:59] (03CR) 10jerkins-bot: [V: 04-1] Bump dependencies in requirements.txt to their latest versions [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) (owner: 10MarcoAurelio) [20:02:42] (03CR) 10MarcoAurelio: "Some flake8 errors make CI fail for this repo." [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) (owner: 10MarcoAurelio) [20:03:20] (03CR) 10MarcoAurelio: "Note: compatibility of the updated dependencies should be tested between them all." [analytics/wikimetrics] - 10https://gerrit.wikimedia.org/r/475142 (https://phabricator.wikimedia.org/T209945) (owner: 10MarcoAurelio) [20:16:19] 10Analytics, 10Analytics-Kanban, 10Pageviews-API, 10Patch-For-Review: Pageviews top endpoint in descending order as of 2018-11-20 - https://phabricator.wikimedia.org/T210091 (10Milimetric) The fix for this has been deployed, but it'll take a while to clear the cache. Sorry for the inconvenience. [20:18:03] nuria, yes, adding the jars that otto used do solve the issue with EL san. [20:18:34] I will push the fix in refinery-source and create a fix in puppet to add the jars to eventlogging_to_druid_job.pp [20:57:58] \o [21:27:39] 10Analytics, 10Analytics-Kanban: [EventLogging Sanitization] Fix Refine parameters and cdh jars to unbreak production - https://phabricator.wikimedia.org/T210110 (10mforns) [21:29:34] 10Analytics, 10Analytics-Kanban: [EventLogging Sanitization] Fix Refine parameters and cdh jars to unbreak production - https://phabricator.wikimedia.org/T210110 (10mforns) When working on this, I discovered that we need to add some extra jars to the execution, otherwise, the same problems observed in Refine a... [21:30:35] (03PS1) 10Mforns: Correctly pass input_path_regex to Refine from EventLoggingSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/475220 (https://phabricator.wikimedia.org/T210110) [21:35:16] No partition predicate found for Alias "mediawiki_user" Table "mediawiki_user" [21:35:16] :( [21:35:49] addshore: need help? [21:35:58] yup :D [21:36:05] addshore: with what query? [21:36:27] I was trying to write a hive query to get me wikidata item creations by user essentially [21:36:42] https://www.irccloud.com/pastebin/TYFg3kBi/ [21:36:56] as far as I can tell I just need to add a join onto that to get the user name [21:36:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [EventLogging Sanitization] Fix passing of input_path_regex params to Refine - https://phabricator.wikimedia.org/T210110 (10mforns) [21:39:13] addshore: that sql ... [21:39:22] :P [21:39:37] addshore: you mean "create table table_blah as select . blah " [21:40:20] addshore: of with blahTable as (select some from other table) select blah1 from blahTable [21:41:01] addshore: or you are getting all data from 1 table, wait [21:41:21] so I was trying to join on wmf_raw.mediawiki_history [21:41:33] but maybe that isn't the best approach? [21:42:25] I think I should also be able to use wmf.mediawiki_user_history [21:43:23] addshore: let me see.... the error " "mediawiki_user" Table "mediawiki_user"" cannot come from the sql above as that table is not in there [21:43:39] nope, wait, let me paste the one that gave the error too [21:43:40] sorry :P [21:44:02] addshore: i see, you need user names for the query above [21:44:07] addshore: ok, sorry, i get it [21:44:09] yup! [21:44:09] :D [21:44:10] sorry [21:45:00] SELECT u.user_name, COUNT(*) AS sum FROM wmf.mediawiki_page_history h LEFT OUTER JOIN wmf_raw.mediawiki_user u ON ( u.user_id = h.caused_by_user_id ) WHERE h.wiki_db = 'wikidatawiki' AND u.wiki_db = 'wikidatawiki' AND h.page_namespace = 0 AND h.caused_by_event_type = 'create' AND h.snapshot = "2018-05" AND u.snapshot = "2018-10" GROUP BY u.user_name ORDER BY sum DESC LIMIT 100 [21:45:02] bah [21:45:08] https://www.irccloud.com/pastebin/565nisT4/ [21:45:17] ^^ thats the sort of query i was trying when i got the error [21:46:26] addshore: ok, partitions are different in mediawiki_user and mediawiki_page_history [21:46:45] addshore: you can see partitions in a table doing: [21:46:56] addshore: show create table mediawiki_page_history [21:47:19] addshore: you will see mediawiki_user is partition by snapshot and wiki [21:47:27] okay [21:47:48] yup, i see that [21:47:48] addshore: and mediawiki_page_history is partitioned by snapshot alone [21:49:58] so, my terrible hadoop knowledge tells me that I should always try to query by the partitions, which as far as I can tell I am, but I guess there is an issue because the partitions are different between the 2 tables? [21:51:16] addshore: mmm, that looks good in your query though ... both partitions are on where clause, one sec [21:52:19] then, my mind in its sleepy guessy mode thought maybe it is just having issues because the partitions have the same name accross different tables in different dbs and something is just getting confused [21:52:20] :p [21:53:29] addshore: i think partitions need to be on the join too, one sec [21:53:42] ooooh [21:54:20] addshore: [21:54:23] https://www.irccloud.com/pastebin/6J4SXTBJ/ [21:54:33] addshore: just a guess but this does not error [21:54:39] oooh, okay [21:55:19] I think the snapshots differ between the 2 tables, but your right, it needs the partition in the join :D [21:55:27] https://www.irccloud.com/pastebin/zXdEDcNg/ [21:55:33] ^^ that is enough, with just the wikidb in it [21:55:41] addshore: ah sorry, yes, they need to be specified explicitily [21:55:55] Thanks! :) it probably would have taken me another hour to eventually figure that out [22:26:37] 10Analytics, 10Product-Analytics: Event counts from Mysql and Hive don't match - https://phabricator.wikimedia.org/T210006 (10Nuria) By looking at some of this data I can see that web crawler events are getting into hive but not into mysql (that would be something for us to fix). You can see that by comparing... [22:26:52] 10Analytics, 10Product-Analytics: Event counts from Mysql and Hive don't match. Hive is persisting data from crawlers. - https://phabricator.wikimedia.org/T210006 (10Nuria) [22:52:26] (03CR) 10Nuria: [C: 032] Correctly pass input_path_regex to Refine from EventLoggingSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/475220 (https://phabricator.wikimedia.org/T210110) (owner: 10Mforns) [22:58:15] (03Merged) 10jenkins-bot: Correctly pass input_path_regex to Refine from EventLoggingSanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/475220 (https://phabricator.wikimedia.org/T210110) (owner: 10Mforns) [22:59:30] thx nuria, I was looking into fixing the EL alarms, and I think we just need to enable RefineMonitor (as Andrew suggested), I created a puppet change [22:59:51] mforns: ok, sounds good [23:37:14] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: connect to address 10.64.36.107 port 5666: Connection refused