[04:31:27] Analytics, Discovery, MediaWiki-General-or-Unknown, Services, and 5 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1386418 (Mattflaschen) This could also be seen as a replacement for some use cases of MW hooks and hook listeners. Right now, if the hook li... [06:11:25] (PS6) Madhuvishy: Add oozie job to schedule mobile app session metrics spark job. [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876) [06:47:48] Analytics-Tech-community-metrics, Engineering-Community: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1386502 (Qgil) NEW [06:59:56] Analytics-Tech-community-metrics, Engineering-Community: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1386519 (Qgil) Meanwhile, http://korma.wmflabs.org/browser/scr.html shows in "Code review users vs. Code review com... [07:03:55] Analytics-Tech-community-metrics, ECT-June-2015: Active changeset *authors* and changeset *reviewers* per month - https://phabricator.wikimedia.org/T97717#1386521 (Qgil) This data is also useful to check what is going on at {T103292}. [07:10:12] Analytics-Tech-community-metrics, ECT-June-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1386528 (Qgil) Related, although at this point this cannot be considered a blocker of this goal: {T103292} I'm realizing that we hav... [09:06:44] Quarry, Easy: String "Your query is currently executing" should be "This query..." - https://phabricator.wikimedia.org/T103275#1386635 (Aklapper) [09:53:51] Analytics-EventLogging, Beta-Cluster: puppet agent disabled on beta cluster deployment-eventlogging02.eqiad.wmflabs instance - https://phabricator.wikimedia.org/T96921#1386731 (hashar) a:yuvipanda [09:54:15] Analytics-EventLogging, Beta-Cluster: puppet agent disabled on beta cluster deployment-eventlogging02.eqiad.wmflabs instance - https://phabricator.wikimedia.org/T96921#1386732 (hashar) Open>Resolved Puppet has been reenabled by @Yuvipanda and it is passing. [10:12:34] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01 - https://phabricator.wikimedia.org/T103301#1386770 (hashar) NEW [10:13:17] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01 - https://phabricator.wikimedia.org/T103301#1386787 (hashar) ``` # cat /etc/resolv.conf ## THIS FILE IS MANAGED BY PUPPET ## ## source: modules/base/resolv.conf.labs.erb ## from: base::resolving domain eqiad... [10:14:14] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01 - https://phabricator.wikimedia.org/T103301#1386788 (hashar) a:hashar And puppet.conf ``` # cat /etc//puppet/puppet.conf # This file is managed by Puppet! [main] logdir = /var/log/puppet vardir = /var/lib/p... [10:17:09] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01 - https://phabricator.wikimedia.org/T103301#1386792 (hashar) That fixed the original issue: ``` Warning: Unable to fetch my node definition, but the agent run will continue: Warning: getaddrinfo: Name or service... [10:17:47] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01: Could not find class role::analytics::zookeeper::server - https://phabricator.wikimedia.org/T103301#1386801 (hashar) [10:17:57] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01: Could not find class role::analytics::zookeeper::server - https://phabricator.wikimedia.org/T103301#1386770 (hashar) a:hashar>None [10:28:40] Analytics, Beta-Cluster: deployment-kafka02 does not pass puppet: Error 400 on SERVER: $brokers[$::fqdn] is :undef, not a hash or array at /etc/puppet/modules/kafka/manifests/server.pp:194 - https://phabricator.wikimedia.org/T103304#1386829 (hashar) NEW [10:29:00] Analytics, Beta-Cluster: deployment-kafka02 does not pass puppet: Error 400 on SERVER: $brokers[$::fqdn] is :undef, not a hash or array at /etc/puppet/modules/kafka/manifests/server.pp:194 - https://phabricator.wikimedia.org/T103304#1386836 (hashar) And I rebooted the instance to get rid of the NFS shares. [10:36:44] Analytics, Beta-Cluster: deployment-kafka02 does not pass puppet: Error 400 on SERVER: $brokers[$::fqdn] is :undef, not a hash or array at /etc/puppet/modules/kafka/manifests/server.pp:194 - https://phabricator.wikimedia.org/T103304#1386847 (hashar) Removed the class `role::analytics::kafka::server` to b... [13:43:09] Analytics, Discovery, MediaWiki-General-or-Unknown, Services, and 5 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1387533 (Ottomata) BTW, T102082 is mainly about analytics eventlogging, but the confluent stuff would be good for an event bus used for appli... [13:52:23] Analytics, Beta-Cluster: Puppet does not pass on beta cluster instance deployment-zookeeper01: Could not find class role::analytics::zookeeper::server - https://phabricator.wikimedia.org/T103301#1387537 (Ottomata) Open>Resolved a:Ottomata Zookeeper classes have moved out of analytics:: context.... [13:57:19] ottomata: hello :) [13:57:49] ottomata: we have made some maintenance work on the beta cluster instances last week. Some had puppet reenabled and none shouldrely on NFS anymore [13:58:34] shouldrely > should rely [13:58:53] Analytics, Beta-Cluster: deployment-kafka02 does not pass puppet: Error 400 on SERVER: $brokers[$::fqdn] is :undef, not a hash or array at /etc/puppet/modules/kafka/manifests/server.pp:194 - https://phabricator.wikimedia.org/T103304#1387541 (Ottomata) Open>Resolved a:Ottomata Edited hieradata to... [13:59:09] hashar: cool danke [14:00:15] ottomata: the eventlogging had a huge number of puppet changes applied [14:00:23] not sure whether it is still functional though :( [14:03:47] ottomata: Heya [14:03:53] Do you have a minute ? [14:04:37] joal: yup, gimme 5 to finish checking my email [14:04:41] sure np [14:07:57] oook, joal wasssshappenin [14:08:25] I'd like you opinion on a dilmna about projectviews [14:08:31] batcave for aminuteb? [14:09:22] More precisely, it'a about hourly partitions or not [14:10:01] pros are: easier to paralelize, dataset to be easily schedule jobs easy to put in place [14:10:03] aye k [14:10:12] cons: a lot of folders/files for small data [14:40:13] ottomata: https://wikitech.wikimedia.org/wiki/Analytics/Data/Projectview_hourly Ok ? [14:41:40] +1 looks good! [14:41:45] Cool :) [14:41:52] thx [15:00:21] mili|away: hiii [15:00:22] oh away! [15:00:40] joal: FYI, I am testing deployment of the new eventlogging code in beta [15:00:51] Cool [15:00:57] Hey ottomata [15:01:07] oh hey [15:01:36] just forgot to change my nick [15:02:20] oook so. EL! [15:02:37] indeed EL [15:02:39] i'm gonna deploy in beta and also cherry pick the puppet changes on deployment-salt [15:02:49] that will let me test those puppet changes directly [15:02:53] you ok with me proceeding? [15:03:10] yeah, that works [15:03:50] if anyone's testing in beta we'd have to be ready to put it back if it's not working though [15:04:51] k [15:09:56] looking good milimetric [15:10:00] events coming all the way through [15:10:04] to all-events.log [15:10:14] sweet [15:10:44] so this is puppet working with the new code, but no kafka configuration yet [15:10:47] yup [15:10:49] exactly [15:11:00] oh, lemme see if I can consume raw events elsewhere now [15:12:31] mforns: I got a weird report about EL [15:12:37] milimetric, hey [15:12:40] The person said events are coming in [15:12:42] what was that? [15:12:46] but when using a new schema [15:12:52] the table isn't created [15:12:54] Analytics-Kanban: enforce policy for each Schema [8 pts] {tick} - https://phabricator.wikimedia.org/T102518#1387803 (kevinator) [15:13:01] I was thinking about our batching code and wondering if you tested that scenario [15:13:08] milimetric, mmmmm [15:13:11] I thought I checked for that, but now I'm not sure [15:13:48] milimetric, a new schema that never had an event before? [15:13:53] yes [15:14:14] milimetric, aha [15:15:09] milimetric, and what do they mean by "events are comming in"? [15:15:36] I guess they are passing validation... [15:15:56] yeah, like the logs are showing events going through the pipeline, then they end up in the db [15:16:12] so that works fine for existing tables, but not for new tables in this particular scenario [15:16:32] it could be a problem on their end (it's an Android app) but I wanted to make sure we tested this with the new batching code [15:17:04] if we didn't that's ok, I can test it now [15:19:10] milimetric, wouldn't it be because there aren't enought events yet in the queue? so the table has not been created yet? [15:19:31] the table is only created when the schema queue gets flushed [15:20:14] yeah, that could be, for sure, I'm not suggesting there's a problem [15:20:20] just wondering if we've tested (it's ok, I'm testing now) [15:20:33] milimetric, aha, I did not test that particular case [15:21:47] the table creating code remained unchanged, so my initial thought would be it should work. But yes, I should have tested that, thanks for spotting it [15:24:03] milimetric, just discarding hypothesis, how much time ago did this schema start sending events? [15:24:15] last week in beta [15:24:28] i'm sending some events through right now, but they're not making it through to all-events.log for some reason [15:24:41] mmm oh [15:24:42] they're in client-side-events.log so they might just be invalid [15:24:50] I see [15:26:06] btw, this is an example: [15:26:06] mw.eventLog.logEvent('MobileWikiAppArticleSuggestions', {"action":"shown", "appInstallID":"dan-test", "pageTitle":"dan-testing-page", "readMoreList": "one|two|three", "readMoreSource": 1}) [15:27:03] it might be because it's monday but how come "revision" isn't specified... [15:37:40] (CR) Joal: "Still one the previous ones :)" (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [15:39:11] (CR) Madhuvishy: Add oozie job to schedule mobile app session metrics spark job. (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [15:56:56] milimetric: can you merge this? [15:56:57] https://gerrit.wikimedia.org/r/#/c/218914/ [15:57:34] Analytics-Kanban: Gather information on all the schemas [13 pts] {tick} - https://phabricator.wikimedia.org/T103366#1387986 (mforns) NEW a:madhuvishy [15:58:27] ottomata: apparently not... I gave it +2 but I have no submit button [15:58:31] me neither... [15:58:39] weird [15:59:15] Analytics-Kanban: Gather information on all the schemas [13 pts] {tick} - https://phabricator.wikimedia.org/T103366#1387997 (mforns) p:Triage>Normal [15:59:51] madhuvishy, I created a task for you (copy-pasted from the one I have assigned in "in progress") [15:59:54] huh, edited permissions to allow analytics devs to submit [15:59:57] not sure why we couldn't before [16:00:43] Analytics-Kanban: Gather information on all the schemas [13 pts] {tick} - https://phabricator.wikimedia.org/T102515#1388020 (mforns) [16:14:09] Analytics-Cluster, Analytics-Kanban: Add Pageview aggregation to Python {musk} [13 pts] - https://phabricator.wikimedia.org/T95339#1388117 (kevinator) [16:34:16] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [30.0] [16:34:57] ok, EL alert, but i think everything loks fine? [16:34:57] hm [16:35:10] as far as I can tell [16:35:20] joal: i just did an eventlogging deploy, but everything looks ok. [16:35:30] on eventlog1001 [16:35:40] Analytics-Cluster, Analytics-Kanban: Deploy oozie reporting of last-access counts - https://phabricator.wikimedia.org/T103376#1388219 (kevinator) NEW [16:35:42] hmm [16:35:53] ottomata: In standup, will double check [16:35:59] Analytics-Cluster, Analytics-Kanban: Deploy oozie reporting of last-access counts {bear} - https://phabricator.wikimedia.org/T103376#1388219 (kevinator) [16:36:01] ottomata: in tasking sorry [16:39:01] ok i see a number of server side events not validating [16:39:07] or [16:39:08] not able to process [16:39:12] no indication as to why [16:39:42] hmm, could be all server sides [16:40:23] hm, no seq id anymore! [16:40:24] HMMM [16:40:36] AH go tit [16:42:20] ottomata: where from ? [16:42:38] joal: https://gerrit.wikimedia.org/r/#/c/219853/ [16:42:40] my fault [16:42:42] i broke it [16:43:00] hm [16:43:03] Happens ! [16:43:11] Thanks for having found it ! [16:43:21] Let me know if you need me to step up from meeting [16:43:23] good thing for alerts! [16:43:25] ottomata: --^ [16:44:20] joal: when you are done maybe you can teach me about backfilling server side events :) [16:44:31] huhuhu :) [16:44:36] will do ! [16:44:42] Very manual ;) [16:56:00] Analytics-Kanban: enforce policy for each Schema [8 pts] {tick} - https://phabricator.wikimedia.org/T102518#1388328 (kevinator) Friendliest thing to do is to truncate data in table... but do not delete the table (just in case someone is doing a join on it, this won't break the code). [16:59:37] ottomata: Icinga didn't told me that everything is back to normal [16:59:42] ottomata: Is that notrmal ? [17:01:06] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [17:01:48] hm [17:01:51] Ok :) [17:01:55] Thanks ;) [17:02:02] yes everything is normal [17:05:41] Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T102517#1388349 (kevinator) [17:07:24] Analytics-EventLogging, Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T103380#1388358 (kevinator) NEW [17:07:49] Analytics-EventLogging, Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T102517#1388367 (kevinator) a:mforns [17:07:56] Analytics-EventLogging, Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T103380#1388369 (kevinator) a:madhuvishy [17:08:20] Analytics-EventLogging, Analytics-Kanban: Reach out to half the schema owners [8 pts] {tick} - https://phabricator.wikimedia.org/T103380#1388358 (kevinator) p:Triage>Normal [17:09:04] Analytics-Backlog, Analytics-Cluster: Deploy oozie reporting of last-access counts {bear} - https://phabricator.wikimedia.org/T103376#1388376 (kevinator) [17:10:10] Analytics-Cluster, Analytics-Kanban, Patch-For-Review, Performance: Implement Unique Clients report on cluster using x-analytics header & last access date {bear} [13 pts] - https://phabricator.wikimedia.org/T92977#1388381 (kevinator) We have enough data to start the validation task... I recommend we... [17:20:24] Analytics-Backlog, Analytics-EventLogging: Can Search up sampling to 5%? {oryx} - https://phabricator.wikimedia.org/T103186#1388420 (kevinator) [17:20:26] Analytics-EventLogging, Analytics-Kanban: Load Test Event Logging {oryx} [8 pts] - https://phabricator.wikimedia.org/T100667#1388421 (kevinator) [17:21:25] Analytics-EventLogging, Analytics-Kanban: Open tcp ports 8421 and 8422 to eventlog1001 to the Analytics VLAN - https://phabricator.wikimedia.org/T103381#1388430 (Ottomata) NEW a:Ottomata [17:21:50] Analytics-EventLogging, Analytics-Kanban, operations: Open tcp ports 8421 and 8422 to eventlog1001 to the Analytics VLAN - https://phabricator.wikimedia.org/T103381#1388430 (Ottomata) a:Ottomata>akosiaris [17:23:29] Analytics-Backlog: Change mediawiki-storage api queries to adapt to the api changes [5 pts] {crow} - https://phabricator.wikimedia.org/T101539#1388450 (kevinator) [17:23:35] Analytics-EventLogging, Analytics-Kanban: Prep work for Eventlogging on Kafka {stag} - https://phabricator.wikimedia.org/T102831#1388453 (akosiaris) [17:23:38] Analytics-EventLogging, Analytics-Kanban, operations: Open tcp ports 8421 and 8422 to eventlog1001 to the Analytics VLAN - https://phabricator.wikimedia.org/T103381#1388451 (akosiaris) Open>Resolved Done and tested. Resolving [17:26:20] Analytics-Backlog: Change mediawiki-storage api queries to adapt to the api changes [5 pts] {crow} - https://phabricator.wikimedia.org/T101539#1388454 (Milimetric) [17:26:21] HOOo boy there they go milimetric, joal: [17:26:34] ottomata: wut ? [17:26:45] https://gist.github.com/ottomata/4e7a0241d44d16782b99 [17:27:19] neat ! [17:28:02] COOOL EventError is working too! [17:28:16] This part is REALLY cool ! [17:29:33] pretty COOOOOoool [17:29:36] ok lunch time! [17:29:57] Analytics, Engineering-Community, ECT-June-2015: Research & Date offsite - https://phabricator.wikimedia.org/T103382#1388470 (Rfarrand) NEW a:Rfarrand [17:30:20] Analytics, Engineering-Community, ECT-June-2015: Analytics Team offsite - https://phabricator.wikimedia.org/T103383#1388477 (Rfarrand) NEW a:Rfarrand [17:30:25] milimetric: joal, try it on stat1002! [17:30:29] kafkacat -C -t eventlogging_Edit -b analytics1012 [17:30:49] Analytics, Engineering-Community, ECT-June-2015: Analytics Team Offsite - Before Wikimania - https://phabricator.wikimedia.org/T90602#1388492 (Rfarrand) [17:31:01] wooow, that's a few events ;) [17:32:19] (PS1) Joal: Fix 2 projectview_hourly aggregation bugs. [analytics/refinery] - https://gerrit.wikimedia.org/r/219862 [17:32:40] ottomata: --^ [17:32:52] If you have 5 mins, that would be great ;) [17:33:17] ottomata: I need to keep focused on pageviews ... I WILL COME EL soon :) [17:33:29] Analytics, Analytics-Backlog: Reportupdater: put history and pid files inside the project folder [5 pts] {lamb} - https://phabricator.wikimedia.org/T103385#1388497 (kevinator) NEW [17:33:39] ottomata: I want to play with that and sparkstreaming ;) [17:34:06] OHHHH joal [17:34:08] yeah, hm [17:34:13] we probably shoudln't call that column count, eh? [17:34:18] is it too late? [17:34:29] it is for pageviews, yeah [17:34:49] I don't know how parquet handles column renaming [17:35:01] oof, we are already doing pageviews in prod? [17:35:03] hm. [17:35:04] Analytics, Analytics-Backlog: Debug blank datafiles generated by generate.py [8 pts] {lamb] - https://phabricator.wikimedia.org/T103387#1388517 (kevinator) NEW [17:35:05] Hive would do that easy, but I don't know parquet [17:35:15] ja would have to regenerate data likely [17:35:25] that is going to be annoying. [17:35:26] pageviews have been backfilled over may, yeah [17:35:46] do we ahve a prod job running htough? [17:35:50] oozie? generating hourly pageviews/ [17:35:50] ? [17:35:52] yes we do [17:35:52] not yet, right? [17:35:54] oh we d. [17:35:55] do. [17:35:55] ok. [17:35:56] hm. [17:36:07] errrf [17:36:25] so june pageview data is mostly in then? [17:36:30] yeah, that count name was not the best choice I have made :( [17:36:40] june is up to date [17:36:45] hm. [17:36:48] i didn't catch it eitiher [17:37:18] is it worth actually fixing? it would be easier to do sooner rather than later? we could just recreate the files from the existing ones. [17:37:22] make a new table with a different column name [17:37:25] select from insert into... [17:37:28] I can double check if we can rename column ezasily [17:37:39] hm [17:37:40] then afterwords move the data in place [17:37:57] parquet issue though --> column named in metadata [17:38:15] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1388545 (Milimetric) It seems like the graphs on http://edit-reportcard.wmflabs.org/ are outdated, so we can remove those, but wou... [17:38:30] eys [17:38:33] i don't htink we can rename [17:38:40] i think we'd ahve to create new data from the existing one [17:38:43] then replace the old data [17:38:47] and drop the old table and create the new one [17:39:00] right ... [17:40:28] Shall I go that way ? [17:40:41] I don't like it, but I don't like the count either [17:42:05] Analytics-Backlog: list of tasks to present to volunteers at wikimania - https://phabricator.wikimedia.org/T102980#1388568 (kevinator) p:Triage>Normal [17:54:06] ottomata: I'll do the change like now if you think it's necessary [17:56:12] Analytics, Analytics-Backlog, Mobile-Web: Debug blank datafiles generated by generate.py [8 pts] {lamb] - https://phabricator.wikimedia.org/T103387#1388645 (Jdlrobson) [17:58:05] ottomata: out of meeting now [17:59:17] ottomata: What name should we use then: pageview_count ? [18:14:35] joal: hi sorry, now i'm in meeting! [18:14:48] um, hm. i think it is probably the right thing to do, as painful as it is [18:14:53] it will be less painful to fix now [18:15:00] at least we don't have to recompute the data from the webrequest table [18:15:08] hm [18:15:23] name. hm [18:15:28] aggregate_count [18:15:28] ? [18:15:50] view_count [18:15:50] ? [18:22:37] ottomata: legacy pageview have: count_views [18:23:18] Analytics-EventLogging, Analytics-Kanban, WikiEditor, VisualEditor 2014/15 Q4 blockers: Reduce WikiEditor 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) - https://phabricator.wikimedia.org/T103036#1388729 (Jdforrester-WMF) Open>Resolved [18:23:22] I suggest we change, and use either view_count, or total_count ? [18:23:27] ottomata: --^ [18:24:14] I'll start the column shift tonight and will self-merge / deploy the thing tomorrow morning if it's done (and if you give me your blesdsing [18:24:21] ottomata: --^ [18:24:22] view_count is good [18:24:34] totally cool with that [18:24:44] unless, you think we might want to sorta standardize something for other tables too [18:24:44] Ok great [18:24:47] and always use the same coulmn name [18:24:51] aggregate_count might be good? [18:24:52] hm. [18:24:58] i like view_count better aesthetically though [18:25:02] it is clearer what it is [18:25:09] Good for me [18:25:11] ok ok , reasoned with myself [18:25:14] view_count, lets do it [18:25:25] I'll put the same name in projectview, ok ? [18:30:05] yes [18:30:19] thanks joal, sorry about that, i think this is the right thing to do, right? [18:30:31] This is the right thing to do [18:30:43] But I am kinda fed up with backfilling the stuff ;) [18:30:59] * joal thinks "This too shall pass" [18:31:05] :D [18:35:54] Diner time, will be back after ! [18:38:24] Analytics-EventLogging, Analytics-Kanban, WikiEditor, VisualEditor 2014/15 Q4 blockers: Reduce WikiEditor 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) - https://phabricator.wikimedia.org/T103036#1388807 (Jdforrester-WMF) [18:41:25] joal: you are awesome :) [19:02:40] (PS7) Madhuvishy: Add oozie job to schedule mobile app session metrics spark job. [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876) [19:28:51] how often is stat1002 -> datasets rsync? [19:29:04] because I've been twiddling my thumbs for...a while now [19:31:31] (CR) Joal: [C: 2 V: 2] "Ok let's merge that :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/216009 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [19:32:32] joal: thank you :) [19:34:21] h;) [19:36:48] Ironholds: should be every 30 mins [19:37:11] Time for me to sleep folks ! [19:37:16] See you tomorrow ! [19:58:39] heyaaa mili|lunch! [19:59:17] what is with me and my nicks [19:59:23] hey hey [19:59:24] https://gist.github.com/ottomata/da72913f664d772477c3 [19:59:36] it is way easier to work with variable schemas in spark than in hive [19:59:45] s/variable/nested [19:59:48] or complex [20:00:04] in order to use hive to do this, you'd have to create a table with levels of Hive Structs that match the json schema [20:00:09] and then add partitions [20:00:10] and then query it [20:00:25] we might want to figure that out for certain high value eventlogging schemas [20:00:28] Edit maybe? i dunno! [20:00:37] what's all this: _Edit.18.0.3507.69750.1434999600000 [20:00:39] but spark is much easier [20:00:43] that's the file name that camus wrote [20:00:52] oh sorry [20:00:55] we can leave that off [20:00:58] and just specify the dir [20:01:03] edting [20:01:24] if we weren't writing sequence files [20:01:26] which we don't have to [20:01:36] is it possible to do something like hourly/2015/06/* ? [20:01:39] then that woudl remoe that step of converting the seqeuence file to a sgring [20:01:44] hm, i don't think so, let me try [20:01:52] you can pass multple dirs to the func [20:01:55] 'cause i mean, that's what partitions let you do, right? [20:02:02] yes, indeed. [20:02:19] this is why madhu had to write a thing for mobile sessions job to map date ranges to directoreis [20:02:27] because we weren't able to use HiveContext at the time [20:03:55] oh is that whole issue sorted now? [20:04:03] like, it doesn't scan all the partitions anymore? [20:04:23] i think joseph figured out why it was doing that, and was able to have it not do it by setting a flag [20:04:36] cool [20:04:49] ok, this is great [20:05:12] so, how is this deployed? I saw the list of topics you pasted above, but didn't read through what you guys were saying [20:05:27] are all events in prod going into both mysql and hdfs? [20:05:39] cool! * works milimetric! [20:05:54] milimetric: the hdfs part is manual at the moment, it is whenever i run camus [20:06:04] sweet, that will work nicely then, I don't really see a need for partitions [20:06:07] i will probably automate that now [20:06:19] ok, cool, but it's going into the topics [20:06:32] yup [20:06:43] (btw, I think Joseph and Madhu had that task to do at some point soon) [20:06:44] i have an eventlogging-processor running on analytics1010 [20:06:49] so you just stole it from them :) [20:07:04] heheh [20:07:25] now the last hurdle is really getting mediawiki data into hdfs so we can finally have one place to query everything [20:07:30] it is mostly a matter of saying: timestamp.field = timestamp, timestamp.format = unix_seconds, whitelist.topics=eventlogging_* [20:07:33] and then running camus :) [20:07:39] sweet, that's great [20:11:27] so um, yeah, i did puppetize this on analytics1010 [20:11:35] so it isnt' 'production' because that is a cisco and i don't want to rely on it [20:11:44] but, running camus regularly won't hurt [20:11:50] hm, milimetric, can we batcave for a min? [20:12:21] brt [20:21:48] milimetric: around? [20:26:38] mforns: can i test your patch on vagrant? or does that not work anymore? [20:26:54] mforns: the wikimetrics one [20:26:58] madhuvishy, you mean wikimetric's [20:27:00] madhuvishy, ok [20:27:20] madhuvishy, yes you can test in vagrant :] [20:27:27] mforns: alright [20:27:39] remember to alembic upgrade/ downgrade -1 [20:27:52] thanks [20:28:03] mforns: cool [20:32:47] madhuvishy: sorry yes, in batcave with 'drew tho [20:33:01] milimetric: no problem, ping when free :) [20:33:07] will do [20:41:14] mforns: I tried a cohort with non-latin chars in description - and ran a report for it - and it all went fine. is there anything else i should test? [20:42:49] ottomata, ping [20:49:33] sod it, I'll send an email [20:49:46] madhuvishy, in this case, the difficult part is the deployment, because of the database collations [20:50:08] madhuvishy, if you enter your vagrant mysql and use: "show full columns in report" [20:50:23] madhuvishy, which collation do you get for the description column? [20:51:23] hmmm mforns i dont see description.. [20:51:41] madhuvishy, sorry, parameters [20:51:52] not description [20:51:56] mforns: collation is null [20:52:08] after applying the migration right? [20:52:14] mforns: yes [20:52:48] madhuvishy, ok, you can try: alembic downgrade -1 to undo the migration and try again to see the collation? [20:53:49] mforns: now its utf8_general_ci [20:54:21] madhuvishy, cool, it seems you have an identical setup as in production [20:55:29] mforns: cool. so it all looks good I guess. the code lgtm. Should I merge it? [20:55:38] madhuvishy: k [20:55:51] madhuvishy, ok! [20:55:59] :] [20:56:26] (CR) Madhuvishy: [C: 2 V: 2] "Verified on my local. LGTM." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/219373 (https://phabricator.wikimedia.org/T100781) (owner: Mforns) [20:56:36] o/ [20:56:40] \o [20:57:07] :) [20:57:15] madhuvishy: so what's up [20:57:24] milimetric: are you back? my question was this - https://meta.wikimedia.org/wiki/Schema:SendBeaconReliability - Matt Flaschen mentioned this was an analytics project [20:57:30] yes, back [20:57:37] uh... yea, Nuria worked on it [20:57:55] he said - Developed as part of an analytics project to measure reliability of sendBeacon. You can list either me or the analytics team as owner. [20:57:56] you know, I have no idea if it's a good thing to leave up [20:57:59] I'd say leave it up [20:58:07] and you can list our team as the owner [20:58:23] it's basically used to help with technical decisions for client side EL code going forward [20:58:25] milimetric: okay - and keep Matt as contact? or someone from our team? [20:58:35] you can keep me or Nuria as contact, that's fine [20:58:53] milimetric: cool thanks. but it's inactive as of now right? [20:59:00] I'm not sure what to do about old data, I'd rather talk to Nuria first [20:59:06] lemme check [20:59:06] Ironholds: don't fully understand [20:59:16] ottomata, compare the two files I linked? [20:59:17] milimetric: aah, alright. we've time to decide that. [20:59:22] thanks [20:59:45] 8d [@stat1002:/a/aggregate-datasets/search] $ tail -n 5 app_event_counts.tsv [20:59:45] 2015-06-21 "Result pages opened" "Android" 41435 [20:59:45] 2015-06-21 "search sessions" "iOS" 513100 [20:59:45] 2015-06-21 "clickthroughs" "iOS" 425224 [20:59:45] 2015-06-21 "search sessions" "Android" 9876 [20:59:45] 2015-06-21 "clickthroughs" "Android" 8249 [20:59:54] root@stat1001:/srv/aggregate-datasets/search# tail -n 5 app_event_counts.tsv [20:59:55] 2015-06-21 "Result pages opened" "Android" 41435 [20:59:55] 2015-06-21 "search sessions" "iOS" 513100 [20:59:55] 2015-06-21 "clickthroughs" "iOS" 425224 [20:59:55] 2015-06-21 "search sessions" "Android" 9876 [20:59:55] 2015-06-21 "clickthroughs" "Android" 8249 [21:00:13] ottomata, okay [21:00:19] now go to http://datasets.wikimedia.org/aggregate-datasets/search/ and manually open the file [21:00:41] Ironholds: is it just me or are these files different formats? [21:00:55] that link does not have the device level dimension [21:00:58] madhuvishy: k, there's no data since May, but I thought either it stopped a while ago or it didn't stop. So that's confusing. Makes me think it shouldn't have stopped and it's failing on accident somehow. Let's not truncate for now, leave me as the contact, and I'll shoot Nuria an email and see what she thinks [21:01:02] they are different formats [21:01:06] I was switching formats when it stopped working [21:01:07] so that is what is wrong? [21:01:09] ah [21:01:16] so, those are the old files ;p [21:01:32] that it's in /srv/ is great but it doesn't seem to be reflected publicly [21:01:33] milimetric: yeah alright. I put your name down on the sheet. [21:02:04] hmmm [21:02:09] it is correct in the document root [21:02:32] the docroot has a symlink [21:02:33] so it is the same file [21:02:38] weird, uhh, cached? [21:02:39] mforns: so we are still waiting on Prateek, Deskana, Amir and Ori - apart from Adam and Aaron who said they'll get back soon. [21:03:20] madhuvishy, exactly, I've checked that all stated projects are currently valid links [21:03:26] and marked them green [21:03:37] Ironholds [21:03:38] curl http://datasets.wikimedia.org/aggregate-datasets/search/app_event_counts.tsv [21:03:38] mforns: oh great [21:03:51] I was thinking on starting to send requests for purging [21:03:54] i think our browsers are caching the old file [21:04:24] hmmm wai tno [21:04:36] madhuvishy, should we keep our owner divisions, so you follow up with the ones you already sent the first email? and I follow up with the ones I sent? [21:04:38] whoa [21:04:42] i get different results [21:04:43] it is varnish [21:04:46] mforns: aah. cool [21:04:52] ottomata, argh varnish [21:05:03] the two different varnish servers are caching different versions of the file [21:05:04] hehehe [21:05:06] mforns: we need to ask them to choose between the three things we talked about today [21:05:11] ? [21:05:20] madhuvishy, mmm [21:05:24] good question :] [21:05:36] yes, I guess so [21:05:59] mforns: need to change the sheet a little bit to reflect that [21:06:05] aha [21:07:51] madhuvishy, also ask when we can delete the data older than 90 days [21:08:00] ? [21:08:14] mforns: yeah - we might also get aggregation requests [21:08:15] ottomata, and the fix is..? [21:08:26] don't know! [21:08:31] i am asking bblack in #ops [21:09:21] madhuvishy, do you think we should anticipate this in our email? like: "if you want the data to be aggregated..."? [21:11:06] madhuvishy, I feel like if we need to implement aggregations ourselves this will take a long time [21:12:07] (PS1) Ottomata: Add camus/camus.eventlogging.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/219981 (https://phabricator.wikimedia.org/T98784) [21:12:48] (PS2) Ottomata: Add camus/camus.eventlogging.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/219981 (https://phabricator.wikimedia.org/T98784) [21:13:02] (PS3) Ottomata: Add camus/camus.eventlogging.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/219981 (https://phabricator.wikimedia.org/T98784) [21:15:19] Ironholds: i'm not sure what to do [21:15:23] what's breaking? [21:15:30] surely if you wait everything will fix itself [21:15:32] :) [21:15:56] ottomata, if I wait an indefinite period of time the dashboards will stop breaking the instance anyone looks at them, yes [21:15:57] (CR) Ottomata: [C: 2 V: 2] Add camus/camus.eventlogging.properties [analytics/refinery] - https://gerrit.wikimedia.org/r/219981 (https://phabricator.wikimedia.org/T98784) (owner: Ottomata) [21:16:16] funny story; guess how big a fan customers are of "it'll fix itself some time in the next Inf" [21:18:04] haha [21:19:59] Ironholds: I'd say: file a ticket, CC me and bblack? I'll update it with what I know. [21:20:03] no response in #ops [21:23:36] ottomata, sure [21:23:42] hello ottomata. when you have 5 min, can you let me know? :-) [21:24:16] ottomata, which project? [21:26:36] Analytics-Engineering: Varnish caching around datasets.wikimedia.org is causing breakages - https://phabricator.wikimedia.org/T103423#1389620 (Ironholds) NEW [21:26:55] Analytics-Engineering: Varnish caching around datasets.wikimedia.org is causing breakages - https://phabricator.wikimedia.org/T103423#1389628 (Ironholds) [21:28:28] ottomata, done [21:29:13] leila: i ahve a quick 5 minutes rigghtht nowowwow [21:29:22] great, ottomata. [21:29:35] so, it's re Bob's question about mysql dump of enwiki revision table. [21:30:08] Can you let me know who I should contact that has the previllages to help him with change of parameters, ottomata? Bob can provide the query. [21:31:11] Analytics-Engineering: Varnish caching around datasets.wikimedia.org is causing breakages - https://phabricator.wikimedia.org/T103423#1389675 (Ottomata) Ja, one of the two misc eqiad varnish hosts has invalid cached data. I don't know how to purge this. ``` curl -H 'Host: datasets.wikimedia.org' http://cp1... [21:31:26] mforns: sorry [21:31:32] stepped away for a sec [21:31:33] oh rihgt sorry [21:31:34] um [21:31:45] madhuvishy, np! [21:31:59] np ottomata. just want to move with this forward as we are under time pressure. :-) [21:32:26] mforns: I'm not sure myself [21:32:34] leila: why is sqoop too slow? [21:33:03] not sure, Bob said that he tried it a month ago and it was too slow. he wasn't sure why [21:33:32] leila: um, i'd say contact sean pringle and/or Jaime Crespo (jynus) [21:33:47] did you know there is now a #wikimedia-databases channel? :D [21:33:55] or email ops@ [21:34:01] I did NOT! :D [21:34:09] or make a phab ticket and CC folks (including me) [21:34:12] I will. thanks for the pointers, ottomata. :-) [21:34:20] madhuvishy, how about: http://etherpad.wikimedia.org/p/analytics-design line 82? [21:34:41] yup! [21:35:16] madhuvishy, I wrote this really crappy, just as a starting point [21:35:23] mforns: looking. [21:39:23] mforns: delete the whole schma [21:39:23] delete revions that haven't had data in 90 days [21:39:23] keep all the revisions - these were the three things we came up in tasking today [21:39:41] madhuvishy, aha [21:40:00] mforns: i'm wondering if the email reflects all three [21:40:18] madhuvishy, I don't think so [21:41:14] mforns: it would be nice to have a form to send out with radio buttons to choose [21:41:25] for each schema [21:41:34] madhuvishy, that's a good idea :] [21:42:13] mforns: ha ha but i dont know how [21:42:29] madhuvishy, we should think beforehand though, if we can easily do what we offer [21:42:55] mforns: yeah. it must be late for you now! [21:42:58] I mean if we can easily so selectively delete the schemas and schema revisions, depending on their contents [21:43:29] automatically [21:43:40] mforns: hmmm, yeah [21:43:48] i dont know a lot about that [21:44:11] Analytics-Engineering, operations: Varnish caching around datasets.wikimedia.org is causing breakages - https://phabricator.wikimedia.org/T103423#1389771 (Ottomata) [21:44:20] madhuvishy, me neither, and sean didn't answer to my email from friday [21:44:39] mforns: hmmm, it seems like we should talk more before reaching out about this [21:44:44] btw, I didn't copy you on that one [21:44:50] byeee [21:44:54] madhuvishy, yes, makes sense [21:45:08] we can talk in the morning after standup if it's too late for you now [21:45:18] and please, feel free to change everything in my email suggestion [21:45:36] madhuvishy, yes, I think I'll wrap up now [21:45:41] mforns: :) we can rewrite it if needed after we know everything [21:45:47] cool! ttyl then :) [21:45:51] good night [21:45:54] ok, thanks! good night [23:06:14] Analytics, Discovery, MediaWiki-General-or-Unknown, Services, and 5 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1390116 (GWicke) [23:47:37] Analytics-Kanban, Patch-For-Review: Add cache headers to the datasets.wikimedia.org/limn-public-data/metrics folder {lion} [5 pts] - https://phabricator.wikimedia.org/T101125#1390258 (Catrope) >>! In T101125#1370116, @gerritbot wrote: > Change 218534 merged by Ottomata: > Add cache headers for datasets.wik... [23:59:26] Analytics-Kanban, Patch-For-Review: Add cache headers to the datasets.wikimedia.org/limn-public-data/metrics folder {lion} [5 pts] - https://phabricator.wikimedia.org/T101125#1390308 (Catrope) Resolved>Open