[00:58:12] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1714299 (Fhocutt) This looks interesting, and I can help with exploring and working with the MediaWiki API. [00:59:30] Analytics-Tech-community-metrics: Implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#1714305 (Fhocutt) I can be a resource for this. [02:35:42] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [02:37:22] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 3.004 second response time on port 9042 [02:42:32] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [02:49:22] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 3.001 second response time on port 9042 [02:54:31] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [03:04:42] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.004 second response time on port 9042 [03:09:53] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [03:16:32] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.011 second response time on port 9042 [03:38:33] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [03:46:53] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 3.003 second response time on port 9042 [04:30:23] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection refused [04:47:02] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.006 second response time on port 9042 [04:55:41] PROBLEM - Analytics Cassanda CQL query interface on aqs1002 is CRITICAL: Connection timed out [04:57:13] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.998 second response time on port 9042 [09:15:51] !log Restart cassandra on aqs1002 [09:15:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [09:45:40] (CR) Joal: [C: -1] Fix inconsistent mobile uniques reports due to partial job runs (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/244604 (https://phabricator.wikimedia.org/T114406) (owner: Madhuvishy) [11:10:32] (CR) Joal: "Comments inline." (7 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [11:13:25] hi a-team1 [11:13:30] mforns: ! [11:13:37] xD [11:13:44] Are you good enough to really be here ? [11:13:52] yes, a lot better [11:13:54] by good, I mean well, sorry [11:13:56] :) [11:13:58] Cool :) [11:14:13] xD I understood [11:14:18] Thanks for yesterday presentation, i was really great :) [11:14:21] would have written it the same way [11:14:27] oh, cool [11:14:30] IT was ... pfff ... big fingers [11:14:47] big fingers? [11:14:57] I say that when I make typos :) [11:15:05] ok ok [11:15:08] :] [11:15:42] Like not being able to properly type is a real handicap for our kind of job ! [11:15:47] :D [11:16:17] hehehe [11:16:21] I have broken cassandra a bit this night :) [11:16:32] Hammering it with hadoop is kinda tough [11:16:49] oh, but np, pageview api is still a tier...10 system [11:16:59] :) [11:17:12] what's the problem? [11:17:34] I think there was too much pressure from hadooop (load too high) [11:17:39] aha [11:17:54] how many machines does our cassandra have? [11:18:00] I will start investigating using another insert method (more bulk style) [11:18:05] aha [11:18:07] hopefully it could reduce load [11:18:15] cassandra have 3 machine [11:18:32] mmm, it seems enough for aggregated data... [11:18:37] no? [11:19:00] And even if I tell hadoop no to use too many writers, I am still writing with 6 writers x 4 [11:19:08] :) [11:19:16] aha [11:19:16] http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Analytics+Query+Service+eqiad&h=&tab=m&vn=&hide-hf=false&m=bytes_in&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [11:19:25] * mforns looks [11:19:48] There is one thing I need to ask the services team: why is there so many bytes out from cassandra [11:19:55] bytes in , ok I get it, but out ? [11:20:02] Need to investigate [11:20:05] aha [11:20:20] aren't we using our own analytics cassandra cluster? [11:20:34] We are [11:20:45] It's just that the services team knows more about cassandra than I do [11:22:48] maybe when you insert data into cassandra, cassandra returns the inserted data [11:23:02] backfilling-wise: per-project daily/hourly is done, top daily is done, and per-article daily and hourly are still ongoing [11:23:15] aha [11:23:19] The per-article ones are the big ones in term of data size [11:23:26] sure [11:24:04] But still, it shouldn't be that long: ~3G gzipped compress daily to upload to cassandra [11:24:13] Should be faster [11:24:24] * joal gets back to investigate better loading ! [11:24:30] aha [11:24:30] ok [11:54:26] a-team, I'm away for 1 hour or so [11:54:31] later ! [11:54:35] ok, later! [12:57:50] * joal is back ! [13:33:40] (CR) Ottomata: "Probably the stuff in refinery-job can stay there, just the stuff that is now in refinery-core should move to refinery-camus." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [13:35:42] (CR) Ottomata: Add libjars optional arg to Camus python wrapper script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 (owner: Madhuvishy) [13:42:25] (CR) Ottomata: "I would structure the src/main/avro schemas in the same way the rest of the class hierarchy is structured, e.g src/main/avro/org/wikimedia" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [14:06:44] hey ottomata I need to borrow your permissions for a bit [14:06:57] trying to figure out what's wrong with aqs [14:10:53] k [14:10:55] wassup? [14:11:03] i saw those alerts form aqs1002 lastnihg [14:11:04] tnight [14:11:06] night8 [14:11:07] AH [14:11:11] last night* [14:13:48] ottomata: I think it has suffered from overload :( [14:14:05] ottomata: I have restarted cassandra this morning, seems to be bakc in the game [14:15:08] Analytics-EventLogging, Database: db1046 innodb signal 6 abort and restart - https://phabricator.wikimedia.org/T104748#1715187 (Milimetric) Sorry, Jaime, I missed this problem when it happened. The project we all monitor is Analytics-Backlog, but we've been meaning to clean that up, there are too many co... [14:15:48] joal: it's still returning empty responses [14:15:59] so I wanted to change the logging to get local logs so we can see what the heck is going on [14:17:47] ottomata: so I just need to be able to fiddle with /etc/restbase/config.yaml (but puppet generates that... hm...) [14:20:07] milimetric: normally you have aqs-admin rights [14:20:15] milimetric: maybe it's not enough ? [14:20:33] joal: I'm lost, as always [14:20:39] milimetric, ottomata : The problem milimetric is describing is not related to this night issue :) [14:20:41] I can't seem to do "service restbase restart" [14:20:54] nor edit /etc/restbase/config.yaml [14:21:04] sudo ? [14:21:06] milimetric: sudo [14:21:07] ? [14:21:12] :) [14:21:12] no, asks for pw [14:21:14] you can't edit the config, since that is managed by puppet [14:21:15] hm [14:21:45] well, somehow I need to be able to debug this thing, and it's getting a bit ridiculous. workers are dying all the time and I have no idea why [14:22:00] nor any way to see the logs, because logstash doesn't seem to tell me anything [14:22:01] hm, i tlooks like you should be able to sudo service restbase restart [14:22:03] milimetric: sudo service restbase restart works for me on aqs1001 [14:22:10] the last hundred or so errors have the helpful message "HOST" [14:22:13] milimetric: i can temporarily let you edit the config file! [14:22:14] :) [14:22:14] it also worked with cassandra service [14:22:17] k [14:22:17] you want aqs1002? [14:22:21] sure [14:22:24] k [14:22:50] milimetric: you'll temporary be an ops guys !!! How cool :) [14:22:58] and then i need to be able to restart restbase - joal how did you restart cass? [14:23:00] there you go [14:23:01] try now [14:23:18] milimetric: sudo service restbase restart has worked for me on aqs1001 [14:24:57] ok joal sweet, that for some insane reason worked [14:25:04] :D [14:25:17] when I go "service restbase status" it says the command "service" doesn't exist [14:25:19] :P [14:25:29] mwarf :) [14:25:33] ok, if you tail /tmp/debug.log I'm about to try and figure out what's up with these empty messages [14:25:35] on 1002 [14:26:41] milimetric: are you sudoing? [14:26:56] /usr/sbin is not in your user's path [14:27:04] I was able to "sudo service restbase restart", yes [14:27:04] you have to sudo for it to even know where the 'service' command is [14:27:08] but I can't sudo other stuff [14:27:11] right [14:27:17] you can do: [14:27:19] %aqs-admins ALL = NOPASSWD: /usr/sbin/service cassandra * [14:27:19] %aqs-admins ALL = (cassandra) NOPASSWD: ALL [14:27:19] %aqs-admins ALL = NOPASSWD: /usr/sbin/service restbase * [14:27:19] %aqs-admins ALL = (restbase) NOPASSWD: ALL [14:27:19] %aqs-admins ALL = NOPASSWD: /bin/journalctl * [14:27:29] you should be able to sudo -u restbase [14:27:30] milimetric: tailing ! [14:27:31] and do anytihng [14:27:32] or maybe [14:27:35] sudo -u cassandra [14:27:42] so, anything restbase or cassandra user can do, I think you can do. [14:28:12] joal: I think I got it, formatting so I can paste properly [14:28:32] Yeah, I have seen it as well : timeuuid, right ? [14:30:14] https://www.irccloud.com/pastebin/IPaTzUv3/ [14:30:32] milimetric: yes [14:30:36] Rahhhhh [14:30:47] Looking into it now [14:31:06] I insert an empty string as a timeuuid --> cassandra doesn't complain [14:31:49] but the driver that restbase uses to hit cassandra has a problem with it? [14:32:05] seems so :( [14:32:11] It's an unused field for us [14:32:22] created by default by restbase [14:32:32] Since it's part of primary key, can't be set to null [14:34:47] Analytics-Cluster, Analytics-Kanban: Move camus properties out of refinery and into puppet - https://phabricator.wikimedia.org/T115114#1715208 (Ottomata) NEW a:Ottomata [14:35:19] joal: ok, and I'm assuming it's not as simple as update table set uuid='not empty string'? [14:35:28] or uuid=newid() or something? [14:35:54] milimetric: it's a pain to generate in java :( [14:36:06] milimetric: I'll double check on how we can do that [14:36:08] i mean directly in cassandra [14:36:13] Yup got it [14:36:24] (reading up too) [14:36:44] meanwhile i'm going to remove this logging and let it go to logstash again [14:37:56] milimetric: ok thanks [14:39:16] it'd be nice to have the logs on disk too by default, wouldn't it? [14:39:52] yeah, especially since the errors seem to not be going to logstash [14:40:25] i'll try and see if you can have both in that streams section (https://wikitech.wikimedia.org/wiki/RESTBase#Debugging) [14:40:30] Analytics-Kanban, Database: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1715233 (mforns) [14:41:18] joal: so when I do select * from data limit 1; I don't see uuid, is it the _tid column or something? [14:41:31] also milimetric we should have a talk about insertion rate --> not really happy of current [14:41:42] milimetric: it is [14:42:35] milimetric: providing a playground for testing [14:43:03] hm? [14:43:19] "Test_Project" keyspace [14:43:26] milimetric: --^ [14:43:37] no, I'm here :) I'm just not understanding what you mean [14:43:51] ah. ok [14:43:54] say a full thought, my interpolation is bad in the morning :) [14:44:06] and the afternoon most of the time [14:44:42] I kind of know you just enough now to know you do as I do: self depreciation as good humour :-P [14:45:08] Currently feeding a new keyspace with some data to see if we can modify [14:45:35] ah ok [14:45:47] I'm trying to figure out if I can update the _tid on a specific record and get it to work [14:45:56] any idea what default value they expect there? [14:46:02] now() [14:46:06] sweet :) [14:46:12] is the easiest way to create :) [14:49:16] I have killed aqs1003 now :( [14:51:23] joal: killed? [14:51:44] not killed, but need restart (timeout, as for 1002 before [14:51:55] 1002 was timing out a lot more this morning [14:52:00] wasn't yesterday [14:52:35] so it looks like Cassandra doesn't let you call UPDATE on part of the PRIMARY KEY [14:52:37] makes sense [14:52:39] right, before me taking care of it :) [14:52:54] milimetric: makes sense indeed :( [14:53:00] mwarf [14:53:04] looks like there's a fast "COPY" method [14:53:14] Means full reimport with a fake timeuuid right ? [14:53:41] milimetric: can be tried [14:53:43] I'm trying to find an alternative, I'm going to try deleting and re-inserting just a row just to make sure this solves the problem [14:53:50] but also looking into COPY [14:54:02] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection refused [14:54:25] !log Cassandra restarted on aqs1003 [14:54:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [14:55:04] milimetric: you can use the test keyspace if you want --> ready to play [14:55:52] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.005 second response time on port 9042 [14:56:29] lol, I love how "now()" returns a guid [14:56:32] how completely stupid [14:56:44] joal: well, that won't be hooked up through restbase though [14:57:00] ok, so now if you do select * from data where "_domain" = 'analytics.wikimedia.org' and project = 'en.wikipedia' and access = 'all-access' and agent = 'all-agents' and granularity = 'daily' and timestamp = '2015100100'; [14:57:06] you get two records instead of one [14:57:36] which keyspace? [14:58:33] ok found it [14:58:51] sorry, per-project [14:58:58] haha, now i don't know how to delete the old one [14:59:17] 'cause i have to specify a timeuuid and i don't know how [14:59:38] empty string [15:00:21] ottomata, hi! qq: are eventlogging validation errors written to a log file also or just to kafka? [15:00:21] doesn't work milimetric ? [15:00:49] this doesn't work: select * from data where "_domain" = 'analytics.wikimedia.org' and project = 'en.wikipedia' and access = 'all-access' and agent = 'all-agents' and granularity = 'daily' and timestamp = '2015100100' and "_tid" = ''; [15:01:07] Invalid STRING constant () for "_tid" of type timeuuid [15:01:15] mwarf [15:01:51] aha! [15:01:51] select * from data where "_domain" = 'analytics.wikimedia.org' and project = 'en.wikipedia' and access = 'all-access' and agent = 'all-agents' and granularity = 'daily' and timestamp = '2015100100' and "_tid" > dfcec280-6e95-11e5-89ab-55ce467c43aa; [15:01:57] <> doesn't work, but > does [15:02:06] wow, well done :) [15:02:59] also, delete doesn't work with that where clause :/ [15:03:22] milimetric: MANNN ! [15:03:27] What have Ivdone :( [15:03:49] it's ok, we'll figure this out. Also, cassandra is supposed to be simple, wtf [15:04:01] milimetric: timeuuid is not simple . [15:04:08] We hsould have avoided that [15:04:20] can we help it? It's part of restbase, right? [15:04:43] It's part of restbase cassandra module I think [15:05:22] that _tid column should just have a default [15:05:31] right [15:05:32] if you don't specify a value on insert, does it fill it in? [15:06:08] nope, says _tid is missing [15:06:46] zactly [15:07:42] tried that before .... [15:08:57] figured it out, deleted the two others and just inserted a new record [15:09:00] duh [15:09:01] :) [15:09:05] ok, so now to query! [15:09:28] How have you managed that ? [15:09:46] worked milimetric !!! [15:09:46] oh snap!!! [15:09:55] :) k, so that's the only problem [15:09:56] phew [15:10:00] right [15:10:12] I was dug in, expecting to find like 30 other nested ones [15:10:22] :D [15:10:33] milimetric: how have you manged to delete the two rows ? [15:10:35] sweet, so.... hm.... now lemme see, it says you can COPY the data out to a CSV file [15:10:42] oh, just don't specify the _tid [15:10:44] milimetric: correct [15:10:53] just delete from ... where [15:11:18] ok, that worked because _tid is not part of the partition key [15:11:20] makes sense [15:11:56] milimetric: retrying again to load data without setting _tid [15:12:14] joal: well, won't that take a lot longer than figuring out how to set the _tid to now() everywhere? [15:12:41] milimetric: both are needed actually [15:12:56] ...? [15:13:10] Well, I don't want to coninue pushing wrong data ! [15:13:16] oh! [15:13:32] you said "retrying" I thought you meant you were deleteing everything and starting over [15:14:04] why don't we hang out in the batcave :) [15:14:35] OMW [15:17:23] (Abandoned) Ottomata: [WIP] Add properties file for importing mediawiki data [analytics/refinery] - https://gerrit.wikimedia.org/r/244594 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [15:18:54] (PS1) Ottomata: Removing camus/ properties files. This has been moved to puppet [analytics/refinery] - https://gerrit.wikimedia.org/r/244694 (https://phabricator.wikimedia.org/T115114) [15:19:37] !log moved camus property files out of refinery repository and into puppet. Camus properties now live on an27 at /etc/camus.d, and camus log files are in /var/log/camus [15:19:39] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [15:19:40] joal: ^ :) [15:19:50] awesome ottomata [15:19:52] Thanks [15:20:07] (CR) Ottomata: [C: 2 V: 2] Removing camus/ properties files. This has been moved to puppet [analytics/refinery] - https://gerrit.wikimedia.org/r/244694 (https://phabricator.wikimedia.org/T115114) (owner: Ottomata) [15:20:32] adding new camus jobs is much nicer now [15:20:43] https://gerrit.wikimedia.org/r/#/c/244601/2/manifests/role/analytics/refinery.pp [15:26:32] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Move camus properties out of refinery and into puppet [5 pts] - https://phabricator.wikimedia.org/T115114#1715380 (Ottomata) [15:26:57] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban: logrotate camus logs on analytics1027 [3 pts] - https://phabricator.wikimedia.org/T110598#1715384 (Ottomata) [15:27:01] Analytics-Backlog, The-Wikipedia-Library: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1715388 (Halfak) NEW [15:28:11] Analytics-Backlog, The-Wikipedia-Library: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1715396 (Halfak) I **boldly** added this to the #Analytics-Backlog in hope that they might be able to pick up requests like this since the #The-Wikipedia-Library doesn't have the engin... [15:32:28] Analytics-Backlog: Add the schema name to the EL EventError topic - https://phabricator.wikimedia.org/T115121#1715415 (mforns) NEW [15:34:28] milimetric: http://stackoverflow.com/questions/23191933/cassandra-inserting-timeuuid-error [15:34:37] :( [15:37:08] Analytics-Kanban, Wikimedia-Logstash, Patch-For-Review: Make Logstash consume from Kafka:eventlogging_EventError {oryx} [8 pts] - https://phabricator.wikimedia.org/T113627#1715442 (mforns) [15:37:12] Analytics-Kanban, Wikimedia-Logstash, Patch-For-Review: Make Logstash consume from Kafka:eventlogging_EventError {Oryx} [8 pts] - https://phabricator.wikimedia.org/T113627#1715443 (ggellerman) [15:37:43] joal: and we're not allowed the datastax driver? [15:38:12] milimetric: we are, just realised that I have a function to generate an uuid [15:38:34] you read the question and got sad, but didn't read the answer :) [15:38:55] I did, but thought that it was only a python thing :) [15:39:02] But found the thing in java ! [15:39:05] milimetric: --^ [15:39:07] :) [15:39:51] that question you linked was java! :) [15:40:00] http://stackoverflow.com/a/23198388/180664 [15:41:26] Analytics-Cluster, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Replace uses of monitoring::ganglia with monitoring::graphite_* [5 pts] - https://phabricator.wikimedia.org/T90642#1715469 (Ottomata) [15:44:49] milimetric: I feel silly now :( [15:45:08] I'll ensure stuff get's back in track before weekend :) [15:45:25] psh, whatsamatter, you can't have a meeting debug code, run tests, read SO answers, and talk to me at the same time? [15:45:31] weaaak! [15:45:31] :P [15:45:42] :D [15:48:48] milimetric: select * from "Test_Project"."data"; [16:06:06] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} [2 pts] - https://phabricator.wikimedia.org/T114881#1715497 (kevinator) Open>Resolved [16:06:28] Analytics-Backlog, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1715499 (madhuvishy) [16:06:39] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1715503 (kevinator) Open>Resolved [16:08:13] Analytics-Kanban, RESTBase-API: create RESTBase endpoints [34 pts] {slug} - https://phabricator.wikimedia.org/T107053#1715506 (kevinator) Open>Resolved [16:08:47] Analytics-Kanban: Deploy the Analytics RESTBase {slug} [13 pts] - https://phabricator.wikimedia.org/T113991#1715508 (kevinator) Open>Resolved [16:08:52] (PS2) Madhuvishy: Fix inconsistent mobile uniques reports due to partial job runs [analytics/refinery] - https://gerrit.wikimedia.org/r/244604 (https://phabricator.wikimedia.org/T114406) [16:09:23] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1715513 (kevinator) [16:09:24] Analytics-Kanban: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1715512 (kevinator) Open>Resolved [16:10:35] Analytics-Backlog, Analytics-Cluster, Analytics-Kanban, Patch-For-Review: logrotate camus logs on analytics1027 [3 pts] - https://phabricator.wikimedia.org/T110598#1715519 (kevinator) Open>Resolved [16:10:35] (CR) Madhuvishy: Add libjars optional arg to Camus python wrapper script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 (owner: Madhuvishy) [16:11:53] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Move camus properties out of refinery and into puppet [5 pts] - https://phabricator.wikimedia.org/T115114#1715521 (kevinator) Open>Resolved [16:12:30] Analytics-Cluster, Analytics-Kanban, operations, Patch-For-Review: Fix active namenode monitoring so that ANY active namenode is an OK state. [8 pts] - https://phabricator.wikimedia.org/T89463#1715523 (kevinator) Open>Resolved [16:16:28] Analytics-Kanban: Gain permission to delete articles on wikitech and mediawiki (needed for doc cleanup) [3 pts] - https://phabricator.wikimedia.org/T114672#1715532 (kevinator) Open>Resolved [16:18:52] Analytics-Kanban, Wikimedia-Logstash, Patch-For-Review: Make Logstash consume from Kafka:eventlogging_EventError {Oryx} [8 pts] - https://phabricator.wikimedia.org/T113627#1715541 (kevinator) Open>Resolved [16:19:50] (PS2) Madhuvishy: Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 [16:20:18] Analytics-Kanban: Update camus-wmf to be deployed by maven (missing jars otherwise) {hawk} [8 pts] - https://phabricator.wikimedia.org/T114657#1715545 (kevinator) Open>Resolved [16:21:53] Analytics-Cluster, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Replace uses of monitoring::ganglia with monitoring::graphite_* [5 pts] - https://phabricator.wikimedia.org/T90642#1715547 (kevinator) Open>Resolved [16:22:24] madhuvishy: how about [16:22:31] i just realized :D [16:22:33] changing [16:22:37] oh ok, heh [16:22:42] not sure to what... [16:22:44] was gonn ajust say [16:23:07] {3} ... .format( ... "-libjars " + libjars if libjars else '' ) [16:23:09] or seomthing like that [16:23:41] aah, i thought of doing something like what you did in puppet [16:25:41] Analytics-EventLogging, Analytics-Kanban: {stag} EventLogging on Kafka - https://phabricator.wikimedia.org/T102225#1715551 (kevinator) Open>Resolved a:kevinator This project & 2015-16 Fiscal Q1 goal is DONE as of Sept 30 2015! Goals page updated: https://www.mediawiki.org/wiki/Wikimedia_Engineeri... [16:26:52] (PS3) Madhuvishy: Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 [16:27:07] aye madhuvishy that is kinda like that [16:27:21] madhuvishy: that is fine [16:27:47] i would probably do it the way I mentioned in python, i can't do it like that in puppet because i can't do conditionals or variable reassignment so easily in puppet [16:27:51] but this way is totally fine [16:28:09] hmmm, not sure about those double quotes [16:28:15] "{3}" [16:28:25] not needed may be? [16:28:29] i think that would make the two opts be passed as a single arg to java [16:28:39] since {3} is now [16:28:43] milimetric: I have values for _tid [16:28:45] -libjars aaa,bbb,ccc [16:28:57] milimetric: BUT, they are the same (and supposed to be different) [16:29:21] milimetric: I assume we move forward with that anyway (field not used and all) [16:30:13] you're using UUIDs.timeBased() ? [16:30:20] yes I do [16:30:20] and that just keeps generating the same value? [16:30:22] :( [16:30:26] wth java [16:30:28] :) [16:30:37] * joal nod [16:30:44] select * from "local_group_default_T_pageviews_per_project".data where "_domain" = 'analytics.wikimedia.org' and project = 'en.wikipedia' and access = 'all-access' and agent = 'all-agents' and granularity = 'daily' limit 10; [16:30:59] has difference values [16:31:11] oh cool! [16:31:35] But select * from "Test_Project"."data"; as same values :( [16:31:46] wait so where do the different values come from? [16:31:50] So basically: We are not usre [16:32:07] milimetric: I get your go ? [16:32:21] sure... :) [16:32:27] ok cool :) [16:32:29] we'll fix it later, no problem [16:32:45] i mean, it'll still be a PK, because we're assuming the rest of the PK is unique [16:32:47] milimetric: will be hard to fix don't you think ? [16:32:48] so I don't see it causing a problem [16:32:58] Yeah, same for me [16:33:04] joal: also, it's not entirely meaningless, right, because each load job will have a different value [16:33:04] ok, I go for that :) [16:33:13] so if we have one bad load, we can justly quickly select all the records from it :) [16:33:39] good point [16:33:42] Ok, let's go [16:33:45] sweet! [16:33:48] dooooo itttt [16:33:59] * milimetric going to find some lunch, rob a bank, bbl [16:34:25] ja madhuvishy i think you should remove those quotes around {3} [16:34:29] aside from that +1! [16:34:30] :) [16:35:20] madhuvishy: Are you ok with the comments I made on your CRs ? [16:35:45] joal: which one? [16:35:58] sorry i pushed too many patches yesterday [16:36:02] :) [16:36:31] (PS4) Madhuvishy: Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 [16:37:03] I reviewed two, the camus module one and another, I can't remember :) [16:37:29] joal: the other one was the hive query parenthesis, I fixed that [16:37:34] camus module I haven't seen [16:37:36] ah, yes [16:39:36] joal: looking now [16:49:04] joal: just looking at this - https://gist.github.com/jobar/9c471d68cc9e04be3b9b [16:49:15] are these tests for the JSON one or binary one? [16:49:20] json [16:50:36] i might not have mentioned it, but thanks for all the work on camus! i didn't imagine it would be so involved [16:53:49] Analytics-Kanban, Wikimedia-Logstash, Patch-For-Review: Make Logstash consume from Kafka:eventlogging_EventError {Oryx} [8 pts] - https://phabricator.wikimedia.org/T113627#1715611 (bd808) Dashboard at https://logstash.wikimedia.org/#/dashboard/elasticsearch/eventlogging-errors [16:55:24] bd808, BTW, I will send an email to the analytics list to announce EL error logs in Logstash :] But first I'd like to talk to Chris Steipp to ensure that there are no privacy issues. [16:56:46] ebernhardson: :) yeah we din't think so too [16:57:06] joal: hmmm the name said binary hence got confused [16:57:09] mforns: access is nda-only so the leaks aren't to the whole world, but I'm sure there is data in there that should be considered sensitive [16:57:51] madhuvishy: reaaly ? Oops, my bad ! [16:58:30] naah it's okay, i'll add them and see what test passes when it should fail. we are using the Binary one for the actual camus job [16:58:48] bd808, ok [16:59:34] madhuvishy: yes, but testing would involve using the generated source for binary, right ? [17:00:25] Analytics-Kanban, RESTBase, Services: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1715623 (mobrovac) [17:00:42] joal: hmmm, not sure i understand that [17:02:44] Analytics-Kanban, RESTBase, Services: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1715627 (mobrovac) >>! In T114830#1712973, @GWicke wrote: > @mobrovac, basically all entry points in the API return some kind of data. I think it make... [17:02:52] madhuvishy: In order to test binary avro decoder, you'd need to encode the payload in Avro, right ? [17:03:00] joal: yes [17:03:08] madhuvishy: In order to encore, you'd need the generated java code, no ? [17:03:20] well no not necessary i think [17:03:27] Oh ? [17:03:36] https://avro.apache.org/docs/1.7.7/gettingstartedjava.html#Serializing-N101DE [17:03:56] you can use the test avsc file directly [17:04:21] Great ! [17:04:26] Didn't know that :) [17:04:33] Then a unittest is feasibl;e ! [17:04:39] :) [17:05:59] joal: :) we thought since we were only making minor changes to the existing upstream classes and these were not really new, no need to add tests. but now that they are already written, i'm all for it :) [17:06:57] awesome madhu :) Sorry for pushing, I like tests :D [17:07:24] madhuvishy: Real life is not testable, so I make the most of it in virtual one ;) [17:09:23] (PS10) Madhuvishy: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [17:14:15] Is there are reason that the Echo tables (like echo_events) are on x1-analytics-slave, but not on analytics-store? [17:15:48] mforns: ^ do you know? [17:16:14] madhuvishy, just a sec, one on one w/ Kevin [17:16:23] ya no worries [17:29:16] (PS11) Madhuvishy: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [17:29:23] joal: ^ added your tests [17:29:32] will add for binary too [17:29:39] Thanks madhu :) [17:32:17] bd808, milimetric, Chris Steipp gave his consent in having EL validation errors in Logstash for 30 days. [17:32:52] bd808, milimetric, so is it OK for you if I send the announce to the analytics list? [17:34:57] (PS12) Madhuvishy: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [17:35:13] mforns: no objections from me [17:38:05] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Add the schema name to the EL EventError topic [5 pts] - https://phabricator.wikimedia.org/T115121#1715730 (Ottomata) a:Ottomata [17:38:16] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: Add the schema name to the EL EventError topic [5 pts] - https://phabricator.wikimedia.org/T115121#1715415 (Ottomata) I'll do the first 2! :) [17:41:51] hi milimetric. do you know what happened just yesterday that there are two emails to analytics about pageviews? :-) [17:42:20] mforns: cool with me, thx for checking [17:42:28] leila: no idea, i haven't caught up on email today [17:42:49] mforns: gonna add these fields to EventError [17:42:50] https://gist.github.com/ottomata/f9a86b16033bdfe0cfac [17:42:52] thoughts? [17:43:21] (PS5) Ottomata: Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 (owner: Madhuvishy) [17:43:29] (CR) Ottomata: [C: 2 V: 2] Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 (owner: Madhuvishy) [17:43:36] (PS7) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [17:44:10] (PS8) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [17:45:44] joal: actually, dont think we can test binary - atleast it's confusing [17:45:49] ottomata, for some reason I cannot load your gist [17:46:14] joal: if i write the converted file to disk, when i deserialize it it'll already be decoded [17:47:14] https://gist.github.com/ottomata/f9a86b16033bdfe0cfac [17:47:15] really? [17:47:18] mforns: ? [17:47:35] made a new one: https://gist.github.com/ottomata/345538257014bf788824 [17:47:52] ottomata: may be just pastebin [17:48:04] joal: so i'm gonna leave it [17:48:19] mforns: http://pastebin.com/8tGuWrpm [17:48:36] ottomata, I can now :] [17:48:43] don't know what happened [17:48:51] yes, seems fine! [17:48:54] hello madhuvishy [17:48:58] nuria: hi [17:49:02] and team [17:49:07] working today? [17:49:19] ottomata, where in the EventError are you going to put it? [17:49:27] nah, making up some hours i own and that's it [17:49:33] nuria: cool :) [17:49:45] was ottomata ok with the puppet code and how we set up libjars [17:50:18] he's been moving things around, so it'll be different now. he moved all the camus properties to puppet erb [17:50:37] he merged by python change for bin/camus to accept libjars [17:51:14] mforns: in the event [17:51:15] so [17:51:20] event.schema [17:51:23] sorry [17:51:24] uhh [17:51:33] error_event is the error event object [17:51:33] then [17:51:41] error_event.event.schema [17:51:45] next to rawEvent [17:51:53] and message and code [17:54:57] (PS9) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [18:00:00] ottomata, aha cool [18:02:35] ottomata: your puppet patch looks good to me [18:03:03] oh it is already merged [18:03:07] okay what now [18:04:27] madhuvishy: hm, welllLllL we need the java stuff merged, i'll leave that to you and joal and nuria. then we need to make a refinery release and deploy it [18:04:32] then we can merge the puppet patch [18:04:46] ottomata: cool. nuria joal I think the patch is ready to go [18:05:01] * joal reads [18:05:24] this one first joal, https://gerrit.wikimedia.org/r/#/c/243990/ [18:05:29] you can merge it if you think its good to go [18:07:11] (PS1) Christopher Johnson (WMDE): adds content pages to monthly page chart adds rdf query list removes social stats from recent [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/244721 [18:08:33] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds local data file retrieve function adds remote sparql query and xml parse to data frame adds local write to tsv with date stamp function [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/243826 (owner: Christopher Johnson (WMDE)) [18:09:47] (CR) Joal: [C: 2 V: 2] "Good to go !!! Awesome work Madhu :)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [18:09:56] ottomata, madhuvishy --^ [18:10:03] joal: :D thanks [18:10:07] Please feel free to move forward :) [18:10:24] joal: need to deploy this change now i think [18:10:34] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds content pages to monthly page chart adds rdf query list removes social stats from recent [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/244721 (owner: Christopher Johnson (WMDE)) [18:11:53] madhuvishy: i think we'll need the other thing too, right? the avro camus patch? [18:12:07] OH, no i'm sorry that is with that [18:12:19] ottomata: yes [18:12:19] cool, ok [18:12:39] ok, yeah, so we need to do a refinery release. joal you've got some code coming into refinery too. [18:12:41] should we wait for that? [18:13:08] ottomata: currently moving the sryuff into camus module in source [18:13:16] ottomata: we also should create the kafka topic [18:13:19] ok, but there is alos the Cassandra stuff. [18:13:25] then deploy jar, then refinery ? [18:13:31] madhuvishy: yes, i'm putting that off til the last moment :) [18:13:41] hey a-team, I'm still feeling a bit under the weather, will sign off for today [18:13:47] cause it is easy, and because once I create it, it will be there forever [18:13:48] have a nice weekend! [18:13:50] yup, hopefully cassandra stuff ready to go in minutes [18:13:51] ok, feel better mforns, laaters [18:13:53] ok cool. [18:13:56] bye [18:14:01] madhuvishy: then we will wait til we get joal's patches in before we do a release. [18:14:07] bye mforns ! [18:14:14] oh madhuvishy, could you add a patch that updates the changelog? [18:14:21] with a line or two about refinery-camus? [18:14:21] ottomata: hmmm, was thinking can test with the actual topic, can't test with the test topic anymore because it expects topic to be prefix_schema [18:14:27] ottomata: yeah [18:14:39] ah, madhuvishy, if you like, I can create the topic now. [18:14:44] will that help you test it? [18:15:02] yeah, we can always drop the imported test messages right? [18:15:31] actually i can import it to a tmp location too [18:21:39] madhuvishy: we can test if we override the schema registry [18:22:11] madhuvishy: and pass that code in an outside jar onlibjars [18:25:32] did anybody get a chance to think about this? "Is there are reason that the Echo tables (like echo_events) are on x1-analytics-slave, but not on analytics-store?" [18:26:02] I'm trying to join `echo_events` with `users`, but they don't seem to be on the same machine [18:27:29] (PS1) Madhuvishy: Update changelog for adding refinery-camus [analytics/refinery/source] - https://gerrit.wikimedia.org/r/244725 [18:28:33] ottomata: updated changelog [18:34:43] hi HaeB [18:35:04] I realize I just sent my email but it's basically saying: let's talk in real time, I think I'm too slow on these threads to be useful [18:35:13] so what data exactly is the Reading team looking for? [18:37:03] (PS2) Ottomata: Update changelog for adding refinery-camus [analytics/refinery/source] - https://gerrit.wikimedia.org/r/244725 (owner: Madhuvishy) [18:37:13] (CR) Ottomata: [C: 2 V: 2] Update changelog for adding refinery-camus [analytics/refinery/source] - https://gerrit.wikimedia.org/r/244725 (owner: Madhuvishy) [18:39:11] Analytics-Backlog, The-Wikipedia-Library: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1715956 (Sadads) Significantly, this data could be used to influence and shape a number of different parts of future community led projects for analyzing the impact of external material. [18:40:42] Analytics-Backlog, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1715961 (Legoktm) [18:41:45] joal: okay so let me know when you've merged your code too. Given that it's friday we probably won't deploy? [18:42:02] madhuvishy: good point :) [18:42:10] We can release, but probably not deploy [18:42:23] yeah, we can do that [18:43:10] milimetric: https://phabricator.wikimedia.org/T114379 is not about data for the reading team, but for Finance [18:44:33] as i wrote, an ETA for that Wikistats update would be great so we can know whether Reading will need set time aside to get Finance these numbers manually [18:44:39] it's not entirely you, HaeB, but I'm very confused now and I feel like I'm miscommunicating with four different teams in five different ways :) [18:44:44] the quarterly report is an entirely separate matter [18:45:02] should have clarified that between kevin's and my email [18:45:09] HaeB: I can't speak to the Wikistats update, but the new dataset that you asked for will probably start being generated as of next Thursday [18:45:11] similar impressions here ;) [18:45:16] :) [18:45:23] which new dataset that i asked for? [18:45:43] sorry - the new dataset that Wikistats asked for, I guess [18:45:53] that is - [18:46:07] Webstatscollector-formatted new pageview definition statistics by article and project [18:46:33] by article? [18:46:57] that's what wikistats needs, as far as I can tell from Erik's diagrams, and nobody's contradicted that assumption yet [18:48:02] anyway, that doesn't seem to help you, HaeB, you seem to be asking for more data than "starting next Thursday", so I'd really love to get a clear understanding of the asks from your point of view [18:50:35] i haven't asked for any data, milimetric [18:51:22] i have asked for an ETA on this phabricator task, to know whether the data that Finance needs will be available by their deadline ... [18:51:41] ...or whether i / we in reading should set aside to get them that data manually [18:52:07] the idea is that Finance will be able to use Wikistats for self-service after the update [18:52:31] HaeB: ok, I understand the nuance. But I'm challenging that idea: I do not think Finance will be able to use Wikistats for self-service based on the very limited information that I know [18:52:49] so what I'd like to learn is what exactly is Finance looking for [18:52:50] finance, eh? [18:52:51] (and they certainly don't need any article-level views... their current draft seems to need only per-project per-month numbers) [18:53:11] HaeB: ok, can you expand on that current draft? [18:54:07] i'm not handling that myself, so anything i can give you is second or third hand information. but i will share the link to their previous edition with you [18:54:43] in that case I have sixth hand information, so third hand sounds very exciting to me at this point :) [18:57:06] (PS5) Joal: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) [18:57:16] ottomata, madhuvishy --^ [18:57:35] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey [not all] schema - https://phabricator.wikimedia.org/T114164#1715993 (Jdlrobson) Wouldn't just adding event_pageId suffice? Why do we need bo... [18:57:52] joal: refinery-camus module has already been created in madhu's patch, rigth? [18:58:00] ottomata: yes [18:58:06] commit message slightly wrong? [18:58:44] Analytics-Backlog, Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1715995 (Halfak) I do this often. I've been using a python script. See https://github.com/halfak/multiquery It would be nice to do this in quarry. [19:01:47] "Flag file to be used (defaults to '_PARTITION'." [19:01:55] joal, '_PARTITION' ? [19:03:12] looking ottomata [19:03:59] actually, defaulting to _PARTITIONED [19:04:03] Will change [19:04:06] hm [19:04:08] no, i think [19:04:11] that is not good either [19:04:13] was leaving comment [19:04:16] Arf [19:04:20] _PARITIONED is what the Hive job writes [19:04:26] when the partition has been added to hive [19:04:30] i think you want somehting like [19:04:35] _IMPORTED [19:04:43] or something, no? [19:04:48] ottomata: We won't rely on check statistics anymore ? [19:04:52] no [19:04:59] hm [19:05:04] ok, no breaking change [19:05:10] joal: _PARTITIONED means that other jobs can now use hive to query the data [19:05:13] I'll make it _IMPORTED [19:05:17] k [19:05:36] joal: maybe [19:05:40] _FULLY_IMPORTED [19:05:43] to make it really clear [19:05:43] :) [19:05:50] As you prefer :) [19:05:58] _IMPORTED_FULLY [19:05:58] heheh [19:06:00] whatever :) [19:06:12] keep it simple : imported :) [19:06:14] (CR) Nuria: "Nice, much better now." [analytics/refinery] - https://gerrit.wikimedia.org/r/244694 (https://phabricator.wikimedia.org/T115114) (owner: Ottomata) [19:07:01] k [19:07:41] (CR) Ottomata: "This is going to make our lives so much easier!" (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [19:09:02] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Add the schema name to the EL EventError topic [8 pts] - https://phabricator.wikimedia.org/T115121#1716006 (Ottomata) [19:10:29] (PS6) Joal: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) [19:10:33] ottomata: hopefuly the good one :) [19:11:39] (CR) Joal: "Should be better :)" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [19:12:19] Guys, going to diner, back after [19:12:34] ottomata: I think you can release refiner/source if you want :) [19:13:16] ok! [19:15:29] (PS7) Ottomata: Add camus helper functions and job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [19:15:46] (CR) Ottomata: [C: 2 V: 2] Add camus helper functions and job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [19:20:25] HaeB: question (nothing to do with data) if you have a sec [19:26:12] !log releasing refinery 0.20 [19:26:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [19:47:22] sorry ottomata joal stepped away for lunch [19:49:57] s'ok, release just finished, going to deploy... [19:53:38] awesome [19:56:09] (PS1) Ottomata: Update refinery artifacts to 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244742 [19:56:28] (PS2) Ottomata: Update refinery artifacts to 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244742 [19:56:36] (CR) Ottomata: [C: 2 V: 2] Update refinery artifacts to 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244742 (owner: Ottomata) [20:00:02] madhuvishy: joal, i have deployed refinery 0.20 core, hive and tools. [20:00:23] i have not run refinery-deploy-to-hdfs, as these jars are a lot smaller than they used to be! (no more hadoop deps, I assume), and it is friday [20:00:46] even though jobs wont' use these unless we resubmit them with a new $refinery_path, i'm just not going to risk it :) [20:01:05] hmmm, madhuvishy you need refinery-camus too, eh?! [20:01:06] doh! [20:01:08] right? [20:01:35] ottomata: yeah [20:02:07] ottomata: that was the jar I was submitting through libjars to camus [20:02:35] right. [20:02:39] (PS1) Ottomata: Add refinery-camus artifact at version 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244749 [20:02:45] (PS2) Ottomata: Add refinery-camus artifact at version 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244749 [20:03:00] (CR) Ottomata: [C: 2 V: 2] Add refinery-camus artifact at version 0.20 [analytics/refinery] - https://gerrit.wikimedia.org/r/244749 (owner: Ottomata) [20:05:22] ok, madhuvishy there we go. [20:08:59] /srv/deployment/analytics/refinery/artifacts/refinery-camus.jar [20:09:01] on stat1002 [20:11:40] HMMMm [20:11:41] but hm [20:11:49] not a jar file... [20:11:50] m [20:11:59] git fat didn't do its thing... [20:16:52] milimetric: CommTech is in the process of planning our next sprint, and we still have page stats on our radar. What do you think is the chance that the pageviews API will have a public endpoint within the next 2 weeks? [20:17:33] Just trying to figure out if we should wait another sprint before seriously looking at building anything with it. [20:19:19] kaldari: the chances of that are very good. The public endpoint is just a matter of the services guys making a config change and deploying, which they'll hopefully do sometime next week. [20:19:30] hm, strange, madhuvishy its cool now [20:19:37] had to deploy a second time, now it is fine. [20:19:47] sooo, ja refinery-camus.jar available. [20:19:47] meanwhile we're filling in data and the back-end is fully functional now: curl http://10.2.2.12:7231/analytics.wikimedia.org/v1/pageviews/per-project/all-projects/all-access/all-agents/daily/2015100100/2015100100 [20:20:00] good to know! [20:20:07] (if you're inside the cluster so the 10.2.2.12 ip resolves, you can test that and the other endpoints [20:20:09] ) [20:21:03] ottomata: btw, aqs.svc.eqiad.wmnet is supposed to point to 10.2.2.12 according to this comment: https://gerrit.wikimedia.org/r/#/c/231574/14/hieradata/role/common/aqs.yaml (last line) [20:22:07] but ottomata maybe this needs to be merged: https://gerrit.wikimedia.org/r/#/c/242134/ ? [20:22:52] hm! yeha probably [20:23:19] should I poke someone to do that? [20:23:32] i can do it, looks harmless [20:23:39] k, cool [20:24:24] done, its a new addy, so it is avail now. [20:24:48] thx, yay [20:27:45] ottomata: cool [20:27:51] the puppet stuff [20:28:46] ottomata: need to put this refinery path somewhere [20:28:50] may be hiera? [20:31:04] milimetric: testing a few restbase urls for fun : works really good :) [20:31:14] refinery path is already a var! :) [20:31:24] hiera would be a good place for it, but we don't have to do that now. hm [20:31:27] ummmmm [20:31:38] madhuvishy: , can I run a test job before we merge the puppet stuff? [20:31:43] yeah [20:31:47] if I make a topic, can you put binary data in it? [20:31:54] yup i can [20:32:05] joal: Marko already created a separate prod config section for wikimedia.org [20:32:13] madhuvishy: , also i forget, how many messages / sec are they expecting? [20:32:22] should be easy to hook into that for the global page view data [20:32:33] gwicke: fantastic :) [20:32:53] https://wikimedia.org/api/rest_v1/?doc [20:33:05] For now I'll monitor cassandra taking time to load, but will hook up onto that next week for sure :) [20:33:21] ottomata: hmmm i'm not sure. the ticket says The stream here is around 200 million messages per day [20:33:34] k [20:33:50] right, around 2 or 3 K per sec [20:33:56] that sounds like what I am remembering, thanks [20:34:05] we'll give them 12 partitions like other higher volume topics [20:34:11] so, madhuvishy, topic is [20:34:19] mediawiki_CirrusSearchResultSet [20:34:20] right? [20:34:34] ottomata: no [20:34:43] ohj? [20:34:45] CirrusSearchRequestSet [20:34:46] what ees? [20:34:53] hmm, noooo [20:34:59] cool! wikimedia.org/api [20:35:01] no? [20:35:15] ottomata: https://phabricator.wikimedia.org/T113521 [20:35:37] your ticket says that, so does https://gist.github.com/ebernhardson/efdfcfee8b8cc62da03f [20:35:41] mediawiki_CirrusSearchRequestSet [20:35:44] gwicke: a question for you [20:35:46] oh [20:35:47] haha [20:35:51] :) [20:35:52] yes, but prefixed with mediawiki_ [20:35:52] thanks [20:35:57] yes yes [20:36:04] right, ok. [20:36:04] gwicke: looking at http://ganglia.wikimedia.org/latest/?r=2hr&cs=&ce=&c=Analytics+Query+Service+eqiad&h=&tab=m&vn=&hide-hf=false&m=bytes_in&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [20:36:26] ok, madhuvishy creating topic. [20:36:35] gwicke: actually it's that one: http://ganglia.wikimedia.org/latest/?r=2hr&cs=&ce=&c=Analytics+Query+Service+eqiad&h=&tab=m&vn=&hide-hf=false&m=bytes_out&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [20:36:42] There are a lot of bytes out ! [20:37:05] joal: that would be cassandra replication traffic [20:37:08] joal: is it cassandra... [20:37:13] hah, was gonna guess same thannng [20:37:21] gwicke: I thought that was it, but prefer to be sure :) [20:37:23] which replication factor did you choose? [20:37:24] :) [20:37:39] gwicke: We did not change the one created by restabase [20:37:39] madhuvishy: ok, topic created, now what? [20:37:45] restbase created the tables [20:37:50] joal: I see, it'll be three-way then [20:37:50] gwicke: --^ [20:37:51] ottomata: producing a test binary message [20:38:00] right gwicke [20:38:02] understood [20:38:10] of CirrusSearchRequestSet, right? [20:38:47] gwicke: I would have expected to handle data faster than it does :) [20:39:08] http://docs.datastax.com/en/cassandra/2.2/cassandra/dml/dmlClientRequestsWrite.html [20:39:14] prepared CQL stattement is used, not table copy, but still [20:39:43] thanks gwicke [20:39:55] ottomata: yeah [20:39:58] joal: are you writing straight to Cassandra, or are you hitting RESTBase? [20:40:10] straight to cassandra [20:40:11] ottomata: just did [20:40:19] just did?! [20:40:20] :D [20:40:29] one message [20:40:32] kk [20:40:37] cool! [20:40:41] ok [20:40:45] joal: one parameter worth playing with is compression [20:41:05] hmmmm gwicke, didn't know abouit that one [20:41:27] restbase supports passing in compression hints [20:42:08] example: https://github.com/wikimedia/restbase/blob/ff3d47dd550013ae5a0605037e9f6c4e4e0464c8/mods/key_rev_value.js#L37 [20:42:28] http://docs.datastax.com/en/cql/3.3/cql/cql_reference/compressSubprop.html [20:43:46] since you are using the default, the compression is likely lz4 [20:43:50] ottomata: are you testing the job? [20:43:54] yes [20:44:02] cool, fingers crosse [20:44:06] you can check in cqlsh with `describe table ` [20:44:07] hm Exception in thread "main" org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -libjars [20:44:15] whaa [20:44:34] madhuvishy: this is what was run [20:44:36] /usr/bin/hadoop jar /srv/deployment/analytics/refinery/artifacts/camus-wmf.jar com.linkedin.camus.etl.kafka.CamusJob -P /tmp/mediawiki.camus.properties -Dcamus.job.name="otto-camus-binary-avro-test" -libjars artifacts/refinery-camus.jar [20:44:49] gwicke: Do you think it could be better with a different algo ? [20:45:07] ottomata: hmmm [20:45:18] this is what we run [20:45:21] /usr/bin/hadoop jar /home/madhuvishy/avro-kafka/camus-wmf-0.1.0-wmf6.jar com.linkedin.camus.etl.kafka.CamusJob -libjars ${LIBJARS} -Dcamus.job.name="madhuvishy_avro_test" -P /home/madhuvishy/avro-kafka/camus.avro.json.properties [20:45:34] i dint think the order mattered [20:46:01] joal: snappy would be faster than lz4 [20:46:04] http://quixdb.github.io/squash-benchmark/ [20:46:05] madhuvishy: guess so! [20:46:17] i think some args are passed to the main class, other to hadoop jar [20:46:21] gwicke: right [20:46:23] if i move the libjars to that part of the comamnd it runs [20:46:29] ottomata: aah [20:46:38] submit patch and I deploy again [20:46:39] need to fix the python patch then [20:46:39] joal: actually, the docs say that snappy is the default [20:46:41] ja [20:46:52] gwicke: Do we need to change settings in restbase before tweaking cassandra ? I get so :) [20:47:09] gwicke: maybe no change then :) [20:47:14] if you are already using snappy, then you should already be set [20:47:19] madhuvishy: but, it works! [20:47:20] awesome! [20:47:20] hdfs dfs -text /user/otto/data/raw/mediawiki/mediawiki_CirrusSearchRequestSet/hourly/2015/10/09/20/mediawiki_CirrusSearchRequestSet.14.3.1.1.1444420800000.avro [20:47:51] ottomata: :D [20:47:53] madhuvishy: does this schema not have a timestamp field? [20:47:54] nice [20:47:56] no [20:47:58] oh. [20:47:59] it doesn't [20:48:00] why not? [20:48:07] i don't know! [20:48:11] that's going to mess up the partitioning [20:48:22] does your decoder work with a timestamp if it has one? [20:48:27] joal: will you load the full data set once & then update, or do you need to do a full re-load every time? [20:49:00] gwicke: Except for reloads (hopefully not too often), single load pass [20:49:11] kk [20:49:28] reducing replication could speed things up slightly [20:49:35] ottomata: yeah [20:49:37] two-way, for example [20:49:38] that's a camus thing [20:49:47] gwicke: cassandra tells me it uses LZ4 for the data tables [20:50:03] ottomata: like, decoder has nothing timestamp specific [20:50:08] hmmmm [20:50:11] gwicke: So changing that would be a first step [20:50:13] it is somewhere, don't remember where, looking [20:50:13] (PS1) Madhuvishy: Fix libjars argument order [analytics/refinery] - https://gerrit.wikimedia.org/r/244803 [20:50:15] oh, okay -- in that case, worth altering the table [20:50:19] right [20:50:25] but, madhuvishy we need search to add a tiemstamp field then. [20:50:31] gwicke: need to tweak restbase or will it behave ok ? [20:50:37] joal: rb should even support the migration, but you'll have to increment the schema version [20:50:39] ottomata: i think it was in your writer [20:50:44] may be [20:51:14] joal: you can also change things manually behind rb's back, but that won't persist if you ever re-create the table [20:51:22] gwicke: so I'll alter the table in cassandra, and create a new row in metadata table ? [20:51:37] right makes sense gwicke [20:51:46] I understand now [20:52:00] madhuvishy: for my json one, it is there [20:52:01] gwicke: I'll try that after a day of load is finished :) [20:52:08] it parses the timestamp out of the payload, and then sets it [20:52:15] ottomata: oh [20:52:28] return new CamusWrapper(payloadString, timestamp); [20:52:53] ALTER TABLE with compression = { 'sstable_compression' : 'SnappyCompressor' }; [20:53:00] ah cool [20:53:02] madhuvishy: yours does too [20:53:03] getTimestamp [20:53:09] yup, but not when cassandra is under pressure :) [20:53:20] } else if (super.getRecord().get("timestamp") != null) { [20:53:20] return (Long) super.getRecord().get("timestamp"); [20:53:20] you can change that at any time [20:53:27] it'll only apply to newly compacted data [20:53:27] expects unix seconds [20:53:37] ottomata: aah [20:53:38] Oh right ! [20:53:39] yes [20:54:00] so, ya, we need to make them add a timestamp field in their data. [20:54:28] ottomata: yeah alright. [20:54:50] hey all [20:54:51] ebernhardson: for the camus hour partitioning to work right, we should add a timestamp field your schema [20:55:01] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Setup pipeline for search logs to travel through kafka and camus into hadoop {hawk} [21 pts] - https://phabricator.wikimedia.org/T113521#1716431 (Ottomata) Hey @ebernhardson, your schema needs a timestamp field in order for Camus to import the data... [20:55:03] hehe [20:55:06] so.. we have all these icinga monitoring checks for analytics servers, right [20:55:11] lol [20:55:12] shoudl be a long. [20:55:20] in binary [20:55:23] and we had an override for contact groups here [20:55:33] but that actually never worked (since X) [20:55:47] okay ottomata i patched the python script [20:55:47] mutante: no? i think there are things that only notify us, right? [20:55:51] and now we fixed it .. so .. a couple of you will start getting mail now [20:55:55] Guys, I'm off for today :) [20:55:55] oh , hm, k [20:55:59] latesr joal! [20:56:04] good night joal! [20:56:05] that we expected would work all the time [20:56:06] thanks gwicke for the advises :) [20:56:06] but did not [20:56:07] (PS2) Ottomata: Fix libjars argument order [analytics/refinery] - https://gerrit.wikimedia.org/r/244803 (owner: Madhuvishy) [20:56:27] (CR) Ottomata: [C: 2] Fix libjars argument order [analytics/refinery] - https://gerrit.wikimedia.org/r/244803 (owner: Madhuvishy) [20:56:30] joal: you are welcome, and enjoy your weekend! [20:56:34] (CR) Ottomata: [V: 2] Fix libjars argument order [analytics/refinery] - https://gerrit.wikimedia.org/r/244803 (owner: Madhuvishy) [20:56:35] ottomata: example diff from icinga configs: https://phabricator.wikimedia.org/P2179 [20:56:35] ottomata: Will wait for some of your free time next week to catch-up on deployment :) [20:56:42] See you all ! [20:57:02] ottomata: if you looked at puppet there were overrides in hiera for the contacts.. but that didnt do what it looked like it would do [20:57:17] hm, hm, ok [20:57:20] now it actually does it, thanks to fixes from John [20:57:25] so maybe it works for some things, but not all? [20:57:44] Actually gwicke, reading the last paragraph of that page tells me not to change the compression : LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra compression from Deflate or Snappy is not enough to make up for the decreased performance for general-purpose workloads, but for archival data they [20:57:50] may be worth considering. Developers can also implement custom compression classes using the org.apache.cassandra.io.compress.ICompressor interface. Specify the full class name as a "string constant". [20:57:52] madhuvishy: sure, timestamp is no problem [20:58:32] joal: yeah, but right now you are bottlenecked on write performance [20:58:37] ebernhardson: okay cool. ottomata i'm not sure what the process is for them to update our schema, submit a patch to refinery-camus? [20:58:43] s/our/their [20:58:53] gwicke: yes. Snappy faster to write thean lz4 ? [20:58:57] and decompression is pretty fast for both algorithms [20:58:58] ottomata: yes, it worked for some things,but only a few, now it works for most things, a lot more but still not all :p [20:59:05] ok, compress/decompress speed might not be the same :) [20:59:08] joal: should be, yes [20:59:13] aye ok, cool :) [20:59:14] thanks mutante [20:59:15] ok understiid [20:59:16] ottomata: ironically the part that still does NOT send mail is when ops gets paged :p [20:59:19] Thanks again :) [20:59:29] so if it's critical [20:59:32] see http://quixdb.github.io/squash-benchmark/ [20:59:44] madhuvishy: ebernhardson yeah unfortunetly until we have a shared schema reop [20:59:51] it'll need to be updated in both places [21:00:03] that only affects " LVS, DB, Labs and Hadoop" and 3/4 are ops anyways [21:00:13] ebernhardson: maybe you can just link madhuvishy to your change in mw and she can change it in refinery [21:01:17] sure, that seems reasonable. We havn't even merged the relevant code yet so it'll just be updating the patches. I'm a bit tied up rest of today but on monday i'll update them [21:01:24] or i guess tuesday, monday is holiday [21:01:59] k cool, np [21:02:09] good news is, proof of concept works! [21:02:15] and i'm pretty sure if you just add a timestamp field [21:02:20] sweet! :) [21:02:23] that camus will take care of the rest [21:02:27] joal: actually, I mis-read the graphs; lz4 looks faster in both directions [21:02:30] madhuvishy: we could merge that puppet job now.... [21:02:31] hm [21:02:34] won't do anything [21:02:38] until data starts coming in [21:02:43] but our side is all set up then. [21:03:07] ottomata: where is the puppet job getting libjars? [21:03:29] ah! [21:03:30] good point. [21:03:36] https://gerrit.wikimedia.org/r/#/c/244601/2/manifests/role/analytics/refinery.pp [21:03:53] should pass it here, camus:job takes it i see [21:05:42] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1716447 (kaldari) Yeah, it seems like this would be trivial to implement as a QueryPage in the Gadgets extension. It might have to be marked as an expensive query, but should still be doable. @Nem... [21:07:47] madhuvishy: , updated. [21:08:27] ottomata: cool I +1ed [21:08:35] soo, sure, why not, i'll merge. :) [21:08:41] :) [21:08:42] and make sure the cron job works as is. [21:08:51] okay [21:08:53] if so, then it'll start importing into hadoop as soon as there is data in the topic [21:09:07] yeah [21:11:32] ebernhardson: also, wanted to tell you that we changed the schema's namespace to org.wikimedia.analytics.schemas. [21:11:47] but they dont' have to change that in theirs, right? [21:12:00] that's just a temporary workaround too? [21:12:22] yeah I don't think it matters for them [21:12:39] the messages will be validated against the one we have in our registry anyway [21:12:40] ha, ja, see, madhuvishy, without timetsamp [21:12:48] this time when i imported [21:12:54] the it was importd into hour 21 [21:12:56] current hour [21:12:56] it's in the latest hour [21:12:57] yeah [21:12:58] yeah [21:13:22] but, it works, woohoo [21:13:22] hdfs dfs -text /wmf/data/raw/mediawiki/mediawiki_CirrusSearchRequestSet/hourly/2015/10/09/21/mediawiki_CirrusSearchRequestSet.14.3.1.1.1444424400000.avro | jq . [21:13:28] :D [21:13:29] yay [21:13:32] ottomata: also, we have to purge their data too [21:13:37] :P [21:13:42] nice work both of you! [21:13:46] thanks [21:13:46] nuria: this is really great. [21:16:58] ottomata: alright I think i'll take off early today [21:17:05] thanks for all the help! [21:17:23] ok cool i'm done for the day too [21:17:25] laterssss! [21:17:27] byeee [21:17:38] havefun at peurto rico [21:18:25] byyyeeye [21:23:04] ottomata: cya in Puerto Rico. here's the diff for those icinga mails. https://phabricator.wikimedia.org/P2179 it's the standard services on erbium and gadolinium [21:23:08] bye for now [21:24:47] it's the contactgroups: 'admins,analytics' in hieradata/hosts/erbium.yaml and gadolinium.yaml that was there but didnt work [21:25:11] the stuff that wasnt in hiera worked. the new part is that it works with hiera now [21:59:05] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1716533 (kaldari) [22:10:54] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1716569 (kaldari) [22:11:40] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1716573 (ezachte) @Tbayer not sure why you mention Wikistats in this context. Or am I getting you wrong? I don't recall even one occasion where Wikis... [22:13:13] Analytics, MediaWiki-extensions-Gadgets: Gadget usage statistics - https://phabricator.wikimedia.org/T21288#1716578 (kaldari) I added two new blocking tasks: one for implementing a special page as part of the Gadgets extension to show per wiki gadget usage stats, and one for generating a database report... [22:15:18] madhuvishy: gone? [22:20:34] (CR) Nuria: "Sorry for not catching that, we run into this same error when doing our initial tests." [analytics/refinery] - https://gerrit.wikimedia.org/r/244803 (owner: Madhuvishy) [22:48:44] Analytics-General-or-Unknown, Possible-Tech-Projects: Pageviews for Wikiprojects and Task Forces in Languages other than English - https://phabricator.wikimedia.org/T56184#1716690 (kaldari) Also note that Mr.Z-Man, the author of the bot that updates the reports on en.wiki seems to have retired from Wikip... [22:50:08] Analytics-General-or-Unknown, Possible-Tech-Projects: Pageviews for Wikiprojects and Task Forces in Languages other than English - https://phabricator.wikimedia.org/T56184#1716691 (kaldari) [22:50:25] Analytics-General-or-Unknown, Possible-Tech-Projects: Pageviews for Wikiprojects and Task Forces in Languages other than English - https://phabricator.wikimedia.org/T56184#552874 (kaldari) [22:57:51] Analytics-Wikimetrics: can't remove users from cohort in Iceweasel (aka Firefox, works fine in Chromium) - https://phabricator.wikimedia.org/T115160#1716697 (jeremyb) NEW