[00:13:45] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1711231 (csteipp) No, I meant operational logs. I would like to discourage us being able to correlate mediawiki operational logs and the webrequest dataset,... [00:16:57] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1711237 (csteipp) > I would like to discourage us being able to correlate mediawiki operational logs and the webrequest dataset, since the webrequest dataset... [01:25:40] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1711456 (Milimetric) I'm all for doing 3. if it's the best way, but I'm not sure what you mean. Any more details are welcome, or I'll just ask tomorrow. [01:29:45] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1711466 (Tgr) There are two use cases discussed here: * Connect data logged to Hadoop from the API via the PSR-3 logging system (`MediaWiki\Logger\LoggerFact... [01:31:49] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1711474 (GWicke) > But are there other uses? The main motivation behind adding such request ids is normally tracking the processing of a single request acro... [02:07:14] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1711503 (GWicke) As an example, here is a summary of Facebook's fairly fancy Mystery Machine log analysis system: http://blog.acolyer.org/2015/10/07/the-myst... [02:31:30] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1711519 (MZMcBride) An RFC meeting about this task has been scheduled for Wednesday, October 14 in #wikimedia-office on freenode. [02:45:39] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1711527 (Tbayer) After discussion with @Tnegrin and @JKatzWMF, I wanted to briefly chime in just to make sure that we will be using consistent definiti... [04:04:12] I'm going to be at Wikiconference USA in DC (Thursday thru Sunday). Will anyone else in the analytics community be there? I'd love to hear what you're up to. [04:11:18] (CR) EBernhardson: "one thing i was wondering, we initially talked about a separate repository only for avro schema files. I can create one easy enough, but w" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [08:08:48] Analytics, Security-Reviews: Security review of Analytics Query Service - https://phabricator.wikimedia.org/T114918#1711759 (mobrovac) Ah, I might have been more descriptive in my previous comment. Here's how it will work once all of the bits and pieces are in place: - The Services Team's RESTBase clust... [08:19:37] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1711771 (atgo) Hey guys - what's the next steps for getting the tighter sampling into pgheres? Inquiring minds... [08:23:25] The eventlogging -errors dashboard rocks :) [08:23:40] mforns, ottomata, bd808: you guys rules :) [10:15:58] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1712000 (ezachte) @Tbayer absolutely, being consistent is important. The only inherent complication I see is if wmf.projectview_hourly doesn't cater... [10:17:39] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1712015 (ezachte) So how to proceed? I know 'perfect is the enemy of good', and I wouldn't have spent the effort if our monthly totals number had diff... [10:20:48] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1712020 (mobrovac) Hm hm, we are starting to create a mess already :) The API needs to be structured. **Really**. As discussed in T103811, we ought to have at most... [10:28:00] Analytics, Deployment-Systems, Services, operations, Scap3: Use Scap3 for deploying AQS - https://phabricator.wikimedia.org/T114999#1712059 (mobrovac) NEW [12:38:10] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1712266 (Milimetric) Erik, I'm in favor of keeping the processing that accounts for bad data. It's one of the reasons I didn't want to replace the bre... [12:48:53] Analytics, Security-Reviews: Security review of Analytics Query Service - https://phabricator.wikimedia.org/T114918#1712273 (Milimetric) Yes to everything that Marko said, thank you for the explanation. Answers to the other questions: * minimum granularity: hourly * agent can be: spider / user / bot /... [12:49:39] hi joal, I was going to try to help by reviewing https://gerrit.wikimedia.org/r/#/c/236224/ [12:49:55] you were saying you ran into some trouble with something yesterday? [12:52:04] Hey milimetric [12:52:40] After a good fight won against oozie the day before yesterday, I am today fighting with java [12:54:14] I can show you were I am if you want [12:55:28] milimetric: --^ [12:55:33] yea [12:55:48] I was trying to phrase: "I wanna help but not get in your way" [12:55:54] :) [12:56:07] Brain bounce is always good in that type of situationg [12:56:30] ok, to the batcave! [12:56:58] OMW [13:34:22] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1712373 (ezachte) Dan, using sequence numbers to detect anomalies makes total sense to me. In fact I used that also to repair multi-months 20%-30% UDP... [13:35:00] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1712374 (Tnegrin) I support keeping things simple for this project and simply replacing the source and definition of the page view logs as Dan proposes... [14:28:04] (CR) Ottomata: "Erik," [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [14:37:42] Analytics-Backlog, Database: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1712490 (jcrespo) Deletion log: **db1047:** ``` mysql> DROP TABLE log.Campaigns_5485644; Query OK, 0 rows affected (0.16 sec) mysql> DROP TABLE log.Campaigns_5487321; Query OK, 0 rows affected (0.07... [14:38:05] (CR) Ottomata: "Hm, the custom fixed analytics namespace with the topic -> schema name is kinda weird, but I suppose its a good idea for a temporary solut" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [14:45:44] (CR) Nuria: ">Hm, the custom fixed analytics namespace with the topic -> schema name >is kinda weird, but I suppose its a good idea for a temporary sol" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [14:46:06] joal: me bad person did not get to your CR yesterday BUT we got all camus avro stuff working!! [14:50:19] hello madhuvishy [14:50:27] Can you test BlueJeans with me? :D [14:50:54] We are trying to see if we can switch the 8:30 PT meeting to BlueJeans so everyone can fit in the virtual room [14:51:09] or anyone else who is around milimetric, nuria, joal [14:51:10] :D [14:51:16] hello [14:51:21] hola nuria [14:51:28] just got an account [14:51:33] like 30 secs ago [14:51:38] :-) [14:51:40] will you send me a link? [14:51:49] so, can you go to the calender event, and click on the link in description? [14:53:59] nuria: no bother :) [14:54:08] That's great that you have the camus stuff working :) [14:54:14] nuria: with maven and all ? [14:55:53] (PS3) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [15:07:21] where is mforns? [15:07:36] not seen :( [15:09:25] (PS4) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [15:11:42] ok... he's presenting QR in 20 minutes... and we're switching to bluejeans I think [15:13:32] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1712552 (csteipp) >>! In T113817#1711474, @GWicke wrote: >> But are there other uses? > > A typical use case for adding such request ids is tracking the pro... [15:16:52] oh, no standup today? [15:16:58] ottomata, no.. [15:17:03] oh, our QR is today? [15:17:10] Infrastructure/CTO review [15:17:12] that's what tht is? [15:17:18] (PS5) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [15:17:18] indeed ottomata :) [15:17:21] aye! [15:17:25] interesting name! [15:18:32] :) [15:22:43] the CTO part does make it super interesting [15:23:15] but Blue Jeans, man that's a name. [15:25:58] joal ottomata are you guys joining the QR? [15:26:33] currently joining kevinator [15:26:44] yes [15:30:05] joal: with mvn and all [15:30:10] joal: only puppet missing [15:30:11] awesome :) [15:30:30] I'll be able to reuse the mvn stuff for the code in CR then ! [15:30:32] great [15:33:10] milimetric: currently feeding ! [15:33:14] Thanks again mate :) [15:33:27] awesome :) [15:44:33] oh cool this meeting is all engeinerring teams? [15:44:34] cool! [15:51:38] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1712662 (Aklapper) >>! In T112527#1705342, @Dicortazar wrote: > Ok, for this, there's a bitbucket account with that info daily updated. Th... [16:11:57] Analytics-Cluster, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Replace uses of monitoring::ganglia with monitoring::graphite_* - https://phabricator.wikimedia.org/T90642#1712691 (Dzahn) This is really just curiosity and not a rhetorical question. What is better about monitoring::g... [16:13:49] Analytics-Cluster, Analytics-Kanban, operations: Fix active namenode monitoring so that ANY active namenode is an OK state. - https://phabricator.wikimedia.org/T89463#1712693 (Ottomata) [16:15:09] (PS6) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [16:23:13] ottomata: Was reading your comments on camus patch - we can not do the fixed namespace stuff if we have, say a file that maps from topic to schema, or if we put the schema class in the camus properties. Which is the better way to go, I wasn't really sure [16:23:16] nuria: ^ [16:23:46] I replied to ottomata's comment [16:23:49] on patch [16:24:02] madhuvishy: i think at the moment it doesn't matter much, not sure. [16:24:02] hmm [16:24:04] I think for now is sufficient [16:24:08] or maybe, it does, because this data isn't going away [16:24:16] and the avro files are going to be written to hdfs with this schema [16:24:24] so they will have the namespace hardcoded in them [16:24:26] not sure if it matters [16:24:48] hmmm [16:24:59] choices are two: 1) schema + namepsace are specified on prop files [16:25:11] 2) only schema name is on prop file [16:25:21] *namespace [16:25:29] on 2) namespace is infered [16:25:49] in any way teh producer side needs to specify namespace on avro schema + somewhere else [16:26:30] i think that for now having namespace hardcoded leads to less errors [16:26:59] let me know if this makes sense [16:32:08] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Present most basic community metrics from T94578 on one page - https://phabricator.wikimedia.org/T100978#1712765 (Aklapper) [16:48:48] (PS7) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [17:00:14] (PS1) Joal: Update bot regexp to catch http UA [analytics/refinery/source] - https://gerrit.wikimedia.org/r/244465 [17:02:00] nuria: I had forgotten about the http bot stuff, thanks for the reminder ! [17:02:02] (CR) Nuria: [C: 2 V: 2] Update bot regexp to catch http UA [analytics/refinery/source] - https://gerrit.wikimedia.org/r/244465 (owner: Joal) [17:02:26] a-team: good job this last qtr [17:02:36] th [17:02:36] :] [17:02:37] I will not be at the backlog grooming today [17:02:42] thanks boss :) [17:02:45] :) [17:03:01] joal:merged now. Now i only own you 999 reviews [17:03:03] neither do I, cassandra loading on the go [17:03:07] :) [17:03:11] nuria --^ [17:03:27] scala one is next [17:03:31] cool [17:03:50] It should make use of the camus maven stuff [17:04:02] Also, it should be moved in the repo madhu created [17:04:03] a-team should we delay backlog grooming? [17:04:17] a-team: easier for me tomorrow [17:04:20] Analytics-EventLogging, Database: db1046 innodb signal 6 abort and restart - https://phabricator.wikimedia.org/T104748#1712933 (jcrespo) a:Springle>jcrespo Even if some lost events are not a huge problem for this schema, we should make sure that db1047 and db1046 contain the same data before doing a... [17:04:28] joal: do you want me to leave the scala dependencies in the pom? [17:04:39] in which pom ? [17:04:40] I'm leaning towards cancelling backlog grooming [17:04:44] me too i need lucnh and also have more meetings coming up [17:04:46] in refinery-camus [17:04:47] too many meetings [17:04:54] ok [17:04:56] Analytics-EventLogging, Database: db1046 innodb signal 6 abort and restart - https://phabricator.wikimedia.org/T104748#1712944 (jcrespo) [17:04:58] Analytics-Backlog, Database: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1532299 (jcrespo) [17:05:00] ok, no objections, CANCELLED [17:05:03] ok, I think it's a go: gooming cancelled ! [17:05:07] :D [17:05:11] yay, may be i will go the office now [17:05:11] +1 to cancel [17:05:21] #metooslow [17:05:21] madhuvishy: either you leave them, or I re-add them ) [17:05:25] I don't mind [17:05:35] probably better for you to remove, and me to re-add [17:05:41] joal: okay, i was wondering yesterday because they are not really part of my code [17:05:42] yeah [17:05:47] cool, i will remove [17:05:47] cool :) [17:05:52] thanls :) [17:05:54] joal: also i had one more question [17:05:58] sure [17:06:17] for the fun, testing cassandra loading : http://ganglia.wikimedia.org/latest/?r=custom&cs=&ce=&c=Analytics+Query+Service+eqiad&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [17:06:50] Analytics-EventLogging, Database: db1046 innodb signal 6 abort and restart - https://phabricator.wikimedia.org/T104748#1426083 (jcrespo) The idea is setup the replication db1046 -> db1047, then backfill db1046 with new records only available on db1047. [17:07:07] when I was building the package for refinery-camus, there were some dependencies from the camus jars like easymock, that failed because they were not in archiva. I did the commenting out releases enabled false bit in the source pom to get it all to resolve [17:07:26] what does that mean when we actually deploy [17:07:29] joal: ^ [17:07:49] madhuvishy: hmmmm [17:08:08] madhuvishy: I thinkm that means the next one trying to build without commenting with suffer :) [17:08:33] We should ask andrew to open archiva, then build, then close [17:08:36] joal: yeah - so we need to get those dependencies up on archiva? [17:08:38] I think this is the process [17:08:43] aha [17:08:48] alright cool [17:08:58] archiva does it ottomagically :) [17:09:02] :D [17:09:38] doing... [17:11:02] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1712973 (GWicke) @mobrovac, I think it's worth having 'data' and 'metrics', as they aren't the same thing. [17:11:28] ottomata: building with archiva open? [17:11:37] hold on a sec, i'll push updated pom [17:12:00] ottomata: I'll also need that help for the cassandra stuff, sorry [17:13:20] (PS6) Madhuvishy: [WIP] Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [17:13:29] ottomata: pushed [17:13:52] milimetric: you around ? [17:15:04] hey, yeah, but talking in -services [17:15:14] ok milimetric [17:15:34] need some help when you'll have time: data in cassandra, but no answer frgom restbase :( [17:15:36] milimetric: --^ [17:17:23] madhuvishy: joal, try now [17:17:25] building [17:17:27] archiva should pull deps [17:17:54] madhuvishy: but uncomment the boolean part in pom first :) [17:20:14] ottomata: done, archiva checked, good for me ! Thanks mate [17:21:12] milimetric: I have a complete doubt about a detail [17:22:50] joal: ok, what's up? [17:23:01] can we batcave for a minute ? [17:25:34] milimetric: you remem [17:25:45] remember the .org issue for domain [17:25:58] I think I have forgotten I had something to change [17:26:02] Is that right ? [17:26:10] i stripped .org from all requests [17:26:33] Ahhhhh, ok :) [17:26:33] so if you pass en.wikipedia it uses it as is, and if you pass en.wikipedia.org it uses en.wikipedia [17:26:43] WOOOH :) [17:26:45] Cool [17:26:52] I had a doubt [17:26:58] Now the other thing [17:27:15] data is in cassandra (some of it at least) [17:27:16] select * from "local_group_default_T_pageviews_per_project"."data" where "_domain" = 'analytics.wikimedia.org' and project = 'en.wikipedia' and access = 'all-access' and agent = 'all-agents' and granularity = 'hourly' and "timestamp" >= '2015100100' and "timestamp" <= '2015100101' limit 10; [17:27:22] gives me an answer [17:27:40] ok [17:28:17] But curl http://localhost:7231/analytics.wikimedia.org/v1/pageviews/per-project/en.wikipedia.org/all-access/all-agents/daily/2015100100/2015100101 [17:28:28] joal:looking at : https://gerrit.wikimedia.org/r/#/c/240868 [17:28:32] on aqs 1001 gives me an empty reply :( [17:28:37] Thanks nuria [17:28:51] nuria: code probably needs to be moved with Madhus in camus module [17:29:10] milimetric: --^ a few lines [17:30:00] ottomata: would you mind havinga look with me at aqs cluster ? [17:30:09] Just want to be sure I don't hit it too badly [17:31:12] joal: looks like http://localhost:7231/analytics.wikimedia.org/v1/pageviews/per-project/all-projects/all-access/all-agents/daily/2015100100/2015100102 gives a really just empty response [17:31:19] and http://localhost:7231/analytics.wikimedia.org/v1/pageviews/per-project/all-projects/all-access/all-agentzzzzzzzzzzzzz/daily/2015100100/2015100102 gives a 404 [17:31:24] so it seems something is happening [17:31:43] milimetric: ok [17:32:10] ottomata: thanks! [17:32:38] milimetric: I currently the hell out of cassandra, maybe that's it ? [17:33:07] maybe [17:33:52] hmm [17:34:11] If restbase gives an empty answer when cassandra is under pressure, that's not cool :( [17:37:59] joal: i'm not familiar with Cassandra but does the cluster all have to get the data and acknowledge it and everything before it serves it? [17:38:00] Analytics-Kanban: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1713018 (Nuria) Are the actual estimates for the wikistats task on a different task? [17:38:18] !log Backfilling load from hadoop to cassandra from beginning of october [17:38:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [17:38:44] Analytics-Kanban: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1713030 (Milimetric) yes - https://phabricator.wikimedia.org/T114379 which right now has some discussion but I've started working on it (it's in-progress) [17:39:03] * joal likes when logbot answers! [17:39:24] joal: there are timeouts, and the default is 2s [17:39:48] right gwicke [17:39:56] thanks for the answer [17:40:07] there are also read retries by default [17:40:13] it'll retry once [17:40:19] ok [17:40:32] cqlsh seems to give an answerr in faster than 2s [17:40:33] in this case curl gives "Empty reply from server" which seems weird [17:40:44] none of the code paths would allow that [17:40:51] it seems like it'd either be a 404 or a 200 [17:40:52] empty reply sounds more like a different problem [17:40:55] or a 400 [17:41:34] I tested the validation and that seemed to execute the code (so passing like 201522222222 for the timestamp which is invalid, gives the proper 400 with the message) [17:41:37] RB will return 500 or 503 when Cassandra is down / too slow [17:41:48] including a JSON error object [17:42:16] hm, so why would it return empty... [17:43:26] I feel like we need better logging to figure this out, logstash has a bunch of nonsense in it [17:43:57] milimetric: there is logstash for restabase ? [17:44:12] joal: logstash.wikimedia.org [17:44:20] I search for "aqs1001" but I may be doing it wrong [17:44:40] https://logstash.wikimedia.org/#/dashboard/elasticsearch/restbase [17:45:11] bd808: that's only for the rest API cluster [17:45:31] searching for 'aqs' should work as well [17:46:14] there are worker restarts in the logs [17:46:21] so something isn't quite right [17:46:43] Sorry gwicke :( [17:46:58] Didn't mean to bug you [17:47:03] "type:aqs" would be the best search probably [17:47:20] *nod* [17:47:25] joal: no worries ;) [17:49:58] (PS6) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [17:50:30] nuria: if you're in CR mode, I think this one is ready as well (currenly working on the cluster, so tested for real) [17:50:42] nuria: --^ [17:51:16] joal: ok, understanding all logic in the camus job is taking me a while [18:23:11] (PS4) Nuria: [WIP] Add camus helper functions and job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [18:23:42] joal: whenever you have time, i added UTC for dates otherwise tests fail in my local machine [18:25:00] (CR) Ottomata: "Cool!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [18:25:02] nuria: makes sense ! [18:25:16] joal: still looking, my scala is like ahem.. terrible [18:25:22] :) [18:29:40] Analytics-Cluster, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Replace uses of monitoring::ganglia with monitoring::graphite_* [8 pts] - https://phabricator.wikimedia.org/T90642#1713190 (Ottomata) [18:29:54] Analytics-Cluster, Analytics-Kanban, operations, Patch-For-Review: Fix active namenode monitoring so that ANY active namenode is an OK state. [8 pts] - https://phabricator.wikimedia.org/T89463#1713191 (Ottomata) [18:36:52] milimetric: noticed something while playing with cassandra: [18:37:24] milimetric: pageview title normalization is goo, except for capitals --> Barack_Obama and Barack_obama exist [18:37:53] No other wrong declination though (spaces, or no first letter capital) [18:38:08] milimetric: Do you think we should normalise? [18:39:43] (CR) Nuria: "Tested and added date instantiation to be utf8 so tests do not fail on local machines that might be running on a different timezone. Secon" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/240868 (https://phabricator.wikimedia.org/T113251) (owner: Joal) [18:46:47] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Add event_pageId and event_pageTitle to quicksurvey [not all] schema - https://phabricator.wikimedia.org/T114164#1713239 (csteipp) In general, we should limit the number of eventlogging schemas... [18:51:53] Analytics-Cluster, Analytics-Kanban, operations, Patch-For-Review: Fix active namenode monitoring so that ANY active namenode is an OK state. [8 pts] - https://phabricator.wikimedia.org/T89463#1713268 (Ottomata) Done! https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=analytics1001... [18:56:16] milimetric: can we celebrate this moment for a second? [18:56:22] it may not last for long. [18:56:36] hm? [18:56:36] I hear we can use piwik, milimetric? [18:57:07] Oh! :) cool, I can enable it when I get home, doc's office for a bit [18:57:21] oh! okay! hope all is well. [18:57:38] oh, all good, just routine [18:58:24] ooki. :-) [18:58:25] joal: we can normalize going forward if you want, but I would say don't worry about it for this loading [18:58:37] we'll never be perfect [18:58:43] milimetric: agreed about not changing for the laoding [19:00:21] milimetric: but I wonder about normalization: most viewed (from far): Barack_Obama is the redirect target from Barack_obama ... [19:00:40] joal: looking at 2nd CR https://gerrit.wikimedia.org/r/#/c/232448 [19:00:45] Hmm, I think I'll leave it as it is for now, and if somebody says it's not good, we'll get back to it [19:00:56] Nuria: Thanks a million ! [19:01:12] nuria: can wait tomorrow as well if you have other stuff to do ;) [19:06:09] joal, I think redirects will deserve their own special treatment, so yeah, leaving it as is for now seems fine [19:06:25] cool milimetric, thanks [19:12:58] joal: in maven's sourceDirectory thing - can we specify multiple directories somehow? [19:13:30] madhuvishy: hm, not sure to understand [19:13:45] what source? soutrce management? [19:13:52] joal: something like [19:14:03] https://www.irccloud.com/pastebin/NKqYXm45/ [19:14:33] i want to put in a test schema in src/test/avro, and have that compile too [19:14:41] whooo, that is avro-maven-plugin spoecific config you are talking about :) [19:14:58] I don't know if that's feasible or not [19:15:01] i was wondering how i can do that without repeating all of the plugin stuff [19:15:02] hmmm [19:17:03] joal: if i put in 2 blocks with different sourceDirectories it seems to only execute the second one [19:20:17] joal: hmmm, i might have figured it out [19:20:36] i can put in two blocks with different ids [19:28:36] (PS7) Madhuvishy: [WIP] Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [19:28:45] nuria: pushed tests, check it out [19:29:04] i will add some doc strings to the decoders too, stepping out for lunch now [19:37:33] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Infrastructure-Team, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1713460 (bd808) > Counts of errors (T113672) by action and user agent, in order to identify problem areas and... [19:39:34] I'm for today lads [19:39:42] Have a good end of day ! [20:10:39] (CR) Nuria: "Looking good,couple small changes and I think we are ready to remove the WIP from commit message." (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [20:46:27] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1713666 (Tbayer) >>! In T114379#1712266, @Milimetric wrote: > Erik, I'm in favor of keeping the processing that accounts for bad data. It's one of the... [20:55:48] (PS8) Madhuvishy: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [20:55:51] nuria: around? [20:59:02] (CR) Madhuvishy: Add refinery-camus module (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [21:00:36] joal: for now, the camus properties will stay in refinery right? [21:13:54] madhuvishy: yes, was just eating [21:14:47] cool, I made the changes [21:16:28] Analytics-Backlog, Developer-Relations, MediaWiki-API, Reading-Infrastructure-Team, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#1713841 (Nuria) >Counts of errors (T113672) by action and user agent, in order to identify problem areas and p... [21:17:13] nuria: ^ [21:17:24] madhuvishy: looking [21:21:29] madhuvishy: almost there ! only one thing, we should log.error("Caught exception while parsing JSON string '" + payloadString + "'."); [21:21:44] before throwing teh exceptions in the hope they will show up in the log somewhere [21:21:46] *the [21:21:52] madhuvishy: makes sense? [21:22:02] madhuvishy: for bot runtimeexceptions thrown [21:22:25] from the SchemaRegistry class? [21:22:34] nuria: those logs don't show up [21:22:40] for that class [21:22:49] that's why I din't add anything [21:22:55] madhuvishy: man ... [21:23:16] well, I can make the exception string a bit more detailed [21:23:59] nuria: but i think it already does explain it [21:27:08] madhuvishy: ya, i still do not understand why we do not see logs from that class , do you? [21:27:16] nuria: nope [21:27:59] madhuvishy: sandness, but then i think we are good. [21:28:11] +1 , will wait for otto to take a look [21:28:32] (PS9) Madhuvishy: Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [21:28:50] cool, made a minor change in the exception logging so it'll print the name of the schema [21:29:20] (CR) Nuria: [C: 1] Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [21:29:56] madhuvishy: sounds good, otto is gone so we can sync up tomorrow [21:30:18] madhuvishy: we still have to do the puppet side but i would like to talk to andrew 1st [21:30:33] nuria: yup alright [21:30:45] i'll also push an initial patch for the properties file [21:31:02] madhuvishy: i want to talk to brandon but i think i will pick up varnish changes next [21:31:15] nuria: oooh awesome [21:31:29] i dont know any varnish but interested [21:31:44] madhuvishy: we are still going to have to test the whole workflow here , we need to create teh topic, consume by hand, add puppet code .. etc [21:31:52] yes yes [21:31:54] madhuvishy: but i think we should merge our code [21:32:00] nuria: also [21:32:10] madhuvishy: so changes like joal has wip can be added to the depot [21:32:13] i wonder if our changes to the schema are okay [21:32:32] search should take a look at the default values, union stuff etc [21:34:11] i have to update some of the docs, but i think its ok [21:34:28] i might need to add one more field, since we cant use null vs empty string to distinguish if a suggestion was requested or not [21:38:24] ebernhardson: okay, i couldn't fully experiment with unions and avro binary either [21:38:53] if you find using avro-tools that it can be validated, we can change it [21:42:13] (PS1) Madhuvishy: [WIP] Add properties file for importing mediawiki data [analytics/refinery] - https://gerrit.wikimedia.org/r/244594 (https://phabricator.wikimedia.org/T113521) [22:04:50] (PS1) Madhuvishy: Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 [22:27:00] Analytics-Backlog, Analytics-Kanban: Flag in x-analytics in varnish any request that comes with no cookies whatsoever - https://phabricator.wikimedia.org/T114370#1714042 (Nuria) [22:27:14] Analytics-Backlog, Analytics-Kanban: Flag in x-analytics in varnish any request that comes with no cookies whatsoever - https://phabricator.wikimedia.org/T114370#1714043 (Nuria) a:Nuria [22:33:36] Analytics-Kanban, Mobile-Apps: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1714056 (madhuvishy) a:madhuvishy [22:33:45] (PS1) Madhuvishy: Fix inconsistent mobile uniques reports due to partial job runs [analytics/refinery] - https://gerrit.wikimedia.org/r/244604 (https://phabricator.wikimedia.org/T114406) [23:06:33] (CR) Nuria: [C: 1] Add libjars optional arg to Camus python wrapper script [analytics/refinery] - https://gerrit.wikimedia.org/r/244599 (owner: Madhuvishy) [23:20:57] Analytics-Backlog, Analytics-Kanban, Traffic, operations: Flag in x-analytics in varnish any request that comes with no cookies whatsoever - https://phabricator.wikimedia.org/T114370#1714113 (BBlack)