[13:31:13] Analytics-Wikistats: Percentage pageviews from Russia is too low in recent geographical breakdowns in Wikistats - https://phabricator.wikimedia.org/T109582#1552786 (ezachte) NEW [14:31:16] Analytics-EventLogging, operations, Patch-For-Review: Create a package for python-pykafka for ubuntu precise and debian sid - https://phabricator.wikimedia.org/T109567#1552896 (Ottomata) Ah! Forgot about hafnium, and that it was precise. Will package it for precise too. [14:40:10] craintesHi ottomata [14:40:17] Hi ottomata sorry :) [14:40:20] hello! [14:40:31] Oops, I'm looking nighty ! [14:40:47] How is it today ? [14:43:08] haven't checked status yet, although dan and I were seeing sometihng fishy yesterday.....>.......mainly about duplicates coming in, but also about potentially rerunning more jobs...due to tweaks we made to my hourly loss query :( [14:43:15] i'm reading your cassandra loader now :) [14:44:33] ottomata: ok [14:45:01] Every text load job since yesterday 4AM is shown red in hue --> seems worng :( [14:45:08] ottomata: --^ [14:48:21] hm, that does sound wrong, will look [14:48:24] joal: https://github.com/spotify/hdfs2cass ? [14:48:32] seen that? [14:48:44] yup [14:48:58] avro based [14:49:47] we can write avro in hadoop pretty easy though, no? [14:49:55] we can very easily output an avro dataset from hive query [14:50:09] and why not, eh? pageview could be avro [14:50:14] its all in hadoop right now [14:50:29] for sure could be :) [14:51:21] camus is all caught up. hm. [14:51:33] i bet its all duplicates. [14:51:34] hm. [14:51:35] And there are bizarre behavior in kafka [14:51:37] checking. [14:51:38] oh? [14:52:04] what? [14:52:06] Like cpu iowait is higher on 1021 and 1022 [14:52:26] disk await max higher on 1021 [14:52:44] I would have expected things to level up after leader reelection [14:53:46] ah, i also find that strange, but i think it has been like that for a while, and the difference isn't a lot. [14:53:57] yeah, true [14:54:02] Machiens are different ? [14:54:58] hardare? no. [14:55:05] k [14:55:51] yeah, joal, i think our reporting is all messed up now, especially by that cp1008.wikimedia.org host [14:56:05] looking into that now, maybe can turn off vk there [14:56:06] ok ottomata [14:56:22] i don't see very little loss (accoriding to my new query) now. [14:56:24] so wrong load jobs, but no data loss [14:56:35] btw, this is what [14:56:35] http://codeshare.io/aITwc [14:56:40] i am using to check [14:56:41] * joal breath again [14:57:30] ottomata: So, about the cassandra loader, I'd choose not to go for avro because of the change weight [14:57:51] aye, +avro is a lot [14:58:10] but, also your job is not short :) [14:58:17] ottomata: But if you prefer, we can go to avro and project that (we'll have to if we want to go for confluent anyway ;) [14:58:24] lots of logic there about convention with fields, etc. [14:58:33] ottomata: Very true [14:58:45] wait, pageview is in parquet, right? [14:58:50] or just text [14:58:51] Type conversion, therefore schema definition [14:58:59] pageview is parquet [14:59:01] where is this xsv coming from? [14:59:22] Genericity [14:59:28] Genericity? [15:00:18] Same job for any XSV data type [15:00:31] Like historical pageview for instance [15:00:38] aye [15:00:45] btw, i think its pretty easy to read parquet as avro [15:00:46] https://github.com/Parquet/parquet-mr/blob/master/parquet-avro/src/test/java/parquet/avro/TestReadWrite.java#L72 [15:01:32] Indeed, easy to convert parquet to avro it seems :) [15:01:49] yeah, anyway, i think you could be right, just want to make sure we consider this [15:02:16] I have looked at it, and choosed the shrtest path in my opinion [15:02:23] But it can be changed [15:03:51] CAn't read your codeshare ottomata [15:04:00] no? [15:04:01] why? [15:04:12] application error :( [15:08:29] hm [15:08:31] ok gist [15:08:47] joal: https://gist.github.com/ottomata/bbab3caa708c5b213ec0 [15:08:59] better thx [15:11:45] ottomata: your request seems correct to me [15:20:20] ok, joal, i'm going to rerun some loads again...and compare one more time, ok? [15:20:30] going to rerun for these https://gist.github.com/ottomata/bbab3caa708c5b213ec0#gistcomment-1554721 [15:20:50] and then compare those percent_lost after the load finishes, and any of those we will rerun refine and depdendents. [15:20:54] am sorry this is not yet over!!! :( [15:21:09] i checked this yesterday: upload 2015 8 17 20 157311435 58018892 0.0 26.94413407 [15:21:10] and it was actually fine. so ja. [15:21:59] ottomata: let's go for it, and also maybe for 2015-08-18 ? [15:22:23] ottomata: We also need to investigate and pinpoint the issue [15:22:44] indeed. joal, that is all hours where percent_lost > 1% in august up til today [15:22:45] Cause that's really taking a lot of resource (people _ cluster) to catch up [15:22:47] so nothing from yesterday [15:22:57] neither today [15:23:01] Ok cool [15:23:06] Let's go [15:23:12] We'll talk about that in standup I guess [15:23:18] indeed. i was hoping to get through this upgrade and then do post mortem and fix all this stuff. ESPECIALLY our 2 hour job fireing thing [15:23:22] that is the source of a lot of these problems [15:23:40] well, maybe not the source, but it is a dumb thing that woudl have prevented a lot of this work [15:23:43] if we fired load off smarter [15:23:46] than waiting for 2 hours [15:23:56] i think we should do it by examining camus offset/timestamps [15:24:00] for each partition [15:24:10] right [15:24:25] I think a first step would be to lauch load with 4 hours lag instead of 2 [15:24:32] Easy win [15:24:33] joal: if you like, maybe I should wait until we get through the full kafka upgrade/exampsion [15:24:37] before i do mroe job rerunning stufff? [15:24:42] that way we do it once and for all? [15:24:53] joal, 'id be fine with that, but that would delay job output by a lot [15:25:03] ottomata: feasible, but the cluster will be buuuuuusy at that moment ! [15:25:06] yeah [15:25:14] hm [15:25:27] Job output timing is not really as important as data quality [15:25:29] I think [15:25:45] So if having 2 hours more lag gives us quality, I buy that ! [15:26:35] (PS2) Milimetric: Pass the buck [analytics/geowiki] - https://gerrit.wikimedia.org/r/232426 (https://phabricator.wikimedia.org/T106229) [15:26:56] I actually dont mind rerunning jobs while we stop load during reinstall [15:27:00] ottomata: --^ [15:29:12] i agree joal [15:29:18] joal, aye. [15:31:42] Even if it's tedious, I think trying to catch up as we go makes sense [15:31:47] ottomata: --^ [15:35:18] k [15:37:53] ottomata: https://gerrit.wikimedia.org/r/#/c/232408/ [16:00:17] hi [16:00:41] stat1003 tries to git pull geowiki-scripts, but that fails [16:00:50] since an hour [16:01:02] Error: /usr/bin/git pull --quiet returned 1 instead of one of [0] [16:01:11] Error: /Stage[main]/Geowiki/Git::Clone[geowiki-scripts]/Exec[git_pull_geowiki-scripts] [16:01:19] any ideas if there was a change to that an hour ago?> [16:04:50] Analytics-Engineering: stat1003 - git pull geowiki scripts fails - https://phabricator.wikimedia.org/T109594#1553329 (Dzahn) NEW [16:05:15] Analytics-Engineering: stat1003 - git pull geowiki scripts fails - https://phabricator.wikimedia.org/T109594#1553336 (Dzahn) Exec[git_pull_geowiki-scripts]/returns:** You are not currently on a branch** [16:27:50] milimetric: were you working on geowiki-scripts? ^ [16:28:18] yes :) talking to them in -ops now [16:28:22] thx [16:28:31] Analytics-Cluster, operations, ops-eqiad, Patch-For-Review: rack new hadoop worker nodes - https://phabricator.wikimedia.org/T104463#1553505 (Cmjohnson) All are racked and @ottomata has already added as worker nodes. Unfortunately, we had 2 more servers that were came to us not working. Dell t... [16:29:48] woot, +6 hadoop nodes this morning! [16:30:04] ottomata: :D nice! [16:32:13] Analytics-Engineering: stat1003 - git pull geowiki scripts fails - https://phabricator.wikimedia.org/T109594#1553516 (Milimetric) This is my fault, sorry for the noise. I'm messing with the repository there and I have to keep it in this bad state until I can be sure I fixed the original problem. Let me know... [16:38:28] ottomata: COOOOOL ! :) [16:38:37] ottomata: faster backfill ;) [16:38:48] ottomata: need me for something ? [16:39:33] naw, i'm about to start rerunĀ load jobs [16:39:36] i'm not going to rerun misc [16:39:38] or bits [16:39:43] and i'm not going to rerurn jobs from 08-03 [16:39:50] i think we are pretty sure those are lost [16:39:57] ottomata: right [16:40:38] Cool, I'll make a pause until this evening interview then, it's already been a long day [16:40:59] yeah [16:40:59] k [16:41:10] i'm going to start these and then get lunch and run home, then will do some reinstalls [16:41:13] I'll still be online though, so in case, just ping [16:41:32] Enjoy your lunch :) [16:42:37] Analytics-Engineering: stat1003 - git pull geowiki scripts fails - https://phabricator.wikimedia.org/T109594#1553617 (Dzahn) I ACKed it in Icinga so monitoring won't report it anymore until it works again. The ideal way would be to schedule a downtime in Icinga (and disable puppet) but you'd have to ask ops... [16:42:52] k laters [16:42:59] Analytics-Engineering: stat1003 - git pull geowiki scripts fails - https://phabricator.wikimedia.org/T109594#1553618 (Dzahn) p:Triage>Low [17:19:03] bearND: able to make the scrum of scrums today? hoping we can cover the link previews and content service stuff a little bit over video there [17:22:25] dr0ptp4kt: I can join and be happy to answer any questions. [17:24:32] bearND: thx ^ ottomata, bearND is coming to scrum of scrums. you'll be there, right? [17:25:17] oh , wasn't planning on it, i was just called into a meeting with kevin about something [17:25:18] shoudl I? [17:26:18] ottomata: actually, is milimetric going to be at scrum of scrums? i hear you if you need to be at another meeting. [17:46:25] (PS1) Ottomata: Update camus property files with names of new brokers [analytics/refinery] - https://gerrit.wikimedia.org/r/232535 (https://phabricator.wikimedia.org/T106581) [17:46:53] (CR) Ottomata: [C: 2 V: 2] Update camus property files with names of new brokers [analytics/refinery] - https://gerrit.wikimedia.org/r/232535 (https://phabricator.wikimedia.org/T106581) (owner: Ottomata) [17:57:09] madhuvishy: Heya [17:57:13] You still with Paul ? [18:19:32] Analytics-Kanban, Patch-For-Review: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1554067 (Milimetric) FYI the script currently in code review works, it's been running for about 5 hours and it looks to be about 1/3 of the way through updating the private data. I'd ex... [18:20:13] !log does this log work? [18:20:25] guess not! [18:21:39] FYI I am starting reinstall of analytics1022 now [18:50:57] (CR) Milimetric: [C: -1] "I need to add some notes directly into this patch so this is easier in the unfortunate case that this happens again. I'll call them READM" [analytics/geowiki] - https://gerrit.wikimedia.org/r/232426 (https://phabricator.wikimedia.org/T106229) (owner: Milimetric) [18:51:25] Hey mforns [18:51:30] batcave for debrief ? [18:51:35] joal, sure! [18:51:37] in 5 mins ? [18:51:57] joal, ok! [18:53:34] ready mforns :) [18:53:38] yep [18:53:41] omw [18:57:19] (CR) Milimetric: Pass the buck (1 comment) [analytics/geowiki] - https://gerrit.wikimedia.org/r/232426 (https://phabricator.wikimedia.org/T106229) (owner: Milimetric) [18:57:34] milimetric: should I add access method to the top endpoint? atm it only takes project and timestamp [18:57:40] joal: ^ [18:57:52] madhuvishy: I'd think so, for completeness [18:58:09] it would be nice to know which articles are trending for folks on mobile only [18:58:24] milimetric: yeah, okay makes sense [18:58:48] if there are systematic differences between top on mobile and top on desktop, that could inform some interesting decisions for readership [19:03:52] joal / ottomata: what's up with this thing: log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender. [19:04:11] milimetric: from where ? [19:04:19] when i run hive [19:04:41] I'm just issuing a simple select distinct with tablesample on wmf.webrequest [19:04:55] hmm [19:05:09] probably from some log4j misconfiguration [19:05:19] ottomata: new machines ? [19:05:32] I received an email from aswin as well, I'll forward it to you [19:05:50] It seems to have started today (why I think machine additions) [19:05:59] ? [19:06:07] milimetric: is it causing you problems? [19:06:22] no, I was just making sure it's not causing other problems [19:06:40] like, if it's misconfigured and it can never roll the logs, is it just growing and growing? [19:07:29] pretty sure hive cli isn't writing logs [19:08:21] ottomata: just forwarded you ashwin email [19:09:43] thanks joal, that is a change i made today. [19:09:44] looking [19:09:52] thx ! [19:13:19] ottomata: chocolate day ! https://twitter.com/jaykreps/status/634069079524012032 [19:14:12] ha [19:15:42] joal: do you want Ashwin's email to Analytics public list? [19:16:28] Hey leila, the one about hashing I guess [19:16:35] the one about Hadoop [19:17:16] Good idea leila, if somebody else has an issue they might bounce as well [19:17:23] the one you sent me? [19:17:36] just replied, and CCed ashwin [19:17:37] ooki, will tell him, joal [19:17:39] should be fixed shortly [19:17:53] leila: given ottomata feedback, no need [19:19:09] ottomata: analytics1022 is dead, welcome kafka1022 I guess :) [19:19:13] :) [19:19:22] yeah, waiting for that stupid PTR record to change again [19:19:27] :) [19:19:30] next time I will change that ahead of time [19:19:32] ah okay, joal. [19:19:39] But thx leila for asking :) [19:19:47] And sorry for the missfit :) [19:20:00] Guys, I'm off for today ! [19:20:01] np. :-) [19:20:07] have a good night joal [19:20:10] Thx ! [19:20:32] ottomata: if you need me to rerun some stuff, please email, like that I start in he morning (less traffic) [19:21:48] joal|night: i emailed you a few mins ago [19:21:52] oh, i didn't [19:21:54] i'm still refining [19:21:57] ther are only 2 partitions more to do [19:22:16] oh, and they are done [19:22:16] yeah I was wondering how it came unoticed :) [19:22:17] emailling now [19:22:21] Ok [19:22:32] k, goodnight! [19:26:58] ottomata, hi! did the candidate write a task? [19:27:17] yes [19:27:19] you want? [19:27:24] yes please! :] [19:28:06] sent [19:28:58] ottomata, thanks! [20:22:45] Analytics-Kanban: Legal request, data point 1 - https://phabricator.wikimedia.org/T109626#1554591 (ggellerman) NEW [20:25:15] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1554616 (Milimetric) I picked an hour I'm not going to disclose on purpose, and I did (notice the 0.01% sampling) select referer, count(*) from wmf.... [20:41:51] (PS3) Milimetric: Pass the buck [analytics/geowiki] - https://gerrit.wikimedia.org/r/232426 (https://phabricator.wikimedia.org/T106229) [20:53:32] Analytics-Kanban, Research-and-Data: Analyze referrer traffic to determine report format {hawk} [8 pts] - https://phabricator.wikimedia.org/T108886#1554764 (kevinator) @milimetric can you run using a whole day instead of an hour? I'm surprised to see Japan so high and I'm left wondering if this was done a... [20:57:22] Analytics-Kanban, Research-and-Data: Legal request, data point 1 - https://phabricator.wikimedia.org/T109626#1554779 (DarTar) [21:22:19] ottomata: what's the best way to kill a hive job? [21:31:40] (PS1) Ottomata: [WIP] Generate hourly aggregate statistics about webrequest sequence stats [analytics/refinery] - https://gerrit.wikimedia.org/r/232644 [21:31:56] bearloga: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Killing_a_running_query [21:36:41] ottomata: thank you very much! [21:40:52] (CR) Ottomata: "I haven't run this exact version of the query, or tested this at all, hence WIP. Whatcha think?" [analytics/refinery] - https://gerrit.wikimedia.org/r/232644 (owner: Ottomata) [21:59:25] (CR) Mforns: Refactor queries into reportupdater and dashiki (1 comment) [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [22:05:26] (CR) Mforns: "I don't know what this flake8 error is, it seems it crashed. In my machine flake8 passes OK. :/" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [22:06:51] (CR) Milimetric: "yeah, that's a system error, not your code. I'd wait until tomorrow and just upload a new patchset" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/232645 (https://phabricator.wikimedia.org/T107504) (owner: Mforns) [22:07:29] milimetric, thx [22:18:32] Analytics, Labs, Labs-Infrastructure, Labs-Sprint-108, Patch-For-Review: Set up cron job on labstore to rsync data from stat* boxes into labs. - https://phabricator.wikimedia.org/T107576#1555114 (yuvipanda) /me pokes @akosiaris again [22:19:18] Analytics-Backlog: Delete obsolete schemas {tick} - https://phabricator.wikimedia.org/T108857#1555124 (mforns) [22:32:51] (CR) Milimetric: [C: 1] [WIP] Generate hourly aggregate statistics about webrequest sequence stats (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/232644 (owner: Ottomata) [22:39:55] Ironholds , milimetric i'm signing off and headed to the dentist... I won't be answering emails for a bit ;-) [22:40:28] kevinator, I'm going to go watch The Wire and eat raspberry ice cream with marshmallows in it [22:40:40] which probably means I'll be heading to the dentist tomorrow ;p [22:40:55] have fun! Or, as much fun as you can have! And let's work on this problem in the morrow when we're fresh and rested :) [23:32:41] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1555494 (kevinator) Here's another use case I came across recently: As a legal team member, I want traffic reports by country so I can provide evidence of traffic in a particular country. ( http://sta...