[00:32:40] Analytics-Cluster, Analytics-Kanban: Report monthly pageviews for the annual report - https://phabricator.wikimedia.org/T95573#1196508 (Nuria) Well, lila mentioned a max of 500 per day (about 15.000 per month). I think the cluster is overkill as we will be using an infrastructure to made to count and proc... [12:10:29] (CR) Krinkle: "recheck" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/199162 (https://phabricator.wikimedia.org/T93690) (owner: Jdlrobson) [13:49:47] Analytics, Ops-Access-Requests, operations: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1197783 (Ottomata) I sent en email yesterday to Jody asking for confirmation of Sati's NDA, as the instructions in your link say to do. [13:55:57] halfak: sorry! I just realized the instrumentation meeting we have in 5 minutes conflicts with my standup [13:56:10] is it ok if we do it right after standup? Should only be 15 minutes [13:56:25] Woops. Standup isn't on your calendar! [13:56:28] Sure. No problem [13:56:45] I should have known that you had standup then :S [13:57:45] halfak: weird... i'll see to fixing that [14:10:57] Analytics-Kanban, Analytics-Wikimetrics: confirm vagrant setup works for wikimetrics - https://phabricator.wikimedia.org/T95690#1197822 (Nuria) NEW [14:17:14] Analytics-Cluster, Analytics-Kanban: Add automata value in agent_type field of the refined table - https://phabricator.wikimedia.org/T95693#1197885 (JAllemandou) NEW [14:22:26] halfak: ok, done, wanna chat? [14:24:37] milimetric, don't talk to halfak [14:24:44] he just told me olives aren't really food [14:24:47] we are shunning him. [14:24:48] * Ironholds shuns [14:24:53] done [14:24:55] * milimetric shuns [14:25:02] :P [14:25:07] (I love all of you ;p) [14:25:24] :) I also think olives are more ways to torture small children than food [14:25:43] but I don't want to get shunned so I won't say that kind of thing [14:25:43] milimetric, just went into the batcave looking for you. [14:25:47] I suppose we need a new call. [14:25:50] halfak: oh, i'll do it [14:25:57] kk [14:29:05] nuria: Do you mind going for my CR first, like that I can deploy ;) [14:30:53] ottomata, nuria : by the way, I will also add the timestamp value in the CR [14:31:01] I had forgotten about that one ... [14:40:18] joal: looking [14:40:21] Thx [14:40:41] nuria: I'm waiting for my test on hive on timestamp to make the last change [14:46:11] (PS3) Joal: Add ts (unix timestamp), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 [14:46:31] ottomata, nuria: pushed change about timestamp as well [14:47:19] oh COol [14:48:12] hm, joal, camus by default will use 'timestamp'. That is also what rcstream uses [14:48:16] we might want to use that as the field name [14:48:22] i think eventlogging uses that too [14:48:25] kkkkk [14:48:28] no problemo [14:48:42] Since date was dt, I used the same abbreviation :) [14:48:49] joal, is that the proper hive field? [14:48:51] type? [14:48:51] joal: nice, who knew you could do that [14:49:03] huhu, hive doc ;) [14:49:07] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-timestamp [14:49:53] I'll try to have the thing as a timestamp type [14:49:57] give me a minute [14:50:09] ja, not sure how it will actually store it, but i think if we can do that it would be better, as we can use hive date functions i think [14:50:15] and also it will probably display nicely [14:55:19] (CR) Ottomata: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [14:56:11] ottomata: We can use timestamp as fieldname, but it is also a reserved keyword for the type ... [14:56:31] oh hm. [14:56:34] RiiiGhhht [14:57:31] (CR) Nuria: [C: 1] Add ts (unix timestamp), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 (owner: Joal) [14:57:47] joal: so much work to do in teh bot front [14:58:13] nuria: ? [14:58:23] joal: not for this patch [14:58:43] nuria: yeah, if we want a more precise analysis, there is some work to do [14:58:44] joal: but for the future, as for example bingbot is not there [14:58:53] It is for sure [14:59:04] ah yah, ay wait maybe i missed it [15:00:23] two different user agents for bingbot: spider, and iPhone ! [15:00:47] spider bingbot is about 3% of our traffic [15:00:54] nuria: --^ [15:01:06] joal: ah ok, bigbot is being detected ok by Ua-parser, right [15:01:12] that is what we wnat [15:01:14] *want [15:01:31] yup [15:01:33] hm, joal, what shoudl we do? [15:01:35] considered as spider [15:01:39] about timestamp name [15:01:44] i guess not use it? [15:01:56] ottomata: I would prefer to for ts, yes [15:02:13] prevent any forced use of `timestamp` to users [15:02:13] hm. ok. [15:02:18] hm. [15:02:24] ok [15:02:51] nuria: btw, i think you are right, HiveContext is not availbale in cloudera jars yet [15:02:56] in spark submit i am getting [15:03:00] Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf [15:03:03] hmMM [15:03:05] wait. hm [15:03:06] that is haodoop [15:03:08] ok hang on [15:03:17] ottomata: but it is on our mvn deps path [15:03:21] yes [15:03:22] it compiles fien [15:04:07] on: jar -tf ./spark-hive_2.10-1.2.0-cdh5.3.1.jar [15:04:13] Sometimes maven deps can trick you :) [15:05:04] joal: ya mvn will get stuff acording to deps no matter whether you use it or not [15:05:13] Analytics-EventLogging, Ops-Access-Requests, operations, Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1198063 (RobH) This actually has to have @mark approval, not Toby. (Daniel and I discussed in IRC, this is a task u... [15:05:16] ottomata: and the cloudera vs we have is 5.0? [15:05:33] 5.3.1 [15:05:40] actually. [15:05:53] https://phabricator.wikimedia.org/T93952 [15:06:13] (PS4) Joal: Add ts (unix timestamp), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 [15:06:32] nuria, ottomata : last one, I promise ;) [15:06:38] ts is now a hive timestamp [15:06:40] Analytics-EventLogging, Ops-Access-Requests, operations, Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1198068 (mark) Approved. [15:07:51] joal, is that timestamp in seconds or milliseconds? [15:09:19] milliseconds, from hive doc [15:10:04] hm, ok, can you note that in the column comment? [15:10:28] Sure, will do [15:12:06] joal, ottomata nice , that makes things easy, now we have to make sure we have the right locale so as to get times on utc, but i guess that is cluster configuration [15:12:29] times are all utc [15:12:30] :) [15:12:32] no prob [15:12:38] the dts are utc, hive assumes utc [15:12:39] etc. [15:13:29] (CR) Nuria: [C: 1] Add ts (unix timestamp), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 (owner: Joal) [15:17:30] ottomata: shall I go and merge ? Or do you want to review the comment for milliseconds ;) [15:18:28] (CR) Nuria: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [15:18:57] joal: i don't see the millisecond comment :) [15:19:15] (PS5) Joal: Add ts (unix timestamp in milliseconds), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 [15:19:18] Arrrrrrrrriving :) [15:21:38] (CR) Ottomata: [C: 2] Add ts (unix timestamp in milliseconds), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 (owner: Joal) [15:21:40] :) [15:21:47] Analytics-Cluster, Analytics-Kanban: Add better timestamp field to refined webrequest data - https://phabricator.wikimedia.org/T94584#1198117 (JAllemandou) [15:23:24] nuria, coOOOl HiveContext works. [15:23:39] i need to puppetized hive-site.conf symlink in /etc/spark/conf, and then i need to make hive jars be included on spark's default classpath [15:23:40] that's it. [15:23:42] ottomata: doing what? (besides classpath) [15:23:53] doin gthat now... [15:23:56] ah the conf there [15:23:57] but if you want to try [15:24:01] aham [15:24:38] unset cp; for f in /usr/lib/hive/lib/*.jar; do cp="$cp:$f"; done [15:24:49] spark-shell --driver-class-path $cp [15:24:52] nuria, ottomata : I go merge and deploy the new fields [15:24:55] ok cool [15:24:57] do it! [15:24:57] yay! [15:25:24] (CR) Joal: [C: 2 V: 2] Add ts (unix timestamp in milliseconds), access_method, client_type and is_zero fields to refined webrequest table. [analytics/refinery] - https://gerrit.wikimedia.org/r/202914 (owner: Joal) [15:27:23] ottomata: trying [15:28:03] Analytics-EventLogging, Ops-Access-Requests, operations, Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1198133 (Andrew) [15:28:50] Analytics-EventLogging, Ops-Access-Requests, operations, Patch-For-Review: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1198137 (Andrew) Open>Resolved Done. Tomasz, re-open this ticket or ping me directly if you don't have access... [15:35:00] Sorry for the merge commit guys :( [15:35:09] Forgot to rebase ... Arfff [15:50:41] joal: np, looks good [15:51:22] nuria: waiting for last 15:00 jobs before deploying [15:51:37] nuria: wit [15:51:52] nuria with new starting date at 16:00 [16:04:07] joal, ottomata: ok so -given that hive context works now- should we go for that for the new job? I will do some tests to compare times for 1 day ok? [16:04:51] ok. i think so, because I think it will make ooziefying easier [16:04:58] now you can select data based on months very easily, ja? [16:05:03] where month=4 [16:05:44] ottomata: if no perf issue, let's go then :) [16:07:33] ottomata: yaya [16:07:35] like: [16:07:48] ottomata: val userSessions = hc.sql("SELECT uri_path, uri_query, content_type, user_agent, x_analytics, dt from webrequest where year=2015 and month=03 and day=10 and hour=01") [16:08:10] yes, and sources too, i guess [16:08:12] so for oozie there is no question is easier cause , just like we do for everything else, we parametize the sql [16:08:14] because you don't need bits or upload, right? [16:08:20] yup [16:08:58] ottomata: yes, this was just a test. i am going to run it for 1 day on mobile data and see if it makes a difference on times to quantify ok? [16:09:15] ottomata: if so, i will get patches "more ready" [16:09:59] k, cool [16:10:49] cool, nuria, all puppetized, you shouldn't need classpath option anymore [16:11:14] ottomata: efficiency to the maxxx [16:11:23] https://gerrit.wikimedia.org/r/#/c/203358/ [16:12:20] ottomata: ok, job with parquet takes for 1 day (with 12 executors) 15 mins, let's see how does hivecontext do, [16:13:23] nuria: eager to know ! [16:24:04] Deploying now ! [16:26:56] joal: do we re-start the cluster when we deploy? [16:27:12] We restart refinery job in ozzie [16:27:45] nuria: FYI, oozie job killed [16:28:23] joal: so in oozie every job runs with its own classpath? [16:29:05] nuria: every job has its own oozie definition at load time, yes [16:30:35] nuria: Code updated, altering table in hive [16:31:01] joal: ok, will wait until you are done before launching new job with hive context [16:31:12] Sounds like a good idea :))) [16:31:15] Sorry for that [16:34:01] nuria: Table updated [16:34:12] joal: and job restarted it? [16:34:13] nuria: restarting oozie job [16:34:17] joal: k [16:34:51] nuria: job restarted, double checking everything is fine for the next runs [16:36:06] joal: k [16:37:19] nuria, joal, is next monday vacation day in wmf? [16:37:38] mforns: I have no idea [16:37:53] mforns: easter was last monday for us in france [16:38:23] mforns: no, easter is not a federal holiday here [16:38:26] joal, it is marked as Thomas Jefferson's birthday in the calendar as a US Holiday [16:38:38] mforns: Ahhhhhhh [16:38:41] mforns: ah wait [16:39:31] mforns: maybe it is?, no clue., some american person should know then [16:39:42] nuria, joal, thanks! [16:39:48] ggellerman____: do you know if monday is a holiday? [16:40:29] nuria: don't think so, but will confirm [16:41:24] nuria: I don't know why that's on Google calendar...not celebrated in the US that I know of [16:47:00] ggellerman____: k, that is what i thought [16:51:17] ggellerman____, thanks! [17:01:36] nuria: you mentioned some analysis scripts? [17:02:03] yes, christian had some parqued code on El extension to do that. [17:02:03] (for el) [17:02:13] milimetric: you can see them at: [17:02:57] nuria@stat1003:~/EventLogging/server/tools$ [17:03:08] thx! (looking) [17:08:57] nuria that's a pretty awesome script, but I think I need to keep writing mine. Because what I'm seeing is that different tables have missing data for different periods of time [17:09:14] milimetric: but are you looking client side vs server side? [17:09:26] milimetric: cause netweork outages affect those two differently [17:09:31] *network [17:09:32] no, i was just looking at the tables themselves [17:09:44] nuria: Something wrong with the cluster I think :( [17:09:51] in the Edit schema, for example, which takes both client and server side events, there is a huge chunk missing [17:10:05] milimetric: right, but tables populated from "server side events" might be fine if we had a network outage between varnish and el machine [17:10:09] milimetric: makes sense? [17:10:26] milimetric: as data gets to machine through 2 completely different paths [17:10:35] yes, that does, I think there are lots of things to check here, I won't try to write some generic script to do it all [17:10:51] milimetric: what tables did you looked at if i may ask? [17:11:05] i don't remember, i was randomly picking some from the show tables list [17:11:13] joal: did not try anything yet, was trying spark shell still [17:11:25] milimetric: ok, cause that maybe it [17:11:41] i'm writing a simple script that will just give me the count per hour for all tables, and run it for a few hours before and after I notice the problem [17:11:46] nuria: just there was no jobs in the interface ... [17:11:53] But some have started again [17:12:25] the cat_db part of his script is actually what i want to do [17:12:36] but per hour instead of per 100 seconds and for all tables instead of specific ones [17:13:12] milimetric: ok, after you can clasiffy tables on client and server side events and see if that makes a difference when it comes to event drop [17:13:27] milimetric: note that many of tables are "dead" . i. e. they receive no events [17:13:58] yes, that's why i'll look before and after, to catch those cases [17:14:44] milimetric: k [17:33:12] (CR) Nuria: "When I tested this (by adding "raise Exception" to run method on ReportNode on report.py)" (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203241 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [17:33:42] ottomata: yt? [17:37:48] yup hiya [17:38:10] nuria: [17:39:06] ottomata: best syntax i found that seems to work is: [17:39:14] https://www.irccloud.com/pastebin/9EyS7DSF [17:39:23] ok? [17:40:28] why not use from wmf.webrequest [17:40:28] ? [17:40:29] cc joal, job with hive context and 12 executors on cluster: https://yarn.wikimedia.org/cluster/app/application_1424966181866_79314 [17:40:45] ottomata: argh, see good that i ask you [17:40:58] ottomata: cause *cof* *cof* i didi not think about it [17:41:59] mforns: let me know if comments on CR make sense [17:42:21] nuria: aye, that way we can parameterize an argument to the job as table [17:42:22] mforns: we can work on repro the testing together if you want to [17:42:26] table=wmf.webrequest [17:42:27] nuria, I've seen them, will look at them closesly and respond & fix [17:42:32] mforns: k [17:42:34] nuria, thanks for the review! [17:42:47] ottomata: ahem, yes, much better [17:43:11] ottomata: let's see how long does the job take [17:45:59] nuria: cluster a bit under pressure right now, catching on refine webrequests [17:46:17] nuria: Might not be the best moment for your test [17:51:39] ottomata: have we changed something in the parquet conf of the webrequest refined table ? [17:56:56] not I [17:56:57] what's up? [17:56:59] joal: ? [17:57:04] yup [17:57:15] Got null values for new columns [17:57:25] I think I have nailed down the issue [17:57:30] Will let you know [17:57:34] once confirmed [17:58:56] milimetric: yt? [17:59:57] joal: ok, wil re-run later once this one finishes [18:00:20] kevinator: howdy, yes [18:00:25] I just filed: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150409-EventLogging#Thursday_Apr_9_18:49:48_UTC_2015 [18:00:27] batcave? [18:00:32] sure [18:03:00] (CR) Declerambaul: "re the trailing dot, i recommend against it. you can use the :paste command in the repl to avoid the syntax error." (10 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [18:08:39] Analytics-EventLogging, Mobile-Web: MobileWebClickTracking table is huge and thus querying too slow - https://phabricator.wikimedia.org/T76671#1198805 (Jdlrobson) Open>declined we're no longer using this table. We split into multiple tables. [18:09:54] halfak: yt ? [18:10:13] Meeting. I have another one afterward. I'll be around again in 1.5 hours [18:10:20] :( [18:10:26] np [18:10:35] Just to let you know I got that email :) [18:10:40] halfak: --^ [18:11:07] Will check with you next week [18:11:13] The one from altiscale? [18:11:17] yup [18:11:20] Great :) [18:11:29] Have a good weekend o/ [18:11:34] You too ! [18:11:38] (CR) Ottomata: "Awesoome, thanks Fabian!" (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [18:11:55] mforns: ja Fabian just reviwed your code :) [18:11:58] yay! [18:12:18] mforns: Sorry for trailling dots ;) [18:14:03] ottomata, mforns: i will address his comments, let me upload a new patch. I just want to make sure to test it before it's submitted. [18:18:29] cool, yup np [18:18:47] Analytics: Cannot permalink easily to a single graph - https://phabricator.wikimedia.org/T76670#1198938 (Jdlrobson) [18:19:02] ottomata, joal, nuria: I have been reading his comments, they are cool [18:19:58] joal, why are you sorry? I agree with you and Declerambaul that it's better to have them leading the line. [18:20:19] Analytics, Language-Engineering, MediaWiki-extensions-UniversalLanguageSelector, Mobile-Apps, and 4 others: there should be a comparison of clicks count on interlanguage links on different platforms - https://phabricator.wikimedia.org/T78351#1198952 (Jdlrobson) [18:20:24] Oh, I didn't get that :) [18:20:28] mforns: --^ [18:20:30] joal, I just put them trailing because of the spark-shell [18:20:45] I thought it was personnal ;-P [18:21:17] mforns: anyway, Fabian's comments are cool, I definitely agrgee ! [18:21:18] the spark-shell fails with leading dots, interprets the first line (without dot) as a plain assignment. [18:21:34] joal, totally! [18:21:37] mforns: aye, fabs suggests to use :paste [18:21:45] which is ok., but i agree could be annoying [18:22:00] ottomata, what is :paste? [18:22:05] in the repl [18:22:07] type :pa [18:22:11] (or :paste) [18:22:15] it will let you paste in a block of code [18:22:16] then [18:22:18] ctrl-D [18:22:25] and it will eval that block all at once [18:22:38] Didn't know the trick either [18:22:43] Sounds really usefull \! [18:22:44] me neither! learned it yesteday :) [18:23:16] ottomata, I see! [18:23:30] ottomata: I think I found the error [18:27:03] ottomata: When rerunning jobs after a schema change [18:27:26] If a partition was created before and then overwritten, new values inside are not seen [18:27:47] I have the case here [18:29:46] new values [18:29:49] meaning all the new data? [18:29:53] or the new fields you are adding? [18:29:53] yup [18:30:03] only values added [18:30:10] ? [18:30:26] hm, don't understand [18:30:38] partition existed with parquet data before [18:30:42] you overwrote it [18:30:43] For instance, here ts, and other values newly added by the schema change are null [18:30:47] ah [18:30:49] correct [18:31:04] you sure it got overwritten? [18:31:04] I need to remove partition manually, then recreate it [18:31:17] just drop, add partition? [18:31:18] Yup, (almost) sure [18:31:22] I think so [18:31:25] I'll check [18:31:25] and then the fields have data? [18:31:33] I think so [18:31:37] I need to double check [18:32:12] For the moment I am checking every partition created, just to be sure [18:32:24] And since cluster is a bit loaded, take time [18:39:01] nuria, are you looking at Declerambaul's comments? [18:39:38] mforns: ya, i had already changed some code /added hive context so i will address those [18:40:56] nuria, do you mind if I add comments to combineByKey comment? [18:41:45] mforns: no, this here explains pretty well how does it worK: https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html [18:42:55] nuria, yes, I remember having used this when implementing the code [18:43:14] ottomata: confirmed ! [18:43:22] and fixed [18:43:33] Everything back to normal [18:44:28] Will update documentation and send an email to analyitics list [18:45:23] ok [18:48:57] Analytics-Cluster, operations, ops-eqiad: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1199134 (Cmjohnson) spent some time chatting with Dell tech. I did get firmware updates for the R720 that are bootable so I would like attempt to upgrade the bios on a few of the older s... [19:02:28] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1199213 (Cmjohnson) [19:02:30] Analytics-EventLogging, operations, ops-eqiad: vanadium failed disk /dev/sda - https://phabricator.wikimedia.org/T94926#1199210 (Cmjohnson) Open>Resolved a:Cmjohnson replaced disk [19:07:05] oh mah goodness, why was that so hard. i haven't gotten spark streaming + avro + schema registry to work [19:07:10] but i did just get a java consumer to use it! [19:07:13] that was actually pretty easy! [19:07:32] KafkaAvroDecoder (from confluent) + point it at schema-registry and give it a topic name [19:12:26] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1199253 (Cmjohnson) confirmed ge-4/0/11 is vanadium. I deleted the interface from the switch The disk was replaced. @[[ https://phabricator.wikimedia.org/p/RobH/ | Robh ]] [19:13:04] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1199255 (Cmjohnson) @[[ https://phabricator.wikimedia.org/p/RobH/ | RobH ]] did you add to server spares? [19:13:16] (CR) Mforns: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [19:13:52] Analytics-EventLogging, operations, Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1199257 (Cmjohnson) confirmed in IRC that no it wasn't done...keeping ticket to complete [19:22:19] (CR) Nuria: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [19:39:30] (CR) Nuria: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [19:45:05] (CR) Mforns: [WIP] Add Apps session metrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [19:51:05] milimetric, yt? [19:52:42] mforns: hey [19:52:49] hey! [19:53:54] milimetric, do you know of public reports that were added 2014-12-17 to wikimetrics for NamespaceEdits, NewlyRegistered, RAE, RNAE, RSNAE, across all wikis? [19:54:19] I found, all those public recurrent reports, created at this date [19:54:28] I just built the stupidest thing [19:54:54] milimetric, the public reports folder is getting big, 2.4 GB for now [19:55:10] yeah, those should be the Vital Signs reports [19:55:19] they should be created by user WikimetricsBot [19:55:26] milimetric, exactly [19:56:03] ok, I feel better, I thought it had something to do with: https://gerrit.wikimedia.org/r/#/c/180071/5/wikimetrics/api/centralauth.py [19:56:14] milimetric, this was merged one day before [19:56:52] milimetric, so yes, the problem with wikimetrics is that tar-ing of the public reports folder is taking too long [19:57:15] mforns: that makes sense, hm... how to fix :) [19:57:17] milimetric, it contains 5000+ folders and 600000+ files [19:57:21] yep :) [19:57:25] boy oh boy [19:57:39] we could compact the individual files [19:57:45] that would reduce the size dramatically [19:57:51] oh duh, we should totally be doing that [19:58:08] milimetric, what do you mean by compacting the individual files? [19:58:19] instead of storing each day separately, we should store "compacted_1", "compacted_2", etc. every few months when the data gets big [19:58:28] milimetric, I see [19:58:34] you know how in each public report folder we store each day separately and then the full_report.json [19:58:42] yes [19:58:48] understand [19:58:58] cool - yeah, the backup will just break until we do that [19:59:07] yes [19:59:27] milimetric, I'll file a task [20:00:35] thanks for looking into it mforns [20:02:08] hey milimetric, batcave w me about all this crap i'm working on? [20:02:46] ottomata: ok, but just finishing up with nuria [20:07:44] ottomata, joal : also loads of parsing errors in spark now, will run jobs later when there are perhaps less tasks scheduled [20:12:04] I don't get the reason for parsing errors though nuria [20:13:21] joal: maybe "compressing errors" is a better description [20:13:24] https://www.irccloud.com/pastebin/S0tYvnVp [20:13:42] joal: but i hadn't seen those up to today [20:15:23] weird, never seen that yet nuria [20:16:33] joal: maybe the hive conf is missing something that tells it that stuff is compressed on a certain way [20:16:49] hmmm [20:17:15] does the error occur on parquet file or on shuffle data ? [20:18:10] joal: https://yarn.wikimedia.org/proxy/application_1424966181866_79389/stages/stage?id=1&attempt=142 [20:19:59] home: I guess it happens when a task get remove because of preemption [20:20:07] But I can't be sure [20:23:37] joal: can you kill that job? [20:23:46]