[00:24:41] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1652055 (Tgr) Now deployed everywher... [01:22:26] !log rerun oozie load jobs for 2015-09-16T22:00Z]: oozie job -rerun 0122641-150605005438095-oozie-oozi-B -date 2015-09-16T22:00Z [02:37:48] ottomata, i forgot, how do i access beeline? [02:37:54] or we don't have it? [02:38:04] * yurik remembers some webui for hive [02:38:40] * yurik also looked through wikitech, but nothing mentions beeline there... except the hive CLI refers to it [02:55:35] never mind, found beeswax [02:56:24] yurik: webui? [02:56:27] you mean hue.wikimedia.org [02:56:28] ? [02:56:30] yep, [02:56:31] beeline is a hive CLI [02:56:37] that is newer and maybe better [02:56:39] but i haven't used it much [02:56:40] we do have it [02:57:03] ah, i see - i was looking through the wikitech, couldn't figure out what beeline hive was refering [02:57:11] gotcha, thx! [03:05:49] ottomata, this query blows up in hive :(( [03:05:52] SELECT * FROM webrequest WHERE webrequest_source='maps' AND year=2015 and month=9 and content_type like 'text/html%' limit 50; [03:06:46] Sep 18, 2015 3:06:17 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl [03:09:14] Analytics-Cluster: hive shows tons of errors on content_type LIKE ''..." - https://phabricator.wikimedia.org/T113014#1652252 (Yurik) NEW [03:09:32] yurik: those are warnings, does that mean it blows up? [03:09:44] ottomata, it floods the screen with them [03:09:51] so technically no, not crash [03:10:32] hm, looks weird, but maybe it will finish? i wonder if the counter it is referring to is something for reporting [03:10:36] stats about job [03:11:03] yurik: , i must sleep, goodnight! [03:11:12] ottomata, sleep is important! [03:11:21] gnight :) [03:11:24] (its 6am here) [03:41:23] Analytics-Cluster: hive shows a flood of warnings on content_type LIKE ''..." - https://phabricator.wikimedia.org/T113014#1652293 (Yurik) [09:33:14] Analytics-Tech-community-metrics, DevRel-September-2015: Provide open changeset snapshot data on Sep 22 and Sep 24 (for Gerrit Cleanup Day) - https://phabricator.wikimedia.org/T110947#1652702 (Dicortazar) Hi again, Some comments. I've thought about the metrics, and I'd say that we need a more proper def... [09:51:37] Disconnecting for a few minutes [10:10:49] hi a-team! [10:55:21] Analytics-Backlog, Research consulting, Research-and-Data: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#1652787 (Ironholds) Totally down to help out! [11:06:20] milimetric, hey, what do {crow} and {lion} at the end of task names stand for? [11:06:36] {frog} too ;) [11:06:50] mforns, hi [11:06:55] hi bmansurov [11:06:57] :] [11:07:06] animal names are codes for our projects [11:07:16] i see [11:07:20] so we can group tasks [11:07:41] what's common between all crow tasks? [11:07:53] you can see a list of the projects and their correspondent animal names in the first column of our kanban board [11:08:03] https://phabricator.wikimedia.org/project/board/1030/ [11:08:19] i see, thanks [11:08:24] :] [11:26:27] Analytics-Dashiki, Need-volunteer: Improve Dashiki's HTML template - https://phabricator.wikimedia.org/T73983#1652817 (bmansurov) The project structure seems to have changed since the task was created. Is the task still valid? [12:23:39] (PS1) Joal: Correct bug in refine [analytics/refinery] - https://gerrit.wikimedia.org/r/239360 [13:08:47] (CR) Ottomata: [C: 2 V: 2] Correct bug in refine [analytics/refinery] - https://gerrit.wikimedia.org/r/239360 (owner: Joal) [13:15:47] Hey ottomata [13:16:08] I have seen you have relaunched load for an hour yesterday [13:16:12] Was there issues ? [13:17:35] i jus tnoticed that that one hour was failed for all coords [13:17:58] so I reran it, but it seems that it was failed just because there were a few duplicates then [13:18:02] i'm looking at it now [13:20:41] ok [13:20:44] thx :) [13:20:56] I wondered if there was something to re-run [13:25:46] Also ottomata, I have noticed some action in aqs area, can you sumarize for me ? [13:26:28] Lastly ottomata, there is some weird behavior since yesterday on the cluster: refine-mobile takes awfully longer [13:26:33] I can't notice why [13:37:51] Analytics-Kanban, hardware-requests, operations, Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653138 (Ottomata) [13:38:14] joal: i saw that too, but haven't looked into it [13:38:32] aqs movement: patches avail, figuring out partition layout, akosiarios will edit networking soon, hopefully today? [13:42:05] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1653161 (Elitre) [13:45:26] ottomata: awesome on aqs: I could start working early next week, right ? [13:48:24] hopefully, it depends on alex too :) [13:52:54] yup [13:54:26] Analytics-Wikistats: Percentage pageviews from Russia is too low in recent geographical breakdowns in Wikistats - https://phabricator.wikimedia.org/T109582#1653207 (ezachte) [14:30:01] Analytics-Kanban, hardware-requests, operations, Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653269 (Ottomata) Ok, I will have to manually partition these, partman is too dumb. Alex, proceed with VLAN changes! Then we can reinstall. [14:52:57] Analytics-Dashiki, Need-volunteer: Improve Dashiki's HTML template - https://phabricator.wikimedia.org/T73983#1653315 (Nuria) I think this can be closed as dashiki already has flexible layouts. [14:53:16] Analytics-Dashiki, Need-volunteer: Improve Dashiki's HTML template - https://phabricator.wikimedia.org/T73983#1653317 (Nuria) Open>Resolved [14:55:48] Hey HaeB, I need to leave now, but I'll be back in a few hour [14:55:55] I'll ping you when back [15:06:04] joal: cool thanks [15:14:58] Analytics-Kanban, Patch-For-Review: Bug: client IP is being hashed differently by the different parallel processors {stag} [8 pts] - https://phabricator.wikimedia.org/T112688#1653381 (Ottomata) [15:34:19] (PS1) Christopher Johnson (WMDE): adds tryCatch to download function with not found warning returning a null adds br before legend to fix overlap with x-axis value labels [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/239396 [15:36:32] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds tryCatch to download function with not found warning returning a null adds br before legend to fix overlap with x-axis value labels [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/239396 (owner: Christopher Johnson (WMDE)) [15:43:23] Analytics-Kanban, hardware-requests, operations, Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653490 (kevinator) a:Ottomata [15:44:23] Analytics-Kanban, Patch-For-Review: Bug: client IP is being hashed differently by the different parallel processors {stag} [8 pts] - https://phabricator.wikimedia.org/T112688#1653494 (kevinator) a:Milimetric>Ottomata [15:52:28] (PS2) Milimetric: Revert "Hack around bad Event Logging IP hash problem" [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/238965 [15:52:37] (CR) Milimetric: [C: 2 V: 2] Revert "Hack around bad Event Logging IP hash problem" [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/238965 (owner: Milimetric) [16:10:44] ottomata: i'm trying to understnad how our webrequest data looks when its read from kafka, and in what format it gets stored in hadoop [16:10:49] where should i look? [16:11:19] madhuvishy: when read from kafka, it is just json [16:11:25] okayy [16:11:29] when stored in hadoop, it is json inside of snappy compressed sequence files [16:11:54] http://wiki.apache.org/hadoop/SequenceFile [16:11:58] hmmm, is it stringified json? [16:12:05] https://github.com/wikimedia/analytics-camus/blob/wmf/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/common/SequenceFileRecordWriterProvider.java [16:12:11] yes its just a json string [16:13:12] okay [16:17:40] madhuvishy: we don't have to write that though. if we can figure out how to write a binary format instead of json using gobblin [16:17:42] that would be fine [16:17:46] we can do some conversion during import [16:17:52] maybe parquet right then? [16:17:59] binary is supported [16:18:05] that is something i wanted to do for camus for a hwile, but never had time to look into it [16:18:12] aah [16:18:22] does it know how to convert from json record -> parquet? probably not, since the json doesn't have a schema [16:18:36] and its possible for each record to have different fields in kafka [16:18:40] yeah it won't, but we can write a "converter" [16:18:52] it'd be cool if it was automatic though. [16:18:59] maybe it could just write different files based on json fields [16:19:13] maybe it could collect all json fields, sort them, and hash them, and use that as a key for different file names [16:19:13] hmmm [16:19:16] iunnOoonOo [16:19:22] ha ha [16:19:55] but gobblin makes me wish we had avro though [16:20:22] madhuvishy: i bet! [16:20:23] i mean, avro could be ok. [16:20:26] its like these linkedin people decided to support avro so much, that if you use something else you feel left out [16:20:29] we hardly ever change webrequest format [16:20:34] so we could define an avro schema [16:20:37] and convert form json to avro [16:20:57] hmmm, that would be a big change though [16:21:00] why? [16:21:23] i feel like we'd be switching out too many pieces at once [16:21:32] ja probably, yeah lets do whatever is easiest [16:21:40] wouldn't the refine jobs depend on that too [16:21:49] yeah, but we'd just change hive table. [16:21:56] its external, so we could have two on top of the same data [16:22:01] hmmm [16:22:02] old data would just have old partitions [16:22:10] new data would have new partitions on the new table [16:22:16] and we'd just change the refine job to query from a different table [16:22:26] the underlying format is abstracted by the hive table [16:22:29] refine don't care [16:22:33] right [16:23:18] hmmm [16:45:37] oh Krinkle, did you know there is this EventConsumer class in utils.py? [16:45:45] you can use that instead of get_reader, and it has a filter option [16:45:52] you should be able to use it with Kafka [16:45:56] (i just saw it for the first time) [16:55:17] ottomata: I haven't even used get_reader yet, but I'll look it up next time. Thanks :) [16:57:23] (PS1) Christopher Johnson (WMDE): adds edit delta and info box [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/239415 [16:59:01] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds edit delta and info box [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/239415 (owner: Christopher Johnson (WMDE)) [17:57:06] * joal is back ! [18:04:52] Hi HaeB :0 [18:04:56] I'm back :) [18:22:21] Hi joal! :) [18:23:02] joal: great - should be back here in about 25min [18:30:58] :) [18:31:19] Everybody seems away, so I'll grab diner ! [18:31:26] Will be back in minutes [18:37:13] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1654216 (Tnegrin) Thanks Gergo! [19:01:45] joal: re [19:02:27] Hey [19:02:49] so, regarding https://phabricator.wikimedia.org/T108925 : the bug per se is about understanding the discrepancies between pageviews05 and projectview_hourly [19:03:03] right [19:03:17] ...but of course the actual goal is to get historical data that's comparable with projectview_hourly [19:03:44] Or if not comparable straight, where at least differences are clearly explained and understood [19:03:46] do we know more now about the possible causes of the discrepancies? [19:03:59] right [19:04:14] pageviews05 id built using Ironholds sampled logs, right ? [19:04:33] yup [19:04:36] yes... i understand it's the same as cube 0.5 [19:04:55] Ok, unfortunately I am really noob in sampled logs cubes and so [19:05:36] HaeB, indeed, albeit at finer granularity [19:05:41] did you look into oliver's suggestion to look at the codebase, and perhaps rerunning it against a sample? [19:05:54] But the definition of pageview used for projectview normally matches the one built by Oliver, except for some details, among which the projcts issue already noticed [19:06:08] was that my suggestion? Because I don't remember where the code lives. [19:06:23] Ironholds: i'm referring to https://phabricator.wikimedia.org/T108925#1567507 [19:06:50] HaeB: You imagine, if Ironholds doesn't know where the code is, that I didn't :) [19:06:56] huh. [19:07:44] joal: what other details apart from the projects choice? [19:07:50] My first here would be to double check differences among projects, as you did before, for all projects, over a few days [19:08:14] HaeB: Nothing that I know about except that the code itself is different [19:08:28] And see if there are patterns in the differences [19:08:29] among projects meaning to check which projects are present in each? or comparing pageviews for the same project? [19:09:12] comparing the two sets by project (how many pageview for en.wikipedia in pageviews05 and project_hourly, and so for each project) [19:09:43] The core thing for me beign able to help here is to have access to the data [19:09:50] got it. so you mean drilling down further to see if we can isolate the source of the discrepancies better [19:09:57] Cause I don't know how to access the pageviews05 [19:10:05] HaeB: Yes [19:10:09] it's on stat1003 [19:10:16] do you have access to that server? [19:10:19] I do [19:10:53] How does the data live in there ? db ? fILES ? [19:11:07] staging [19:11:15] ? [19:11:31] it's a table in the database called "staging", i mean [19:11:37] right [19:11:52] mysql DB I guess, on stat1003 [19:12:13] *nod* [19:12:28] Do you have a particular user to access it, or do I need to have one set up? [19:12:52] the general "research" user, IIRC [19:13:28] Ok ... I guess it's password protected (would be better), and I don't want a pass to be published here :) [19:13:53] joal, do you need a hand getting connected? [19:14:02] hey halfak :) [19:14:07] I do indeed :) [19:14:13] There's a .my.cnf file that only people in the 'researcher' group have access to. [19:14:16] It's in /etc/ [19:14:17] * halfak looks [19:14:26] for me, the PW ist stored in the .my.research.cnf file in the home folder [19:14:49] and yes, halfak helped me set that up too ;) [19:15:14] You'll want to make a symlink because the password changes sometimes. [19:15:18] (i thought that the Analytics team engineers do everything with root all the time ;) [19:15:37] joal, see /etc/mysql/conf.d/research-client.cnf on stat1003 [19:15:46] HaeB: I do as few root as I can, I make too many mistaskes ;) [19:15:56] "researchers" is the group [19:16:39] joal: i know that it would be quite unprofessional to do everything as root ;) was just kidding [19:16:43] halfak: so I should symlink that file into ~/.my.research.cnf ? [19:17:09] HaeB : was kidding as well :) I don't have root :-P [19:17:47] joal, if symlink "~/.my.cnf" Mysql client will find it by default [19:17:53] awesome [19:17:59] halfak: (symlink) huh, i guess i should do that too [19:18:05] I have a couple of those, so I made a special file name [19:19:21] k [19:19:34] After symlink, mysql -u research ? [19:19:52] yes [19:19:55] Oh! [19:20:03] you need -h analytics-store.eqiad.wmnet [19:20:14] RIIIIGHT ! [19:20:21] So in fact it's not stat1003 :) [19:20:26] I wondered :) [19:20:27] That's right. [19:20:36] The mysql server is accessible from stat1002 as well :) [19:20:49] I'd have guessed so :) [19:20:53] ok..... [19:21:15] (don't know about that sneaky behind the scenes stuff ;) [19:21:46] No prob HaeB :) [19:21:47] (i know, the host name is right in the mysql command i enter habitually) [19:22:17] Thx halfak, I am in :) [19:22:22] \o/ [19:22:37] Then the DB name is staging ? [19:22:59] depends on what DB you want. [19:23:03] :D [19:23:07] Maybe HaeB has something to show you? [19:23:14] Yes indeed [19:23:21] We are after pageviews05 [19:23:27] Probably 'staging' then [19:23:29] yes, staging, as i said [19:23:45] I don't have a staging DB available to me :( [19:24:31] I can see 23DBs, none of them is staging [19:24:59] Oh, I made a mistake ! [19:25:16] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1654371 (ellery) Is that really necessary? If so, this is an important use case you can use for pitch :). Anyw... [19:25:33] HaeB: You see when I tell you too many mistakes ;) [19:25:59] joal, did you do "analytics-slave"? [19:26:08] no [19:26:16] halfak: How do you specify that ? [19:26:19] Oh. Cause that's a thing too [19:26:29] analytics-slave = enwiki + some user DBs [19:26:38] ok [19:26:39] analytics-store = all the wikis [19:26:45] ok makes sense [19:26:52] it does? lol [19:26:52] btw i pasted the queries that i used at e.g. https://phabricator.wikimedia.org/T108925#1563160 [19:27:04] ItSeems that by derfault I am on analytics-store [19:27:22] halfak: --^ [19:27:29] HaeB: Ok, will have alook [19:27:39] analytics-store is the place to be [19:27:54] HaeB: I'll try to dig into the stuff now that I know where the other part of the data lives ! [19:28:05] ...and that was actually from the shell including the CNF file and host selection ;) [19:28:08] BTW, joal, I'm running that process that would not complete without running out of memory in hadoop on stat3 right now with a ghetto set of mappers -- same code. [19:28:18] Each mapper is using ~200MB of memory. [19:28:23] halfak: I am sure you have even better places not yet unveilled ;) [19:28:33] I've already churned through a substantial pat of data [19:28:35] WTF hadoop!? [19:29:14] halfak: hadoop streaming, I told you I am not as comfortable with that one :( [19:29:30] But yeah ... WTF indeed [19:30:14] Unix streaming is awesome though. [19:30:22] ;) [19:30:46] HaeB: Yes Isee that, sorry I did not pick the fact that I could do the same :) [19:30:50] halfak: yup [19:31:35] So HaeB : I'll go for project by project comparision in monday, see if I can make up some directions to look for [19:32:11] I don't see any better idea for us to dig on [19:32:28] HaeB: --^ If you have one, I'll gladly here about it ! [19:32:37] that would be cool [19:32:45] i might, let's see [19:32:59] I'll gladly EAR ! my goodness ... late it is [19:33:25] and btw regarding my earlier question about the sampling method... can you confirm that it's done via a counter on the (varnish) servers? [19:33:33] I can't [19:33:41] Ironholds ? [19:33:43] Z^ [19:33:50] I think ottomata or Ironholds know [19:34:37] eh? [19:34:46] hey ottomata [19:34:54] --^ a few lines [19:34:56] which sampling, the sampled-1000 webrequest files? [19:35:27] ottomata: we were looking at oliver's earlier cube data... i guess they were derived from that? [19:35:37] I dunno [19:35:43] dunno nuthin about cube data [19:35:58] ok, so what's the sampling method for these webrequest files? [19:36:13] "pageviews05 id built using Ironholds sampled logs, right ?" [19:36:18] ah [19:36:20] 1/1000 [19:36:21] is it a counter on each cache that runs to 1000 ? [19:36:22] that's it [19:36:24] no [19:36:26] its not per cache [19:36:29] or is it a random number generator? [19:36:31] its per the whole string [19:36:34] i think its just every 1000 [19:36:38] *stream* [19:36:40] or via a date etc hash? [19:36:41] in the stream [19:36:53] hm, well, that was how udp2log did it [19:36:57] how does hive do it? [19:36:57] hm [19:37:04] ok every request was stored in one stream, and then we took every 1000th entry? [19:37:07] TABLESAMPLE(BUCKET 1 OUT OF 1000 ON rand()) [19:37:22] That's what hive does [19:37:36] cool! no idea what that does, would have to read docs! [19:38:16] ottomata: it matters because of the differences (sample errors) to expect https://phabricator.wikimedia.org/T108925#1633106 [19:40:31] HaeB: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling [19:41:48] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1654408 (Ottomata) >I've been told that sampling is implemented by a counter on the Varnish servers (correct?). Nope! Not correct. > If that's not true and the sampling is d... [19:41:49] So it's pseudo-random [19:41:51] so we took a sampled snapshot once via TABLESAMPLE? [19:42:20] i somehow thought these sampled DBs were updated continuously [19:42:43] Every hour, a snapshot is extracted: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/legacy_tsvs/generate_sampled-1000_tsv.hql [19:42:45] ah, ottomata says it's done hourly [19:42:48] joal: milimetric should we submit a pull request for the endpoints? [19:43:23] ok, perhaps i misunderstood the person who told me this, but i quite clearly recall them mentioning a counter ;) [19:43:25] madhuvishy: I think that's what milimetric suggested, but I have not taken the time to review the code [19:43:27] anyway, thanks for clearing that up [19:43:36] np HaeB :) [19:43:53] joal: alright, let me know when you have :) [19:43:55] I'll dig into projects on monday, and see [19:44:19] if it's done hourly, that also means that the sampling errors will be larger [19:44:55] i estimated that this effect can't explain the discrepancies, but i was looking at the daily numbers as one sample [19:45:05] will check again for hourly [19:45:19] cool HaeB [19:45:42] madhuvishy: if you think we should submit a pull request as-is, please do :) [19:45:55] I trust your code and dan's review madhuvishy :) [19:46:01] naw you should review it too :) [19:46:18] Ok, I will that then :) [19:46:24] But it'll be on monday ! [19:46:30] ya no problem [19:46:48] mrf, I hte being the bottleneck [19:46:54] :D [19:47:15] HaeB: We catch up on monday on the ticket with (hopefully), some newS ? [19:47:32] ottomata: any idea on mobile refine ? [19:48:26] joal: haven't looked [19:48:28] still bad? [19:49:07] hm, its a little slow, but catching up it looks like [19:49:09] Analytics-Backlog: Investigate sample cube pageview_count vs unsampled log pageview count - https://phabricator.wikimedia.org/T108925#1654414 (Tbayer) >>! In T108925#1654408, @Ottomata wrote: ... > Currently, Hive is sampling across all webrequest via: > > TABLESAMPLE(BUCKET 1 OUT OF 1000 ON rand()) > > T... [19:50:14] ottomata: looking at time it takes to run queries: seems that the bot regexp augmentation costs a lot :S [19:50:32] I'll monitor this weekend, and we'll discuss that with nuria on monday [19:50:58] madhuvishy: we still don't have agreement on that phab thread :/ [19:51:12] and that impacts the code quite a bit [19:51:12] milimetric: aah [19:51:19] hmmm [19:51:24] hm, milimetric thanks ! [19:51:37] i mean, if they agree with our approach then the code is good as is [19:51:49] I thought that had been discussed [19:52:13] Thanks for watching us even if sick milimetric :) [19:52:14] hmm, interseting joal [19:52:25] interseting ? [19:52:40] right, inteREsting :) [19:52:45] hm, I'll ping gabriel in a bit [19:52:57] okay thanks milimetric! [19:53:01] milimetric: Don't work if you're not well, it cam wait [19:53:37] joal: it's ok, I'm working today [19:53:57] ok [19:54:53] I hope your weekend will be better milimetric ! [19:56:27] Has anyone noticed the drop in traffic reported in Vital Signs? [19:56:28] https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=Pageviews [19:56:38] It must be related to the incident yesterday [19:56:50] :) thx [19:56:55] Hadn't noticed that kevinator [19:57:03] IIRC there was no data loss... so it it just a case of scripts running before the data was available? [19:57:08] Ok, more backfilling ! [19:57:17] milimetric: wanna add an annotation? [19:57:18] milimetric: sure [19:58:02] plus the fact that jobs are slow, that doesn't look good :( [19:58:25] joal: is this data we'll eventually recover and overwrite in the repository? [19:58:28] or is it lost forever? [19:58:49] milimetric: We need to double check, but I think it'll be back after backfilling [19:59:35] kevinator: annotation added [19:59:52] :reloads [20:00:19] cool, thanks [20:01:06] ottomata: how shall we go here ? I suggest rollback the bot regexp, ensure better execution time, then backfill [20:03:54] uhm, what's happening? [20:04:14] jobs were run too soon and we need to rerun? or mobile is just a little behind? [20:09:29] ottomata: jobs ran too sone [20:09:37] ottomata: Or at least I suspect [20:09:54] There is a huge drop in pageviews [20:10:00] https://vital-signs.wmflabs.org/#projects=ruwiki,itwiki,dewiki,frwiki,enwiki,eswiki,jawiki/metrics=Pageviews [20:10:36] So that means backfilling, ottomata, but if jobs takes much more longer, it's not fun [20:11:14] I can launch backfilling now and monitor it through the weekend, I am just afraid of cluster contention due to long jobs [20:11:50] joal: i think dont' worry about cluster contention, i think yarn will sort it out. [20:11:58] ok ottomata [20:12:02] I trust you ;) [20:12:08] maybe launch the backfilling jobs in the priority queue [20:12:10] instead of production? [20:12:22] good idea ottomata [20:12:41] I'll start backfilling load now [20:12:44] i mean,i could be wrong! :D [20:12:52] but, it seems mobile is just lagging, it IS running [20:12:53] When done, I'll go for refine etc [20:12:57] ok [20:14:16] By the way ottomata, have you checked the load you re-ran yesterday ? [20:15:56] yes, i checked the hourly sequence stats [20:15:58] just saw dups [20:16:05] ok cool [20:16:20] ottomata: I'll go for production queue --> rerun [20:16:32] not possible to rerun in another queue :S [20:17:12] I'll double check that things don't break tomorrow, and stop backfilling if needed [20:17:49] ok [20:17:55] joal: yeah, you can move the jobs after they have launched in yarn [20:18:00] but it would be annoying. [20:18:03]