[00:00:26] ottomata: after all, my fault for taking days to review [00:01:01] hard to review during meetups [00:01:08] i was just going through my queue and saw that, looked simple enough [00:01:18] np [00:01:43] i'm peacin out too, going bouldering with those guys, laters! [01:25:10] Funny. 217% of es.wiktionary entries were created by bots. :) https://stats.wikimedia.org/wiktionary/EN/BotActivityMatrixCreates.htm [01:26:38] Looks like the bot creation count considers all pages, while the total is about countable pages [01:30:44] https://phabricator.wikimedia.org/T87723 [03:10:21] MediaWiki-extensions-MultimediaViewer, Analytics, Multimedia: Create dashboard showing file namespace page views and MediaViewer views - https://phabricator.wikimedia.org/T78189#997477 (Tgr) >>! In T78189#996257, @Gilles wrote: > There's been no visible movement on Sentry (maybe you're blocked there, though?)... [07:04:11] Analytics-Wikistats: Discrepancies in historical total active editor numbers - https://phabricator.wikimedia.org/T87738#997849 (Tbayer) NEW [07:04:54] Analytics-Wikistats: Discrepancies in historical total active editor numbers - https://phabricator.wikimedia.org/T87738#997857 (Tbayer) [14:03:51] (CR) Ananthrk: [WIP] UDF to get country code from IP address UDF to determine client IP address given values from source IP address and XFF headers Added I (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/183551 (owner: Ananthrk) [14:34:54] (PS7) Ananthrk: [WIP] UDF to get country code from IP address UDF to determine client IP address given values from source IP address and XFF headers Added IntelliJ related files to .gitignore Split existing Geo UDF into two - GeoCodedCountryUDF and GeoCodedDataUDF Both U [analytics/refinery/source] - https://gerrit.wikimedia.org/r/183551 [16:49:24] holaaaaaa [16:50:44] (CR) Nikerabbit: [C: 2] Dashboard: Add php file to generate sql [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/186557 (owner: Jsahleen) [16:50:51] (Merged) jenkins-bot: Dashboard: Add php file to generate sql [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/186557 (owner: Jsahleen) [16:57:38] (PS3) Nuria: Reformating archive file to be zero padded [analytics/refinery] - https://gerrit.wikimedia.org/r/185913 [16:58:02] (CR) Nuria: "Thanks for the catch. Corrected report name now." [analytics/refinery] - https://gerrit.wikimedia.org/r/185913 (owner: Nuria) [17:03:35] (CR) Nuria: [WIP] UDF to get country code from IP address UDF to determine client IP address given values from source IP address and XFF headers Added I (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/183551 (owner: Ananthrk) [18:32:49] ops-core, Analytics, operations: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#998750 (Ottomata) FYI, the pagecounts-raw files found at dumps.wikimedia.org/other/pagecounts-raw/ use the nginx logs. We and are now recreating this data using Hive via varnishkafka. Christian and I... [18:34:38] Analytics, operations: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#998753 (Ottomata) FYI, I restarted most hadoop daemons yesterday. There might be a few that I didn't, but the volume of logs should be much less now. [18:35:24] ops-core, Analytics, operations: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#998754 (faidon) Are they? So are these just counting the X% of requests that come via HTTPS, where X is < 5 probably (and also a biased sample, as this is predominantly editors)? [18:41:38] ops-core, Analytics, operations: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#998771 (Ottomata) So, in udp2log, the https and the duplicated proxied http request both exist. That means that a any given https request will have 2 entries in the logs. The webstatscollector code ch... [18:42:42] nuria: is labs down? I'm trying to get to http://wdq.wmflabs.org/api_documentation.html but I can't [18:46:06] leila: i would send a note to labs mailing list, i do not think things are working that great yet [18:46:41] leila: list is Wikimedia Labs [18:46:43] thanks, nuria. trying to send some api queries to wikidata for mobile QR. [18:48:45] springle: you still in PST? [18:50:34] nuria: yes [18:51:01] springle: do you have a fe minutes to talk about your comments regarding the labs db hosts? [18:51:13] certainly [18:51:16] springle: this ones: https://gerrit.wikimedia.org/r/#/c/186356/ [18:51:48] springle: ok, the part i am not sure about is whether your comments were intended for the labs infrastructure or just us and wikimetrics [18:52:22] leila: wdq is down, I’m bringing it back up [18:52:35] nuria: hey! I see https://metrics.wmflabs.org/ is back up [18:52:38] thanks YuviPanda. :-) [18:53:09] nuria: for you. ideally for eveyone in labs too, but that could be like herding cats with laser pointers [18:53:36] YuviPanda: right but we have to change the config for the db, springle had some comments here:https://gerrit.wikimedia.org/r/#/c/186356/ [18:53:37] leila: wdq should be back up [18:54:21] springle: so waht is the right config setting then , $db_host_mediawiki = '{0}.labsdb' no longer seems to work per YuviPanda 's e-mail [18:54:37] springle: is there anything preventing us from just setting these on DNS? [18:54:41] YuviPanda: it's working thanks! [18:54:57] springle: I asked nuria to use the direct names since otherwise managing /etc/hosts is a pain [18:55:04] nuria: simply, each labsdb has all wikis, however if we keep traffic split up using the old splits s1 = labsdb1001, s[245] on labsdb 1002, s[367] on labsdb1003, we will see much better cache characteristics [18:55:58] YuviPanda: i have no problem with direct names, only with favoring specific labsdbs instead of spreading load around in some predictable fashion [18:56:04] springle, YuviPanda seems like we need to coordinate here a bit cause using "s1" directly no longer works [18:56:43] springle: nuria I think using labsdb directly is ok for now, and let’s have s{1,7}.labsdb set up in DNS and then have people switch? [18:57:25] YuviPanda: let me make sure it is ok for now, as yesterday could not check that but if that works for springle it certainly works for us [18:58:01] nuria: simple question: if you are querying enwiki, will it use labsdb1001? if commonswiki, will it use labsdb1002? etc [18:58:22] springle: we query ALL wikis [18:58:28] springle: with teh same config [18:58:47] nuria: targetting a single labsdb host for all wikis? [18:59:12] springle: well it was "s1" prior which was fine right? [18:59:14] qchris: YOOO [18:59:17] springle: that’s what quarry does as well :) [18:59:40] then it's a bad plan and maybe contributing to the uneven load seen on labsdbs recently [18:59:56] springle: it's been like that for more than 1 year though [19:00:21] nuria: impossible. the three labsdb hosts have not had all shards each for one year yet [19:00:24] springle: we have done no recent changes on that regard, but let me make sure [19:02:01] ottomata: Heya. [19:03:45] springle: I lied, [19:04:26] springle: setup is been like this: https://github.com/wikimedia/operations-puppet/blob/211dc6c728681503d2e88c6f795e9e924af16049/manifests/role/wikimetrics.pp#L121 [19:04:51] nuria: we'll just use DNS for s1.labsdb, s2.labsdb, etc. if you try to use those names for directing wikimetrics load correctly, we will be ok [19:05:32] springle, YuviPanda : but with those names in place we were getting connection errors as of last week [19:05:57] nuria: i don't really know much about that code ^... i'm just complaining based on vaguely seeing hardcoded labsdb names :) [19:06:50] springle: the s1. s2 were no longer working as jan 21st, YuviPanda , can you confirm? [19:07:31] heheh, qchris, i was just getting bugged by faidon about the nginx udp2log stuff. i want to make getting new pagecounts-raw out prio #1 :) [19:07:40] no, because I have no idea how they were set up in the first place :) [19:07:45] ottomata: it is. [19:07:47] so. which parts still need work, and how can I help? should I work on the rsync stuff? [19:07:49] DNAT / /etc/hosts has been flaky [19:07:55] so I have no problem believing they haven’t been working [19:08:10] ottomata: I wrangled with the cluster the whole morning/afternoon to get the deployments + backfilling in place. [19:08:14] YuviPanda: do we need to wait for Coren? [19:08:23] :) [19:08:27] ottomata: I am not puppetizing the rsyncing. [19:08:34] ? [19:08:59] Shouldn't I? Do you want to do it? [19:09:11] i am happy to! [19:09:22] i just want to help wherever i can right now :) [19:09:27] * qchris hands over the item to ottomata :-) [19:09:33] ok cool. [19:09:37] springle: then - given YuviPanda's comments- we have no way to access s1 /s2 anymore [19:10:06] ottomata: The source files are in /mnt/hdfs/wmf/data/archive/pagecounts-raw [19:10:27] It is still backfilling from the beginning of the year [19:10:33] nuria: are connections directly to labsdb100x names working for you currently? [19:10:43] aye ok. [19:11:02] qchris, do you think we should use the new pagecounts-raw files starting from this year? [19:11:09] replace the existing ones with those? [19:11:31] I am a bit torn. [19:11:44] springle: as far as i can see, yes [19:11:48] On the one hand, the webstatscollector's files are already in the public. [19:11:58] On the other, the Hive data is typically better. [19:12:06] springle: functionally, now i do not know "load" wise [19:12:13] Also ... the Hive data allows to better cover for yesterday's issues. [19:12:23] There goes ottomata :-) [19:12:29] And ottomata is back [19:12:34] whoops [19:12:39] Decide :-P [19:12:45] haha [19:12:53] nuria: then for now, use labsdb1001 (that's 1, not 2) directly. that is most similar to your old setup, and will preserve labsdb1002 which is usually highest load [19:13:06] i do not care! it will probably be easier if I can just rsync what is there without having to make execptions [19:13:08] so I say let's replace [19:13:12] and stick a readme in there or something [19:13:18] nuria: we'll revisit this once YuviPanda has a chance to sort everything out in labs [19:13:23] springle: ok. Code change on gerrit on the way [19:13:34] nuria: thanks! [19:13:42] ottomata: Readme's won't properly make it to the target directory. [19:13:51] ottomata: Also, no one will read them. [19:14:44] ha [19:14:48] well, we can add it do the main index file [19:14:57] it is just text in puppet :) [19:15:09] Is it already in puppet? [19:15:12] qchris, could you write that part up, since you know the differecens better t han me [19:15:13] yes [19:15:35] https://github.com/wikimedia/operations-puppet/blob/production/modules/dataset/files/pagecounts/generate-pagecount-main-index.sh [19:15:41] Ah. Cool. [19:16:02] Ok. I can write something up. [19:16:11] About the rsync and backfilling... [19:16:15] kinda dumb though, why isn't this just a php file or something? instead of having puppet generate it [19:16:16] backfilling is not fully done. [19:16:17] meh :/ [19:16:18] aye [19:16:24] that's fine, let's just get prepared :) [19:16:32] Ah. k. [19:16:47] actually, that should be fine, if my rsync is going to work to just replace (i'll make backups of current stuff), then it will just rsync as backfill happens [19:17:20] But overwriting parts of the files from 2015 is ... not so straight forward from a user point of view. [19:17:33] s/user/consumer of the files/ [19:17:48] Anyways. Fine by me. [19:17:52] nuria: oh er, i'll +2... [19:18:03] ? [19:18:11] parts of the files? [19:19:02] ottomata: Like files from 2015-01-01 until 2015-01-11 are hive, 2015-01-12 until 2015-01-26 are webstatscollector, 2015-01-27 onwards are hive. [19:19:26] Anyways ... Fine by me :-) [19:21:23] hm, naw rsync will eventually replace all of them [19:21:34] if the files exist in hdfs-archive with the same name as what is on dumps now [19:21:36] right? [19:22:24] Basically ... yes. [19:22:35] There might be outliers, if the collector takes too long. [19:22:43] Let me check if that is the case in 2015-01 [19:23:01] pagecounts-20150106-120001.gz, [19:23:16] pagecounts-20150118-050001.gz, [19:23:35] projectcounts-20150106-120001, [19:23:52] projectcounts-20150118-050001, [19:24:21] ottomata: ^ are the only files that are in webstatscollector's output, but not in Hive's. [19:24:23] springle: is there a different in connection pool size when connecting to the host itself versus connecting via s1/s2? [19:25:14] nuria: not that I know of [19:26:11] OH [19:26:12] i see [19:26:24] qchris: , once we start rsync, we can remove those? [19:26:53] After you backed them up, and once Hive has produced their counter-parts: Yes. [19:28:01] aye k [19:38:51] (PS1) Gilles: Display performance stats over a long timespan [analytics/multimedia] - https://gerrit.wikimedia.org/r/187161 [19:39:10] (CR) Gilles: [C: 2 V: 2] Display performance stats over a long timespan [analytics/multimedia] - https://gerrit.wikimedia.org/r/187161 (owner: Gilles) [20:03:52] (PS1) Gilles: Improve performance for date selection [analytics/multimedia] - https://gerrit.wikimedia.org/r/187170 [20:04:00] qchris: it works! https://gerrit.wikimedia.org/r/#/c/187168/2 [20:04:04] haven't run it in real dirs yet [20:04:05] (CR) Gilles: [C: 2 V: 2] Improve performance for date selection [analytics/multimedia] - https://gerrit.wikimedia.org/r/187170 (owner: Gilles) [20:04:07] but in test dirs works gooood [20:04:31] Hey :-) [20:04:45] You dared to remove the ssh thing. Awesome! [20:04:51] :) [20:04:57] That looked sooooo arcane. [20:04:58] i'll remove the key manually [20:05:45] I guess the /*/*/ does the flattening. [20:05:49] hup! [20:05:50] yup [20:05:55] That's a nice solution! [20:06:51] Oh you even wrote a note about it. [20:07:01] * qchris should learn to read docs before reading the code. [20:11:58] qchris, i think it works without Bug: [20:11:58] no? [20:12:15] i thought it did [20:12:34] Without the bug, Gerrit shows you a link to phabricator. [20:12:55] i thought phab found it without bug too [20:13:01] maybe not [20:13:06] But with "Bug:" (and no empty line between the "Bug:" and Change id), a comment on the phab task is made [20:13:17] that links back to the gerrit change. [20:14:30] no empty line! [20:14:31] ok [20:14:45] so tricky [20:15:51] ottomata: Since in the phab task you say "this week" ... do you intend to deploy this week too? [20:15:57] let's do it! [20:16:09] this commit you mean? [20:16:16] Yes. [20:16:16] ja, want to send email to public list asap [20:16:35] if you write up the differences in that index file, i'll take that info and write the email [20:17:00] So I double checked ... and the differences are neglectable. [20:17:10] The difference in monitoring requests do not come into play. [20:17:22] so we should not mention them? [20:17:37] And the remaining difference is that there is less packetloss on Hive, so the numbers will go up a bit. [20:17:52] ok cool. i'll write an email with a bit of history and mention the old way an dthe new way and that [20:18:02] and that we are backfilling from jan 1 2015 [20:18:05] ja? [20:18:10] k. [20:18:35] So basically ... it sounds, like I need to do nothing else but babysitting the backfilling? [20:18:52] (Since the chances are not worthy to be mentioned in the index file) [20:18:52] sounds like it! :) and make a change to the index text [20:19:02] you should change it, it says stuff about domas and webstatscollector [20:19:04] i think [20:19:10] Ok. Will do. [20:19:43] I'll also mention the pagecounts-all-sites dataset in this email too. [20:20:24] kevinator: I’m going to work off Erik’s original template [20:20:29] cleaner [20:20:58] ok… that’s where I started, so I should be able to cut/pasted into whatever the final doc is. [20:26:16] ottomata: still there ...? [20:26:19] yup [20:26:57] ottomata: can you log into halfnium and see whether the EL graphite job that should report to statsd is running? (i do not have permits) [20:27:15] cron? [20:28:14] ' no consumer like' [20:28:28] "/usr/bin/python /usr/local/bin/eventlogging-consumer @/etc/eventlogging.d/consumers/graphite [20:28:28] " [20:28:35] ^ ottomata [20:28:43] looks like it is running [20:28:50] /usr/bin/python -OO /usr/local/bin/eventlogging-consumer @/etc/eventlogging.d/consumers/graphite [20:29:00] and halfnium is able to connect to vanadium? [20:29:12] as it consumes from there [20:29:31] connect how? [20:30:35] like there should be a connection open from halfnium to vanadium now visible via netstat i think [20:30:38] hm, i don't see much traffic flowing [20:31:06] i do see connections open [20:32:06] want me to bump it? [20:38:30] ottomata: wait bump it how? [20:38:37] ottomata: ah restart you mean/ [20:38:38] ? [20:39:33] yes [20:39:35] but i am not sure how [20:43:02] ottomata: see [20:43:20] https://www.irccloud.com/pastebin/QvhTU267 [20:43:29] on https://wikitech.wikimedia.org/wiki/EventLogging#Operational_Issues [20:44:31] nuria, k, its running, still no traffic [20:44:37] it looks like it just isn't consuming anything from vanadium [20:45:27] ottomata: then you know more than me.....could it be some kind of firewall issue? [20:46:13] i know more than you!? [20:46:20] NOOOOO that is not what I want to happen! [20:46:43] these are my notes-to-us when this last happen: https://wikitech.wikimedia.org/wiki/EventLogging#Graphite [20:47:16] ottomata: I know is like when apache says "talk to your administrator" and you ARE the administrator... it's like ahem ... [20:49:14] ottomata: but something is going cause counts overall are coming in but counts per schema are missing: [20:49:14] http://graphite.wikimedia.org/render/?width=588&height=311&_salt=1422478095.007&from=00%3A00_20150101&until=23%3A59_20150128&target=eventlogging.schema.NavigationTiming.rate&target=eventlogging.schema.PageContentSaveComplete.rate&target=eventlogging.client_side_events.valid.rate [20:51:49] nuria: i can use zsub and get traffic [20:51:53] and i see the traffiv in tcpdupm [20:52:01] but, traffic is not being consumed by graphite consumer. [20:52:08] i am going to lunch! back later [20:54:58] ottomata: k [20:55:03] nuria: The service is flapping on hafnium [20:55:09] The logs say "eventlogging-consumer: error: [Errno 2] No such file or directory: '/etc/eventlogging.d/consumer/graphite'" [20:55:19] And there is indeed no such file on hafnium. [21:13:26] https://graphite.wikimedia.org/render/?width=1031&height=532&_salt=1422479579.187&from=-24minutes&target=eventlogging.schema.NavigationTiming.rate [21:13:32] nuria, ottomata: fixed. [21:14:02] It seems the service has been restarted with a typo in the config name: [21:14:05] qchris: by you ? by the re-start? by the magic of the internets? [21:14:27] https://wikitech.wikimedia.org/w/index.php?title=EventLogging&diff=142235&oldid=142233 [21:14:43] nuria: Yes, by me. [21:15:07] Since you did not react to my ping above, and I just wanted the thing fixed :-) [21:15:15] qchris: ah, did you do a re-start with the ahem .... right command? [21:15:33] Meh. Just a typo in the command. [21:15:41] qchris: sorry, i did not see it! [21:15:45] I updated your notes in the wiki [21:15:50] No worries. [21:16:01] qchris: ok, you do have ssh permits to hafnium then? [21:16:07] Yup. [21:16:22] qchris: ok, will ask for those. Thanks for correcting docs. [21:16:26] yw. [21:16:48] qchris: it is nice to know that things work in the way we think they work. [21:17:01] ;-) [21:17:04] Sure. [21:20:27] nuria: Are you sure you do not have access to hafnium? [21:20:38] You are listed in the eventlogging-admins group [21:20:49] qchris: last time i tried I did not ... on look at that [21:20:52] And that group has ssh permissions on hafnim. [21:21:33] qchris: what is the "whole" name of the host? [21:21:52] hafnium.wikimedia.org [21:21:58] That's my ssh config: [21:22:03] Host hafnium.wmf [21:22:03] HostName hafnium.wikimedia.org [21:22:03] ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org [21:22:15] With ^ "ssh hafnium.wmf" gets me into hafnium. [21:23:15] qchris: ok, i think it's my ssh config, let me fix that [21:26:53] qchris: indeed i have permits OOHHHHH [21:26:59] \o/ [21:26:59] kind of scares me though [21:27:19] but for this is handy, that way i do not have to bother otto [21:27:24] (PS4) Ottomata: Reformating archive file to be zero padded [analytics/refinery] - https://gerrit.wikimedia.org/r/185913 (owner: Nuria) [21:27:25] Well, you wrote the wikitech doc that mentions a command on hafnium, so I thought you should have access ;-) [21:27:32] (CR) Ottomata: [C: 2 V: 2] Reformating archive file to be zero padded [analytics/refinery] - https://gerrit.wikimedia.org/r/185913 (owner: Nuria) [21:27:55] yes, nuria is the eventlogging-admin! [21:27:57] ask your administrator! [21:27:58] :) [21:28:12] Hahaha [21:28:59] ottomata: see, like the apache situation jaja , answer is " ahem .. i will get back to you on that" [21:30:09] mforns: https://wikitech.wikimedia.org/wiki/Requesting_shell_access [21:31:18] ok, qchris, so todo for me: [21:31:32] 1. backup existing 2015 pagecounts-raw. [21:31:32] 2. merge my change [21:31:33] 3. send email [21:31:33] ? [21:31:43] look right? [21:31:53] Yup. [21:31:59] okeydoeky [21:32:18] But being nice to the community ... I guess one can sand the community right away? [21:32:30] ja [21:32:31] ottomata, so todo for me: [21:32:31] true [21:32:34] doing so now [21:32:39] 1. babysit backfilling. [21:32:44] Anything else? [21:32:48] naw, thank that's it! [21:33:00] once we have backfilled and verified that all is working [21:33:06] i will decommission webstatscollector [21:33:12] but that can wait til we are ready [21:33:17] k. [21:36:04] ottomata: Could you have a look at https://gerrit.wikimedia.org/r/#/c/187188 ? [21:36:57] (It is soo tempting to just go in and look. I should not longer be allowed) [21:37:54] (CR) Ottomata: [WIP] UDF to get country code from IP address UDF to determine client IP address given values from source IP address and XFF headers Added I (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/183551 (owner: Ananthrk) [21:38:07] dunnn dunn dunnn [21:40:26] omg [21:40:33] do we need something for ananth to do? [21:40:56] ottomata: Thanks. [21:40:58] (PS1) OliverKeyes: Include requests with 304 status codes [analytics/refinery/source] - https://gerrit.wikimedia.org/r/187227 [21:41:00] ha, tnegrin, it wasnt' merged yet, i'm talking to nuria bout this now. [21:41:11] but i think he needs to move out the IpUtil stuff to a different change, so we can review that separetly [21:41:14] but ja, maybe so? [21:41:20] ok -- let's talk to kevin! [21:41:24] since now we are mostly slowed by back and forth reviews [21:42:00] tnegrin: oh, the dunn dun dunn was about qchris [21:42:04] removing his eventlogging admins access [21:42:09] :( [21:42:45] ananth has been working on my dashboard so I think he's got some cycles [21:43:34] https://www.youtube.com/watch?v=bW7Op86ox9g [21:43:45] (CR) OliverKeyes: "Also: hey ma, look! No spaces/tabs conflicts! :D" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/187227 (owner: OliverKeyes) [21:44:12] hahah [21:44:12] https://www.youtube.com/watch?v=jHjFxJVeCQs [21:44:14] that one is better [21:44:15] hahaha [21:44:52] Hahahaha! [21:45:07] Thaaaat kind of "dunn dunn dunn" :-) [21:45:28] ananthrk: RUTHERE? [21:46:39] (CR) Ottomata: [C: 2 V: 2] Include requests with 304 status codes [analytics/refinery/source] - https://gerrit.wikimedia.org/r/187227 (owner: OliverKeyes) [21:48:05] I just dug up a list of UDFs we made a while back: http://etherpad.wikimedia.org/p/analytics-hive-udf [21:48:24] (PS13) Ottomata: Generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 (owner: OliverKeyes) [21:49:27] (CR) Ottomata: [C: 2 V: 2] Generalised class of UDFs for handling the webrequests table [analytics/refinery/source] - https://gerrit.wikimedia.org/r/181939 (owner: OliverKeyes) [21:53:14] kevinator: are you sure oliver is working on this? https://phabricator.wikimedia.org/T78805 [21:53:17] (CR) OliverKeyes: first draft of host parsing udf. extracts the project and project qualifier from the uri_host. Added mobile_qualifier option. Changed tabs t (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/185377 (owner: Ewulczyn) [21:55:09] nuria: yes… ottomata just reviewed the code [21:57:23] (CR) Ottomata: "Ananth, this looks great! Once you move the Iputil stuff to a separate change ( and modify this commit message appropriately), we can mer" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/183551 (owner: Ananthrk) [22:13:14] Analytics-Cluster: Getting Ananth started - https://phabricator.wikimedia.org/T77196#999585 (Ironholds) [22:13:15] Analytics-Cluster: Hive User calls UDF to extract fields out of X-Analytics header - https://phabricator.wikimedia.org/T78805#999584 (Ironholds) Open>Resolved [22:14:02] Tool-Labs, Analytics-Engineering: Copy paid MaxMind geolocation dbs to tool labs - https://phabricator.wikimedia.org/T87151#999591 (yuvipanda) Open>stalled [22:15:55] Analytics-Engineering: Include a UDF for Wikimedia-specific parsers. - https://phabricator.wikimedia.org/T78686#999607 (Ironholds) Open>Resolved [22:16:00] Analytics-Engineering: WebStatsCollector's pageviews definition should have a UDF - https://phabricator.wikimedia.org/T78779#999608 (Ironholds) Open>Resolved [22:18:44] qchris_away: new data is copying over now! [22:40:44] (CR) Ottomata: "Oliver, when you do, could you put this method into your Webrequests class?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/185377 (owner: Ewulczyn) [22:43:00] ori: you around? [22:43:11] ottomata: in quarterly review mtng [22:43:13] what's up? [22:43:35] i think everyone else has asked you this already, but i don't really understand from what they say. wanted to ask you what's up with x analytics extension [22:43:41] but, do you rmeeting, i'm in the office! [22:43:44] i find you after? [22:44:56] (PS1) Gilles: Generate dashboard for Chinese Wikipedia [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/187260 [22:45:14] (CR) Gilles: [C: 2] Generate dashboard for Chinese Wikipedia [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/187260 (owner: Gilles) [22:45:35] (CR) Gilles: [V: 2] Generate dashboard for Chinese Wikipedia [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/187260 (owner: Gilles) [22:47:00] (CR) OliverKeyes: "Thass the plan!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/185377 (owner: Ewulczyn) [22:48:11] (PS1) Gilles: Generate stats for Chinese Wikipedia [analytics/multimedia] - https://gerrit.wikimedia.org/r/187263 [22:48:16] ottomata: yes, cool, thanks [22:48:28] (CR) Gilles: [C: 2 V: 2] Generate stats for Chinese Wikipedia [analytics/multimedia] - https://gerrit.wikimedia.org/r/187263 (owner: Gilles) [22:49:24] Analytics-Cluster: Write Success Flags for hive_webrequest.* Oozie Coordinator when partition is created - https://phabricator.wikimedia.org/T86616#999726 (Ottomata) Open>Resolved [23:00:30] operations, ops-core, Analytics: Deprecate HTTPS udp2log stream? - https://phabricator.wikimedia.org/T86656#999773 (Ottomata) Update: the Hive generated pagecounts-raw data is now being copied every hour from HDFS to dumps.wikimedia.org. The data is still being backfilled in hadoop. Once all backfill jobs... [23:14:35] ottomata: Awesome! [23:14:40] That was quick. [23:14:51] New md5sums match too. [23:14:53] Great! [23:15:31] we! [23:15:33] wee! [23:15:41] nuria: i think something is weird with the mobile apps jobs [23:15:42] job* [23:15:50] it seems to be launching a workflow for every hour [23:21:42] something is not deployed properly. we fixed this. [23:22:08] OH we didnt'? [23:22:25] wait i'm confused... ignore me [23:25:23] ottomata: back... [23:25:23] we didn't fix it? [23:25:42] hm, nope [23:25:43] we didn't [23:25:43] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mobile_apps/daily_uniques/coordinator.xml#L3 [23:25:45] fixing. [23:25:46] ottomata: wait .. maybe i miss some test code there [23:25:58] that is my fault! [23:26:15] ottomata: can fix that in 10 secs [23:26:19] ottomata: wait [23:26:39] ok waiting [23:27:32] nuria, while you are at it , can you remove irrelevant coments? [23:27:36] [23:27:39] [23:27:45] [23:27:49] will do [23:27:57] that last one, ^, menntion both datasets or none, or them in general [23:27:58] oh [23:28:02] yeah, change it to refined [23:29:50] (PS1) Nuria: Frequency of mobile jobs should be daily. [analytics/refinery] - https://gerrit.wikimedia.org/r/187273 [23:30:00] ottomata: https://gerrit.wikimedia.org/r/187273 [23:30:17] (PS2) Nuria: Frequency of mobile jobs should be daily. [analytics/refinery] - https://gerrit.wikimedia.org/r/187273 [23:31:55] Analytics-EventLogging: Do analysis for SendBeaconReliability experiment [5 pts] - https://phabricator.wikimedia.org/T78110#999879 (Nuria) Open>Resolved Analysis completed: http://www.mediawiki.org/wiki/Extension:EventLogging/SendBeacon [23:31:57] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#999881 (Nuria) [23:32:28] ottomata: changing [23:34:31] nuria, the comment abou thte datasets definition is wrong [23:34:34] could you add that too? [23:34:49] ottomata: just did that [23:35:10] don' see it [23:35:13] Please see datasets definition, the webrequest_mobile [23:35:13] datasets is a processed dataset from the raw data. [23:35:31] maybe say: [23:35:45] This uses the refined mobile and text webrequests datasets [23:35:49] or something [23:35:50] (PS3) Nuria: Frequency of mobile jobs should be daily. [analytics/refinery] - https://gerrit.wikimedia.org/r/187273 [23:36:03] cool [23:36:05] ees good [23:36:06] danke [23:36:13] (PS4) Ottomata: Frequency of mobile jobs should be daily. [analytics/refinery] - https://gerrit.wikimedia.org/r/187273 (owner: Nuria) [23:36:15] jaja [23:36:19] (CR) Ottomata: [C: 2 V: 2] Frequency of mobile jobs should be daily. [analytics/refinery] - https://gerrit.wikimedia.org/r/187273 (owner: Nuria) [23:36:35] sorry about the frequency , but so you know i changed it to test my changes a bunch [23:39:35] aye makes sesnse [23:46:13] ottomata: will let changes bake a bit and look at them tomorrow see how data looks, lemme know of any issues [23:46:53] k cool [23:47:01] we should compare the file for today the 28th [23:47:07] vs the others we already generated [23:47:13] i think that they were being overwritten each hour? [23:47:19] yes, [23:47:32] the setting was making it so [23:47:33] also, fyi, i renamed them manually [23:47:40] so they had the padded 0 [23:47:43] so that is why. [23:47:54] let's wait til the 28th data comes out and compare [23:47:57] and we can delete the old ones [23:48:01] and if you want I can launch a backfill job for them [23:48:23] ah ok, i was going to delete them but ya , let's wait until the new data and compare