[00:05:21] ottomata, if you're still around; Rachel bought the thing! [00:05:38] oh yeah [00:05:39] can do. [00:06:24] danke schoen! [00:06:33] and then I will write some API wrapper functions and then we will be cooking with bacon! [00:06:46] (I don'tknow why people say cooking with fire when it's bacon that goes with everything.) [00:07:30] AHHhhhh was it purchased under my email address?! [00:07:37] a new maxmind account? [00:07:38] ! [00:07:56] Ah FOO [00:07:57] it was [00:11:18] Ironholds: i can't puppetize this, but I can download it and put it on stat1003 (or 1002?) for you if you like [00:11:33] there are 5 different formats [00:11:33] why not puppetise? Will that impact syncing? [00:11:42] because it was purchased on the wrong account [00:11:45] i need to sort that out first [00:11:47] oh balls [00:11:53] it looks like I have purchased it! [00:11:53] I would really appreciate it on stat2; I can at least write and test the functions! [00:12:05] ok, which db do you want? there are 5 differnet formats [00:12:06] then as soon as it is sorted we have it available and accessible [00:12:14] what formats?I think we normally go for the binaries. [00:12:29] http://cl.ly/image/2M013o1C332r [00:12:49] hmnm [00:12:51] * Ironholds investigates [00:14:37] ottomata, binary with cellular? Although if you're just sending them across to one machine maybe also just "binary database" to be on the safe side? [00:14:52] you can throw them in my /home/ directory if you want *shrugs* [00:15:02] k [00:16:31] Ironholds: you got it [00:17:00] ottomata, ta! [00:17:04] okay, I'll make a quick check... [00:21:02] * Ironholds blinks [00:21:05] okay, this is really weird. [00:21:15] ottomata, do you remember which filename was which? [00:21:38] wait, worked it out [00:21:47] okay, 177 is the right one, but it'sapparently IPv6 only(?) [00:22:34] ...or not [00:22:43] If I put IPv4 in, "invalid, expecting IPv6" [00:22:49] IPv6 goes in, "invalid, expecting IPv4" [00:23:52] can you grab me 171 too? Just so I can check I'm not mad. [00:24:38] Ironholds: i got you 177 and GeoIP2-Connection-Type [00:24:51] yup [00:25:09] the latter is an mmdb, so not supported. I'm 177 and (I think, by extension, 171) are.dats, which are. [00:25:29] ok, Ironholds 171 there [00:25:32] ta! [00:25:33] okay, lesse.. [00:26:52] okay! [00:27:00] Ironholds: you got 172 and 178 too, just in case [00:27:02] the csvs [00:27:04] 171 works for IPv4, and 177 doesn't seem to work at all(?) [00:27:04] NOW YOU HAVE EVERYTHING [00:27:10] unless 177 is cellular-only? [00:27:15] I'll investigate more [00:27:16] iunno? [00:27:16] thankee! [00:27:17] ;0 [00:27:18] :) [00:27:26] * Ironholds directs his phone to geoiplookup.wikimedia.org [00:32:52] yurikR: I think I can vchat hangout tomorrow [00:33:08] it is really hard to keep up with emails and remote things while in the office! [00:33:39] okay, 171 works, 177 doesn't, so I'm gonna make an upstream "WTF" bug report. [00:57:33] mforns: http://www.mediawiki.org/wiki/Gerrit/Tutorial [09:58:22] (CR) QChris: [C: -2] "Waiting a bit to allow community to veto" [analytics/metrics] - https://gerrit.wikimedia.org/r/165395 (https://bugzilla.wikimedia.org/66352) (owner: QChris) [11:20:45] Analytics / General/Unknown: By counting HTTP redirects, webstatscollector reporting too high numbers - https://bugzilla.wikimedia.org/71790 (christian) NEW p:Unprio s:normal a:None One of the longstanding issues with Webstatscollector is that it counts redirects at the HTTP level. So for... [11:47:58] Analytics / General/Unknown: By counting HTTP redirects, webstatscollector reporting too high numbers - https://bugzilla.wikimedia.org/71790#c1 (Nemo) (In reply to christian from comment #0) > Since we're about the deploy a new webstatscollector anyways, and this > double counting should not be too har... [14:38:30] Analytics / General/Unknown: By counting HTTP redirects, webstatscollector reporting too high numbers - https://bugzilla.wikimedia.org/71790#c2 (christian) (In reply to Nemo from comment #1) > I'll miss > stats for Special:MyLanguage, [...] Yup. I'll miss stats for Special:Random :-( > Are we talking... [14:43:13] Analytics / General/Unknown: By counting HTTP redirects, webstatscollector reporting too high numbers - https://bugzilla.wikimedia.org/71790#c3 (Yuvi Panda) I'm sure we can count special page requests separately if we want them... [15:12:14] Analytics / General/Unknown: By counting HTTP redirects, webstatscollector reporting too high numbers - https://bugzilla.wikimedia.org/71790#c4 (christian) Oh. Counting of Special pages won't change per se. It only those Special pages that happen to come with 301, 302, or 303 HTTP status codes. So fo... [16:26:07] (PS2) Ottomata: [WIP] Import base geocoding logic from Kraken repository and create Hive UDF to get geocoded country [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 [16:27:43] (PS2) Ottomata: THIS REPOSITORY IS DEPRECATED [analytics/kraken/deploy] - https://gerrit.wikimedia.org/r/165110 (owner: QChris) [16:27:47] (CR) Ottomata: [C: 2] THIS REPOSITORY IS DEPRECATED [analytics/kraken/deploy] - https://gerrit.wikimedia.org/r/165110 (owner: QChris) [16:27:52] (CR) Ottomata: [V: 2] THIS REPOSITORY IS DEPRECATED [analytics/kraken/deploy] - https://gerrit.wikimedia.org/r/165110 (owner: QChris) [16:29:12] (CR) Ottomata: [C: 2] Stop counting requests for 'undefined' and 'Undefined' [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165350 (https://bugzilla.wikimedia.org/66352) (owner: QChris) [16:52:49] (CR) Ottomata: [C: 2] Stop considering requests for 'undefined' and 'Undefined' [analytics/refinery] - https://gerrit.wikimedia.org/r/165378 (https://bugzilla.wikimedia.org/66352) (owner: QChris) [17:03:28] YOOOO qchris! [17:03:34] i miss seeing your beard every day! [17:03:34] YOOOO ottomata! [17:03:57] Wait a few months ... I guess then I can send it to you by snailmail :-D [17:04:01] hahahah [17:04:24] I just saw you otto-geo commit. [17:04:25] \o/ [17:04:42] I just added you as reviewer to this: https://gerrit.wikimedia.org/r/#/c/164264/ buuuut, do not worry about it [17:04:45] it is low priority! [17:04:53] i cleaned it up a bit on the airplane [17:04:58] it still needs some work it hink [17:05:06] probably, some renaming, meh, dunno [17:05:12] what the crap, .DS_Store! [17:05:17] Removal of .DS_store! [17:05:20] right :-D [17:05:58] I saw that you have both IPv4 and IPv6 logic ... some MaxMind guy told me at some point that IPv6 shoud be able to handle IPv4 [17:06:02] But I never tested that. [17:06:04] (PS3) Ottomata: [WIP] Import base geocoding logic from Kraken repository and create Hive UDF to get geocoded country [analytics/refinery/source] (otto-geo) - https://gerrit.wikimedia.org/r/164264 [17:06:05] hm [17:06:08] i didn't test either [17:06:19] Ok. I guess I'll test then. [17:06:29] haha, ok, no hurry [17:06:33] no hurry or worry [17:06:38] I am about to finish the webstatscollector hacking. [17:06:43] cool [17:06:45] There was not even -O2! [17:06:54] ? [17:07:02] gcc ... -O2 [17:07:14] Optimization of executable. [17:07:16] ah [17:07:21] but not even? [17:07:22] It speeds up things by ~35% [17:07:25] OH! [17:07:26] i see [17:07:30] there wasn't that before [17:07:35] Right. [17:07:37] cool! [17:08:01] I guess you'll get a chance to review thaht soon. [17:08:14] Are you in some of those meetings today? [17:08:20] I miss you too. [17:08:36] * qchris hugs ottomata and everyone in this channel :-D [17:09:13] some, i have 20 mins before SoS [17:09:20] uh, wait which patch? [17:09:53] patch? I was talking about meetings. [17:10:14] Oh you mean the -O2 ... that's not yet in gerrit. [17:10:16] no, review [17:10:17] yes [17:10:18] oh ok [17:10:39] Need to reshuffle commits a bit. [17:10:41] qchris: I don't think I even knew that https://github.com/wikimedia/analytics-metrics exists [17:11:08] And I've been waiting for revies since December ... [17:11:11] Hahahaha. [17:11:30] drdee did most of that. That is pretty amazing and nice :-) [17:11:31] qchris: I think you should jsut self merge those :p [17:11:45] they are kinda like writing a wiki page :) [17:12:14] And although they are alike ... still the need to be linked by hand ... as in [17:12:19] qchris: do you remember the names of the fairscheduler queues that we came up with when we last talked? [17:12:30] about that. [17:12:31] ? [17:12:38] default, standard? [17:12:41] default, production? [17:12:51] https://wikitech.wikimedia.org/wiki/Analytics/Webstatscollector#Used_Page_View_definition [17:12:59] aye yeah [17:12:59] Names ... pheewwwww. [17:13:07] Let me check the logs. [17:13:30] haha [17:13:37] how do you grep IRC logs from long ago? [17:13:40] if you don't know the date? [17:13:49] hexchat stores them on disk for me. [17:13:52] ah ok [17:13:53] cool [17:13:59] adium does so for me too [17:14:00] HMM [17:14:04] wonder where those files are.. [17:14:05] hmm [17:14:09] fair-scheduler matches for 2014-07-21, 2014-08-14, and 2014-08-21. [17:14:24] ~/.hexchat/logs/freenode/#wikimedia-analytics.log :-) [17:14:31] s/hexchat/adium/ [17:14:42] ha [17:15:36] ottomata, poketh [17:16:07] so, re netspeeds;it's 171 we're gonna want synced, although I'm talking to the API maintainers about how we'd go about building in 177 support (well, talking and patching their exception handlers. Grrr.) [17:16:56] cool [17:17:02] found it, 2014-08-13 :) [17:17:22] hm, ok! [17:17:53] no, 8-21 [17:18:14] essential! [17:18:16] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140821.txt [17:18:21] Oh ... I am too late. [17:18:25] :-) [17:18:25] haha [17:18:26] :) [17:18:34] so, default, essential? [17:18:40] good for you? [17:19:00] essential is good for me. [17:19:03] cool. [17:24:43] (PS1) Ottomata: Rename FairScheduler 'adhoc' queue to 'default' [analytics/refinery] - https://gerrit.wikimedia.org/r/165513 [17:25:12] qchris: I'm pushing this now because I want to restart some oozie jobs while i'm here with gage, to show him how to do it [17:25:22] k [17:27:07] Btw ... if you resubmit them anyways ... what do you thing about using the refinery directory with the timestamp, instead of "current"? [17:27:45] (CR) QChris: [C: 2 V: 2] Rename FairScheduler 'adhoc' queue to 'default' [analytics/refinery] - https://gerrit.wikimedia.org/r/165513 (owner: Ottomata) [17:29:36] qchris: ? [17:29:53] Let me grab the bug ... [17:31:04] Cannot find the bug right now. Anyways ... when submitting jobs with "current" [17:31:44] and in a followup deployment changing a subworkflow, one would have to redeploy all oozie jobs that use this subworkflow [17:32:25] Otherwise, the oozie jobs might break, because they use the old properties (from submission) and the new xmls. [17:32:55] Here it is: https://bugzilla.wikimedia.org/show_bug.cgi?id=71213 [17:33:45] Meh ... I guess one cannot understand what I mean with the description above ... [17:33:48] Let me try again. [17:34:10] Say, we have two bundles. bundleA and bundleB. [17:34:28] Both of them use some subworkflow ... say workflowX. [17:34:59] Let's assume we submit bundleA and bundleB with "current" refinery. [17:36:09] Then at some later point, we modify the parameters of workflowX. [17:36:33] And update the bundle.xml for bundleA and bundleB in the git repo. [17:37:18] After the deployment script ran, Oozie tries to run bundleA and bundleB with the properties from back then (because it stored them separately), [17:38:16] but refinery "current" now no longer match those old properties. Because after the deployment script ran, bundleA.xml and bundleB.xml are the new, updated xmls. [17:38:42] So Oozie's properties and the bundleA.xml and bundleB.xml are from different refinery versions. [17:39:09] Meh ... bundleB was unnecessary in the example. [17:39:36] When doing the same thing with the refinery directory that has the timestamp, jobs just continue to run after a deployment. [17:39:55] And still ... we devs can develop using the "current". [17:40:31] Does that make sense? [17:46:29] So it doesn't ... let's keep using "current" then, and see if it bites us again. [18:14:23] ah, i get it. yeah, hm. would be nice if we could store the current version in a single place somehow...or make that part of the deployment script. [20:29:09] On the labs cluster, I use [20:29:15] VERSION=$(hdfs dfs -ls -d hdfs:///wmf/refinery/2014* | tail -n 1 | sed -e 's@^.*hdfs://.*refinery/@@') [20:29:22] in a script to submit jobs. [20:29:56] That's bad code ... but still ... we could have a script that does something like that and helps with submission of oozie things. [20:33:10] hmmmm [20:33:12] i like that qchris [20:33:25] a job submit helper wrapper thing would be nice in general [20:33:33] the oozie cli is kinda cumbersome [20:34:02] Totally :-) I have custom wrappers for basically all oozie things I do :-/ [20:37:44] qchris clean it up, put it in bin/! [20:38:40] :-) [21:02:58] qchris_away: can INSERT OVERWRITE TABLE overwrite jsut individual partitions? [21:13:48] ottomata: INSERT OVERWRITE TABLE typically works on a single partition. [21:13:54] But I guess I do not understand the question [21:14:24] ok cool, that was my question [21:14:36] then you are likely correct! i may be able to do that instead [21:14:44] Cool. [21:14:44] or [21:14:45] hm [21:14:46] actually [21:14:49] no, i might not, i mean [21:14:52] i could do that [21:14:57] but, i will be running a sqoop command [21:15:00] not a hive query [21:15:51] Mhmm. [21:15:55] I guess that makes sense. [21:16:02] Ja. You're right. [21:38:30] (PS4) QChris: Release fix that stops counting [uU]ndefined and redirects [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165351 (https://bugzilla.wikimedia.org/66352) [21:38:36] (PS1) QChris: Use GNU Make's implicit rm [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165617 [21:38:38] (PS1) QChris: Move linked libraries into LDLIBS [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165618 [21:38:40] (PS1) QChris: Collect MacOS paths [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165619 [21:38:42] (PS1) QChris: Remove unneeded commented out linking of libdb [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165620 [21:38:44] (PS1) QChris: Drop unneeded dependencies from rules [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165621 [21:38:46] (PS1) QChris: Use GNU Make's implicit rules to build object and executable files [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165622 [21:38:48] (PS1) QChris: Provide definitions to system level functions [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165623 [21:38:50] (PS1) QChris: Remove unused variables [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165624 [21:38:52] (PS1) QChris: Provide definition of setgroups function [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165625 [21:38:54] (PS1) QChris: Adapt type of list of group ids to what the setgroups function expects [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165626 [21:38:56] (PS1) QChris: Specify return value for main function [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165627 [21:38:58] (PS1) QChris: Turn finalizer of projects list into proper list [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165628 [21:39:00] (PS1) QChris: Stop reading past the end of the array of whitelisted mediawiki wikis [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165629 [21:39:02] (PS1) QChris: Enable optimizations at level 2 [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165630 [21:39:04] (PS1) QChris: Stop counting 301, 302, 303 HTTP status codes [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165631 (https://bugzilla.wikimedia.org/71790) [21:45:50] ottomata: I just looked over the Sqoop sources, and it looks like --hive-overwrite will drop the partition if it exists. [21:46:44] Using that parameter, the loading should boil down to LOAD DATA INPATH [...] OVERWRITE [21:47:08] src/java/org/apache/sqoop/hive/TableDefWriter.java:233 [21:50:46] Analytics / Wikimetrics: Cohort validation: text is confusing "0 invalid" - https://bugzilla.wikimedia.org/71842 (nuria) NEW p:Unprio s:normal a:None Created attachment 16727 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16727&action=edit Screen shot showing bug Cohort validation:... [22:06:09] Ironholds: your original hive query using concat(ip, user_agent) produced no result :( [22:06:52] ah, ok, cool qchris, will try that [22:06:53] thanks! [22:07:00] whoaa patches [22:07:36] (CR) Ottomata: [C: 2] Use GNU Make's implicit rm [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165617 (owner: QChris) [22:07:39] Ja ... today you wanted patches ... [22:07:44] Here you go :-) [22:08:05] (CR) Ottomata: [C: 2] Move linked libraries into LDLIBS [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165618 (owner: QChris) [22:08:07] DarTar, okay, that's weird! [22:08:23] Ironholds: even distinct ips produces no result though [22:08:28] what’s wrong with this: [22:08:31] screen hive -e "USE wmf_raw; SELECT day, COUNT(DISTINCT(ip)) AS uc FROM webrequest WHERE year=2014 AND month = 09 AND webrequest_source='text' AND http_status= 200 AND uri_host = 'meta.wikimedia.org' AND uri_path LIKE '/wiki/Research%' GROUP BY day;" > meta_research.tsv [22:08:42] * Ironholds strokes beard [22:08:45] I might be missing something obvious [22:09:54] (CR) Ottomata: "Eh, I wouldn't do this. Fink is a a package manager for OS X. Who knows if someone will have Fink or not." [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165619 (owner: QChris) [22:10:00] (CR) Ottomata: [C: -1] Collect MacOS paths [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165619 (owner: QChris) [22:10:25] (CR) Ottomata: [C: 2] Remove unneeded commented out linking of libdb [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165620 (owner: QChris) [22:10:45] DarTar, qchris and I are working on it. [22:10:50] You've got the hive dream team! Also me. [22:10:59] (CR) Ottomata: [C: 2] Drop unneeded dependencies from rules [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165621 (owner: QChris) [22:11:15] DarTar, try changing http_status=200 to http_status='200' [22:11:16] it's a string! [22:11:40] (CR) Ottomata: [C: 2] Use GNU Make's implicit rules to build object and executable files [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165622 (owner: QChris) [22:12:10] Ironholds: aah! [22:13:46] (CR) Ottomata: [C: 2] Provide definitions to system level functions [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165623 (owner: QChris) [22:14:16] (CR) QChris: "You mean that it'll break the build if the directory is not around, or that it is dangerous because different libraries" [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165619 (owner: QChris) [22:14:36] (CR) Ottomata: [C: 2] Remove unused variables [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165624 (owner: QChris) [22:14:54] (CR) Ottomata: [C: 2] Provide definition of setgroups function [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165625 (owner: QChris) [22:16:59] (CR) Ottomata: [C: 2] Turn finalizer of projects list into proper list [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/165628 (owner: QChris) [22:44:13] Ironholds: did you know this exists? https://github.com/wikimedia/analytics-metrics [22:44:15] i did not until today [22:45:01] no, I imagine it'syuri?s [22:45:03] qchris is adding makefiles too? [22:45:03] https://gerrit.wikimedia.org/r/#/c/99077/ [22:45:07] no, i think diedierk? [22:45:09] ...bah. I imagine it's yuri's [22:45:10] ahh [22:45:10] qchris is using it? [22:45:21] makefiles to auto generate .dot files? [22:45:30] or, pngs from .dot files? [22:45:41] oh THAT thing [22:45:44] yeah, I've seen it [22:45:46] k [22:54:15] Analytics / Wikimetrics: Cohort validation: text is confusing "0 invalid" - https://bugzilla.wikimedia.org/71842 (Marcel Ruiz Forns) NEW>ASSI a:Marcel Ruiz Forns [22:57:25] cool, Ironholds [22:57:25] Puppet/File[/usr/share/GeoIP/GeoIPNetSpeedCell.dat]/ensure: defined content as '{md5}b3978b47600eaac35c76c98951a0709f' [22:57:43] also this one [22:57:43] File[/usr/share/GeoIP/GeoIPNetSpeed.dat]/ensure: defined content as '{md5}85322d4e344bc7699da955d84768a1af' [22:58:36] that will be synced out to hadoop nodes in about 30 mins too [22:58:52] ah, but we'd need a specific UDF for that :) [23:08:30] nuria: https://meta.wikimedia.org/wiki/Special:CentralNoticeLogs [23:12:02] nuria: http://meta.wikimedia.org/wiki/Special:GlobalAllocation?project=wikipedia&language=en&country=US&filterDate=2014-10-03&filterDate_timestamp=20141003000000&filter[hour]=13&filter[min]=14