[00:56:05] hey, analytics engineering! [00:56:10] we're performing a stress test on hive [00:56:14] that'll be $2,000 [05:51:29] Analytics / General/Unknown: datasets.wikimedia.org SSL error - https://bugzilla.wikimedia.org/72805 (Dario Taraborelli) NEW p:Unprio s:normal a:None https://datasets.wikimedia.org/public-datasets gives a 404 (meaning that people on SSL can't access any data available at this URL). See http... [12:51:14] Analytics / Refinery: Raw webrequest partitions that were not marked successful - https://bugzilla.wikimedia.org/70085 (christian) [12:51:14] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) NEW p:Unprio s:normal a:None In this bug, we track issues around raw webrequest partitions (not) being marked s... [12:52:13] Analytics / Refinery: Duplicates/missing logs from esams bits for 2014-09-28T{18,19,20}:xx:xx - https://bugzilla.wikimedia.org/71435 (christian) [12:52:13] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) [12:52:13] Analytics / Refinery: Raw webrequest partitions that were not marked successful - https://bugzilla.wikimedia.org/70085 (christian) [12:52:43] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) [12:52:43] Analytics / Refinery: Several raw webrequest partitions now marked successful between 2014-10-13T13:xx:xx and 2014-10-13T22:xx:xx - https://bugzilla.wikimedia.org/72028 (christian) [12:52:44] Analytics / Refinery: Duplicates/missing logs from esams bits for 2014-09-28T{18,19,20}:xx:xx - https://bugzilla.wikimedia.org/71435 (christian) [12:52:57] Analytics / Refinery: Raw webrequest partitions for 2014-10-07T1[789]:xx:xx not marked successful - https://bugzilla.wikimedia.org/71882 (christian) [12:53:12] Analytics / Refinery: Duplicates/missing logs from esams bits for 2014-09-28T{18,19,20}:xx:xx - https://bugzilla.wikimedia.org/71435 (christian) [12:53:13] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) [12:53:28] Analytics / Refinery: Duplicates/missing logs from esams bits for 2014-09-28T{18,19,20}:xx:xx - https://bugzilla.wikimedia.org/71435 (christian) [12:53:28] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) [12:53:29] Analytics / Refinery: Raw webrequest partitions for 2014-10-08T1[89]:xx:xx not marked successful - https://bugzilla.wikimedia.org/71881 (christian) [12:54:58] Analytics / Refinery: Raw webrequest partitions for 2014-10-30T21/1H not marked successful - https://bugzilla.wikimedia.org/72810 (christian) NEW p:Unprio s:normal a:None The bits and upload webrequest partition [1] for 2014-10-30T21/1H have not been marked successful. What happened? [1]... [12:55:12] Analytics / Refinery: Raw webrequest partitions for 2014-10-30T21/1H not marked successful - https://bugzilla.wikimedia.org/72810#c1 (christian) For bits, it only affected cp3020. The affected period is 2014-10-30T21:25:41/2014-10-30T21:26:26. No lost messages, only 70660 duplicates, which is <2 second... [12:55:13] Analytics / Refinery: Raw webrequest partitions that were not marked successful due to only bits caches causing unknown problems - https://bugzilla.wikimedia.org/72809 (christian) [12:57:09] !log Marked raw upload webrequest partition for 2014-10-30T21/1H ok (See {{bug|72810}}) [12:57:13] !log Marked raw bits webrequest partition for 2014-10-30T21/1H ok (See {{bug|72810}}) [13:34:58] Analytics / Refinery: Kafka logs drowning in errors processing fetch requests since 2014-10-28 ~20:00 - https://bugzilla.wikimedia.org/72812 (christian) NEW p:Unprio s:normal a:None Since 2014-10-28 ~20:00, the kafka logs on each four brokers are drowning [1] in exceptions around processing... [13:35:41] uhhh [13:35:45] drowning!? [13:41:42] Analytics / Refinery: analytics1021 getting kicked out of kafka partition leader role on 2014-10-27 ~07:12 - https://bugzilla.wikimedia.org/72550 (christian) [13:41:43] Analytics / Refinery: Kafka logs drowning in kafka.server.KafkaApis errors processing fetch requests since 2014-10-28 ~20:00 - https://bugzilla.wikimedia.org/72812#c1 (christian) The time 2014-10-28 ~20:00 nicely matches the leader re-election of bug 72550 comment 1. [13:42:44] Analytics / Refinery: analytics1021 getting kicked out of kafka partition leader role on 2014-10-27 ~07:12 - https://bugzilla.wikimedia.org/72550 (christian) [13:42:44] Analytics / Refinery: Kafka logs drowning in kafka.server.KafkaApis errors processing fetch requests since 2014-10-28 ~20:00 - https://bugzilla.wikimedia.org/72812 (christian) [14:17:25] qchris_away: did you say you are leaving for the weekend? [14:17:31] sooN? [14:17:37] ottomata: actually ... right now :-) [14:17:53] Do you need me around? [14:18:13] naw, its ok, i just like saying things like "LOOK everything is FINE" when you are around [14:18:20] because then you can say "wait, no, look, it is not fine here" [14:18:25] :) [14:18:28] Hahahaha :-) [14:18:52] I'll go through the channel backlog tomorrow ;-) [14:19:00] haa, ok [14:19:12] Is it about the tsv vetting, or something else? [14:20:27] tsv vetting [14:20:34] but, that can mostly wait til monday anyway [14:20:37] so we have mroe data [14:21:00] For the tsv vetting, you were doing the hard work ... I was only sitting besides you :-) [14:21:12] yeha, but 4 eyes are better than 2 [14:21:27] I'll return in a few hours. Then I'll have a look at the tsvs. Ok? [14:21:37] nonon [14:21:39] go away! [14:21:50] i'm just choosing what to work on today, this can wait! [14:21:50] Btw. ... kafkatee edit.tsv for 2014-10-30 and 2014-10-31 are missing on stat1002. [14:22:01] yes, they were never synced [14:22:05] i synced them manually the other day [14:22:09] Oh. I see. [14:22:31] I will ahve a look at your kafka broker bugs first anyway [14:22:47] Thanks. [14:23:04] The brokers are working nicely and doing they work just fine. [14:23:15] Only the logs are just full of those errors :-( [14:23:26] https://bugzilla.wikimedia.org/show_bug.cgi?id=72812 [14:23:38] But that can probably wait a few days too. [14:23:52] Anyways .... enjoy your weekend :-) [14:24:48] hmm, i wonder if this is part of ellery's problem, he might be doing it [14:24:54] he is running kafkacat in a cron somewhere [14:25:02] and things are not workgin right for him [14:42:57] Analytics / Refinery: Kafka logs drowning in kafka.server.KafkaApis errors processing fetch requests since 2014-10-28 ~20:00 - https://bugzilla.wikimedia.org/72812#c2 (Andrew Otto) NEW>RESO/WON Ah, this is caused by Ellery's use of kafkacat on stat1002. It looks like kafkacat does not properly c... [14:45:01] kevinator: mediawiki storage is the blocker for annotations [14:45:08] (from your msg last night) [14:45:37] once we have a place to put annotations, it's trivial to render them - UX is a really nice to have but we can render them ourselves too [14:46:55] kevinator / nuria__ / mforns: btw, Sean is testing the warehouse schema / loading queries in a database called "warehouse" on analytics-store so he can get an idea of the load / etc. A-w-e-s-o-m-e! :D [14:47:31] ;] [14:54:17] milimetric: nice, real nice [15:07:04] milimetric, nuria__: I got the line where the exception is thrown: it's validate_cohort.py line 243 [15:07:13] matches = session.query(MediawikiUser).filter(clause).all() [15:07:52] which is already within a try block, and in fact is the only query within that try block [15:07:56] mforns: and the exception is which one? [15:08:09] OperationalError: (OperationalError) (2005, "Unknown MySQL server host 'wikimania2015wiki.labsdb' (0)") None None [15:08:38] should I catch this error separately and do nothing? [15:10:36] mforns: i think we need to distinguish this case (recoverable) [15:10:46] from another exception (not recoverable) [15:10:53] yes [15:10:54] ok [15:10:55] this exception doesn't need to be re-raised [15:11:06] so we need two catch blocks [15:11:18] yes, that's what I meant with 'doing nothing' [15:11:21] :] [15:11:23] rather tahn 1 on line 272 [15:11:34] understand [15:12:05] one catch +plus log (without raise) for this exceptions, validation continues for the rest of the projects [15:12:52] and we can have (gasp) a catch all , log and raise for anything but, but this should not be caught above, makes sense? [15:13:39] also, likely this # clear out the dictionary in case of an exception, and raise the exception 280 [15:13:39] 278 for key in users_dict.keys(): for key in users_dict.keys(): 281 [15:13:39] 279 users_dict.pop(key) users_dict.pop(key) [15:13:59] needs to happen for both exceptions [15:14:19] aha [15:14:50] so we disntiguish the "recoverable" case (no need to re-raise) from the non recoverable [15:15:14] yes, got it [15:15:23] thanks!! [15:15:30] callers do not need to catch anything on that case, exception gets handled where thrown [15:15:33] k [15:15:50] mforns , milimetric: taking my daughter to school, brb [15:15:57] ok [15:23:18] nuria__ has a kid? [15:23:30] would that be a...nuriette? nurietta? [15:34:34] well, actually nuria's name is nurieta (not sure about the spelling) [15:34:53] and derivative humans don't follow naming conventions [15:35:52] they should [15:36:10] and they do! [15:36:26] That's why we have Ben Foo or Bar Foo as a naming convention, or Fooson or Foosdottir [15:36:44] I'll just think of them as nurieta.child.class$new() [15:59:10] :) [16:00:28] random_mutation(nurieta.class.merge(nurieta.husband.class))() [16:01:09] in case nuria comes back, we'll have to explain ourselves: no big deal, just trying to define your kid in python / R, nothing to see here :P [16:01:33] naw, you wouldn't use nurieta.child.class$new() in R [16:01:42] that's a mix of S3/4 and R5/6 syntax. 2-4 different class systems! [16:02:02] it'd end up with a nurieta method that applies to objects of the class "child.class". Nobody wants that. [16:02:52] you'd probably define a class in R6 that inherits from nurieta and nurieta_partner classes and then just initialize a new instance of that. So child_class <- setRefClass("child_class", inherits = c("nurieta_class","nurieta_partner_class")) and then class$new() [16:26:18] pufff .. no comments [17:14:41] ottomata: one question about eventlogging and puppet [17:15:14] yesssaksmeee [17:15:45] ottomata: as far as i can see new upgrades of packages are not handled via puppet [17:16:00] usually not [17:16:02] ottomata: so if we wanted to add a new python package [17:16:07] 1) stop service [17:16:19] 2) pip or apt-get (likely apt-get as this is prod) [17:16:24] 3) restart [17:16:26] yup [17:16:31] you can probalby jsut apt-get and then restart [17:16:45] don't need to stop it really, as the process has the current stuff loaded in mem [17:16:46] ottomata: and do you know.... [17:16:56] ottomata: how does new code get deployed to vanadium? [17:17:06] new eventlogging code? [17:17:07] ottomata: is it deployed to under the regular wmf schedule? [17:17:09] i do not know [17:17:12] i doubt it? [17:17:27] ah [17:17:30] it looks like via tin? [17:17:32] git-deploy? [17:17:32] ottomata: so it will be us sshing and git pull? [17:17:38] hmmm [17:17:53] no [17:18:17] ottomata: good... cause that sounds a little too low tech [17:21:03] yes, igt deploy [17:21:09] nuria, deployment goes like this [17:21:32] ssh tin.eqiad.wmnet [17:21:32] cd /srv/deployment/eventlogging/EventLogging [17:21:32] git pull [17:21:32] git deploy start [17:21:32] git deploy sync [17:22:07] https://wikitech.wikimedia.org/wiki/Trebuchet#Deploying [17:23:10] ottomata: k, i see, is there rollback? I see "abort" but that is no rollback [17:23:24] sure [17:23:30] git checkoiut HEAD^ [17:23:35] git deploy start; git deploy sync [17:24:00] k, i see, redeploy [17:24:23] ottomata: that assumes every check in into master is a deploy though [17:24:41] git checkout oh [17:24:43] wait [17:24:47] i think git deploy does tags [17:24:51] ahhhh [17:24:57] ottomata: that makes more sense [17:25:11] cause you want to tag at deploy time to be able to rollback [17:25:15] precisely [17:25:28] to what you had before [17:25:29] yup, each git-deploy start does a tag [17:25:34] ahhh, k [17:25:38] maybe sync too [17:25:39] ? [17:25:39] ottomata: that sounds better [17:25:43] dunno exactly how that works [17:25:43] but ja [17:25:45] somethign like that [17:26:22] ottomata: k, i am going to add a new dependency to EL so we will follow this procedure when we get there [17:26:36] ok [17:39:56] Analytics / EventLogging: database consumer could batch inserts (sometimes) - https://bugzilla.wikimedia.org/67450#c3 (nuria) Actual beginning of e-mail thread with pertinent conversation: https://lists.wikimedia.org/pipermail/analytics/2014-August/002429.html [17:42:40] brb, cafe time! [18:17:43] (PS14) Ottomata: Add UAParserUDF from kraken [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 [18:18:37] (CR) Ottomata: [C: 2 V: 2] Add UAParserUDF from kraken [analytics/refinery/source] - https://gerrit.wikimedia.org/r/166142 (owner: Ottomata) [18:23:26] (PS1) Ottomata: Updating refinery-hive/pom.xml's parent version to 0.0.2-SNAPSHOT to match rest of repo [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170368 [18:23:41] (CR) Ottomata: [C: 2 V: 2] Updating refinery-hive/pom.xml's parent version to 0.0.2-SNAPSHOT to match rest of repo [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170368 (owner: Ottomata) [18:43:49] (PS2) Mforns: Avoid exception accessing unknown project database [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/170152 (https://bugzilla.wikimedia.org/72582) [18:48:21] milimetric: showing leila some ninja command-line parsing with jq, I forgot how awesome it is [18:48:56] which reminds me I really need to order this new O’Reilly book [18:49:24] http://datascienceatthecommandline.com/ [18:51:45] (PS1) Ottomata: Release version 0.0.2 with UAParserUDF in refinery-hive [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170373 [18:52:29] (CR) Ottomata: [C: 2 V: 2] Release version 0.0.2 with UAParserUDF in refinery-hive [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170373 (owner: Ottomata) [18:52:53] DarTar: jq? [18:53:11] milimetric: awk for json [18:53:13] http://stedolan.github.io/jq [18:53:27] best thing since, well, awk [18:56:38] s['props']['982'] [18:56:42] oops [19:01:52] DarTar: cool! I never thought of having to hack around json files 'cause python and JS seem pretty up to the task [19:02:03] but what's something I can do with jq in a more elegant way? [19:02:26] single liners of hell, go pipes! [19:03:14] grabbing lunch, brb [19:04:12] (PS1) Ottomata: Add refinery-hive-0.0.2 and refinery-tools-0.0.2 [analytics/refinery] - https://gerrit.wikimedia.org/r/170375 [19:05:02] ha :) I was just playing with https://jqplay.org/# and it seems pretty fun [19:05:39] (CR) Ottomata: [C: 2 V: 2] Add refinery-hive-0.0.2 and refinery-tools-0.0.2 [analytics/refinery] - https://gerrit.wikimedia.org/r/170375 (owner: Ottomata) [19:07:04] oo, didn't know about jqplay, that's nice [19:18:43] (PS1) Ottomata: Bump version number after release [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170380 [19:18:53] (CR) Ottomata: [C: 2 V: 2] Bump version number after release [analytics/refinery/source] - https://gerrit.wikimedia.org/r/170380 (owner: Ottomata) [19:20:09] (CR) Mforns: "Sorry for the late review" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/168202 (https://bugzilla.wikimedia.org/71582) (owner: Bmansurov) [19:42:26] nuria__: at least I got the jar in the auxpath automatically [19:42:37] now you just have to create temporary function uaparse as "org.wikimedia.analytics.refinery.hive.UAParserUDF"; [19:42:39] no need to add jar [19:53:31] ok, let's update the docs ( i will do it) [19:58:54] k [20:00:34] ottomata, Ironholds : docs for udfs & hive updated: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF [20:01:05] nuria__: you can't select additional columns? [20:01:08] Ironholds: ^ [20:01:09] and the output is? [20:01:11] yeah, I saw [20:01:15] i got a query to put in there... [20:01:44] ottomata: when using a udf last time i tested you couldn't [20:02:00] ottomata: I can test again [20:08:21] Ironholds: added sample output [20:08:42] ta [20:09:03] oh. balls. [20:09:10] well, not balls. hmn. [20:10:09] Ironholds: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF#Example:_User_Agent_UDF [20:10:13] me too! [20:10:24] maybe you can't use them with other columns? but you can use them in subselects! [20:10:36] in an ideal world it'd by possible to create new columns from each output, but this isn't an ideal world. This is still awesome :D [20:10:50] I'm going to build a function for splitting those into a table at the R end right now [20:11:28] you can access them Ironholds [20:11:37] ua(user_agent)['device_family'] [20:11:49] aha [20:12:21] but then if I go ua(user_agent)['device_family'] AS device, ua(user_agent)['os_family'] AS OS [20:12:24] will that run once or twice? [20:12:42] IOW, am I limited to ua(user_agent)['device_family','os_family']? [20:12:53] i'm not sure, i haven't played with this much, see what you can do. [20:13:03] well, it's hard for me to performance test at this end ;p [20:13:12] I guess I can just run each in turn and see what the runtime difference is [20:13:14] you could do [20:14:19] okay, lesse what happens... [20:15:06] maybe htis? [20:15:06] select a.parsed_ua['os_family'], a.parsed_ua['device_family'] from ( [20:15:06] select ua(user_agent) as parsed_ua from webrequest [20:15:06] where year=2014 and month=10 and day=30 and hour=0 and webrequest_source='mobile' limit 10) a [20:15:06] ; [20:15:07] trying it [20:15:32] Ironholds: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-GROUPingandSORTingonf(column) [20:15:34] btw, other columns work fine with this udf [20:15:36] nuria__: [20:15:46] select uri_path, ua(user_agent)['os_family'] was fine [20:15:57] yeah, oliver that works [20:15:58] ^^^ [20:15:58] ottomata: right, i just run query too [20:16:37] oh you can do it without subquery if you don't use alias [20:16:39] huh [20:16:43] i guess hive jsut figures out what you mean [20:16:43] hm [20:17:37] ottomata, yeah, I'm not seeing any real performance difference [20:17:37] cool! [20:17:59] ottomata: docs corrected [20:18:30] I'll build the parser anyway. It's nice to have an excuse to code while waiting for queries to run ;p [20:19:00] oh, nuria__there are two seconds of that doc that uaparserudf [20:19:03] sections* [20:19:09] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF#Sampling_Data:_Get_a_user_agent_report_for_the_past_month [20:19:41] we should probably merge this page into the Hive/Queries page [20:20:25] ottomata: i linked it from there a while back right? [20:20:31] leemme see [20:20:42] ottomata: it is linked [20:21:09] mk, i think we should merge them, but i'm not gonna do it (at least not now!) :) [20:22:19] Ottomata: i leave it up to you but I think is fine, we can use this one to document the udfs available [20:23:19] would there be any way for those to just be auto-loaded when hive is? [20:23:27] unless there's a measurable cost to more java [20:24:26] one day Ironholds [20:24:28] need new version of hive [20:24:49] ottomata, so clearly you should upgrade! you love upgrading everything so very much! :D [20:25:08] ha, indeed! [20:25:11] it will happen [20:25:18] first we have to upgrade the whole cluster to trusty [20:25:25] then we upgrade the whoel cluster to cdh 5.2 [20:25:27] it will happen! [20:25:35] but i want to focus on some other things for now [20:25:57] also doesn't seem such a good idea to load a bunch of stuff you might not need [20:26:10] i do not think naming functions is actually bad [20:31:20] well, so far it's just two functions [20:31:31] and in exchange for doing that, researchers have to remember "ua" and "geolocate" [20:32:02] (PS2) Milimetric: Transform projectcounts hourly files [analytics/refinery] - https://gerrit.wikimedia.org/r/169974 [20:32:04] instead of the URL, or CREATE TEMPORARY FUNCTION ua as 'org.wikimedia.analytics.refinery.hive.UAParserUDF'; and the geo equivalent [20:36:48] Ironholds: is the geo udf merged already? i think i still WIP [20:37:08] nuria__, to my knowledge you're right, I just load it from otto's testbed area ;p [20:37:40] Ironholds: which means that even if upgrades you will still need to add it by hand [20:37:58] as it is not available in archiva yet [20:38:11] Ironholds: makes sense? [20:38:15] yes, I was being sarcastic about the upgrades, referencing ottomata's stress of upping to CDH5 and then upgrading stat100* to Trusty [20:38:23] ahhh [23:20:14] Hello nice wikimetrics devs. I'm having a problem getting vagrant working when wikimetrics is enabled. Can anybody help? This is the error message I'm getting: Error: Invalid parameter db_user_centralauth at /tmp/vagrant-puppet-5/modules-0/role/manifests/wikimetrics.pp:70 on node mediawiki-vagrant.dev [23:36:49] bmansurov: you need to have the centralauth role enabled + upgrade vagrant/puppet [23:37:07] bmansurov: vagrant/puppet/wikimetrics [23:37:20] nuria__: thanks [23:37:40] puppet will do it for you [23:37:48] if you run vagrant provision [23:37:54] bmansurov^ [23:38:02] so no need to do it buy hand [23:38:06] nuria__: cool [23:38:06] *by hand [23:38:17] bmansurov: but you need to have latest puppet vagrant [23:38:36] bmansurov: ok, lemme know if you run into issues [23:39:06] nuria__: ok, i'm trying now [23:39:22] bmansurov: do you use vagrant for mediawiki? [23:39:30] nuria__: yes [23:39:37] bmansurov: the way it works is very similar [23:39:48] bmansurov: ok, then you know how to do all this [23:45:22] nuria__: So I enabled 'centralauth', and ran 'vagrant provision', this should upgrade puppet, right? [23:45:43] bmansurov: actually puppet run enables "centraauth" [23:45:49] so need to do it by hand [23:45:53] but it works too [23:46:00] but no [23:46:11] bmansurov: vagrant provision does not upgrade puppet [23:46:20] bmansurov: you need to git pull in the depo [23:46:27] bmansurov: making sure you are in master [23:46:32] also [23:46:39] you mean the puppet repo? [23:47:02] yes, /vagrant one [23:47:26] the vagrant repo has a puppet directory inside [23:47:41] nuria__: so I'm in the /vagrant/puppet folder and pulled master [23:47:51] nuria__: what command do I run now? [23:48:17] git submodule update --init , just like you would do for mediawiki [23:48:23] and after vagrant provision [23:48:31] bmansurov: [23:49:21] nuria__: thanks, I think it worked