[00:00:29] you could fake the whole thing too [00:00:32] [00:00:34] whatever [00:00:34] :) [00:00:47] i think the end just is confusing because that is a real domain somewhere [00:02:19] Done. [01:00:37] Analytics, operations, Analytics-Kanban: Upgrade Analytics Cluster to Trusty, and then to CDH 5.3 - https://phabricator.wikimedia.org/T1200#1042711 (Ottomata) Alright! Some oozie jobs are busy backfilling the time that they were offline, but everything is looking good. I'm going to wait a day or two make su... [01:01:25] ok, laters all! [02:07:28] Analytics-Cluster, Analytics-Kanban: Estimate roughly of how many users might not have javascript capable/enable browsers - https://phabricator.wikimedia.org/T88560#1042740 (Nuria) Report: https://www.mediawiki.org/wiki/Analytics/Reports/ClientsWithoutJavascript [08:10:25] the pagecounts on dumps haven't updated since yesterday at 13:00 [12:49:16] Project-Creators, Phabricator, Engineering-Community: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#1043399 (Qgil) Between the two... maybe "Ready-For-Volunteers" is more precise in its meaning? [15:01:24] henrik: those dumps come now from hadoop and the cluster was down yesterday for an upgrade, I think we forgot to send an announcement to the public list [15:09:36] henrik: hiya [15:22:25] Analytics-Cluster, Analytics-Kanban: Make gecoded data and chosen client_ip available as fields in refined webrequest data - https://phabricator.wikimedia.org/T89401#1043625 (kevinator) p:Triage>Normal a:JAllemandou [16:26:08] ottomata: so here's what I get recently on wikimetrics-dev1 (I ran there so I wouldn't mess anything important up) [16:26:09] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Package[sudo-ldap] is already declared in file /etc/puppet/private/modules/passwords/manifests/init.pp:11; cannot redeclare at /etc/puppet/modules/sudo/manifests/init.pp:9 [16:26:24] I'm wondering what we can do more generally about this kind of problem [16:26:50] milimetric: that actually sounds like a problem with the local edits we have made to our self hosted puppet master [16:27:21] hm, doubtful since I get the same problem on limn1 [16:27:27] which was fine last time i ran it [16:27:27] hm, maybe not then. [16:27:32] but i'll try resetting completely [16:27:56] is wikimetrics-dev1 useing a self hosted puppet master? [16:28:40] yes, but i'm on the vanilla production branch and I don't see any local commits [16:29:02] hm ok [16:29:15] the last commit is from you giving Jo access [16:29:19] yeah [16:30:58] so I'm not trying to point fingers. I'm just saying this is a problem I don't understand but that I intuitively think should never happen [16:31:07] yet it happens basically anytime I don't run puppet daily [16:31:18] so if I get puppet working, then run it every day manually, things seem ok [16:31:39] and if I step away from the instance for like 10 days, and then try to run it again, it's usually (more than 50% of the time) broken [16:47:48] Analytics-Kanban: Analyze failure types in Edit Schema events - https://phabricator.wikimedia.org/T89725#1043796 (Milimetric) NEW [16:48:26] Analytics-Kanban: Analyze difference in Edit Schema "bounce rates" across wikis - https://phabricator.wikimedia.org/T89726#1043805 (Milimetric) NEW [16:48:39] nuria: alright, got it. [16:48:48] ottomata: hi :) [16:48:59] henrik: ottomata is right now working on it, he might have a more news [16:49:03] *more news [16:51:48] great. Do you expect to populate the missing hours, or will there be a gap? [16:53:43] (in meeting, but they will be populated, will explain more in a second) [16:54:01] henrik: no gap due to the magic of buffering ta-ta-channnn [16:54:43] Analytics-Kanban: Analyze device type and how it influences Edit Schema events - https://phabricator.wikimedia.org/T89728#1043833 (Milimetric) NEW [16:56:44] nuria: Awesome. I think I've used up stats.grok.se's downtime quota for a bit now anyway, so that is nice :) [17:02:16] Analytics-Kanban: Analyze different types of users in the context of Edit Schema events - https://phabricator.wikimedia.org/T89729#1043859 (Milimetric) NEW [17:02:39] Analytics-Kanban: Analyze different types of users in the context of Edit Schema events - https://phabricator.wikimedia.org/T89729#1043867 (Milimetric) [17:03:02] Analytics-Cluster, operations: Audit analytics cluster alerts and recipients - https://phabricator.wikimedia.org/T89730#1043869 (Ottomata) NEW a:Ottomata [17:08:24] ok, henrik, quick update: yesterday I upgraded our Hadoop cluster, this worked well for the most part, but there are some finicky issues that are causing a few jobs to not run [17:08:36] the data is all present, I just have to iron out the issues to get the jobs launching smoothly again [17:08:42] once they run, the data will be copied out to dumps again. [17:12:00] check [17:20:32] Analytics-Kanban: Analyze failure types in Edit Schema events - https://phabricator.wikimedia.org/T89725#1043959 (Nuria) We should just be careful with mixing "user data" and "application exceptions" [17:20:55] Analytics-Kanban: Analyze failure types in Edit Schema VE events - https://phabricator.wikimedia.org/T89725#1043969 (Nuria) [17:23:52] Analytics-Wikimetrics: Uploading cohort or running a large report fails - https://phabricator.wikimedia.org/T87596#994809 (Capt_Swing) Still a fairly regular issue for some Wikimetrics users. See thread: https://lists.wikimedia.org/pipermail/wikimetrics/2015-February/000241.html [17:36:05] ottomata: can you showcase pageID in 85 minutes ish? [17:39:50] sure! [17:40:05] thoguht we were going to wait til page id was in refined table as a top level field, or something, but i can show [17:40:12] we can show what we got now [17:40:18] and then show it as top level field later :) [17:40:42] Analytics-Kanban: Analyze failure types in Edit Schema VE events [13 pts] {bear} - https://phabricator.wikimedia.org/T89725#1044037 (kevinator) [17:40:50] Analytics-Kanban: Analyze failure types in Edit Schema VE events [13 pts] {bear} - https://phabricator.wikimedia.org/T89725#1043796 (kevinator) p:Triage>Normal [17:44:12] thanks ottomata I just shared the deck with you with a placeholder for this. You can add to the deck… othewise just share you screen [17:55:48] nuria: maybe ask ops@ too [17:55:51] about vcl changes [18:08:55] (PS1) QChris: Update HiveQL that creates wmf.webrequest to match Hive 0.13.0 [analytics/refinery] - https://gerrit.wikimedia.org/r/191085 [18:09:15] thanks qchris :) [18:09:26] yw :-D [18:09:33] qchris: remind me again why we chose to use an external table? [18:09:52] Because we do not want Hive to think that it owns this data. [18:10:01] Pig & Co also should be able to use it on the same grounds. [18:10:12] hm. right. [18:10:14] hm. [18:10:25] it does make the drop script more comlicated [18:10:25] Feel like changing that? [18:10:40] i think i will change the partition layout to match raw's [18:10:50] eventually... [18:11:01] Sure :-) [18:13:11] Btw. ... talking conventions. Got 10 minutes to bikeshed layout of mediacounts? [18:13:26] I'd basically do the same thing as for pagecounts-all-sites. [18:13:31] sure [18:13:46] So the wmf.webrequest would get aggregated per hour to wmf.mediacounts. [18:14:05] And every day, another job would aggregate wmf.mediacounts to an hourly file. [18:14:27] (mediacounts are public anyways. So we can keep that data in hdfs forever) [18:14:35] daily file?) [18:14:36] ( [18:14:46] Right. daily file. [18:14:47] Sorry. [18:15:01] But I am not sure of the name of the daily file. [18:15:19] To match the legacy tsvs, it should be something like mediacounts.tsv.log-20150101.bz2 [18:15:28] that's different than pagecounts-all-sites, right? [18:15:40] hourly files stored in monthly directories? [18:15:50] Yes, that's different. [18:15:57] We only have exactly 1 tsv per day. [18:16:03] no hourly files. [18:16:05] we did this for mobile_apps [18:16:13] No pagecounts/projectcounts variants. [18:16:26] mobile_apps/ [18:16:27] └── uniques [18:16:27] └── daily [18:16:27] ├── 2015-01-01.gz [18:16:45] Mhmm. [18:17:32] So if I'd download the file for 2015-01-01, I'd afterwards never know that the 2015-01-01.gz acutally was a mobile_apps/uniques file. [18:17:59] ja, i'm fine if you want to put the name in the file too, but i think the hierarchy is needed [18:18:21] mediacounts is fine with me [18:18:30] you probably don't need .log [18:18:31] The suggestion up to now is to use: [18:18:31] mediacounts-v00-20150101.tsv.bz2 [18:18:33] The v00 is "version 00" [18:18:41] since this is an aggregated dataset [18:18:49] version 00? [18:19:01] Yup. We dropped the "log" in the suggestion. [18:19:34] Like if the format of the tsv gets changed in the future, the version would get bumped, and the file name would be ...-v01-... [18:19:48] So the filename tells you how to parse the file [18:19:51] or what to expect in there. [18:22:35] (CR) Ottomata: [C: 2 V: 2] Update HiveQL that creates wmf.webrequest to match Hive 0.13.0 [analytics/refinery] - https://gerrit.wikimedia.org/r/191085 (owner: QChris) [18:22:41] ottomata: ^ sounds roughly ok? [18:22:49] I meant the version stuff. [18:23:14] sure, any reason not to make it look a little prettier [18:23:14] maybe [18:23:22] mediacounts.v00.2015-01-01.tsv.bz2 [18:23:32] or [18:23:33] probalby [18:23:36] put v at the end [18:23:40] so it is more sortable by date [18:23:52] mediacounts.2015-01-01.v00.tsv.bz2 [18:23:57] ? [18:24:03] The dots are fine by me. I don't care about the separator. [18:24:14] :) [18:24:26] The - between the date components are due to somewhat mimick the logrotate default. But I wanted to add them. [18:24:43] So I guess with you feeling the same, I'll just add them and overrule the others :-) [18:24:55] the date sorting is a good argument. [18:25:03] Thanks. [18:25:16] I guess it'll be mediacounts.2015-01-01.v00.tsv.bz2 :-) [18:38:05] ottomata: will give Qa@ a change to respond, and after asks in ops and wiimedia-releng [18:39:57] (PS1) QChris: Encode commas in PercentEncoder [analytics/refinery/source] - https://gerrit.wikimedia.org/r/191098 [18:42:07] ottomata: are you presenting the page id addition on the showcase? [18:58:23] yes [18:58:28] going to show it in kafka [18:58:34] i was going to show it in hive, but that is being weird. [18:58:43] was going to use the get xanalytics value udf [18:58:46] something is weird [19:21:49] Analytics-Engineering: Email engineering re: x-analytics deployed to all wikis - https://phabricator.wikimedia.org/T89748#1044379 (ggellerman) NEW a:kevinator [19:22:32] ja, joal ellery ran into the same locking problem in his jobs [19:22:39] i had him test setting support.concurrency = false [19:22:41] and it works for him now :) [19:22:46] so i'm just goin gto set that by default in hive-site.xml [19:23:17] Analytics-Engineering: Email engineering re: x-analytics deployed to all wikis - https://phabricator.wikimedia.org/T89749#1044388 (ggellerman) NEW [19:23:31] Analytics-Engineering, Analytics-Kanban: Email engineering re: x-analytics deployed to all wikis - https://phabricator.wikimedia.org/T89749#1044395 (ggellerman) [19:24:00] hey nuria, check this out! [19:24:01] on stat1002: [19:24:04] /home/otto/kafkacat -t webrequest_text -p 0 -b analytics1022.eqiad.wmnet -c 2000 | grep page_id= | jq '.uri_path + ";" + .x_analytics' | grep -v '%' | head -n 50 | sed 's@"@@g' | column -s ';' -t [19:24:06] :D [19:24:25] kafkacat cuts current stream? [19:24:41] consumes, yes [19:24:47] Analytics-Engineering, Analytics-Kanban: Email engineering re: x-analytics deployed to all wikis - https://phabricator.wikimedia.org/T89749#1044400 (ggellerman) p:Triage>Normal a:kevinator [19:25:59] ottomata: you know, 'column' i had never used before [19:26:17] nice right! [19:26:27] very [19:26:29] very [19:42:43] (PS1) QChris: [WIP] Add media file consumption reports [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 [19:43:08] (CR) QChris: [C: -2] [WIP] Add media file consumption reports [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 (owner: QChris) [19:46:38] (PS2) QChris: [WIP] Add media file consumption reports [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 [19:47:34] (Abandoned) QChris: [WiP] Add HiveQL file for daily media file access aggregations [analytics/refinery] - https://gerrit.wikimedia.org/r/173479 (owner: QChris) [20:07:03] ottomata, got a minute to get on the altiscale call? [20:11:05] sure [20:11:45] hmi joined the calender one [20:11:48] link? [20:11:55] halfak: ^ [20:12:04] Bah. n/m [20:12:09] Ended up calling it early. [20:12:11] oh [20:12:12] sorry [20:12:31] So, Aakash was telling me that it is impossible to get error logs for a failed Hadoop job in less than 5 minutes after it failed. [20:12:54] So, I was hoping to have you tell him how you're doing it. :) [20:13:44] ha [20:13:49] Either way, his conclusion was "WONTFIX" [20:13:57] well, that *might* be true, depending on his cluster, maybe it just takes a while [20:14:07] it is a separate thing for hadoop to do; to move the logs from the datanodes into hdfs [20:16:27] I sometimes just tail the datanode's container logs. Or if one cannot log into the datanode, one can curl the container's stdout/stderr/syslog directly on the datanode. [20:16:29] LIke: [20:16:34] curl 'http://analytics1029.eqiad.wmnet:8042/node/containerlogs/container_1421426017437_56576_01_000002/hdfs/stdout' [20:16:46] (Not sure if that is possible on the altiscale cluster) [20:17:16] That is pretty much real-time. [20:17:58] qchris, ohh. How do you figure out which node? [20:18:22] Through the web interface on analytics1001. [20:18:59] Port 8088 on analytics1001 (through ssh) the path is "/cluster" [20:19:08] There, I look for the relevant application. [20:19:19] Drill down to the container. [20:19:20] Gotcha. [20:19:39] * halfak wishes that the output in his terminal would contain stderr from jobs. [20:19:47] But that gets me closer. [21:54:41] henrik: fingers crossed, i think things are chugging along now [21:55:02] its still catching up, but seems to be working fine [22:32:59] Analytics, Analytics-Kanban: Report metrics for Quarterly Report 2014 Oct-Dec - https://phabricator.wikimedia.org/T89024#1044969 (kevinator) Open>Resolved The report card has been published here: https://commons.wikimedia.org/w/index.php?title=File%3AWikimedia_Foundation_Quarterly_Report%2C_FY_2014-15_Q... [22:33:58] Analytics-Kanban, Analytics-Visualization: Build high level timeseries view of key metrics [8 pts] {bear} - https://phabricator.wikimedia.org/T88367#1044974 (kevinator) [22:53:11] Datasets-General-or-Unknown, Analytics: pagecounts stats are behind by about 16 hours - https://phabricator.wikimedia.org/T89771#1045030 (Bawolff) NEW [22:58:30] Datasets-General-or-Unknown, Analytics, Analytics-General-or-Unknown: pagecounts stats are behind by about 16 hours - https://phabricator.wikimedia.org/T89771#1045043 (Bawolff) [23:07:07] Analytics-Wikistats: Discrepancies in historical total active editor numbers - https://phabricator.wikimedia.org/T87738#1045058 (kevinator) a:ezachte