[00:59:54] ottomata: if you do a rewrite in VCL like th eone specified here: https://www.varnish-cache.org/trac/wiki/RedirectsAndRewrites [01:00:12] ottomata: how do you send your original url to varnishkafka? [01:08:52] ori: yt? want to talk about swaping vanadium? [01:11:54] nuria: hey [01:12:35] orI: do you have a box that you are going to use already? [01:12:53] no, but I think I have an old ticket requesting machines [01:12:55] I should look [01:13:04] we should have two machines at minimum [01:13:11] so we can have a hot spare [01:13:41] ori: one working one not? [01:13:49] yeah [01:14:08] ori: ok, let's first dig out that old ticket then. [01:14:16] having automatic failover would be good, but absent that even just having a machine configured and ready to go would be an improvement over the status quo [01:14:47] ori: ya, of course. Event having a machine reday to go -not configured- will be an improvement [01:16:55] nuria: do you have access to RT? [01:17:23] ori: yes, i thought we didi not used it anymore [01:17:51] I'm not sure how to get the Phabricator task for an RT ticket; I found the RT ticket but not the task [01:17:55] anyhow it's https://rt.wikimedia.org/Ticket/Display.html?id=7509 [01:23:18] nuria: OK, so: but i'll chat with faidon and see if he knows where this falls on the ordering roadmap and if its immediate, i'll get a quote for it either way i'll find out for ya tomorrow [01:23:42] ori: ok, we shall reconvine tomorrow aabout this then [01:24:02] cool cool [01:24:10] i'll let you know as soon as i hear anything [01:25:11] ori:k [01:48:52] Analytics-Engineering, Analytics-Kanban, VisualEditor, VisualEditor-Performance, § VisualEditor Q3 Blockers: Report on the central tendency for length of pages which are edited for VisualEditor performance benchmarking - https://phabricator.wikimedia.org/T89788#1061159 (Jdforrester-WMF) >>! In T... [01:51:44] Analytics-Engineering, Analytics-Kanban, VisualEditor, VisualEditor-Performance, § VisualEditor Q3 Blockers: Report on the central tendency for length of pages which are edited for VisualEditor performance benchmarking - https://phabricator.wikimedia.org/T89788#1061160 (Jdforrester-WMF) >>! In T... [08:09:17] Analytics, Wikimedia-Hackathon-2015: Wiki tools that use revision scoring - https://phabricator.wikimedia.org/T90034#1061589 (Qgil) [10:36:54] (PS1) Gilles: Update schema version for MultimediaViewerNetworkPerformance [analytics/multimedia] - https://gerrit.wikimedia.org/r/192527 (https://phabricator.wikimedia.org/T89814) [11:00:50] Analytics, MediaWiki-extensions-MultimediaViewer, Multimedia, Multimedia-Sprint-2015-02-18, Patch-For-Review: Set up varnish 204 beacon endpoint for virtual media views and use it in Media Viewer - https://phabricator.wikimedia.org/T89088#1061975 (Gilles) The use of virtual pageviews is becomin... [11:49:37] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1062054 (Dicortazar) Hi, new numbers and list of developers. First, I've been working on removing self-merge... [11:54:00] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1062064 (Dicortazar) Hi, numbers on participants in Gerrit. A participant is defined as any developer that has left any trace of activit... [11:58:13] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1062072 (Dicortazar) Code available at https://phabricator.wikimedia.org/P329 [11:58:40] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1062081 (Dicortazar) Code available at https://phabricator.wikimedia.org/P329 [12:18:36] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1062150 (Qgil) Ok, this looks good. Thank you! Is this data already counting all repositories in Gerrit (T86154)? The shape of the diag... [12:24:05] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1062166 (Qgil) The work described in T59038#1017504 is covered by this graphic and this table. Thank you! Now... [12:35:08] (PS8) Jsahleen: Reports: Add reporting system for generating limn sql [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/192457 (https://phabricator.wikimedia.org/T90265) [12:38:39] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1062209 (Dicortazar) Thanks! It's counting around 1200 Gerrit repositories according to the database. That depression in November/Decem... [12:43:18] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-February-2015: Active code review users on a monthly basis - https://phabricator.wikimedia.org/T86152#1062216 (Dicortazar) Just to clarify, it's counting all of the repos according to {T86154}. And regarding to {T88277}, we still have to... [14:49:51] (PS2) Mforns: [WIP] [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/192319 (https://phabricator.wikimedia.org/T89251) [14:49:58] (CR) jenkins-bot: [V: -1] [WIP] [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/192319 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [15:05:24] Analytics-Cluster: Eng uses Mahout installed on Hadoop cluster - https://phabricator.wikimedia.org/T78016#1062618 (Ottomata) [15:05:25] Analytics-Cluster: Better way to access Hadoop related web GUIs - https://phabricator.wikimedia.org/T83601#1062617 (Ottomata) [15:05:26] Analytics, Analytics-Kanban, operations: Upgrade Analytics Cluster to Trusty, and then to CDH 5.3 - https://phabricator.wikimedia.org/T1200#1062615 (Ottomata) Open>Resolved All is good! [15:06:52] Analytics-EventLogging, Analytics-Kanban: EL alarms should be included just in the tugsten host - https://phabricator.wikimedia.org/T89469#1062623 (kevinator) a:Nuria [15:06:59] Analytics, Analytics-Kanban: Backfill event logging data after 02/05 outage - https://phabricator.wikimedia.org/T88692#1062624 (kevinator) a:Nuria [15:10:43] Analytics-Kanban: Analyze different types of users in the context of Edit Schema events {lion} - https://phabricator.wikimedia.org/T89729#1062639 (kevinator) a:Milimetric [15:14:24] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Geocoding UDF should be more resilient - https://phabricator.wikimedia.org/T89204#1062658 (Ottomata) Open>Resolved [15:19:36] Analytics-Dashiki, Analytics-Kanban: Pageviews metric not showing in Vital Signs - https://phabricator.wikimedia.org/T90587#1062676 (kevinator) NEW [15:19:49] Analytics-Dashiki, Analytics-Kanban: Pageviews metric not showing in Vital Signs - https://phabricator.wikimedia.org/T90587#1062683 (kevinator) p:Triage>Normal [15:20:18] Analytics-EventLogging, Analytics-Kanban: Upgrade box for EventLogging (vanadium) - https://phabricator.wikimedia.org/T90363#1062685 (kevinator) p:Triage>High [18:02:28] (CR) Erik Zachte: [C: 2 V: 2] Remove localized "User" namespace prefixes [analytics/wikistats] - https://gerrit.wikimedia.org/r/190344 (https://phabricator.wikimedia.org/T89387) (owner: Amire80) [18:08:57] I am back :) [18:11:16] (PS1) Ottomata: Update refinery artifacts to version 0.0.7 [analytics/refinery] - https://gerrit.wikimedia.org/r/192582 [18:12:28] Hey ottomata [18:12:42] Do you think 0.0.7 will be deployed soon ? [18:12:55] nuria: I have a question for you if you have minute [18:12:55] yes [18:12:58] doing that no [18:12:59] now [18:13:01] Cool :) [18:13:04] thx a lot [18:13:10] joal: in meeting, give me 15 minutes? [18:13:31] ottomata: no rush for the coming hours, but before tomorrow would be great ! [18:13:47] nuria, sure :) [18:13:58] (PS2) Ottomata: Update refinery artifacts to version 0.0.7 [analytics/refinery] - https://gerrit.wikimedia.org/r/192582 [18:16:59] (PS3) Ottomata: Update refinery artifacts to version 0.0.7 [analytics/refinery] - https://gerrit.wikimedia.org/r/192582 [18:17:41] (CR) Ottomata: [C: 2 V: 2] Update refinery artifacts to version 0.0.7 [analytics/refinery] - https://gerrit.wikimedia.org/r/192582 (owner: Ottomata) [18:19:12] joal, sigh, soon, issue with the way I added the files I think, and I need to eat lucnh [18:19:14] its coming! [18:37:46] joal: reday for your question [18:37:49] *ready [18:38:11] :) [18:38:35] Si I wonder about the "remove the bots from pageview" thing [18:39:06] I know there is a lot ongoing about "pageview definition", and I would really enjoy some help clearing things up :) [18:40:33] I don't know how it would be easier to discuss that, hangout maybe, for 15mins ? [18:40:34] joal: th4e pageview definition includes bots, yes, let me get it [18:40:45] sure, want to come into batcave? [18:40:57] yup, arriving [18:42:08] joal: Can you hear us? [19:02:22] kevinator: meeting ? [19:02:25] nuria: I’m in the batcave. [19:57:38] (PS1) Ottomata: Fix 0.0.7 jars to match exactly what is in archiva [analytics/refinery] - https://gerrit.wikimedia.org/r/192612 [19:58:20] (CR) Ottomata: [C: 2 V: 2] Fix 0.0.7 jars to match exactly what is in archiva [analytics/refinery] - https://gerrit.wikimedia.org/r/192612 (owner: Ottomata) [20:15:17] (CR) Nikerabbit: Reports: Add reporting system for generating limn sql (15 comments) [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/192457 (https://phabricator.wikimedia.org/T90265) (owner: Jsahleen) [20:40:11] hm, halfak, i think I can turn up our vcores! [20:40:14] i learned something today! [20:40:37] Cool! :) [20:40:56] that will at least make yarn have some more slots to give out to users in busy times [20:41:36] hmm [20:44:01] wow, halfak, i need to do a hyperthreading audit of these nodes [20:44:04] totally inconsisten. [20:46:35] ottomata, does that mean we can get more capacity? [20:49:16] um, sorta, it won't really up the real capacity, but it might fake some according to yarn [21:11:55] (CR) Ottomata: [C: 2] "Ja LGTM! You say you have another patch coming, ja?" [analytics/refinery] - https://gerrit.wikimedia.org/r/192363 (owner: Joal) [21:29:29] (CR) Ottomata: (WIP) project class/variant extraction UDF (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/188588 (owner: OliverKeyes) [21:29:55] (CR) Ottomata: "Did you mean to commit TestIsZeroUDF? ;)" (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/188588 (owner: OliverKeyes) [21:30:02] (CR) Ottomata: [C: -1] (WIP) project class/variant extraction UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/188588 (owner: OliverKeyes) [21:31:12] (CR) Ottomata: [C: 2 V: 2] Assert lengths aren't negative [analytics/kafkatee] - https://gerrit.wikimedia.org/r/177152 (owner: CSteipp) [21:40:11] I was granted Hive access at some point (https://phabricator.wikimedia.org/T85169) but now I need to log in. Can someone point me to the instructions? [21:40:14] stat1003? [21:42:59] stat1002.eqiad.wmnet [21:43:17] this doc is out of data! [21:43:17] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Access [21:43:20] but the CLI Access is what you want [21:43:28] looks like I'm in, and have access. thanks! [21:43:38] yup! :) [21:45:42] Is wmf_raw.webrequest not a thing anymore? [21:45:55] I got an error FAILED: SemanticException [Error 10041]: No partition predicate found for Alias "webrequest" Table "webrequest" [21:46:26] it is, but you have to specify a partition predicate [21:46:36] but [21:46:37] awight [21:46:42] check out the wmf.webrequest table [21:46:44] it is better :) [21:46:56] it is a more efficient format, has no duplicates, and has some extra fields [21:47:03] hmm... ewulczyn was using _raw for some reason [21:47:04] you'll still have to give a partition predicate though [21:47:06] yes [21:47:10] I'll ask him later [21:47:11] this is available as of a month ago [21:47:13] about [21:47:15] less than [21:47:16] but ja [21:47:18] partition predicate [21:47:21] ok, great [21:47:26] just means, you need to have at least one partition specified in your where clause [21:47:28] where year > 0 [21:47:30] would query all data. [21:47:31] but don't do that :) [21:47:38] be as specific as you can [21:47:40] How long is that webrequest data stored, by the way? [21:47:50] I was meaning to ask you guys at some point. [21:48:01] ottomata: I see, u happen to know how I can list the partitioning columns? [21:48:12] the policy is no longer than 90 days, wmf_raw data is 30 days, wmf (refined) data hasn't been deleted yet (that is a to do for me) but goes back til jan 1 for now [21:48:19] awight, i think the easiest way: [21:48:22] show create table webrequest; [21:48:26] you can see the partition columns there [21:48:27] excellent [21:51:37] aww, I still threw an OOM error. https://phabricator.wikimedia.org/T90635#1063926 [21:52:13] oops, still using wmf_raw. Does the order of condition terms make a difference? [21:53:10] Sorry for the remedial questions. Should I limit the number of reduce tasks? > Number of reduce tasks not specified. Estimated from input data size: 999 [21:54:04] that is a pretty huge query, but ja, use wmf.webrequest, don't limit the nuber of reducers [21:54:15] you are probably getting OOM before hive even launches the job [21:54:20] still crashing :( [21:54:21] because of alll the partitions you are selecting [21:54:24] FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. GC overhead limit exceeded [21:54:25] awight: [21:54:34] i don't think you need all webrequest_source partitions, do you? [21:54:46] also, you can increase your client's heapsize [21:54:47] do [21:54:56] <_< /me reads about the webrequest_source column [21:55:11] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries#Out_of_Memory_Errors_on_Client [21:55:31] great [21:57:08] awight: quick summary, webrequest_source delineates which varnish cluster the request came from. i'm not really sure where RecordImpressions go [21:57:12] but it is probably only one of them [21:57:17] maybe text? [21:57:19] maybe bits? [22:01:45] (CR) Ottomata: "Oh, also, if you put" [analytics/refinery] - https://gerrit.wikimedia.org/r/192363 (owner: Joal) [22:12:01] ottomata: you got your slides from your kafka talk handy? I wanted to steal some stuff [22:17:17] ja [22:17:29] https://docs.google.com/a/wikimedia.org/presentation/d/1IfFBjD1pmXeIYWZ9FbQSMuI9jod2RbAnm6usD12ZhIk [22:18:03] thx! [22:30:32] ottomata, the webrequest data contains all view requests, i.e. including all hits on the caches etc.? [22:30:56] yes [22:42:40] (PS6) QChris: Add media file consumption reports [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 [22:48:35] (CR) QChris: Add media file consumption reports (5 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 (owner: QChris) [22:50:34] ottomata: Did we already discuss to which webserver/URL I should sync the mediacounts? [22:50:47] hm, uhHHh [22:50:53] :-) [22:50:53] it is very similar to pagecounts, ja? [22:50:57] maybe it should go to dumps? [22:51:06] It is similar to pagecounts. [22:51:18] But didn't people say we should move away from dumps? [22:51:38] (I basically do not care where they get synced to) [22:51:53] You prefer dumps? [22:52:14] i dunno [22:52:15] i don't care either [22:52:18] who's going to look for this? [22:52:32] thus far, datasets has not been very public, as the data in ther ehasn't been super interesting to hte public [22:52:39] i think more folks will want this (e.g. GLAM) [22:52:41] so maybe dumps? [22:52:44] Mostly community will look for this. [22:52:49] it is better managed than datasets [22:52:54] esp. for community use [22:53:05] i hadn't heard we should move away from dumps [22:53:10] So dumps it is :-) [22:53:21] we don't really have a policy of what belongs where (we probably will eventually) [22:53:28] but this feels like a dumps dataset to me :) [22:53:28] The discussions were around harmonizing where datasets live. [22:53:39] aye, uh, i don't know :/ [22:53:40] Ok. Datasets it is. [22:53:40] :) [22:53:44] dumps? [22:53:51] Argh. Sorry. Yes dumps. [22:53:54] phew [22:53:54] :) [22:54:00] * qchris facepalms :-) [22:55:43] cool, looks good to me! [22:55:50] (CR) Ottomata: [C: 2 V: 2] Add media file consumption reports [analytics/refinery] - https://gerrit.wikimedia.org/r/191118 (owner: QChris) [22:55:54] :) [22:56:00] Whoa :-D [22:56:09] Then I'll start deploying. [22:56:13] Thanks. [22:56:38] (CR) Gergő Tisza: [C: 2] Update schema version for MultimediaViewerNetworkPerformance [analytics/multimedia] - https://gerrit.wikimedia.org/r/192527 (https://phabricator.wikimedia.org/T89814) (owner: Gilles) [22:56:50] (Merged) jenkins-bot: Update schema version for MultimediaViewerNetworkPerformance [analytics/multimedia] - https://gerrit.wikimedia.org/r/192527 (https://phabricator.wikimedia.org/T89814) (owner: Gilles) [22:59:27] :) [23:05:34] (PS3) Mforns: [WIP] [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/192319 (https://phabricator.wikimedia.org/T89251) [23:08:14] (CR) jenkins-bot: [V: -1] [WIP] [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/192319 (https://phabricator.wikimedia.org/T89251) (owner: Mforns) [23:41:49] ori: any news on the box for EL?