[07:35:03] Analytics-Tech-community-metrics, Developer-Relations: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#2238301 (Qgil) [08:08:48] o/ [08:16:25] hi elukey :) [08:24:06] joal: hi! [08:24:16] I started my week looking at VCL code [08:24:25] embedded into erb [08:24:28] not a good idea [08:24:40] insta brain damage [08:24:42] mwahahaha :) [08:35:22] joal: I was wondering if the solution that we used for https://yarn.wikimedia.org/ could be ok for the stat1001 maintenance [08:35:48] it is not the best one but in the end you can see [08:35:50] "Error: 404, Public access disabled. See https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Access#ssh_tunnel.28s.29 for access instructions" [08:36:10] elukey: I don't really like the "technical problem" while it is maintenantce, but for a few hours it'll be very much ok :) [08:36:18] we could use something like "Error: 503, Service temporarily down due to maintenance. Please check TXXXXX" [08:36:29] elukey: That'd be perfect :) [08:36:35] but it'll get displayed only on the very bottom [08:36:42] the big central part will remain [08:37:01] like for yarn's page [08:37:08] Arf, so it is then :) [08:37:40] don't love the solution but it may be much easier and performant than the other one [08:38:08] no problem, I like 80-20 solutions :) [08:55:46] joal: do you have a minute to listen to my ramblings and give me an advice? [08:55:54] sure elukey [08:55:59] thanks! [08:56:15] so I found where the error page displayed for https://yarn.wikimedia.org/ is stored in puppet [08:56:57] basically it is a static html page, and some Varnish VCL code reads it and appends in the end the specific error code [08:57:02] in this case [08:57:12] Error: 404, Public access disabled. See https://wikitech.wikimedia. [08:57:13] ... [08:57:36] the perfect solution would be to personalize the big error message in the middle [08:58:26] but it might be a bit hard to convince everybody about this change (and also technically difficult as far as I can tell) [08:58:38] so I am thinking about changing it to [09:00:11] "Our servers are under maintenance or experiencing technical difficulties. This is probably temporary and should be fixed soon. Please check the error message at the bottom of this page for more information and try again in a few minutes. [09:01:20] a user will notice the error message imho [09:01:20] elukey: That message is more generic and therefore applies better to more cases, but I don't know how ops people prefers [09:01:44] elukey: I agree that making sure th [09:01:54] e user noticves the error message at the bottom is good [09:02:55] thanks :) [09:03:07] maybe there is a way to add another page only for maintenace [09:03:36] even if having only one is better, the current one is rather confusing [09:03:45] elukey: I don't know, but if so, it'xs a matter of choice I guess :) [09:04:14] agreed on that, rather confusing and not generic enough for our case [09:29:29] Analytics-Tech-community-metrics, Developer-Relations (Apr-Jun-2016): Check whether mailing list activity per person on korma is in sync with current "mlstats_mailing_lists.conf" - https://phabricator.wikimedia.org/T132907#2213805 (Qgil) This task welcomes an owner. [10:05:27] joal: https://gerrit.wikimedia.org/r/#/c/285363/1 [10:05:56] makes sense? [10:35:26] elukey: makes complete sense to me :) [10:35:59] \o/ [10:36:03] good :) [10:36:12] let's wait Brandon and see [11:51:42] @nehanarkhede: @LinkedIn's use of @apachekafka:1.4 trillion msg/day, 1400 brokers. Powers database replication, change capture etc [11:52:03] joal: --^ [12:36:14] Analytics: analytics specific icinga alerts should ping in our IRC channel. - https://phabricator.wikimedia.org/T125128#2239187 (elukey) p:Normal>Low [12:44:41] wikimedia/mediawiki-extensions-EventLogging#551 (wmf/1.27.0-wmf.22 - e6e8696 : Antoine Musso): The build has errored. [12:44:42] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/e6e86960047f [12:44:42] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/125820411 [12:45:49] what the hell is this? :D [12:46:12] ahhhh nice! [12:47:45] elukey: :) [12:51:17] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Use MySQL as Hue data backend store - https://phabricator.wikimedia.org/T127990#2239209 (elukey) Adding a blocker: the connection between Hue and metastore (an1027 -> an1003) needs to be encrypted with TLS: http://www.cloudera.com/documentation/... [12:52:41] Hi mobrovac, let me know if you find some time to investigate with me :) [12:53:49] joal: in a self-review training now, and then meetings meetings meetings [12:53:52] :/ [12:54:01] okey, tomorrow maybe then :) [12:54:02] joal: i'll try to look at it when i have 5 mins [12:54:06] morb [12:54:35] mobrovac: thanks, but don't put too much pressure: it can wait tomorrow [12:55:11] joal: i suspect it's about type management in the sqlite back-end module [12:55:20] but need to look into it deeper [12:55:41] mobrovac: I'd have add suspected the opposite: type management in cassandra module :) [12:55:56] mobrovac: sqlite returns a number (kindof exepcted) [12:56:04] We'll discuss that later [12:56:18] yup [13:06:07] (PS1) Joal: Add refinery_directory prop to every oozie job [analytics/refinery] - https://gerrit.wikimedia.org/r/285388 [13:27:29] Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2239350 (faidon) I don't think archiva is a runtime dependency on anything — but fyi, @Ottomata. [13:30:05] Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2239356 (MoritzMuehlenhoff) I'm on this. There's already a replacement VM (meitnerium) with archiva installed, the next is the migration of /var/lib/archiva from the titan... [13:55:10] joal: comment about https://gerrit.wikimedia.org/r/#/c/285388/1/oozie/aqs/hourly/coordinator.properties [13:56:06] would it be worth to add a comment about what are the downsides of keeping "current"? [13:56:35] or something like 'if you keep this value then this might happen, beware' [13:56:56] elukey: if you think it's necessary :) [13:59:27] ahahhaaha don't sayyy it in this way!! [14:00:05] it was only a suggestion for people like me that are hated by oozie [14:00:13] huhuhu :) [14:01:00] o/ joal [14:01:02] batcave? [14:01:09] halfak: sure ! [14:01:58] halfak: are you there? [14:41:23] joal, FYI, that analytics-store machine is intended to be used for big, long-running metrics queries so don't hesitate to experiment. [14:41:37] awesome halfak ! [14:56:10] (PS1) Joal: Normalize oozie job names (bundles, coords, wfs) [analytics/refinery] - https://gerrit.wikimedia.org/r/285400 [15:00:18] Analytics-Kanban, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2239690 (Nuria) p:Triage>High [15:02:01] Analytics-Kanban, DNS, Operations, Traffic: Create analytics.wikimedia.org - https://phabricator.wikimedia.org/T132407#2197243 (Nuria) @BBlack : can you confirm whether is OK with ops to deploy this domain to 1001? [15:06:07] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2239717 (Ladsgroup) [15:08:00] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2239761 (Ladsgroup) [15:18:17] joal: if you have time, https://etherpad.wikimedia.org/p/analytics-aqs-cassandra [15:18:40] not sure if I forgot something [15:45:01] (CR) Nuria: "Can you describe a bit more in the commit message why is this needed/how to use it?" [analytics/refinery] - https://gerrit.wikimedia.org/r/285388 (owner: Joal) [15:52:17] Analytics-Kanban: Unique Devices javascript node module - https://phabricator.wikimedia.org/T133184#2239931 (mforns) I've created a pull request to Tomas Steiner's repo. https://github.com/tomayac/pageviews.js/pull/7 [15:55:22] (CR) Nuria: "LGTM" [analytics/refinery] - https://gerrit.wikimedia.org/r/285400 (owner: Joal) [16:00:31] elukey: sorry, currently updateing etherpad [16:01:57] nuria_: standup? [16:02:09] a-team: i have mangers meeting [16:02:18] ok ! [16:02:19] if it is cancelled i shall join standup [16:02:27] thus far looks like it is happening [16:02:28] starting without you :) [16:24:22] (PS2) Joal: Add refinery_directory prop to every oozie job [analytics/refinery] - https://gerrit.wikimedia.org/r/285388 (https://phabricator.wikimedia.org/T133206) [17:26:39] I'm running few relatively simple queries in hanalytics-store and they are taking a very long time. Does anyone know if things are slow for a specific reason? [17:33:24] nuria_: if you have time (even later on) can we talk about the error page for stat1001? [17:33:42] elukey: sure, it is fine as is if brando is good with it [17:33:46] *brandon [17:34:22] all right good, thanks :) [17:34:48] nuria_: do you also think that 2 days of notice would be good for scheduling the stat1001 downtime? [17:34:55] (announcing it to $mainling_lists) [17:35:28] elukey: we probably need more cause think that with TZ two days is normally one day by thetime someone in a different tz sees it [17:36:07] all right so something like next monday? [17:36:24] nuria_: --^ [17:36:50] elukey: that sounds ok, will work on out of service banner [17:37:39] nuria_: super! Probably a reference to the phab task would be super useful (in the error message) [17:39:05] elukey: that cannot hurt, but it should be on the "custom" msg correct? [17:39:10] elukey: not teh generic one [17:42:16] nuria_: yes yes correct, we are not going to modify the "big" one since it would require more work [17:42:50] elukey: debrief / planning on cassandra tomorrow, while still fresh ? [17:45:45] joal: +1! [17:45:53] cool :) [17:46:15] for various degrees of "fresh", but yeah [17:46:16] :) [17:46:20] huhuhu [17:46:33] elukey: morning, 11am? [17:47:10] yep, fine for me! [17:47:17] ok, sending [17:48:43] nuria_: last thing - should I write the downtime email, send it to you to double check and then proceed? [17:53:28] elukey: any of us can double check [17:53:47] all right! [17:53:47] elukey: should be sent to wikitech-l , analytics and likely engineering (this last list is internal) [17:55:19] a-team I'm off for tonight [17:55:25] See y'all tomorrow ! [17:55:28] joal, bye! [18:05:29] Analytics: Searching in dashiki for nl... does not bring nlwikipedia only nlwikidata - https://phabricator.wikimedia.org/T133718#2240530 (Nuria) [18:07:53] (CR) Nuria: [C: 2 V: 2] Add refinery_directory prop to every oozie job [analytics/refinery] - https://gerrit.wikimedia.org/r/285388 (https://phabricator.wikimedia.org/T133206) (owner: Joal) [18:16:06] a-team: logging off for today! [18:16:08] byyeeeeeee o/ [18:57:25] Analytics, Discovery, Maps, RESTBase-Cassandra, Patch-For-Review: Investigate and implement possible simplification of Cassandra Logstash filtering - https://phabricator.wikimedia.org/T130861#2240802 (Eevans) p:Lowest>Normal [19:01:38] mforns_brb: where do you think the config for out of service be: https://meta.wikimedia.org/wiki/Dashiki:DefaultDashboard? [19:03:09] mforns_brb: or here: https://meta.wikimedia.org/wiki/Dashiki:CategorizedMetrics [19:18:07] (PS2) Nuria: [WIP] Add out of service banner to dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/285255 [19:26:05] mforns_brb: nevermind , it has to be per-dashboard as all of them request different config files [19:30:18] mforns_brb: I can also create a global config setting for all dashboards, that seems best, we ill need to re-deploy content translation too [19:33:59] hi nuria_ [19:34:06] mforns: hola [19:34:27] mforns: was talking to the void, i know, let me know if you have any ideas [19:34:58] nuria_, hehe, yes it seems we have to add the config to all dashboard configs and also deploy the new version to all of them [19:35:09] mforns: i was thinking that rather we could [19:35:27] 1) add a global out-of-service config: https://meta.wikimedia.org/wiki/Dashiki:OutOfService [19:35:35] 2) deploy new dashiki code every where [19:35:51] it is unlikely that some dashboards will be on while others are off [19:36:06] as they require data from datasets [19:36:17] does that sound ok mforns ? [19:36:30] nuria_, one question [19:36:46] when 1001 is down, no reports will be showing right? [19:37:03] mforns: all data requests will return 404 [19:37:04] because no file will be retrieved from 1001 [19:37:14] so, the dashboards will be all empty [19:37:21] mforns: but, for example, pageview metric doesn't need 1001 [19:37:28] mforns: not empty, but defective [19:37:37] ok [19:37:46] so all dashboards except vital signs -> pageviews [19:38:13] why don't we use a static page and point all dns to that? [19:38:14] mforns: right, depending on caching headers some other things might work [19:38:39] ah [19:39:02] Analytics-Kanban: Out of service banner in dashiki - https://phabricator.wikimedia.org/T133736#2241050 (Nuria) [19:39:47] all our labs domains? [19:39:58] yes [19:40:14] mforns: that seems that it would be more work right? [19:41:19] nuria_, I have never changed domains in wmf's dns server [19:41:20] mforns: let's see [19:41:24] isn't that easy? [19:41:40] mforns: we would need to change the config for every virtual host in our dashiki instance [19:42:29] mforns: i guess we could deploy a static page via phab and commit that page to the dashiki depot so it is available for reuse [19:42:37] nuria_, I was thinking of pointing browser-reports.wmflabs.org (and all other domains) to a static page deployed somewhere [19:42:45] mforns: how? [19:43:47] I'm not sure where you specify that browser-reports.wmflabs.org is served from dashiki-01.eqiad.wmflabs [19:43:55] mforns: without deploying apache config that is taht handles the redirect [19:43:56] is that the dashiki virtual host config? [19:44:29] mforns: on hiera/puppet/apache [19:44:57] I see [19:45:42] mforns: also going forward seems easier to handle outages with config rather than deployments [19:45:56] mforns: when browser reports is hosted on analytics.wikimedia.org [19:46:12] nuria_, totally, adding the banner to dashiki is a lot better [19:46:19] mforns: waht you are suggesting will happen cause varnish will source (with elukey's changes ) teh error page [19:46:21] *the [19:46:26] I was just thinking of a faster way, but it seems it wouldn't be faster [19:46:35] mforns: banner looks rustic to say the least.. ahem [19:47:12] mforns: https://www.dropbox.com/s/e86fy358fhy7oy0/Screen%20Shot%202016-04-26%20at%2012.15.48%20PM.png?dl=0 [19:47:47] nuria_, this is totally fine, it does what is needed [19:48:00] mforns: my css skills are below pityful [19:48:04] whatever is below that [19:48:48] nuria_, what I didn't understand is if we go with 1) add a global out-of-service config, [19:48:52] Quarry: Quarry task running for a while - https://phabricator.wikimedia.org/T133738#2241097 (Ankit-Maity) [19:48:58] mforns: aham [19:48:58] then all dashboards will show the same layout? [19:49:00] Quarry: Quarry task running for a while - https://phabricator.wikimedia.org/T133738#2241109 (Ankit-Maity) p:Triage>Normal [19:49:10] mforns: no they will show the same "banner" [19:49:23] mforns: as banner gets added to every layout [19:50:00] nuria_, but the same config wiki would be responsible to say which graphs the dashboard shows no? [19:50:18] wouldn't each dashboard show the same graphs? [19:50:40] mforns: no, if you have one gloabl config you add teh component per layout, see: [19:50:58] (PS3) Nuria: [WIP] Add out of service banner to dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/285255 (https://phabricator.wikimedia.org/T133736) [19:51:12] * mforns looks [19:54:23] nuria_, I see [19:54:35] mforns: does that make sense? [19:54:44] nuria_, but there's something I don't understand still [19:54:52] mforns: ajam [19:55:05] sorry it will be for sure a stupidity from my side... [19:55:08] batcave? [19:56:04] mforns: jaja, wait, give me a sec [19:57:51] omw [19:57:57] ok [20:57:45] (PS4) Nuria: [WIP] Add out of service banner to dashiki [analytics/dashiki] - https://gerrit.wikimedia.org/r/285255 (https://phabricator.wikimedia.org/T133736) [20:58:24] (CR) Nuria: [C: -1] "Still WIP, validated approach with mforns." [analytics/dashiki] - https://gerrit.wikimedia.org/r/285255 (https://phabricator.wikimedia.org/T133736) (owner: Nuria) [22:13:26] (PS1) Alex Monk: Fix up deployment-prep scap config [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/285535 [22:17:20] (PS2) Alex Monk: Fix up deployment-prep scap config [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/285535 (https://phabricator.wikimedia.org/T132267) [22:25:08] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2241542 (Krenair)