[00:42:29] Analytics, EventBus, Services: Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#2610073 (Pchelolo) Another potential improvement on the reliability front might be improving the extension. Right now if connection to the producer service times out, or wr... [08:08:45] Analytics, Datasets-General-or-Unknown, Services: Many error 500 from pageviews API "Error in Cassandra table storage backend" - https://phabricator.wikimedia.org/T125345#1985043 (elukey) Adding a bit more info about the new cluster for @Nemo_bis. The new hosts' SSDs allowed us to load Cassandra with... [08:23:18] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2610343 (elukey) Status update: We have re-imaged all the Cassandra hosts (aqs100[456]) with the new RAID sett... [08:41:54] (PS1) Addshore: Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308690 [08:43:52] (CR) Sjoerddebruin: [C: 1] Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308690 (owner: Addshore) [08:43:55] (PS1) Addshore: Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308691 [08:44:00] (CR) Addshore: [C: 2] Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308691 (owner: Addshore) [08:44:04] (CR) Addshore: [C: 2] Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308690 (owner: Addshore) [08:44:07] (Merged) jenkins-bot: Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308691 (owner: Addshore) [08:44:10] (Merged) jenkins-bot: Fix wikidata-analysis/metrics.php -> dumpScanProcessing.php [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/308690 (owner: Addshore) [08:52:17] (PS1) Addshore: Fix example in README [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308693 [08:52:29] (CR) Addshore: [C: 2] Fix example in README [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308693 (owner: Addshore) [08:53:08] (CR) Addshore: [V: 2] Fix example in README [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308693 (owner: Addshore) [09:02:36] (PS1) Addshore: Remove pom.xml that is in the wrong place [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308694 [09:02:47] (CR) Addshore: [C: 2 V: 2] Remove pom.xml that is in the wrong place [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308694 (owner: Addshore) [09:05:52] (PS1) Addshore: Bump some dep versions [analytics/wmde/toolkit-analyzer] - https://gerrit.wikimedia.org/r/308695 [09:07:05] (PS1) Addshore: New Build [analytics/wmde/toolkit-analyzer-build] - https://gerrit.wikimedia.org/r/308696 [11:54:25] Taking a break a-team, see you later [12:17:41] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016), Documentation: Create basic/high-level Kibana (dashboard) documentation - https://phabricator.wikimedia.org/T132323#2610763 (Qgil) [12:20:05] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2610770 (Qgil) What about you and me meeting half hour during some European morning in order to go through the list?... [13:18:40] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2610933 (Nemo_bis) Server activity for the curious: https://ganglia.wikimedia.org/latest/graph.php?r=month&z=xl... [13:23:14] good morninnggg [13:24:40] joal: hiii, can you explain to me the camus thing? Why does it fail during high cluster load times? [13:24:48] i would hope that the entire run would fail [13:24:52] and the offsets wouldn't be saved at all [13:25:26] heh, you have a [1] note in your email but no [1] link! :) [13:26:00] i think I remember/understand the per topic issue, but, not really [13:36:57] Analytics, EventBus, Services: Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#2610991 (Ottomata) I like async rabbit idea in general; I did a similar thing for a message delivery system at my last job. There, we did an INSERT DELAYED query that inse... [13:47:48] ottomata: o/ [13:49:25] hiayaa [13:50:47] ottomata: qq - how the analytics-privatedata-users group is checked for hdfs' private data? [13:51:44] ? [13:51:55] how are users checked to see if they can read? [13:52:13] yes sorry :) [13:52:19] group membership on namenode [13:52:28] ahh okok [13:52:30] super [13:52:36] :) [13:53:25] Hi ottomata :) [13:54:06] I have sent a code review to add a virtual host on stat1001 to access yarn [13:54:56] ottomata: I can't explain why camus has issues with writing history files when cluster is laoded [13:55:03] ottomata: However I have observed it :) [13:55:41] aye :) [13:55:51] that's pretty nasty [13:56:51] ottomata: I didn't investigate, But if camus has no retry policy at hdfs failure, then what we observe can happen [13:57:53] ottomata: The problem comes more from the checker than camus itself: camus fails to write history, and therefore will retrieve again the same indices at next run, but given that we deduplicate, that's no bother [13:58:12] ottomata: Problem was because of partition checker not flagging if any error [13:58:49] oh, ok, sorry, so camus is doing the right thing [13:59:01] if it fails it fails hard and retries from the right place [13:59:17] i think you explained this to me before too, ok, let's talk more in ops sync up after standup [13:59:41] sure ottomata [14:00:14] ottomata: that patch for the checker has been in my todo for month, it's good it finally got done :) [14:39:02] (PS3) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [14:49:15] elukey: https://github.com/implydata/imply-pivot [14:58:16] (CR) Nuria: "Thanks for doing the changes, just some small comments." (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/308532 (https://bugzilla.wikimedia.org/144716) (owner: Joal) [14:59:45] nuria_: thanks! This morning it was still down, worth to follow up with them [14:59:51] to know a definitive answer [15:01:33] Analytics, Operations: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2611221 (Milimetric) That was after I updated the password, @Nuria, we'll see if others still have problems. [15:26:05] joal: a-batcave-2 [15:26:05] ? [15:26:07] ottomata: https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2 [15:26:20] danke [15:40:54] (CR) Ottomata: [C: 1] Make camus partition checker more resilient [analytics/refinery/source] - https://gerrit.wikimedia.org/r/308532 (https://bugzilla.wikimedia.org/144716) (owner: Joal) [15:41:57] joal: here we go: [15:41:57] https://wikitech.wikimedia.org/wiki/Analytics/DataStore/Evaluation [15:42:02] data lake/store candiate page [15:42:05] candidate [15:53:33] ottomata: do you have 5 mins? [15:54:10] ja [15:54:34] so the yarn.w.o vhost is up and running on stat1001 [15:54:42] and via curl it seems working [15:54:52] but I'd like to test basic auth via browser [15:55:03] I tried with an ssh tunnel but I keep incurring in issues [15:55:16] so I think my brain is now working correctly atm [15:55:19] any suggestion? :) [15:55:21] ha, port? i will try [15:55:34] port 80 on stat1001 as the others [15:55:47] mmm sorry checking [15:56:34] yes port 80 [15:56:58] I could have used another one and the trick would have been easier probably [15:57:12] hmmm, i am getting redirected to https, maybe that's the problem? [15:57:16] oh nuria_ I forgot, I'll write imply folks a question on their google group forum [15:57:58] ottomata: sure that it is not the current yarn.w.o? I am not getting redirected [15:58:03] but getting 401s [15:58:07] (at least via curl) [15:58:30] hm, , pretty sure, i made my /etc/hosts send yarn.wikimedia.org to localhost [15:58:32] and have [15:58:34] tunnel as [15:58:46] hmm, oh wait, maybe tunnel bad [15:59:01] nope, same [15:59:03] my tunnel is [15:59:09] ssh -N stat1002.eqiad.wmnet -L 8082:stat1001.eqiad.wmnet:80 [15:59:25] then going to http://yarn.wikimedia.org:8082 in browser gives me [15:59:30] yarn.wikimedia.org sent an invalid response. [15:59:30] ERR_SSL_PROTOCOL_ERROR [15:59:32] https://groups.google.com/forum/#!topic/imply-user-group/Vz2XTJT144Y [15:59:43] elukey: , but via curl i see auth error [15:59:50] 401 [16:00:25] ottomata: I used the same curl :P [16:00:30] sorry same tunnell :D [16:00:33] :) [16:02:53] It seems that you are hitting the main w.o page though with http://yarn.wikimedia.org:8082 [16:03:18] is every req of the browser tunneled? [16:04:35] hm, well, my yarn.wikimedia.org resolves to 127.0.0.1 [16:04:46] it hsould anyway... [16:05:21] Analytics, Reading-analysis: [REQUEST] Extract search queries from HTTP_REFERER field for a Wikibook - https://phabricator.wikimedia.org/T144714#2611328 (Milimetric) [16:05:38] elukey: you are getting the 401 in the browser? [16:06:02] no I am not able to reach yarn.. even with /etc/hosts modified [16:06:03] mmmmm [16:06:50] hm, i think you are right, i don't see my requiest in apache logs on stat1001 [16:08:04] wikimedia/mediawiki-extensions-EventLogging#596 (wmf/1.28.0-wmf.18 - 8ae667f : Chad Horohoe): The build has errored. [16:08:04] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/8ae667f52f92 [16:08:04] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/157929811 [16:08:44] elukey: i see my request in analytics_access.log [16:08:59] dunno why thouygh, but i guess that's the default vhost [16:09:54] so somehow my header is not set properly...i think [16:11:02] elukey: dunno what's going on, but it won't hurt to proceed anyway [16:11:12] the site isn't up, and clearly the auth is doing something [16:11:21] i mean [16:11:21] it isn't being used by anyone, since it is disabled [16:11:33] so you won't break it for anyone [16:14:29] so I just used curl localhost:8088 --header "Host: yarn.wikimedia.org" --user elukey:blalba --verbose [16:14:42] and got [16:14:43] [Tue Sep 06 16:13:15.115575 2016] [auth_basic:error] [pid 23780] [client 10.64.21.101:39314] AH01617: user elukey: authentication failure for "/": Password Mismatch [16:14:54] that is expected since my pass is not right :) [16:16:08] so I am almost sure that it is doing its work, just wanted to check a valid login before proceeding [16:33:55] headed to cafe, back before staff [17:16:08] hola nuria_. I'm getting back to Lars re "Getting search engine terms for specific wkibook" [17:35:04] Analytics, Spike: Spike: Evaluate alternatives to varnishkafka: varnishevents - https://phabricator.wikimedia.org/T138426#2399826 (Ottomata) I don't understand how varnishevent could replace varnishkafka. varnishevent just seems like a better varnishncsa. There is no Kafka support; we'd have to build it. [17:45:25] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2141764 (BBlack) Re: public interface hostname, can we pick a new one separate from `stream.wikimedia.org`? It looks so far like we're going to face issues deprecating insecure (non-TLS) acc... [17:46:49] Analytics-Kanban: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#2611862 (Nuria) [17:46:51] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2611861 (Pchelolo) >>! In T130651#2590955, @Ottomata wrote: > Interesting! @Pchelolo can you link to code? Would like to be inspired :) The whole file is quite relevant, but this is the m... [17:48:47] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2611875 (Ottomata) Can do, I'm sure we can think of something :) I love bikesheds. [17:58:30] joal: you said we needed to chat about the Option problem? [17:58:44] milimetric: Would be a good idea yes [17:59:00] milimetric: joining back ! [17:59:39] Analytics: Investigate lowering "per-article" resolution data in AQS - https://phabricator.wikimedia.org/T144837#2611931 (Nuria) [17:59:45] ottomata: I used a chrome plugin to test yarn, all works :) filed a code review for VCL: https://gerrit.wikimedia.org/r/308776 [18:00:00] Analytics, EventBus, Services: Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#2611944 (GWicke) See {T120242} for earlier discussion on the same topic. We discussed writing events to an internal table as part of the primary transaction, followed by an... [18:00:22] great :) [18:00:41] * elukey goes afk! [18:00:43] byyyeeeeee [18:01:08] byyyee [18:01:09] ! [18:01:23] bye elukey ! [18:01:44] Analytics-Kanban: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#2611946 (Nuria) Current capacity of AQS/Pageview API is documented here: https://wikitech.wikimedia.org/wiki/Analytics/AQS#Scaling:_Settings.2C_Failover_and_Capacity_Projections We know that at our current resolution... [18:02:02] ottomata: new capacity ticket: https://phabricator.wikimedia.org/T144833 [18:02:26] a-team: future goal regarding lowering resolution: https://phabricator.wikimedia.org/T144837 [18:02:34] please amend as needed. [18:02:48] milimetric: will push datepicker change in a bit. [18:04:10] leila: back now. [18:04:23] nothing specific, nuria_. I responded on Analytics. [18:04:34] leila: ah ok, yes [18:05:21] Analytics, Reading-analysis, Research-and-Data, Research-consulting: Propose metrics along with qualifiers for the press kit - https://phabricator.wikimedia.org/T144639#2611969 (leila) @Nuria yes, this is Research work. We will ping you if we need help. Thank you! :) [18:12:24] (PS4) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [18:12:29] milimetric: --^ [18:12:53] yep, thx [18:14:48] (CR) jenkins-bot: [V: -1] [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [18:15:50] Analytics-Kanban: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#2611999 (Ottomata) Ok, some HW context! We recently added 3 new AQS nodes (aqs100[456]) to replace 3 OOW ones (aqs100[123]). We ordered these nodes with 8 very large SSDs, with the intention of increasing capacity by... [18:16:25] milimetric: I'm sorry I pushed too fast :( [18:16:58] milimetric: Some mess in file EditHistory.scala, line 166 (the discussion we were having around initialisers) [18:18:55] leaving for now a-team, see you tomorrow P1 [18:18:55] P1 Test paste - https://phabricator.wikimedia.org/P1 [18:19:14] nite joal, no worries, I'll clean up whatever [18:19:17] rhooo, didn't know about that :) [18:19:26] stashbot and Ps :) [18:19:28] :) [18:19:30] later !! [18:20:18] Analytics, Reading-analysis, Research-and-Data, Research-consulting: Propose metrics along with qualifiers for the press kit - https://phabricator.wikimedia.org/T144639#2605965 (leila) a:ezachte>leila [18:20:44] Analytics, Reading-analysis, Research-and-Data, Research-consulting: Propose metrics along with qualifiers for the press kit - https://phabricator.wikimedia.org/T144639#2605965 (leila) Grabbing the task from Erik as I'll be drafting the proposal based on Erik's input shared late last week. [18:33:07] Analytics, Operations: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2612049 (Tbayer) Thanks - I should clarify that the bug was about the user "piwik" instead of "piwik-wmf-admin", and the password for the former still doesn't work. I am fine with using the admi... [20:09:03] Analytics-Kanban: Reduce rate of events coming from datepicker - https://phabricator.wikimedia.org/T144856#2612479 (Nuria) [20:09:58] (PS1) MaxSem: Ensure all metrics are logged with the same time [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/308816 (https://phabricator.wikimedia.org/T144652) [20:11:51] (PS1) Nuria: Reduce rate of events coming from datepicker [analytics/dashiki] - https://gerrit.wikimedia.org/r/308819 (https://phabricator.wikimedia.org/T144856) [20:12:58] milimetric: will push datepicker fix and legan fix once CR-ed, i added a task with 1 point [20:13:34] nuria_: there's a debugger; on line 43 of that last patch [20:19:52] (CR) Milimetric: [C: -1] "apply no longer works with this configuration. I think the blur handler is always firing. Also, you'd have to save the last date selecti" [analytics/dashiki] - https://gerrit.wikimedia.org/r/308819 (https://phabricator.wikimedia.org/T144856) (owner: Nuria) [20:31:29] milimetric: arghhhh [20:31:33] milimetric: so true [20:32:02] (PS2) Nuria: Reduce rate of events coming from datepicker [analytics/dashiki] - https://gerrit.wikimedia.org/r/308819 (https://phabricator.wikimedia.org/T144856) [20:36:22] nuria_: take a look at my comment on the patch, there are a couple functionality issues too [20:36:44] milimetric: argh, yes, you are right. bad me for not testing this well enough [20:40:13] np, I have a nasty headache though, I'm gonna go lie down. I'll take a look later if you solve it, but it looked tricky [20:43:05] Analytics-Kanban, EventBus, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-08-16_(1.28.0-wmf.15): Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2612639 (Nuria) [20:43:07] Analytics-Kanban: Wikistats 2.0. Edit Reports: Setting up a pipeline to source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2612640 (Nuria) [20:43:09] Analytics-Kanban, EventBus, MW-1.28-release-notes, Patch-For-Review, WMF-deploy-2016-08-23_(1.28.0-wmf.16): Update MediaWiki hooks to generate data for new event-bus schemas - https://phabricator.wikimedia.org/T137287#2612638 (Nuria) Open>Resolved [21:24:05] joal, FYI: https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FData%2FPageview_hourly%2FSanitization&type=revision&diff=819735&oldid=390304 [22:08:30] Analytics, MediaWiki-API, Reading-Infrastructure-Team: Add pageview stats to the action API - https://phabricator.wikimedia.org/T144865#2612941 (Tgr) [23:09:28] (PS5) Milimetric: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 [23:10:54] (CR) jenkins-bot: [V: -1] [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [23:11:23] (CR) Milimetric: "Joal it turned out the other change didn't have the from*Row methods, they were all added here, so I just patched this. The thing is, Opt" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [23:31:25] (CR) Yurik: [C: 1] Ensure all metrics are logged with the same time [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/308816 (https://phabricator.wikimedia.org/T144652) (owner: MaxSem) [23:48:08] Analytics-Kanban: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#2611862 (EBernhardson) I don't know if it would be relevant, but we recently decomissioned 16 servers from the elasticsearch cluster due to being out of warenty. These machines had 32 SSD's (@300GB each iirc). I'm not...