[00:46:35] Analytics: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2724322 (Zebulon84) Then is it possible for WikiMedia to turn off this compatibility mode with the « header('X-UA-Compatible: IE=edge'); » solution suggested on the stackoverflow link ? [05:51:35] !log created the oozie coordinator 0035415-160922102909979-oozie-oozi-C for webrequest-load-check_sequence_statistics-wf-upload-2016-10-18-[234] [06:16:48] !log created the oozie coordinator 0035451-160922102909979-oozie-oozi-C for webrequest-load-check_sequence_statistics-wf-upload-2016-10-18-5 [07:23:49] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2724574 (jcrespo) > Motivation. For our use cases in labs, analytics, and dumps, it would be nice if there was a real-time and safe-for-public-consumption replica in production. All good here, but I had to stop re... [07:41:10] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2724580 (elukey) I tried to compare a Miley Cyrus link logging correctly a 400 (and Timestamp:Resp) with a "ba... [09:24:10] !log merged puppet change to force varnishkafka to use -q 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not ReqHeader:Upgrade ~ "[wW]ebsocket" and not HttpGarbage' and -L 10000 [09:24:16] joal: --^ [09:44:09] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2724697 (ArielGlenn) Since I was added (thanks!), let me weigh in briefly. Note that "dumps" includes not just sql tables, generation of xml dumps of metadata for pages/revisions and xml files of revision content,... [10:01:38] Hi elukey [10:01:51] elukey: Thanks for the oozie relaunchs [10:02:19] elukey: I'd need more explanations on the change you made in varnishkafka if you want me to understand how it'll impact us :) [10:14:08] suuure! [10:14:32] I put a summary in https://phabricator.wikimedia.org/T148412 [10:15:17] so 'HttpGarbage' is a Varnish tag for a request arrived with malformed inputs [10:15:28] and discarded by Varnish since not following the HTTP specs [10:15:38] but it gets logged anyway in VSL [10:15:50] so it might generate spurious logs to kafka too [10:16:23] and I put in the varnishkafka VSL query to discard every req with HttpGarbage [10:16:38] second one is increasing the -L option to 10000 [10:17:09] that means keeping a maximum of 10000 incomplete request logs in varnishkafka's memory before flushing them [10:17:15] elukey: Cool ! Better understanding thanks ! [10:17:20] super :) [10:17:31] the last thing is the Miley Cyrus link [10:17:44] but I believe that it could be a different nature [10:18:08] something like Varnish issuing a backend request that fails, and not reporting the end time [10:18:17] but I am a bit far from the solution [10:18:31] and sadly it is the cause of the oozie complains [10:26:39] elukey: We should discuss with team if it's necessary to invest some time in a more robust way to handle - as timestamps [10:28:57] joal: sure, but at the same time I am hoping that at steady state we'll stop spend time on these issues.. [10:29:03] for example as happened for misc/maps [10:29:32] elukey: I understand, I just wonder what is easiest [10:30:04] elukey: it now makes a few month we have to deal with that issue ... I think we first started talking about it in Berlin :) [10:30:17] elukey: And while we have made good progress, there still are corner cases [10:32:05] joal: yeah you are right [10:32:24] and I supect that text will bring up new monsters [10:33:20] even if I hope that upload is the only problematic use case since the Traffic team got into big difficulties to make it work with the Open source version of Varnish [10:33:54] elukey: And actually let me be fair: 12:30:17 < joal> elukey: And while YOU have made good progress, there still are corner cases [10:34:13] * joal still needs to learn how to use formatting in IRSSI [10:37:53] well you guys helped me a ton, from Berlin onwards.. without hours spent in discussions vk would still be broken [10:38:01] :) [10:38:30] You still did all the work elukey :) [10:53:13] hi team! thanks elukey for taking care of oozie reruns [10:54:07] super welcome, this is my special good morning from oozie nowadays [10:55:08] "Hello Luca, you know what? I still don't like Varnishkafka. Have a good day!" [10:55:23] "Yours, Oozie the complainer" [11:01:28] * elukey lunch! [11:01:47] Enjoy food elukey :) [11:01:51] Hi mforns :( [11:01:59] sorry, wrong typing mforns :D [11:02:01] hi joal! [11:26:09] haha [11:26:37] hey milimetric [11:27:15] hi joal [11:27:28] lemme check on the sqoop, last I saw yesterday it was still on enwiki [11:27:51] 91G so far [11:28:15] enwiki, commonswiki, and part of wikidata [11:28:35] those big ones take forever, I think tungsten is almost a must [11:34:28] milimetric: I don't know enough on tungsten to have a good feeling if it'll help or be a burden :) [11:34:48] milimetric: If it's like managing mysql replication issues on hadoop - I'd rather wait for import ;) [11:35:00] is there any practical difference in having access to stat100[24] with/without stat1003? [11:35:26] elukey: I think stat1003 is for research [11:36:21] because I have a request from the reading team for all the stat boxes [11:36:40] but I think that stat100[24] are enouh [11:36:42] *enough [11:38:49] elukey: depends if they want to access mysql or not [11:38:53] elukey: I think [11:39:04] elukey: milimetric might know better --^ [11:40:34] ahh so analytics-privatedata-users does not grant it [11:40:56] elukey / joal: stat1003 only has mysql replica access and you can get access to the mysql password on there in a file in /etc/ [11:40:59] elukey: I'm not sure, I almoste never use mysql, and always do it from stat1003 [11:41:10] milimetric: I knew you knew ! [11:41:13] stat100[24] have hadoop access, and therefore privatedata [11:42:05] yep.. but stat1002 is the only one with stat1003 with mysql/conf.d [11:42:15] ? [11:42:28] elukey: that sentence doesn't parse :) [11:42:33] elukey@stat1002:~$ ls -l /etc/mysql/conf.d [11:42:33] total 20 [11:42:33] -r--r----- 1 root analytics-privatedata-users 108 Mar 4 2015 analytics-research-client.cnf [11:42:36] -r--r----- 1 root root 108 Jan 23 2015 research-client.cnf [11:42:39] -r--r----- 1 root analytics-wmde 108 Jul 11 18:01 research-wmde-client.cnf [11:42:42] -r--r----- 1 root statistics-privatedata-users 108 Mar 4 2015 statistics-private-client.cnf [11:42:45] -r--r----- 1 root stats 108 Mar 4 2015 stats-research-client.cnf [11:42:48] yes sorry :D [11:42:57] uh... oh? [11:43:10] I guess maybe then stat1002 is a pure superset? [11:43:13] so stat1004 does not have this one, meanwhile stat100[23] do [11:43:17] lemme try connecting to the replicas from there [11:44:35] elukey: ok, so right, but weird and probably broken [11:44:45] because the password we'd use is research-client.cnf [11:44:52] but it's only accessible by root [11:45:50] ah and on stat1003 [11:45:51] elukey@stat1003:~$ ls -l /etc/mysql/conf.d [11:45:51] total 8 [11:45:51] -r--r----- 1 root researchers 108 Mar 4 2015 research-client.cnf [11:45:52] -r--r----- 1 root stats 108 Mar 4 2015 stats-research-client.cnf [11:46:01] elukey: ok, verified that I can login with analytics-research-client.cnf [11:46:09] on stat1002 [11:46:24] on stat1004 it looks like neither mysql nor the files are present [11:46:29] yep yep [11:46:31] (like mysql itself is not installed) [11:47:04] ok, so ... stat1003 for normal access to replicas, stat1002 for needlessly confusing access to replicas and also hadoop, stat1004 for only hadoop [11:47:09] makes... sense? :) [11:47:29] milimetric :D [11:47:58] suuuure [11:47:58] I would make research-client.cnf readable by researchers on stat1002 [11:49:12] checking puppet [11:49:16] joal: ok, so tungsten I have no idea how it is to maintain, but it does read the binlog. What it does after that is interesting and unique as far as I've found in this ecosystem. It stores and updates a mirror of what it replicates and uses that to periodically update hdfs. This way it can replicate updates, deletes, etc. [11:49:53] so in theory we could get incremental-ish updates. Which I don't think we can get any other way that I've sliced this problem [11:50:27] so from stat1002's puppet [11:50:32] # Include the MySQL research password at [11:50:32] # /etc/mysql/conf.d/analytics-research-client.cnf [11:50:32] # and only readable by users in the [11:50:32] # analytics-privatedata-users group. [11:50:32] include passwords::mysql::research [11:50:34] mysql::config::client { 'analytics-research': [11:50:37] user => $::passwords::mysql::research::user, [11:50:39] pass => $::passwords::mysql::research::pass, [11:50:42] group => 'analytics-privatedata-users', [11:50:44] mode => '0440', [11:50:47] } [11:50:53] ah snap no [11:51:05] sorry research-client is the one [11:51:22] can't grep clearly since we have stuff like mysql::config::client { ' :D [11:51:24] yeah, though it seems like this evil was perpetrated very much on purpose... weird [11:51:53] node 'stat1003.eqiad.wmnet' { [11:51:53] role(statistics::cruncher) [11:51:53] include passwords::mysql::research [11:51:53] # This file will render at [11:51:53] # /etc/mysql/conf.d/research-client.cnf. [11:51:55] mysql::config::client { 'research': [11:51:58] user => $::passwords::mysql::research::user, [11:52:00] pass => $::passwords::mysql::research::pass, [11:52:03] group => 'researchers', [11:52:05] mode => '0440', [11:52:08] milimetric: yeah, since it's binlog based, any kind of issue you get with classic replication you get with tungsten ;) [11:52:08] } [11:52:10] I can find it only in here [11:52:13] so it shouldn't be on stat1002 [11:52:14] milimetric: But worth the try for sure [11:52:39] ok so I'll remove it from stat1002 [11:52:46] and see if puppet re-creates it [11:53:04] elukey: maybe someone made it because they didn't know about the other file [11:53:51] it seems wrong to have different files with different names and access for the same thing... [11:55:24] !log removed /etc/mysql/conf.d/research-client.cnf from stat1002 (root:root perms, not supposed to be there but only on stat1003) [11:55:25] joal: yeah, people seem to trust binlog replication a lot more than me, maybe it's not so bad [11:56:30] milimetric: it has proven to work (at least some), but I have always seen it cause problems (less with postgres, but still) [12:16:55] milimetric: https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups - just updated [12:19:16] that looks good, elukey [12:19:59] I am not sure if analytics-wmde has access to Hadoop [12:36:50] (PS28) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [12:36:52] (PS21) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [13:02:51] Analytics-Cluster, Operations, hardware-requests: Decommission analytics1026 and analytics1015 - https://phabricator.wikimedia.org/T147313#2725047 (elukey) p:Triage>Normal [13:31:11] (PS22) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [14:11:32] joal: added https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?editorTab=Metrics&panelId=25&fullscreen [14:11:35] :) [14:12:11] Thanks elukey @ [14:21:19] ottomata: aloha! would you mind to review https://wikitech.wikimedia.org/wiki/Analytics/Data_access whenever you have time? [14:23:20] looking [14:23:45] sorry https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups [14:23:50] only this one [14:27:47] elukey: looks amazing! maybe change the heading for hte two tables, so that it is clear that one is showing which machines get access, and the other which mysql.cnf files get access [14:27:58] there is no mention of on disk log files for stat1002...maybe no one really uses those any more :o [14:28:18] that was the orignal reason for a difference between stat1002 and stat1003 (and statistics-privatedata-users vs statistics-users) [14:29:04] ah yes I wanted to ask about them [14:29:07] I forgot to add [14:29:11] checking it [14:29:24] also adding the heading, thanks! [14:55:15] (PS23) Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - https://gerrit.wikimedia.org/r/307903 (owner: Milimetric) [15:00:16] mforns, ottomata : standdduppp [15:00:27] GAH [15:00:28] gankd [15:00:42] got headphones and everything 5 mins ago.. [15:31:54] ottomata: I am on stat1002 and checking /mnt/hdfs [15:32:17] to see what statistics-privatedata-users can check (I've read sampled webrequest logs) [15:32:20] but I don't find it [15:34:49] statistics-privatedata-users don't have access to hadoop by default [15:34:50] so nothing in hdfs [15:35:00] there's no special group read perms with it either [15:35:03] previously it was [15:35:07] (before hadoop) [15:35:15] stat private users: stat1002 [15:35:15] stat users: stat1003 [15:35:18] and stat1002 had logs in [15:35:29] /a/log/webrequest/archive [15:35:36] ahhhh [15:35:43] wrong link [15:35:44] sorry [15:35:45] :P [15:35:50] err wrong pat [15:35:52] *path [15:35:55] thanksss [15:35:58] updating docs [16:19:04] Analytics, Analytics-Cluster: Audit fstabs on Kafka and Hadoop nodes to use UUIDs instead of /dev paths - https://phabricator.wikimedia.org/T147879#2725808 (elukey) p:Triage>High [16:44:36] Analytics: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2725900 (Nuria) @Zebulon84: couple things come to mind. 1) we have to prove the theory that is the compatibility mode driving the number of IE7 requests 2) adding a header like that one might al... [16:55:35] milimetric: do you have a minute? [16:56:44] or maybe ottomata [16:57:12] I am reviewing a phab task from the reading team [16:57:16] to get access to stat boxes [16:57:18] all of them [16:57:27] and it doesn't make much sense to me [16:57:42] elukey: can you past task? [16:58:16] nuria: https://phabricator.wikimedia.org/T148472 [16:58:44] as far as I can see with analytics-privatedata-users one can check MariaDB Slaves and Hadoop [16:58:57] meanwhile stat1003 is more for researchers [16:59:03] elukey: there are also eventlogging logs on disk [16:59:04] on stat1003 [16:59:09] they might be on stat1002 too, not sure [16:59:13] but that is probably what they want [16:59:50] ok so another thing to add to the list [16:59:51] :) [17:00:00] elukey: sure! [17:00:11] I want to double check it because too many people ask "give me everything" without telling why [17:00:13] haha, i commend your effort, but its gonna be hard to cover everything there, as it changes [17:00:14] yeah [17:00:27] stat1003 is often used interchangeably as another 'cruncher' node [17:00:28] oh yes I know but at least we have something to start [17:00:38] elukey: he doesn't need access to 1003 [17:00:58] so sometimes people collarborate on a single node [17:01:07] like joseph and dan ahve been doing [17:01:10] computing stuff in /home dirs [17:01:12] and working together there [17:01:15] or even in /srv or /as [17:01:17] /a [17:01:28] so its not always just for data that the analytics team generates [17:01:49] there's also dashboard generation stuff [17:01:52] milimetric: staffff ??? [17:02:01] that does some rsyncing of data around [17:02:13] and uhhhh, i always have to look up which thing happens where [17:02:30] sometimes people (like aaron) have their own stuff to generate datasets and put them in aggregate-datasets or something [17:02:37] and those are rsynced from either stat1002 or stat1003 (can't remember which is which) [17:02:57] ottomata: rejoining? [17:03:09] ottomata: we wil give you couple mins [17:31:07] nuria: interview talk? [17:31:11] still in batcave [17:32:01] * elukey afk! [17:32:12] elukey: sorry was on the phone before, what's up [17:32:17] oh... afk, sorry [17:32:35] milimetric: nono don't worry it was about the stat boxes [17:32:39] :) [17:32:39] oh cool [17:32:46] I was trying to answer to the reading team [17:32:49] and I was confused [17:32:49] :D [17:33:04] looking forward to simplifying that, good nite! [17:33:29] ottomata: fyi I merged the varnishkafka patch [17:33:30] all looks good [17:33:34] I don't remember if I mentioned it during standup [17:33:52] nothing else to handover [17:34:40] yeehaw! [17:45:33] Analytics: Puppetize job that saves old versions of geoIP database - https://phabricator.wikimedia.org/T136732#2726271 (Milimetric) @Nuria no, it's a general-purpose backup, but the only place we would actually need it with our current setup is if geowiki processes fail for a while and nobody notices (which... [18:13:07] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2726372 (Milimetric) >>! In T146444#2724574, @jcrespo wrote: >> Motivation. For our use cases in labs, analytics, and dumps, it would be nice if there was a real-time and safe-for-public-consumption replica in prod... [18:20:21] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2726411 (Milimetric) >>! In T146444#2724697, @ArielGlenn wrote: > Since I was added (thanks!), let me weigh in briefly. > > Note that "dumps" includes not just sql tables, generation of xml dumps of metadata for p... [18:20:27] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2726413 (jcrespo) It seems that my own frustration talked instead of having a civilized response. I apologize sincerely and I would understand if you do not want to work with me any more. I would be happy to meet... [18:25:59] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2726434 (Milimetric) >>! In T146444#2726413, @jcrespo wrote: > It seems that my own frustration talked instead of having a civilized response. > > I apologize sincerely and I would understand if you do not want to... [18:34:26] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2661250 (Nuria) @jcrespo, @milimetric: let's start all over here as we work better as a team. The fact that we get so fired up when talking about privacy means that we care. We all want 1) the data redaction to b... [19:04:13] bye a-team, see you tomorrow! [19:05:23] bye! [19:23:41] Analytics-Kanban: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2726673 (Milimetric) Ok, everyone, I spoke to @jcrespo and we will collaborate on this project but there are too many dependencies to resolve first. I will try to help with those this quarter and we will resume ta... [19:48:33] milimetric: coo, we good on the el alarm [19:48:33] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=graphite1001 [19:48:33] also,i removed the valid - raw or whatever graph from grafana [19:48:53] and references to server side raw there [19:48:54] https://grafana.wikimedia.org/dashboard/db/eventlogging [19:50:18] milimetric: also, if you have a sec, wanna ask you your preference about something [20:19:38] Analytics-Kanban, Patch-For-Review: Examine puppet code for Event Logging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2726925 (Ottomata) Looking good: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=graphite1001 I also modified graf... [20:20:08] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Examine puppet code for EventLogging and make sure monitoring is using the best counts - https://phabricator.wikimedia.org/T147321#2726927 (Ottomata) [20:55:17] Analytics-EventLogging: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2727024 (Tbayer) Forgot to CC @mpopov and @chelsyx earlier regarding the MobileWebSearch schema. Indeed this drop seems to show clearly at http://searchdata.wmflabs.org/m... [21:03:33] Analytics: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2727053 (Milimetric) I found another issue with this, re-opening. To login to piwik, people needed to be in the ops group. Which explains why some people were having trouble logging in. This change makes it... [21:03:43] Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2727054 (Milimetric) Resolved>Open [21:35:55] Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2727204 (akosiaris) Commented already in the change. The Require directive in apache (which is what get populated from the variable in changed in the above chageset) is a logical OR, not a logical AND. [21:51:23] Analytics-Kanban: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2727225 (Milimetric) Open>Resolved a:Milimetric makes sense, closing this again then, and will try to follow up case-by-case.