[07:04:40] wow oozie didn't alarm tonigh [07:04:45] *tonight [08:24:29] Yay elukey :) [08:36:19] o/ [08:36:49] I am not super happy since it probably means that bots didn't hit the problematic URLs.. let's see how it goes :) [08:37:38] joal_: we'd need to reboot druid100[123] :) [08:39:51] elukey: Have you had any info from ottomata on this? [08:40:55] from https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Druid it seems that restarting one node at the time is fine [08:43:25] elukey: I was thinking of clickhouse [08:44:37] ah okok! [08:44:59] so I can see the processes running on each host, it shouldn't be a big issue restarting them [08:45:09] but if you prefer to wait for Andrew we can do it [08:45:13] elukey: ok :) [08:45:44] elukey: other thing to keep in mind is that those hosts have a zookeeper, but I assume it's managed by puppet [08:45:53] So I have no issue restarting them [08:49:00] as long as we do it one at the time we should be super fine [08:49:07] but let's wait andrew [08:49:17] I just want to make sure that we fix them by today [08:49:26] (the Mad CoW issue) [08:50:41] elukey: If you prefer you can reboot them now, and if we don't manage to restart clickhouse we'll ask andrew when he arrives :) [08:51:29] nah no rush [08:51:40] there are too many manual things running :D [09:06:02] By the way elukey, I wanted to thank you and mforns for having taken care of oozie jobs this weekend [09:06:18] elukey: I saw them later that you did ;) [09:06:21] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2737447 (elukey) One thing that is worth mentioning: the `ReqAcct` tag in the logged requests without `Timesta... [09:08:03] joal_: I feel really good that we are finally spreading the Hadoop maintenance load, you were doing everything and even if it is interesting/fun it is a burden for one person only [09:08:17] :) [09:08:34] https://phabricator.wikimedia.org/T148412#2737447 - if you have time can you give me your opinion? [09:08:48] the usual rubber duck time of the day :D [09:09:17] :) [09:09:33] this is related to the Varnish logs that cause troubles in upload [09:09:53] we are logging something that, for some reason, does not send any data to the client [09:10:04] but at the same time, I can't see any error [09:10:23] and it is not even returning a http status code afaik [09:11:17] elukey: weird ! [09:13:34] I am thinking about Http keep alive but not really sure [09:13:50] because it appers to go through the varnishes up to the backends [09:15:02] hm [09:15:45] by the way elukey, should we get back to using bastion1001? [09:16:40] sure, when did you stop using it? Was it reimaged? [09:16:44] I don't recall now :( [09:16:49] huhuhu :) [09:17:00] I stopped using it last week, at reboot day [09:18:32] ahhh okok so you can use it now [09:18:39] ok cool :) [10:57:53] Analytics-Wikimetrics: Include Tulu Wikipedia in Metrics and Quarry - https://phabricator.wikimedia.org/T148950#2737572 (Pavanaja) [11:31:42] hi team! [11:33:42] o/ [11:33:58] hi mforns ! [11:34:05] hello :] [11:45:49] mforns: incase you didn't notice: 11:06:02 < joal_> By the way elukey, I wanted to thank you and mforns for having taken care of oozie jobs this weekend [11:45:53] :) [11:46:27] joal_, hehe, ok I rerun a job that elukey already had rerun [11:46:49] mforns: better twice than none ! [11:46:54] xD [11:49:08] :D [11:52:09] a-team: I'm gonna need to leave early today: Lino has caught my cold (now that I feel better ...) [11:52:26] oh sorry joal_ [11:52:35] mforns: That's life :) [11:52:38] ok, take care [11:52:48] Still here for some time [11:53:03] But yeah, will go with him to the doctor later today [11:54:03] take care! [11:54:45] Thanks lads :) [12:16:05] Analytics-Kanban, Dumps-Generation: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2737704 (mark) [12:16:47] Analytics-Kanban, Dumps-Generation, Dumps-Rewrite: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2661250 (mark) [12:26:05] Analytics-Kanban, Dumps-Rewrite: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2737715 (ArielGlenn) [12:26:08] Analytics-Kanban, Dumps-Rewrite: Improve mediawiki data redaction - https://phabricator.wikimedia.org/T146444#2661250 (ArielGlenn) Removed Dumps-Generation because that's for issues with the current dumps process only. Dumps-Rewrite is the right one. [13:37:49] gone a-team, see you later [14:12:59] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738017 (Ottomata) Nuria, a new git repo. This should be a standalone node module. node-rdkafka-sta... [14:13:55] Analytics-Kanban, EventBus, Operations: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2738019 (Ottomata) Nice! Thanks! [14:20:25] elukey: o/ [14:21:11] o/ [14:21:47] elukey: did i ever make you aware that i'd been working on this? -- https://wikitech.wikimedia.org/wiki/Cassandra/Tools [14:23:14] nice!!! [14:23:20] I didn't know it! [14:23:36] ok, good [14:23:52] well, no good that i failed to mention it before, but good that you know now :) [14:24:49] i'm trying to wrap up https://phabricator.wikimedia.org/T132958 soon (mvp) for https://github.com/eevans/cassandra-tools-wmf [14:25:11] and the result will be a Debian package sitting in apt.w.o that you can install [14:27:14] elukey: i have this too: https://github.com/eevans/services-adhoc-reports/blob/master/report-topk-partion-sizes [14:27:30] but i'm not sure if that would be helpful for your cluster [14:27:58] it looks for wide-partition warnings in logstash, and sends an email report of the topk results [14:28:09] and it's pretty configurable [14:28:33] the warnings it looks for are generated by compaction [14:29:53] !? [14:30:06] elukey: are your logs going to logstash OK? [14:30:13] looks to me like your dashboard is empty [14:30:57] aha [14:31:21] ...and boom goes the dymatime [14:31:26] dynamite [14:32:43] sorry here I am [14:32:55] elukey: fixed the dashboard, it had the wrong cluster attribute [14:33:01] I haven't seen the wide-rows issue up to now, or at least never caused any problem [14:33:04] thank you! [14:33:18] right, i just looked at your dashboard, i see no such warnings [14:33:27] so it's probably a non-issue [14:35:37] I saw it happening on restbase (or Marko/Filippo mentioning it) [14:35:50] yes :) [14:36:00] a lot. [14:36:58] * elukey reads about wide rows [14:37:27] elukey: as i think about it, you are probably immune [14:37:40] because the wide of your rows are bounded at something quite low [14:37:46] s/wide/width/ [14:43:06] ah ok so if you have a variable number of columns (potentially tons of them) for the same row [14:43:34] looking forward to see your new deb package deployed $everywhere [14:44:50] and I am going to read more about wide rows [14:45:00] is there a phab task for the restbase issue? [14:48:23] ummm, several, probably :) [14:48:34] https://phabricator.wikimedia.org/T148516 is fall out [14:48:51] https://phabricator.wikimedia.org/T144431 [14:51:47] Analytics-Kanban: Create documentation for edit history reconstruction - https://phabricator.wikimedia.org/T139763#2738203 (mforns) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Denormalization_and_historification [14:59:02] thanks! Going to read them :) [15:00:40] elukey, joseph: standddupppp [15:00:46] elukey, joal: standddupppp [15:06:17] Analytics-Kanban, Discovery, Operations, Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2738322 (Nuria) [15:09:59] (PS3) Nuria: Enhancing regex to support pageviews to non-knowledge wikis [analytics/refinery/source] - https://gerrit.wikimedia.org/r/316845 (https://phabricator.wikimedia.org/T130249) [15:11:06] (CR) Nuria: Enhancing regex to support pageviews to non-knowledge wikis (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/316845 (https://phabricator.wikimedia.org/T130249) (owner: Nuria) [15:21:05] Analytics-Kanban: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2738364 (Nuria) [15:21:44] a-team, back for some time (while Lino sleeps) [15:22:11] Analytics-Kanban: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2738376 (Nuria) A more through explanation about "miss-bucketing": - The 'count_different' field in webrequest_sequence_stats indicates how many sequence numbers were th... [15:22:13] joal we're grooming in 8 mins [15:22:30] Analytics: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2738377 (Nuria) [15:24:46] i have just mentioned in standup "oozie has been quiet during the past two days" [15:24:50] bam error [15:24:52] sigh [15:25:08] hehehehe [15:25:21] let me rerun it elukey [15:25:29] thanks :) [15:26:03] ottomata : i think i need more info for the node statsd code, let's touch base after standup [15:28:05] ja [15:28:26] elukey: have you restarted druid machineS? [15:29:21] !log created 0005238-161020124223818-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-21-17 (oozie errors) [15:29:51] !log actually it was webrequest-load-check_sequence_statistics-wf-upload-2016-10-24-14 [15:30:44] elukey,ottomata : grooomingggg [15:34:02] Analytics-Kanban: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2738417 (Nuria) [15:36:07] Analytics, Analytics-Cluster, Research-and-Data, Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2738419 (Ottomata) [15:39:48] Analytics, Reading-Web-Backlog: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2732531 (Nuria) @JKatzWMF : if we were to guess is your #2 suggestion. Pivot is just showing dat afrom pageview_hourly thus the tool doesn't aggreggate data any differently [15:42:42] Analytics: Edit analysis dashboard Failures by User Type chart does not update correctly - https://phabricator.wikimedia.org/T148656#2729192 (Nuria) One choice might be to remove graph [15:45:49] Analytics-Kanban: Research Spike: Provide data on top pageviews across all projects daily - https://phabricator.wikimedia.org/T126367#2738453 (Nuria) [15:46:10] Analytics-Kanban: Research Spike: Provide data on top pageviews across all projects daily - https://phabricator.wikimedia.org/T126367#2012359 (Nuria) [15:46:14] Analytics-Kanban, Patch-For-Review: Count pageviews for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2738454 (Nuria) [15:48:16] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2010185 (Nuria) Who would be the users of such an API? What is the value proposition of such an APi? [15:50:38] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2738465 (Nuria) ping @Tgr [15:50:44] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2010185 (Nuria) Also, you can select a random (time-wise) subset of pageviews using hive commands, seems that this would be sufficient. [15:52:52] Analytics: Standardize Hive UDF code comments and generate documentation {flea} - https://phabricator.wikimedia.org/T114238#2738480 (Nuria) [15:52:53] Analytics-Kanban, Easy: Standardize logic, names, and null handling across UDFs in refinery-source {hawk} - https://phabricator.wikimedia.org/T120131#2738483 (Nuria) [15:53:19] Analytics, Analytics-Cluster, Need-volunteer: Create new analytics cluster / hadoop role for mediawiki vagrant - https://phabricator.wikimedia.org/T115707#2738485 (Nuria) [15:53:35] Analytics, Analytics-Cluster, Need-volunteer: Create new analytics cluster / hadoop role for mediawiki vagrant - https://phabricator.wikimedia.org/T115707#1730762 (Nuria) Open>declined [15:53:39] Analytics, Analytics-Cluster, Need-volunteer: Create new analytics cluster / hadoop role for mediawiki vagrant - https://phabricator.wikimedia.org/T115707#1730762 (Nuria) Doesn't look like hadoop role will actually be used on vagrant [15:58:54] Analytics: Add scala 2.10.4 to stat1002 and analytics1027 - https://phabricator.wikimedia.org/T115970#2738491 (Nuria) Open>declined [15:58:55] Analytics: Add scala 2.10.4 to stat1002 and analytics1027 - https://phabricator.wikimedia.org/T115970#1737361 (Nuria) Not needed any longer. [15:58:59] Analytics-Kanban, Patch-For-Review: Count pageviews for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2738495 (Lokal_Profil) Could se.wikimedia.org (Wikimedia Sverige) also be added? [16:11:44] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738560 (Ottomata) Done this before for ganglia and logster(ish): https://github.com/wikimedia/opera... [16:11:48] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738561 (Nuria) Gerrit module. Node module to talk to statsd? ES6 syntax? Mocha for testing (... [16:14:00] Analytics: Put wikistats latest in gerrit - https://phabricator.wikimedia.org/T148842#2738577 (Nuria) Open>Resolved a:Nuria [16:15:33] elukey: how are we getting the varishkafka stats into graphite? [16:16:14] I have no idea :D [16:16:37] there was a json something daemon [16:16:42] jajaja [16:18:58] IIRC it reads /var/cache/varnishkafka/webrequest.stats.json and reports to statsd [16:19:48] no probably I am wrong [16:20:07] tried sudo lsof | grep "/var/cache/varnishkafka/webrequest.stats.json" but can't see anything [16:22:29] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738642 (Nuria) Every time i have created a gerrit depot I have requested it here: https://www.mediaw... [16:23:16] mforns: are we still working on this one or should it go back into paused? https://phabricator.wikimedia.org/T148053 [16:24:11] nuria, we thought this would be better done after https://phabricator.wikimedia.org/T147884 [16:24:23] so I was waiting to finish that one [16:24:32] I'l move it into paused [16:24:41] mforns: just did, sounds good [16:24:45] ok [16:25:29] Analytics-Dashiki, Analytics-Kanban: Switch to fetch away from jquery - https://phabricator.wikimedia.org/T148053#2738654 (mforns) Paused, because it depends on https://phabricator.wikimedia.org/T147884 being finished. [16:27:05] Analytics-Kanban: Research Spike: Provide data on top pageviews across all projects daily - https://phabricator.wikimedia.org/T126367#2738677 (Nuria) This will be closed when parent task is completed. (T130249) [16:28:03] Analytics-Kanban: Provide data on top pageviews across all projects daily - https://phabricator.wikimedia.org/T126367#2012359 (Nuria) [16:28:19] ottomata: JsonLogster ? [16:33:40] OH HYEAHHHHH [16:39:34] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738729 (Ottomata) Ah! I have been reminded, that the more recent way we get stats from varnishkafka... [16:44:55] ottomata: even if I don't see logster running (but the pkg is installeD) [16:45:19] ahhh it's a cron! [16:45:58] yess # Puppet Name: logster-varnishkafka-webrequest [16:46:08] jaaa :) [16:50:29] mforns: Heya [16:50:36] joal_, helloo [16:50:46] mforns: quick question for you [16:50:52] shoot [16:50:57] mforns: Do you have a fork of the plywood repo ? [16:51:03] joal_, no [16:51:09] mforns: repo is down on github :( [16:51:15] again! [16:51:18] mwarf [16:51:25] mmm [16:51:30] mforns: I think Dan might have, but I don't want to bother him [16:51:39] I'll see tomorrow :) [16:51:45] joal_, you mean pivot? [16:51:49] or plywood [16:52:03] mforns: plywood really [16:52:10] no no [16:52:18] The idea would be to be able to play with it [16:52:31] mforns: I'll possibly try just using npm and see [16:52:44] aha [17:36:33] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2619979 (GWicke) @Ottomata, we have a nice [metric reporting abstraction in service-runner](https://g... [17:42:31] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2738967 (Ottomata) +1, this interface is basically the same as nodes-statsd, ja? So yeah, likely thi... [17:46:33] * elukey afk! [17:54:51] hey a-team, going afk too. gonna drive to CT, will work later into the evening, back laters! [17:55:02] bye ottomata [18:03:02] a-team, away for now ! [18:21:37] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2739145 (GWicke) @ottomata, it is almost identical to node-statsd, but also adds some minor API exten... [18:29:00] Analytics-Kanban, Mobile-Content-Service, Wikipedia-Android-App-Backlog, Patch-For-Review, Spike: [Feed] Establish criteria for blacklisting likely bot-inflated most-read articles - https://phabricator.wikimedia.org/T143990#2739166 (Mholloway) @Jdlrobson pointed out an interesting alternative... [19:29:16] Analytics-Kanban, Discovery, Operations, Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2739351 (mpopov) >>! In T147682#2738052, @Ottomata wrote: > Oh, just re-read earlier parts... [19:40:16] mforns: since it is my ops week Iam going to take a look at the IE7 issue [19:40:40] nuria, ok, do you want any help? [19:40:57] mforns: nah, will just try to stablish first if usage is truly increasing [19:41:34] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2739398 (Nuria) [19:43:41] nuria, from what I saw, IE7 went from 1.4% (of the total pageviews) to 2.4% in one year (mid 2015 to mid 2016) [19:44:03] mforns: pivot to the rescue! [19:44:51] global IE went from 12% to 11% in the same year, so weird [19:48:15] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2739426 (Nuria) {F4650284} [19:48:55] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2739429 (Nuria) See IE usage, indeed IE7 seems to be increasing when compuing daily measures [19:50:25] nuria, it can be that requests before handled by IE8 IE9 and IE10 that got into compatibility mode, now are handled by IE11 which also may be processing them in compatibility mode (IE7) [19:50:55] so IE7 requests would make sense to stay more or less the same [19:53:21] mforns: mmm.. in ms docs it says that compativbility mode starts with ie8 [19:53:26] mforns: now, amybe maybe [19:53:35] in IE8 you needed to set it [19:53:47] and in IE11 is set by default mforns ... [19:54:48] mforns: gotta love ms urls : https://msdn.microsoft.com/en-us/library/gg699485(v=vs.85).aspx [19:59:42] Analytics-Kanban, ChangeProp, EventBus, Wikimedia-Stream, Services (watching): Write node-rdkafka event.stats callback that reports stats to statsd - https://phabricator.wikimedia.org/T145099#2739445 (Ottomata) As long as the interface is the same, it should work. Coo [20:00:40] nuria, hehe [20:01:04] nuria, yes, maybe the increase in IE11 is accountable for a greater share of IE7 compat mode [20:10:57] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2739475 (Nuria) If compatibility mode is triggered we would expect wikipedia to be in the IE compatibility list: https://msdn.microsoft.com/en-us/library/gg622935(v=vs.85).aspx and it is t... [20:21:28] Analytics-Kanban: Inconsistant data in #all-sites-by-os-and-browser fot IE7 - https://phabricator.wikimedia.org/T148461#2739522 (Nuria) Our stats indicate that IE7 usage increases in Windows 7. However for Ie8 and IE9 usage is lowering in all platforms. And given that our usage of windows 7 is stable overa... [21:09:05] hello, I'm getting numerous users of tools.wmflabs.org/pageviews saying there is not data showing up for October 16 onward. Have you received any similar reports? [21:09:50] no affecting code changes have happened in the app since then, so I think it's the RESTBase API [21:10:53] Analytics, Editing-Analysis: Determine: What percentage of new articles are created by non-autoconfirmed editors - https://phabricator.wikimedia.org/T149021#2739817 (Quiddity) [21:44:51] Analytics-Kanban, Discovery, Operations, Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#2740149 (Ottomata) Crap crackers. I just added bts and boom to our apt, but alas, there ar... [21:53:57] Analytics, Editing-Analysis: Create a chart showing percentage of new articles created each month that have not survived to the present - https://phabricator.wikimedia.org/T149049#2740242 (kaldari) [21:58:12] Analytics: Provide API for sampling pageviews - https://phabricator.wikimedia.org/T126290#2740273 (Tgr) Random pages are useful for checking the user impact of a bug or a feature that does not handle some special case. If I sample 1000 pages and 10 of them look broken, that means roughly 1% of our pageviews...