[05:49:50] morning! fyi the druid historial on druid1006 died because OOM this morning [05:49:53] [Fri Sep 22 00:05:49 2017] Out of memory: Kill process 7287 (java) score 421 or sacrifice child [05:52:07] (going afk again, ttl!) [08:27:36] Hi a-team [08:27:44] o/ [08:27:46] hellooooo joseph! [08:27:52] :) [08:28:27] elukey: The druid-historical OOM might be related to the change we made with Andrew this week [08:28:41] * elukey wants druid metrics [08:29:20] * joal hears "jmx-prometheus is urgent" :) [08:29:53] not that easy, druid exposes metrics via api, so we'll need to have an ad hoc prometheus agent [08:30:03] elukey: Mwarf [08:31:03] Metrics are emitted as JSON objects to a runtime log file or over HTTP (to a service such as Apache Kafka). Metric emission is disabled by default. [08:31:27] http://druid.io/docs/latest/operations/metrics.html [08:31:51] the last time that I've checked the druid's mbeans I didn't find much :( [08:31:51] elukey: Sending them though kafka would be great actually, wouldn't it? [08:31:56] nope [08:32:02] Ahh :( [08:32:27] I mean, the prometheus masters might poll from kafka but it seems better to poll directly from the host [08:32:38] ok [08:32:46] we already have python scripts that implements prometheus agents, I'll "take inspiration" from them :P [08:32:51] elukey: You're the master, master [08:33:07] suuuuuuuure [08:36:07] elukey: a completely different topic, but somewhat related: how are you doing on druid auth stuff? [08:39:44] the puppet code is completed, andrew had some minor comments but they should be addressed now.. The remaining thing is the authz part, still not sure what ops would like to see (will ping them in a bit) [08:40:20] when you say authz, you mean parsing the posted json, right? [08:42:04] ah yes sorry, I mean authorization for xyz [08:42:19] even if you authenticate correctly [08:42:40] elukey: While I do understand, I think this step is overkill [08:43:17] elukey: I wonder if splitting clusters is not better Idea (even if not optimal from hardware perspective) [08:44:32] I'd like it more to be honest, but the fact that druid doesn't handle this is really sad [08:45:08] elukey: let's discuss with team around best option [08:46:46] sure, the work for the tlsproxy is basically ready, so adding the lvs + proxy + etc.. should be a matter of half a day of work [08:51:09] elukey: great [09:29:34] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 4 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3626366 (10phuedx) [09:31:42] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 4 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3626371 (10phuedx) @bmansurov @pmiazga: This task requires technical QA /cc @ovasileva. [10:11:13] Hi mforns - are you around? [10:11:17] joal, yes! [10:11:32] mforns: I'm wondering about endpoints again [10:11:40] joal, batcave? [10:11:43] sure :) [10:11:46] omw [10:28:31] (03PS5) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [10:28:38] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3573902 (10ovasileva) @Tbayer - are the enwiki and dew... [11:03:48] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3626668 (10elukey) ``` # elukey@kafka-jumbo1001:~$ curl http://10.64.0.175:7800/metrics -s | grep -i jumbo [..] kafka_network_req... [11:04:33] (03PS6) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [11:27:47] joal: I had to leave before I finished my review last night, copying some comments to your latest patch and doing it there. Just checking with you if you have more patches coming? [11:28:13] Hi milimetric - No plan for more patches comming [11:28:25] milimetric: Sorry for the mess :( [11:29:02] also milimetric so that you know: https://git.io/vdfn1 [11:30:45] Ok folks, taking a break now [11:31:55] * elukey lunch! [11:33:21] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3626699 (10phuedx) [12:12:52] 10Analytics: Request for python package csvsort on stat1005.eqiad.wmnet - https://phabricator.wikimedia.org/T174577#3626835 (10Adrian_Bielefeldt) Then we'll use hadoop. [12:44:50] 10Analytics, 10Analytics-Wikistats: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3626887 (10Erik_Zachte) [12:46:48] 10Analytics, 10EventBus, 10Wikimedia-Stream: Hits from private AbuseFilters aren't in the stream - https://phabricator.wikimedia.org/T175438#3626906 (10Nirmos) Okay, so. I'm working on https://sv.wikipedia.org/wiki/MediaWiki:Gadget-EventStreams.js. It connects to https://stream.wikimedia.org/v2/stream/recent... [13:02:38] 10Analytics-Kanban, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3626966 (10elukey) And we are again over 90%, it seems that we'd need to find a more permanent solution to this problem. This is a proposal from the Analytics team: 1) Free some space on dbstore1002 drop... [13:19:13] (03CR) 10Mforns: "Really amazing work, and so quickly! In the end it has grown a bit complex with all the needed parts (mediawiki-history-yaml-schema, druid" (0325 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [13:41:10] fdans, helloooo you around? [13:41:19] SUP [13:42:18] 'sup! I'm reviewing the line-chart popup, but when I npm run dev, it errors with: [13:42:19] Module not found: Error: Can't resolve '../../../../semantic/dist/components/popup' [13:42:39] I executed: cd semantic; gulp build [13:42:43] but same error [14:04:20] fdans, ^ [14:04:22] goddammit [14:04:38] yeah that stayed there accidentally, sorry mforns [14:04:57] np. I thought I was missing something [14:05:33] mforns: yeah i basically added and installed the popup component locally, then stopped using it and didn't remove the reference [14:05:47] ok ok [14:06:33] just pushed again mforns [14:06:38] fdans, thanks! [14:21:35] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3627241 (10Niedzielski) [15:00:59] ping fdans joal [15:02:19] 10Analytics, 10Analytics-Wikistats: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3627415 (10Erik_Zachte) Major production jobs + visualisations developed by Erik Zachte which are still in use Note: Wikistats 1.0 scripts is a subset **xml dump based stats on wiki conte... [15:05:59] (03CR) 10Milimetric: "Reviewed everything but the endpoint documentation language in the last four files. Will do that after standup." (0318 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) (owner: 10Joal) [15:13:42] 10Analytics-Kanban, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3627449 (10jcrespo) Cool to me. [15:20:50] 10Analytics, 10Analytics-Wikistats, 10Research: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3627482 (10DarTar) [15:21:45] 10Analytics, 10Analytics-Wikistats, 10Research: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3626887 (10DarTar) p:05Triage>03Normal [15:48:20] joal: you're probably done for the day, but if you wanna talk over my comments on my review, let me know [15:48:30] some of them were a little vague like "javascript 6 is fun!" [15:48:43] milimetric: Let's spend a minute if you like [15:48:50] k [15:55:34] 10Analytics: Request for python package csvsort on stat1005.eqiad.wmnet - https://phabricator.wikimedia.org/T174577#3627556 (10Nuria) 05Open>03Resolved [16:17:20] A-team, I'm gonna take this end of day off :) [16:17:34] well deserved joal :] [16:17:44] I'll implement comments and all on monday [16:17:51] Thanks mforns :) [16:22:17] ciaooo joal [16:24:21] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3627616 (10bmansurov) >>! In T175918#3626371, @phuedx wrote: > @bmansurov @pmiazga: Th... [16:26:03] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Have "Last Attracted Developers" information for Gerrit (already exists for Git) - https://phabricator.wikimedia.org/T151161#3627622 (10Aklapper) [16:43:51] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3627732 (10bmansurov) [16:45:02] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3607883 (10bmansurov) [16:48:41] * elukey off! [16:50:01] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3627746 (10bmansurov) {meme, src="southparkfan-approves", below="Baha©®™ approves"} [17:08:33] 10Analytics-EventLogging, 10Analytics-Kanban, 10Page-Previews, 10Readers-Web-Backlog, and 5 others: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3627800 (10phuedx) Wow… You might want to make that image a //little// smaller, @bmans... [17:33:00] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3627902 (10Nuria) Just FYI that mysql backend cannot r... [18:05:24] 10Analytics, 10Analytics-Wikistats, 10Easy, 10I18n, 10Patch-For-Review: WikiReportsLocalizations.pm still fetches language names from SVN - https://phabricator.wikimedia.org/T64570#3627955 (10Dzahn) 2014: Prioritization and scheduling of this bug is tracked on Mingle card 2017: Moving this to deprioritis... [18:22:19] milimetric: one question, on this patch: https://gerrit.wikimedia.org/r/#/c/379137/4 [18:22:44] milimetric: is there any testing that you can think of besides unit testing and editing a page in vagrant? [18:23:07] looking [18:24:55] milimetric: k [18:24:58] nuria_: yeah, testing in vagrant is the best in my opinion. Other people test in beta and production! (yes) but beta isn't quite the same as production. In my limited experience, if you have bugs, you're just as likely to find them in vagrant as anywhere else [18:25:32] milimetric: ok, i will do that some more and merge i guess [18:26:48] milimetric: does vagrant log errors locally? [18:27:25] milimetric: application errors taht is [18:27:28] *that [18:28:44] nuria_: yes, if you vagrant ssh there's a single log file somewhere in /vagrant/log or something like that, it has all the errors in there, and you can debug log there too [18:28:52] that's how I test - I tail -f that file and run stuff [18:29:03] milimetric: ok, so that is what i was doing [18:29:13] milimetric: so i think this is good to go then [18:29:16] k, it's not bad - PHP is a weird language but the environment is friendly [18:34:38] milimetric: do you have permits to merge this: https://gerrit.wikimedia.org/r/#/c/379137/? [18:34:53] yes [18:35:04] I can take a look and merge now? [18:35:31] milimetric: ok, [18:37:19] k nuria_ it started submit, lemme know if it fails for some reason [18:45:12] milimetric: looks liek it is done [18:45:24] cool [18:50:24] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 UI second deployment/iteration - https://phabricator.wikimedia.org/T170460#3628032 (10Nuria) [18:50:27] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3628031 (10Nuria) 05Open>03Resolved [18:50:40] 10Analytics-Kanban, 10Analytics-Wikistats: Use daily granularity for 1-month time ranges - https://phabricator.wikimedia.org/T173372#3628033 (10Nuria) 05Open>03Resolved [18:50:53] 10Analytics-Kanban, 10Analytics-Wikistats: Error handling - https://phabricator.wikimedia.org/T171487#3628034 (10Nuria) 05Open>03Resolved [18:51:27] 10Analytics, 10Operations, 10monitoring, 10Patch-For-Review: Eventstreams graphite disk usage - https://phabricator.wikimedia.org/T160644#3628043 (10Nuria) [18:51:29] 10Analytics-Kanban, 10Wikimedia-Stream: Stop tracking EventStreams client lag in graphite - https://phabricator.wikimedia.org/T174435#3628042 (10Nuria) 05Open>03Resolved [18:52:27] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 UI second deployment/iteration - https://phabricator.wikimedia.org/T170460#3628046 (10Nuria) [18:52:30] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (3/4) - Data issues - https://phabricator.wikimedia.org/T170937#3628045 (10Nuria) 05Open>03Resolved [18:54:47] nuria_: Could you confirm that the following is expected in the current eventlogging/ua-parser expectations? https://phabricator.wikimedia.org/T176149#3615500 [18:55:11] Krinkle: looking [18:55:13] I'll add add defence for it either way (as right now the navtiming process is very frequently restarting ever since the switch to pre-parsed user agents). [18:55:31] But would be good to know if it is expected that browser.major can be missing/null/not-int. [18:59:32] Krinkle: wait, the stacktrace does not match this version, does it? https://github.com/wikimedia/puppet/blob/06e9b65a7cc700dd1ee5dfa17e330cfbb98eb6c4/modules/webperf/files/navtiming.py#L298 [19:00:10] Yeah, make_stat was factored out. Otherwise the same [19:00:13] Krinkle: I believe uaparser has a filler for None, let me triple check [19:01:05] Krinkle: wrong wrong [19:01:13] Krinkle: as in nuria is wrong [19:01:18] Krinkle: it can be: [19:01:32] Krinkle: {"os_minor": null, "os_major": null, "device_family": "Other", "os_family": "Windows Vista", "browser_minor": "0", "wmf_app_version": "-", "browser_major": "43", "browser_family": "Firefox"} [19:02:02] https://phabricator.wikimedia.org/P6039 [19:02:14] So here browser_major is null [19:02:24] in three cases within recent data [19:02:33] I guess that's what is the case for family:Other [19:02:51] But then.. we alreayd se version='-' in case family=Other [19:03:09] Krinkle: ya, and i always get this confused, when you look at this in python is None but since we are serializing as json [19:03:14] and none makes no sense [19:03:27] Ah 2/3 of these major version=null have browser name Safari [19:03:28] Krinkle: those get to be nulls [19:03:37] null in JSON / None in Python [19:03:58] Krinkle: right [19:04:23] Krinkle: so "anything that ua parser could not parse regarding versions" will be null [19:04:27] Krinkle: sounds like [19:04:39] So browser_major being null of browser_family "Other" seems reasonable and we cover that. For all others we assume there is a version is well (at least if it is a well-known browser that we know is versioned, such as Safari) [19:05:21] Given that we only end up using it if it matches one of a small subset of known browsers that we aggregate metrics for (given that they're sampled globally, so most are not useful on a per-minute basis) [19:05:53] I guess this is one of those cases where ua-parser isn't strict enough. Something can't be Safari if it doesn't have a version since all their UAs include a version. [19:05:55] Oh well. [19:05:59] I'll just cover it then :) [19:06:10] Krinkle: ok [19:06:32] Krinkle: i think we have latest version of ua-parser but that project needs some love [19:08:32] Yeah, it's a hard problem to solve though. [19:09:17] Either you need a *lot* of patterns (to be strict and conclusive), or their current approach, which is a smaller number of partial patterns. Which sometimes result in something that is meant to be a generic/Other match, instead becomes an odd partial match. [19:56:16] 10Analytics, 10Proton, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3592394 (10bmansurov) [19:58:26] 10Analytics, 10Proton, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3628226 (10bmansurov) a:03bmansurov [20:06:04] (03PS1) 10Joal: Correct field names in mediawiki-history spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/379835 [20:09:44] (03PS1) 10Joal: Correct field names in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379836 [20:28:06] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (4/4) - Detail page - https://phabricator.wikimedia.org/T170940#3628278 (10Milimetric) >>! In T170940#3605104, @fdans wrote: > //If we select 1 year in the time-range selector, the graph shows 13 data points. Is that expected? (now it's showing 11, I w... [21:17:30] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3628485 (10Halfak)