[00:53:24] (03CR) 10Yurik: [C: 032] Add stuff from logging phases 2 and 3 [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/326831 (https://phabricator.wikimedia.org/T152559) (owner: 10MaxSem) [00:57:10] (03CR) 10MaxSem: [V: 032] Add stuff from logging phases 2 and 3 [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/326831 (https://phabricator.wikimedia.org/T152559) (owner: 10MaxSem) [09:34:59] the Yandex upstream dev released clickhouse without the json library that caused license issues \o/ [11:02:15] 06Analytics-Kanban, 15User-Elukey: Webrequest dataloss registered during the last kafka restarts - https://phabricator.wikimedia.org/T152674#2872172 (10elukey) [11:02:33] 10Analytics, 15User-Elukey: Add webrequest_stats to Druid in order to explore it with Pivot - https://phabricator.wikimedia.org/T150844#2872173 (10elukey) [11:03:00] 10Analytics, 10Analytics-Cluster, 15User-Elukey: Monitor cluster running out of HEAP space with Icinga - https://phabricator.wikimedia.org/T88640#2872175 (10elukey) a:05elukey>03None [11:03:25] 10Analytics-Cluster, 06Operations, 10ops-eqiad, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#2872177 (10elukey) a:05elukey>03None [11:04:37] 10Analytics, 10Pageviews-API, 15User-Elukey: Improve user management for AQS - https://phabricator.wikimedia.org/T142073#2872182 (10elukey) [11:04:50] 06Analytics-Kanban, 15User-Elukey: Valgrind tutorial for periodical mem usage reviews - https://phabricator.wikimedia.org/T147438#2872183 (10elukey) [11:05:06] 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Puppetize clickhouse - https://phabricator.wikimedia.org/T150343#2872184 (10elukey) [11:05:23] 06Analytics-Kanban, 15User-Elukey: Ongoing: Give me permissions in LDAP - https://phabricator.wikimedia.org/T150790#2872187 (10elukey) [11:05:54] 10Analytics, 06Operations, 15User-Elukey: kafkatee's logrotate/syslog default pkg files needs to be removed - https://phabricator.wikimedia.org/T145490#2872191 (10elukey) a:05elukey>03None [11:50:02] building clickhouse on copper (our build host) saturates 8GB of RAM [11:50:19] and the build gets killed at 75% [11:50:23] (last stable) [11:50:32] * elukey cries in a corner [12:54:26] 10Analytics, 14Trash: --- Items above are triaged ----------------------- - https://phabricator.wikimedia.org/T115634#2872343 (10Milimetric) @Aklapper, I'm sorry these tasks keep causing problems. If we don't use these, we feel we'd have too many columns / column thrashing. Do you see an alternative that doe... [12:58:15] 06Analytics-Kanban: decommission outdated instances on labs - https://phabricator.wikimedia.org/T153193#2872345 (10Milimetric) [12:58:52] * joal pads elukey on the back :( [13:01:27] thanks milimetric! [13:01:38] joal: I am creating an instance in labs with 16GBs of RAM [13:01:40] np, next week ok? [13:01:45] sure sure! [13:01:49] it is not urgent [13:15:52] so milimetric about your point on time spent on Clickhouse.. I put some thoughts on it and I am clearly overtime (respect to what we established in the beginning) but I am learning tons of things about debian packaging [13:16:32] so since I am less overloaded with goals/varnishkafka/etc.. I'd like to keep going for a bit more, because it will be good for me (and us!) in the longer term [13:16:35] what do you think? [13:17:14] (I know that yours was an advice to avoid seeing myself in an asylum) [13:20:03] :) indeed, I have no judgement to pass [13:20:07] your call [13:20:48] and of course your efforts are appreciated, I think Clickhouse is great [13:21:32] I wish things like puppetizations were more universally useful, but it seems to me everyone uses puppet just slightly differently enough that we end up with 1000 clickhouse roles [13:22:18] but even so, very useful for us, so it's great. I mean, people will *freak* out when they see how fast getting some of these answers is [13:25:11] ahhhaa [14:00:50] nuria, after master git pull, dashiki tests pass normally in my machine! [14:00:56] hey team :] [14:01:07] mforns: did you npm install? [14:01:21] she changed some deps, you should just in case [14:01:22] milimetric, yes: rm -rf node_modules + npm install [14:01:30] cool [14:01:39] thx! [14:30:13] (03PS3) 10Mforns: Attach error file to webrequest load email [analytics/refinery] - 10https://gerrit.wikimedia.org/r/321676 (https://phabricator.wikimedia.org/T148980) [14:42:25] * elukey afk a bit! [15:10:45] hello joal ! I was wondering, is there any way to rebuild & run the tests in refinery-source-hive? [15:11:02] I'm still sort of getting to grips with maven [15:13:46] cc milimetric :) [15:19:04] mforns nuria milimetric can we move our meeting slightly later to 2:45pm et - have to bring my daughter to the doc unexpectedly at 1:40pm et... [15:19:34] ashgrigas, fine by me :] [15:19:37] i'll send an updated IA map before we meet [15:21:39] ashgrigas: no problem, anytime this afternoon is fine [15:22:00] fdans: mvn package should do a full rebuild [15:22:25] and mvn clean && mvn test should clean and run the tests [15:22:43] milimetric yeah sorry I meant without rebuilding the whole thing [15:22:54] oh, without! [15:23:30] that I don't know... looking [15:23:50] milimetric nah it's just that yesterday it was running super slow, now it seems fine [15:24:23] just checking if there was a specific task for hive only [15:24:49] maven downloads the whole internet, so the first build is really slow. But then it caches the dependencies locally and should be ok after [15:24:55] mforns milimetric thanks! [15:25:12] np :] [15:25:49] fdans: it sounds like mvn compile should compile just what changed [15:25:55] but I'm not sure how that works [15:26:16] cool, I'll give that a try [15:34:53] * elukey waits in the batcave other ops.. [15:35:07] AH [15:35:20] Joseph is teaching and Andrew is not working! [15:35:28] it took me 5 minutes to realize it [15:35:30] good [15:35:36] * elukey restarts his brain [15:37:54] hm... maybe that's what I need, a nice fresh restart [15:38:06] (03PS1) 10Mforns: Mark Limn as deprecated by Analytics at WMF [analytics/limn] - 10https://gerrit.wikimedia.org/r/327216 (https://phabricator.wikimedia.org/T148058) [15:41:00] 06Analytics-Kanban, 10GitHub-Mirrors, 07Documentation, 07Easy, 13Patch-For-Review: Mark documentation about limn as deprecated - https://phabricator.wikimedia.org/T148058#2872710 (10mforns) The wikitech page should be totally rewritten. Marking it as deprecated will avoid people getting confused, but we... [15:41:45] mforns: as a sort of apology for always formatting text this way: http://vimcasts.org/episodes/aligning-text-with-tabular-vim/ [15:42:03] (the last example with continuous formatting is cool) [15:42:50] milimetric, looking :] [15:45:21] milimetric, this is awesome! :] but I think I'm still vim-o-phobic, hehehe [15:45:42] it seems like there should be a sublime version of this [15:45:55] https://github.com/randy3k/AlignTab [15:45:57] mmmm, makes sense... [15:46:22] mmmmmmm. cool! [15:55:35] (03PS8) 10Joal: Add mediawiki history spark jobs to refinery-job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/325312 (https://phabricator.wikimedia.org/T141548) [15:56:38] Ohhh my dear ops elukey and ottomata, I'm bery sorry I completely missed our sync :S [16:00:47] a-team: stanndddupppp [16:01:16] ashgrigas: by all means, just change invite [16:01:49] elukey: standdduppp [16:02:20] fdans: I'm very sorry I missed your ping - I scrolled my irssi window and forgot to get it back into auto scrolling :( [16:02:49] fdans: I'll obviously brain-dump everything I know on refinery-source :) [16:08:02] (03CR) 10Milimetric: [] "Data has been thoroughly deleted on hdfs and stat1002." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/327003 (https://phabricator.wikimedia.org/T153082) (owner: 10Milimetric) [16:10:08] 06Analytics-Kanban: Check if we can deprecate legacy TSVs production (same time as pagecounts?) - https://phabricator.wikimedia.org/T130729#2872790 (10JAllemandou) a:03Milimetric [16:12:07] ebernhardson: i can take a stab at that patch and implement the suggestions i was making, let me know if you want me to do that , can probably do it today [16:32:57] nuria: sure, i adjusted it to use a map to hold onto instances but wasn't sure what else would do the trick [16:34:20] milimetric mforns revised IA map to review later today https://www.dropbox.com/s/qy5p4cqlvy4jb7m/Wikistats%20Nav%20Model%20v3.png?dl=0 [16:34:35] if you have any thoughts on filters needed please add them to the spreadsheet [16:35:41] milimetric: ready ! In the cave [16:57:23] (03PS32) 10Milimetric: Script sqooping mediawiki tables into hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) [16:57:25] (03PS3) 10Milimetric: [WIP] Port standard metrics to reconstructed history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/322103 [16:57:53] (03CR) 10Nuria: [V: 032 C: 032] Attach error file to webrequest load email [analytics/refinery] - 10https://gerrit.wikimedia.org/r/321676 (https://phabricator.wikimedia.org/T148980) (owner: 10Mforns) [16:59:30] 10Analytics-EventLogging: Add user_agent_map field to EventCapsule - https://phabricator.wikimedia.org/T153207#2872971 (10Aklapper) [16:59:43] 06Analytics-Kanban: Attach error file to webrequest load email - https://phabricator.wikimedia.org/T153212#2872973 (10Nuria) [17:00:20] 06Analytics-Kanban: Attach error file to webrequest load email - https://phabricator.wikimedia.org/T153212#2872973 (10Nuria) [17:01:08] joal: cassandra standup? [17:01:53] 06Analytics-Kanban: Attach error file to webrequest load email - https://phabricator.wikimedia.org/T153212#2872973 (10Nuria) [17:01:56] 06Analytics-Kanban, 13Patch-For-Review: Change requests drop alarms to be more precise regarding data loss - https://phabricator.wikimedia.org/T148980#2873006 (10Nuria) [17:02:51] mforns: Yes !!! [17:02:59] joal, ? :] [17:03:06] mforns: sorry, was willing to talk to elukey [17:03:09] elukey: joining ! [17:03:14] xD [17:05:39] (03CR) 10Nuria: [] Mark Limn as deprecated by Analytics at WMF (031 comment) [analytics/limn] - 10https://gerrit.wikimedia.org/r/327216 (https://phabricator.wikimedia.org/T148058) (owner: 10Mforns) [17:07:12] 10Analytics, 10Analytics-Cluster, 15User-Elukey: Monitor cluster running out of HEAP space with Icinga - https://phabricator.wikimedia.org/T88640#2873024 (10Nuria) agreed, this seems a good one to add to ops-excellence next quarter. [17:14:54] (03CR) 10Mforns: [] Mark Limn as deprecated by Analytics at WMF (031 comment) [analytics/limn] - 10https://gerrit.wikimedia.org/r/327216 (https://phabricator.wikimedia.org/T148058) (owner: 10Mforns) [17:27:41] (03CR) 10Nuria: [] Mark Limn as deprecated by Analytics at WMF (031 comment) [analytics/limn] - 10https://gerrit.wikimedia.org/r/327216 (https://phabricator.wikimedia.org/T148058) (owner: 10Mforns) [17:28:57] (03PS1) 10Fdans: Standardized UDF naming [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327237 (https://phabricator.wikimedia.org/T120131) [17:33:43] (03PS2) 10Nuria: Standardized UDF naming [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327237 (https://phabricator.wikimedia.org/T120131) (owner: 10Fdans) [17:46:23] mforns: I'm sorry I need to leave now, will you have stime tomorrow for the docker thing? [17:47:01] joal, sure ping me when you're around (I'll start around 13h) [17:47:11] thanks :] [17:47:19] k mforns, I should be there in the afternoon [17:47:23] Thanks ! [17:47:26] ok :] [17:47:38] Good evening a-team, see you tomorrow! [17:47:44] byeeee! [17:48:23] nite [17:48:32] o/ [18:10:36] * elukey afj! [18:10:41] * elukey afk :) [18:10:44] byeeee o/ [18:21:06] MaxSem: for your comment here: https://gerrit.wikimedia.org/r/#/c/323178/1/i18n/en.json, I wasn't sure what you meant [18:21:21] I found https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Add_new_extension_to_extension-list_and_release_tools [18:21:36] did you mean I should add Dashiki to the extension list? [18:22:11] milimetric, look at qqq.json of any extension [18:22:17] oh! [18:22:23] thanks, will do [18:35:10] (03CR) 10Nuria: [] Lucene Stemmer UDF (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [18:37:02] (03CR) 10EBernhardson: [] Lucene Stemmer UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [18:53:14] milimetric mforns appt ended early - free to chat at 2pm et (7 mins from now)? [18:53:36] sure ashgrigas, works for me, mforns is in scrum of scrums [18:53:47] he should be out by 2 [18:53:53] ok [18:53:57] ashgrigas, milimetric, SoS is finished, so I'm free as well [18:53:57] is batcave used for that? [18:53:58] :] [18:54:06] ok batcave at 2pm et [18:54:12] ok! [18:58:30] (03CR) 10Nuria: [] Lucene Stemmer UDF (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/326168 (https://phabricator.wikimedia.org/T148811) (owner: 10EBernhardson) [19:03:37] ashgrigas: ping [19:04:15] joining batcave [19:04:42] is there a different link again? [19:04:46] says requesting to join [19:04:50] milimetric [19:05:04] weird... [19:05:05] one sec [19:05:07] used the irc channel link above? [19:05:21] http://goo.gl/jtzZ0H [19:05:26] is this not the right link? [19:28:07] 10Analytics-EventLogging: Add user_agent_map field to EventCapsule - https://phabricator.wikimedia.org/T153207#2873553 (10Nuria) p:05Triage>03High [19:28:13] nuria: i think i understand now what you are thinking about there for the UDF, will update [19:32:54] ya'll have a new engineer? [19:33:01] * halfak just read notes from SoS [19:52:17] 10Analytics, 10Analytics-EventLogging: Add user_agent_map field to EventCapsule - https://phabricator.wikimedia.org/T153207#2873753 (10Nuria) [19:52:44] ebernhardson: i was going to [19:52:50] ebernhardson: let me know if you are doing it too [19:53:03] ebernhardson: sometimes just adding a bit of code is more explanatory than words [19:54:08] ebernhardson: I am in teh middle of it [19:54:10] *the [19:54:19] nuria: oh, then go right ahead :) i had just started looking at it again [19:54:38] ebernhardson: ok, i have a meeting in a bit but probably can have something after that [19:55:35] i'd like to replay sampled webrequest logs onto a restbase test environment (to create realistic workloads), does that sound reasonable, and if so, what do i need? [19:56:03] i gather we have a 7 day buffer in kafka; 7 days would probably be enough [19:57:49] urandom: not a direct answer to your question, but when i did the same for elasticsearch i used https://github.com/buger/goreplay to sniff the tcp/ip stack and record http requests, then replayed them back. [19:59:33] ebernhardson: nice but i think if we have canned urls it should be easy enough to replay no? load can be achieved easily with siege, which i used to test aqs/cassandra before [19:59:35] cc urandom [20:00:02] urandom: how do the urls you are looking for look like? for example: rest base requests from android app? [20:00:06] nuria: likely yes, i had to go that way with elasticsearch because everything interesting is inside the request payload and not in the url [20:00:33] nuria: good question [20:00:44] urandom: yours are mostly get requests? right? [20:00:49] yes [20:00:50] urandom: on meeting can answer in abit [20:01:55] urandom: you could do kafkacat, or even maybe kafkatee, since it has sampling built into it [20:02:16] ottomata: aren't you supposed to be off? [20:02:18] :) [20:02:29] yup! [20:02:37] nuria is there a particular difference between [20:02:38] this https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/SmartReferrerClassifierUDF.java [20:02:44] and this https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/RefererClassifierUDF.java [20:02:45] ? [20:02:47] fdans: on meeting can talk in abit [20:02:53] sure! [20:03:04] fdans: I can take this one :) [20:03:16] oh joal thanks I thought you were off :) [20:03:27] fdans: the two pieces of code you mentioned are different indeed :) [20:03:46] ebernhardson: goreplay is interesting. I did something similar a long time ago, more hacky though [20:03:49] fdans: went for diner, and you caught me on my night check ;) [20:04:02] 10Analytics: Add global last-access cookie for top domain (*.wikipedia.org) - https://phabricator.wikimedia.org/T138027#2873787 (10Nuria) Ping @Bblack: could we commit to do this Q3? [20:04:24] fdans: RefererClassifier was the simple version, SmartRefererClassifier the better omne [20:05:15] so they accomplish the same task right? I mean, can I deprecate both in favor of a unique method? [20:05:50] fdans: Smart is the currently used one, and it is a superset of not-smart, so I think you can do that yes [20:06:18] awesome, that's what I needed to know, thanks a lot joal !! [20:06:36] no prob fdans, thanks for taking care of this normalization :) [20:06:46] ma pleasure! [20:39:43] (03PS3) 10Fdans: Standardized UDF naming [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327237 (https://phabricator.wikimedia.org/T120131) [21:05:20] 06Analytics-Kanban, 07Easy, 13Patch-For-Review: Standardize logic, names, and null handling across UDFs in refinery-source {hawk} - https://phabricator.wikimedia.org/T120131#2874098 (10fdans) I'm quite happy with the current refactor, pending successful smoke testing within Hive (which I'm doing now): https:... [21:23:22] urandom: back, do you have those urls you needed? [21:39:38] urandom: please ping us if you need help [21:53:23] nuria: as far as urls, that would be: all of them [21:55:18] nuria: also, having a large corpus harvested from webrequests (assuming that's what you're thinking), would be a great start, but longer term i was wondering about something more realtime [21:56:49] though for now, that is probably premature optimization, having a weeks worth of data from somepoint relatively recent would do [22:17:29] (03PS1) 10MaxSem: Added missing file [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327372 [22:17:37] yurik, ^ [22:18:01] (03CR) 10Yurik: [C: 032] Added missing file [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327372 (owner: 10MaxSem) [22:19:19] (03PS1) 10MaxSem: Count number of pages with tabular data on Commons [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327373 [22:19:26] yurik, another one [22:20:09] MaxSem, can you count .tab and .map separatelly? [22:20:20] hmmm [22:21:43] milimetric mforns do you think this structure could work - based off of dan's sketch https://www.dropbox.com/s/gttq4o03i1offwg/site%20map%20v1.png?dl=0 [22:22:03] difference is we combine the topic nav and overview to the landing page [22:23:11] ashgrigas, yes I like that :] [22:23:57] ashgrigas, in the bottom right view, the project selector should read spanish wikipedia instead of all, no? [22:24:20] but yes, this is awesome! [22:25:47] yes sorry about that [22:25:50] updated now [22:26:48] off for the day, see you tomorrow a-team! [22:26:51] np typo [22:26:53] bye fdans! [22:26:55] nite fdans [22:28:34] ashgrigas: I think the overview page could use some data / graphs, and that's why I was thinking it would be different from the landing page. This is the page in wikistats I have in mind when I think overview: https://stats.wikimedia.org/EN/Sitemap.htm [22:29:40] that doesnt really feel like an overview page to me even though its used as one now [22:29:58] can we maybe extract the essence of that page content? [22:32:44] (03PS2) 10MaxSem: Count number of pages with tabular data on Commons [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327373 [22:33:42] (03CR) 10MaxSem: [V: 032] Added missing file [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327372 (owner: 10MaxSem) [22:37:13] ashgrigas: right, if it's really to be an overview it would show a broader set of metrics, aligned with our topics. Perhaps the "featured" metrics in each category would be a good first stab at the overview page. But the point being that it should show a good amount of data, and that seems to me too busy for the very first page. But if you think we can [22:37:13] make it clean enough for newcomers and people not familiar with our data, then I'm all for it [22:49:02] (03PS3) 10MaxSem: Count number of pages with tabular data on Commons [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327373 [22:53:57] (03CR) 10Yurik: [C: 032] Count number of pages with tabular data on Commons [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327373 (owner: 10MaxSem) [22:54:06] (03CR) 10Yurik: [V: 032 C: 032] Count number of pages with tabular data on Commons [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/327373 (owner: 10MaxSem) [23:27:54] leila: Hi! You there by any chance? [23:28:06] hi elukey. yes. what's up? :) [23:28:21] o/ [23:29:10] so on stat1002 the /home partition is almost at 97% [23:29:20] (space occupied) [23:29:27] so akrausetud has 100GB atm [23:29:36] do you know how I can contact him? [23:29:49] * leila digs the inbox. [23:30:27] yes, elukey. will pm you now. [23:30:36] thanks :) [23:31:17] np elukey [23:31:38] the other one is Ellery :P [23:31:44] I am going to contact him too [23:32:06] you know his email, right, elukey? :D [23:36:19] yes! [23:36:20] :P [23:36:22] already mailed [23:36:26] thanks leila ! [23:36:40] great. anytime. [23:56:52] madhuvishy: are you around? got a quick question about PAWS internal. [23:57:08] neilpquinn: what's up? :) [23:58:09] madhuvishy: I'm trying to push to a GitHub repo, but I got the message "ssh: connect to host github.com port 22: No route to host " [23:58:41] I can install things from pip so there's some access to the outside internet, but not Github?