[07:47:37] 10Analytics, 10DBA: Json_extract available on analytics-store.eqiad.wmnet - https://phabricator.wikimedia.org/T156681#3000516 (10Marostegui) 05Open>03Invalid I will close this for now as we cannot support this on our current MariaDB version. [07:51:14] morning! [07:51:21] aqs1009-a is bootstrapping :) [08:22:15] so it should finish tomorrow in the afternoon, then I'll bootstrap aqs1009-b that will take a bit less. On Wednesday evening the work *should* be done [08:41:07] 10Analytics, 10Analytics-Cluster, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to analytics-privatedata-users for musikanimal - https://phabricator.wikimedia.org/T156986#3000766 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff @MusikAnimal I've enabled you... [09:12:35] ah aqs1008 is also serving aqs HTTP traffic [09:57:38] 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Investigate if Node Managers can be restarted without impacting running containers - https://phabricator.wikimedia.org/T156932#3000940 (10elukey) [11:09:31] joal: o/ - I think that enabling Yarn NM's state save (to allow graceful restart) should be a matter of a simple config change, but I am concerned about the space needed [11:09:38] added some thoughts to the code review [11:09:41] WDYT? [11:48:11] (03PS1) 10Addshore: Remove minutely SPARQL script from CRON [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336206 [11:48:37] (03PS1) 10Addshore: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336207 [11:49:04] (03PS2) 10Addshore: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336207 (https://phabricator.wikimedia.org/T146468) [11:49:23] (03PS2) 10Addshore: Remove minutely SPARQL script from CRON [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336206 (https://phabricator.wikimedia.org/T146468) [11:49:28] (03PS3) 10Addshore: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336207 (https://phabricator.wikimedia.org/T146468) [11:49:46] (03PS1) 10Addshore: Remove minutely SPARQL script from CRON [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336208 (https://phabricator.wikimedia.org/T146468) [11:49:58] (03PS1) 10Addshore: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336209 (https://phabricator.wikimedia.org/T146468) [11:50:15] (03PS2) 10Addshore: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336209 (https://phabricator.wikimedia.org/T146468) [11:58:02] (03CR) 10Addshore: [C: 032] Remove minutely SPARQL script from CRON [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336206 (https://phabricator.wikimedia.org/T146468) (owner: 10Addshore) [11:58:05] (03CR) 10Addshore: [C: 032] Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336207 (https://phabricator.wikimedia.org/T146468) (owner: 10Addshore) [11:59:19] (03Merged) 10jenkins-bot: Remove minutely SPARQL script from CRON [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336206 (https://phabricator.wikimedia.org/T146468) (owner: 10Addshore) [11:59:22] (03Merged) 10jenkins-bot: Remove minutely SPARQL script [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/336207 (https://phabricator.wikimedia.org/T146468) (owner: 10Addshore) [11:59:34] elukey: o/ [11:59:44] Just read your comments in CR [11:59:51] eqi analytics1042 [11:59:55] oops :) [12:00:41] elukey: Will reply in gerrit [12:04:06] joal: we can discuss in here :) [12:04:39] does it make sense? Not sure how much space it is needed to save the containers states [12:05:58] elukey: makes sense, I finished my comment on gerrit anyway, let's move the talk here :) [12:06:59] ah yes I've read them [12:07:17] do they make sense as well? [12:07:17] it makes sense, for some reason I was thinking to the containers state too [12:07:21] yeah [12:07:48] one thing that I don't know is what happens when a NM is stopped [12:07:53] does it close all the containers too? [12:08:05] maybe with the new option it doesn't and recovers afterwards [12:08:15] I think that's the point, yes [12:11:31] all right let's try it :D [13:11:42] joal: you there? [13:15:11] 10Quarry: Add time stamp (when query was run) to results page - https://phabricator.wikimedia.org/T157232#3001552 (10Aklapper) [13:15:25] elukey: Yes ! [13:15:39] elukey: what's up? [13:20:15] (03PS3) 10Joal: [WIP] Add xml-stubs dump job in scala [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/335846 [13:21:56] joal: just wanted to restart the NM on an1028 [13:22:14] but I don't want to terminate anything causing you running towards me with an axe [13:22:22] :D [13:22:41] no no no : A gentleman would walk but never run :) [13:22:57] ahahahah [13:24:23] good for me elukey [13:28:18] joal: done! [13:28:21] k [13:28:51] elukey@analytics1028:/tmp/hadoop-yarn$ sudo du -hs [13:28:52] 96K . [13:28:57] let's wait a bit :P [13:29:04] but for the moment, it seems working [13:29:25] I haven't seen the "waited 5 seconds, kill -9" message while restarting [13:31:40] * elukey hopes [13:35:38] ah yes in the logs it seems that the NM is effectively using the new state db [13:37:58] YAY [13:47:50] elukey: seems that archiva is very slow :( [13:53:01] joal: absolutely ignorant about it.. but my first question would be, what has become slow? [13:53:33] There one jar retrieval that takes very long at build time [13:54:55] --verbose :D [13:55:07] what build time sorry? I am missing some piece :) [13:55:13] Arf elukeyok, will try to get more info :) [13:55:53] so, when using maven to build our jars on stat machines, it checks / downloads jars based on versions from our archiva server [13:56:38] yep [13:56:51] (meanwhile the host looks fine from the system metrics - https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=meitnerium&var-network=eth0&from=now-7d&to=now) [13:57:09] And from two or more days, it's seems very much slower than some time before [13:57:16] hm [13:59:06] I am seeing the archiva access logs with spark stuff, I guess it was you :D [13:59:17] it was ! [13:59:38] It's always that pom that takes time : https://archiva.wikimedia.org/repository/mirrored/eigenbase/eigenbase-properties/1.1.4/eigenbase-properties-1.1.4.pom [13:59:46] I don't know why [14:01:24] I am seeing a lot of 404s [14:01:32] for eigenbase-properties [14:01:42] (even the link gives me 404) [14:02:02] hm - maybe it's a very long timeout [14:02:07] weird [14:02:46] so there is no pom in there https://archiva.wikimedia.org/repository/mirrored/eigenbase/eigenbase-properties/1.1.4/ [14:03:00] is it expected? [14:03:08] I have no idea :( [14:03:54] * joal is sad - Still issues with http proxy in scala jobs :( [14:04:05] mmmm https://archiva.wikimedia.org/repository/mirrored/commons-beanutils/commons-beanutils/1.8.3/ has the pom for example [14:08:10] joal: if you remove that pom, is the build going to fail? [14:19:11] it's not included directly from us - it's a dependency dependency :S [14:19:16] elukey: --^ [14:34:16] elukey: would you give 5 minutes to discuss again java connection issues? [14:34:39] sure sure [14:34:44] batcave? [14:34:57] 2 mins! [14:35:00] sure [14:54:40] Heya milimetric, would you have some time for me? [14:55:35] of course [14:55:40] batcave? [14:55:53] one minute [15:17:57] joal: still working on it! [15:17:59] need coffee [15:18:00] :D [15:18:03] thanks elukey :) [15:19:34] (03PS5) 10Joal: Update sqoop script with labsdb specificity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/334042 (https://phabricator.wikimedia.org/T155658) [15:26:22] elukey@stat1002:/home/joal$ scala url_test.scala [15:26:22] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 [15:26:23] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 [15:26:23] Trying .... [15:26:23] sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@75e56da [15:26:25] Done. [15:26:27] joal: --^ [15:26:56] Oh ! How ? [15:27:00] going to grab a coffee and then we can chat in the cave about it ok? [15:27:07] sure [15:29:25] (03CR) 10Ottomata: [C: 031] Update sqoop script with labsdb specificity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/334042 (https://phabricator.wikimedia.org/T155658) (owner: 10Joal) [15:31:16] joal: I am in the cave if you have time! [15:31:19] ottomata: o/ [15:32:30] o\ [15:32:34] can I wave in that directioN? [15:32:40] or does that look my shoulder is disconnected from my body? [15:33:16] it looks like you want to hit yourself in the head [15:33:21] hahah [15:33:40] but now I can picture you bending and waving [15:34:00] like doing some stretching exercise [15:45:09] joal: need to do some ops work, will update you later :( [15:45:20] k elukey, just was back :) [15:53:02] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002152 (10Reguyla) [16:02:09] joal: elukey standup? [16:02:13] suuureee [16:02:14] sorr [16:27:50] elukey: wokrs with the trick :) [16:27:55] Thanks a lot mate @ [16:30:48] joal: yw! [16:46:14] 10Analytics: Bump replication factor of system.auth table in cassandra when new nodes have finished bootstrap - https://phabricator.wikimedia.org/T157354#3002384 (10JAllemandou) [16:47:23] 10Analytics: Bump replication factor of system.auth table in cassandra when new nodes have finished bootstrap - https://phabricator.wikimedia.org/T157354#3002397 (10elukey) p:05Triage>03High [16:48:06] 10Analytics-General-or-Unknown, 06Analytics-Kanban, 13Patch-For-Review, 07Privacy: analytics.wikimedia.org loads resources from third parties - https://phabricator.wikimedia.org/T156347#2971655 (10Milimetric) p:05Triage>03High a:03Milimetric [16:50:35] 06Analytics-Kanban, 06Operations, 15User-Elukey: Yarn node manager JVM memory leaks - https://phabricator.wikimedia.org/T153951#3002407 (10Milimetric) 05Open>03Resolved This will be resolved by upgrading CDH versions, which will happen soon. So... resolved in the future :) [16:54:54] 06Analytics-Kanban: Create purging script for analytics-slave data - https://phabricator.wikimedia.org/T156933#3002416 (10Milimetric) [16:55:01] 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review: Add jmxtrans metrics from Hadoop yarn-mapreduce-historyserver - https://phabricator.wikimedia.org/T156272#2969142 (10Milimetric) [17:05:22] 10Analytics, 10DBA, 06Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3002434 (10Ottomata) Ok, talked with Analytics folks about this. These boxes host lots more than just EventLogging data, and we won't be able to ween people off of MySQL for... [17:06:23] no problem mforns [17:06:35] hey a-team, the library connection is not good for working today. Sorry, I'll go back home and try to fix this as soon as possible [17:06:49] no problem man, you can't stop the wind, just enjoy it, make some windmills :) [17:06:54] xD [17:06:59] see you [17:38:23] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002585 (10Capt_Swing) (ping @yuvipanda ) @Reguyla I think that Quarry did work like most other applications. If you're logged into your account on any public Wiki, you can log into Quarry via Oauth. The Oau... [17:41:34] 10Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941#3002610 (10Capt_Swing) [17:41:38] 10Quarry: Add time stamp (when query was run) to results page - https://phabricator.wikimedia.org/T157232#3002612 (10Capt_Swing) [17:41:45] 10Analytics: Update DataLake History schema to only contain "objective" measures - https://phabricator.wikimedia.org/T157362#3002613 (10JAllemandou) [17:52:36] 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2989939 (10BBlack) I know a chunk of our rcstream traffic that I've observed in the past came from Google. I'm not sure who/what in Google drives that... [18:00:33] 10Analytics: Add hardware capacity to AQS - https://phabricator.wikimedia.org/T144833#3002679 (10RobH) [18:46:30] 10Quarry: Allow Quarry to work on All Project vice only Meta - https://phabricator.wikimedia.org/T157342#3002790 (10Reguyla) To be honest I am blocked on Meta, which is fine because I don't need access to it anyway since its mostly for Stewards and the WMF folks but it's causing a problem because Quarry is linke... [18:47:14] * elukey afk! [18:59:16] hi a-team, my internet is back :] [19:27:47] 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3003029 (10Ottomata) [19:49:48] milimetric: I edied the announcement email and also created a talk page for it [19:49:52] lemme know what you think [20:00:45] ottomata: that's great, we can link to it from the wikistats discussion if it's not getting enough eyeballs [20:00:52] but first yeah send the email and see how it goes [20:03:05] ok cool [20:03:09] so +1 from you on the email? [20:03:18] also, check out the lists i put on the To: line at the top [20:03:20] and the subject [20:03:26] add or remove any lists you think it should go to [20:06:01] milimetric: ^ [20:11:50] sorry, yes, +1 email, adding research public list too [20:28:06] 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3003238 (10Ottomata) [20:29:23] 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#2989939 (10Ottomata) From Google?! :o [20:46:10] Hello, analytics. I'm curious - do we have a number for Varnish hit rate for the action API? Just a sinngle number across all projects and all requests [20:50:16] nuria_: ^^^ [20:55:29] Actually, even the rate of requests to /api.php in varnish would be nice, we already know the rate of misses [20:56:50] Pchelolo: I'm not aware of something like that, buuuut hmmmm.... somebody collects action api data into hive... [20:56:57] looking [20:57:15] Pchelolo: btw, if you have a quick sec: https://gerrit.wikimedia.org/r/#/c/336261/ [20:57:44] looking [20:58:16] Pchelolo: do you have hive access? [20:58:24] ottomata: nope [20:58:36] are you looking for a quick number? [20:58:39] or do you want something regular? [20:59:06] oh varnish hit rate? [20:59:07] oh. [20:59:07] ottomata: just one number. Like average hits to api.php over the last day [20:59:28] average per? [21:00:44] ottomata: ok, lemme expand. So we need the number for the percentage of Varnish cache-hits for api.php We have the number of cache-misses but we don't have the number of requests to api.php that comes to varnish [21:01:07] so ja, we can get that [21:01:12] because we have all webrequests [21:01:13] so you want [21:01:22] # of hits to /api.php where cache_status = hit [21:01:22] ? [21:01:26] sorry [21:01:36] # of requests to /api.php where cache_status = hit [21:01:37] ? [21:01:50] qq, what varish cluster serves api.php? text? [21:02:06] ottomata: text [21:02:48] ottomata: could I get both '# of requests where cache_status = hit' and '# of requests where cache_status = miss' [21:03:58] In the end I want to say X% of requests for api.php were cached, all the rest were cache-misses [21:05:02] Pchelolo: do you have a particular day in mind? if not, how about last wednesday? [21:05:08] i'll see if i can get you counts grouped by cache_status [21:05:20] ottomata: doesn't matter. We need just the order of magnitude [21:05:24] k [21:07:37] Pchelolo: are the only possible api uri paths [21:07:40] "/w/api.php"? [21:08:02] what do you mean? [21:08:32] Anything that's under /w/api.php is action api in some sence.. [21:08:41] i mean [21:08:48] are there other paths that get to api.php [21:08:50] liek [21:08:53] /wiki/api.php [21:08:54] or [21:09:03] /index.php?...api.php [21:09:14] mw routing is weird sometimes! :) [21:10:18] ottomata: hm.. I think we only use /w/api.php [21:11:09] ok [21:11:19] running a query... [21:12:01] cool, thanks a lot [21:12:03] Pchelolo: you should get hadoop access :) [21:12:54] ottomata: heh.. This is the first time in a year I've needed it, with this frequency I don't really need hardoop access :) [21:13:26] yeah but maybe you'd have fun! :) [21:17:51] any estimates how long the calculations would take? [21:30:37] ohsorry Pchelolo [21:30:37] done! [21:31:05] hit,154443263 [21:31:05] int,49397862 [21:31:05] pass,301643378 [21:31:05] miss,54975448 [21:31:51] https://gist.github.com/ottomata/2f95e9a7622955c4f6c050267b2d7fc0 [21:32:22] thank you! now it's only left to find out how to get the number I want from this [21:32:37] yeah, heh [21:32:41] i dunno how varnish is configured for api.php [21:32:48] but i'd guess pass + misses are similiar [21:32:51] dunno what int is [21:32:53] or why some are - [21:34:18] ok. So int is 'internally generated by varnish' [21:34:30] so it's hit / (hit + miss + pass) [21:35:01] oh, so int shouldn't be considered as a real request? [21:35:35] https://grafana.wikimedia.org/dashboard/db/varnish-caching [21:35:45] there's explantion in the end [21:35:59] But I guess Brandon would know much more [21:36:17] hm [21:36:18] aye [21:56:29] ottomata: Pchelolo: high prio - https://gerrit.wikimedia.org/r/#/c/336326/ [21:56:33] must land before tomorrow ideally [21:56:46] low prio - refactor to use the new system (for your own benefit) [21:58:50] ah really? ok. [22:00:36] Krinkle: so, your RCFeedEngine stuff is going out tomorrow, and we should merge this so the extension matches the code going out, yet? [22:00:45] oh, nm, great. aaron just merged [22:00:56] or, +2ed at least [22:01:14] Yeah, all compatible as before (with engines + formatters living separately still being supported). The only thing that I broke is that RCfeedEngine is now an abstract class instead of interface. [22:01:23] I didn't find any usage of it outside core when I wrote the patch [22:01:29] but forgot about the EventBus patch [22:01:38] which landed sometime between me writing that patch and it merging [22:02:01] got it [22:02:41] Krinkle: since I got your attention :) i think i'm getting confused by some statsd counter stuff [22:02:42] https://grafana-admin.wikimedia.org/dashboard/db/eventstreams?from=now-1h&to=now [22:02:49] i want to graph the number of currently connected clients [22:02:59] that is tracked via statsd increment and decrement [22:03:07] when a client connects, .increment [22:03:13] when the connection is closed, .decrement [22:03:17] ottomata: I guess my help is not needed any more here? [22:03:23] Pchelolo: ya we good :) [22:05:44] ottomata: seems like that won't work. If you send +1 and -1 to the same metric name, it will not correlate since you're not resending the +1 every minute [22:05:51] also avoid sum() in this case :) [22:06:04] as it won't aggregate the way you want (you'll get the total of 15min if looking at last 2 weeks) [22:06:09] aye [22:06:19] See https://wikitech.wikimedia.org/wiki/Graphite#Extended_properties [22:06:23] you'll want to use .rate [22:06:24] Krinkle: i'm not resending the +1 every minute, am I? Or is statsd doing that? [22:06:37] oo great, reading [22:06:42] ottomata: No, but if you want to use decrement() and end up with # currently connected, you'd have to [22:07:04] Anyhow, can you instead send a gauge() and track currently connected within the application? [22:07:30] Krinkle: not really, since there are many workers [22:07:44] or, can I even with that? [22:07:44] right, but that could be aggregated within graphite by a wildcard query [22:07:48] meh [22:07:58] ottomata: alternatively, you coudl track # new connections [22:08:19] yea we *should* have that with the service-template-node's request counting [22:08:22] but it was kinda funky [22:08:28] didn't look right [22:08:30] yeah I can imagine [22:08:50] if you are poking, that woudl be the v2_stream_* metrics [22:11:56] Krinkle: how does decrement work if it is called outside of the original increment's flush interval? [22:12:01] does it actually send a -1 to statsd? [22:12:05] yes [22:12:12] It's meant to be use with an integral of sorts [22:12:17] aye, that might make sense [22:12:25] so if i do the integral, the decrements will be taken into account? [22:12:27] but it's imho not very useful, because you'd always depend on the query range [22:12:31] hm [22:12:33] and it'll never be right [22:12:34] oh [22:12:35] hm [22:12:54] well, it *should* be right if increments + decrements always come in pairs eventually? [22:12:57] last 24 hours will end up querying from wherever it was at the previous day [22:12:58] the base will be wrong [22:12:59] oh [22:13:02] but you'd need to see all time [22:13:03] all the time [22:13:04] hm [22:13:09] Right [22:13:11] got it [22:13:13] hm [22:13:22] i see [22:13:25] all our diamond collector metrics for nginx for this purpose also use gauge() [22:13:29] so gauge sounds like the best option then [22:13:29] yeah [22:13:31] ok [22:13:35] active_connections from rcstream, for example, reports that way [22:14:14] and then you can graph .median or upper to decide what to display if the worker sent multiple values in 1 minute (for sanity) [22:14:38] and sumSeries() to add up the different instances [22:14:42] over a wildcard [22:14:59] report every 10-50 seconds or so [22:15:35] or even after every connection change. statsd will aggregate and flush to graphite [22:15:47] but might be too much depending on the amount of traffic [22:16:17] hm, Krinkle, would I need to report the # of current connections regularly then? [22:16:26] like, call .gauge every minute? [22:16:38] oh, 10-50 seconds, yo just said that [22:16:39] ottomata: Yeah, although no need to repeat the same number of nothing changed. Graphite/Grafana deals with that for you [22:16:50] oh [22:16:50] ok [22:16:58] so after a connect or disconnect is good? [22:17:02] or maybe that could be too often [22:17:11] hm, i betcha the statsd lib we are using limits it tough [22:17:12] though [22:17:12] o [22:17:25] but yeah, perhaps after every con/discon schedule a callback to report the number (if not already), and flush it within 50s to be safe. [22:17:53] i'd hope the statsd lib would take care of that [22:17:55] no? [22:17:55] I imagine service template might have something for this [22:17:59] yeah [22:18:02] it uses a fancy statsd lib [22:18:03] umm [22:18:06] I don't think it would merge values since they change meaning [22:18:40] statsd the service (statsd.eqiad) will flush once a minute and aggregate during the flush time (e.g. keep track of .min, .max etc.) [22:18:52] but debouncing within your app is different since it won't see all the lowest/highest naturally [22:19:04] but for high traffic you probably dont' want to send a gauge after every con/discon [22:19:07] this one https://github.com/brightcove/hot-shots [22:19:27] 10Analytics, 10Datasets-Webstatscollector, 06Language-Team: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#3003528 (10Tbayer) [22:19:52] • bufferFlushInterval: If buffering is in use, this is the time in ms to always flush any buffered metrics. default: 1000 [22:20:25] could probably just increase that, and then always call .gauge during conn/disconn [22:20:59] or, hm, probably sneding at most once a second to statsd is fine [22:21:03] ? [22:21:37] ottomata: Yeah, that's a different kind of buffer though That's just to reduce connection setup to the statsd server. It'll remember a few messages and then send them all at once separated by \n [22:22:01] oh, so multiple guages will be sent in as single statsd send [22:22:02] hm [22:22:08] would statsd just use the last one then? [22:22:09] right [22:22:10] hmm [22:22:11] ok [22:22:22] i'll just make a timer then to call .gauge every n [22:22:24] seconds [22:22:30] no, it'll interpret each one and aggregate as normally to the extended properties of .lower/.upper/mean/median etc. [22:22:44] they'll all fall within the same 1minute flush interval [22:23:03] which is fine given the buffer is 1s [22:23:36] oh, yeah i guess that'd be fine too [22:23:37] hm [22:23:51] but then if there are 1000s (don't think there will be) of conns/ disconns per second per worker [22:23:54] anyway, I'm merely bringing up app-level aggregation so that you don't end up sending a message for every con/discon which seems like it could potentially be a high number once more widely deployed. Other collectors do the same for metrics that are user controlled. [22:23:56] the statsd message willb e kinda large [22:24:02] exactly [22:24:26] ya app level aggregation is fine [22:25:01] our resourceloader/varnish metrics for example aggregates every 1000 requests or 50 seconds and just uses min() or max() to "add" to the number (depending on which one you care for). YOu could say connections = Math.max(connections, current); and reset once every 10 seconds or so. [22:25:06] anyhow, enough :) [22:26:40] hm, since i actually will decrement these on disconnect [22:26:49] i think i can just keep track of the total number of connections [22:27:02] without having to reset [22:27:08] ja ok, thanks for the tips [22:27:20] if yo don't mind, i'll add you as reviewer to this patch. there are a lot of considerations here [22:27:30] will probably do that tomorrow, gonna peace out for the day [22:29:55] 10Analytics, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3003537 (10Ottomata) [22:32:42] 10Analytics, 06Operations, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3003545 (10Ottomata) [22:49:45] 10Analytics, 06Operations, 03Scap3: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3003573 (10thcipriani) >>! In T155856#3003544, @Ottomata wrote: > I can help! yay! Thanks in advance. Feel free to poke me in IRC if you have questions/problems/need a post-deploy checker. [22:52:39] (03CR) 10EBernhardson: "hmm, indeed we need to directly drop those in hive. will update." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 (owner: 10EBernhardson) [23:30:32] 10Analytics, 06Performance-Team: Check if the EventLogging User Agent schema upgrade breaks any performance tool/metric - https://phabricator.wikimedia.org/T156760#3003685 (10Krinkle) >>! In T156760#2995375, @Nuria wrote: > {"event": { .. }, "userAgent": "{\"os_minor\": \"2\", \"os_major\": \"4\", \"device_fa... [23:35:16] (03PS3) 10EBernhardson: Adjust drop hourly script to handle mediawiki log partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 [23:38:21] (03CR) 10EBernhardson: "after closer review of the existing scripts refinery-drop-hourly-partitions needs only very small modifications, which are already paramet" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 (owner: 10EBernhardson) [23:38:39] (03PS4) 10EBernhardson: Adjust drop hourly script to handle mediawiki log partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 [23:42:47] (03CR) 10EBernhardson: "tested as dry run on stat1002, the results look reasonable. The initial hdfs command to drop old data is pretty long, but a quick test sho" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/335158 (owner: 10EBernhardson)