[07:43:34] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3616946 (10MoritzMuehlenhoff) [08:00:10] morning! [08:00:36] All the oozie alerts seems to be not actionable right? I checked webrequest upload/text and all is fine [08:04:36] moooorning elukey ! [08:05:29] hola Fran! [08:19:29] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3616964 (10elukey) [08:47:03] Hi elukey and fdans :) [08:47:25] hellooooo joseph! [08:48:01] Sorry for not having that polite and nice past few days ... I'm worrieed about those WKS2 backend stuff :( [08:49:13] elukey: Alerts were due to my HUGE job [08:49:21] I'll make sure not to run it again :) [08:50:04] :) [08:58:32] joal: I haven't noticed any inconsistencies in your daily charm ;) [08:58:42] mwahahaha :) [08:58:59] joal is referring to me, but I was joking during yesterday's standup :) [08:59:04] elukey, however, THAT'S A DIFFERENT STORY [09:03:16] fdans: I DON'T GET YOUR PASSIVE AGGRESSIVE mood [09:03:20] :D [09:03:46] (╯°□°)╯︵ ┻━┻ [09:40:59] just realized that analytics1062 was down [09:41:05] completely frozen [09:41:08] just rebooted it [09:41:54] 10Analytics-Kanban, 10Analytics-Wikistats: Productionise list view - https://phabricator.wikimedia.org/T175265#3617111 (10fdans) [09:45:15] the host looks good [09:45:51] elukey: Could have been my huge job from yesterday - it generated a hell lot of tmp data [09:45:58] :( [09:46:34] mmmm I don't think so, it was completely frozen, not stuck [09:46:40] ok [09:46:42] like when a kernel error happens [09:46:45] really weird [09:46:49] After all, maybe not every failure is mine ;) [09:49:37] joal: nope! it's almost always elukey's fault! [09:49:47] * joal hugs ema :) [09:50:07] Long time no see ema, how are you> [09:50:09] ? [09:50:46] indeed! All good, just came back to The North after a few days in my hometown [09:51:07] mostly involving swimming in the Ligurian Sea and eating seafood [09:51:21] ema: Sounds like a little bit of paradise :) [09:51:37] While getting back in The North when winter is coming, a bit more dangerous :) [09:53:52] yeah [09:55:09] joal: how's your part of The North? [09:55:19] ema: Full of kids :) [09:55:53] the free folk! :) [09:55:58] ema: which is kinda comprable to white walkers somehow [09:56:06] :d [09:56:08] lol [10:19:39] for date-formats lovers: https://pbs.twimg.com/media/DJd6SeoXgAAg0Ui.jpg:large [10:43:03] * elukey lunch! [11:51:11] just disabled monitoring and alerting (graphite based) for kafka-jumbo [11:57:46] 10Analytics, 10Operations, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3617323 (10Jan_Dittrich) > You can use eventlogging and wikimediaevents code at this time , there are quite > a bit of examples of how to run ab tests on discovery's code. My concern is mainly with... [11:57:51] 10Analytics, 10Operations, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3617324 (10Jan_Dittrich) > You can use eventlogging and wikimediaevents code at this time , there are quite > a bit of examples of how to run ab tests on discovery's code. My concern is mainly with... [12:00:19] hellooooooo teammmm [12:08:43] hi mforns :) [12:08:50] hello joal :] [12:09:30] there's a lot of oozie emails, it's my ops week, should I look at them (you probably already have)? [12:09:36] joal, ^ [12:09:55] mforns: Those emails are due to me launching too big of a job yesterday - sorry for he noise :( [12:10:05] ah ok! [12:11:21] 10Analytics, 10EventBus, 10Wikidata, 10MW-1.30-release-notes (WMF-deploy-2017-09-19 (1.30.0-wmf.19)), and 2 others: Very large jobs posted by Wikidata - https://phabricator.wikimedia.org/T175316#3617334 (10Pchelolo) 05Open>03Resolved Verified that the jobs are now small enough to fit in our new infrast... [12:26:50] Taking a break a-team [13:33:36] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3617449 (10Zoranzoki21) 05Open>03Resolved I think to this can be closed as resolved. [13:40:19] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 3 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3617483 (10Pchelolo) [13:40:23] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Add unit tests to EventBus extension - https://phabricator.wikimedia.org/T175958#3617481 (10Pchelolo) 05Open>03Resolved Some tests were added by the above patches - coverage is not great yet, but I think we can resolve this now. [14:03:10] (03CR) 10Elukey: [C: 031] "It looks good to me, I was about to ask to touch a file when the script runs to avoid duplicates but if we run it once a month it is fine" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [14:43:43] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3617712 (10mforns) 05Resolved>03Open Hi @Zoranzoki21 I'm still working on this task since a couple days ago, and it is still broken. One of the gerrit patches was mer... [14:47:09] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3617717 (10Zoranzoki21) >>! In T164409#3617712, @mforns wrote: > Hi @Zoranzoki21 > I'm still working on this task since a couple days ago, and it is still broken. > One o... [14:49:20] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3617730 (10mforns) No problem! It's good to know that more eyes are looking at these tasks :] [14:57:26] 10Analytics-Kanban, 10Analytics-Wikistats: Use daily granularity for 1-month time ranges - https://phabricator.wikimedia.org/T173372#3617746 (10fdans) [14:57:28] 10Analytics-Kanban, 10Analytics-Wikistats: Productionise line graph - https://phabricator.wikimedia.org/T171766#3617747 (10fdans) [14:57:30] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (4/4) - Detail page - https://phabricator.wikimedia.org/T170940#3617748 (10fdans) [14:57:32] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (3/4) - Data issues - https://phabricator.wikimedia.org/T170937#3617749 (10fdans) [14:57:34] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (1/4) - Dashboard and general UI - https://phabricator.wikimedia.org/T170933#3617751 (10fdans) [14:57:36] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (2/4) - Wiki selector - https://phabricator.wikimedia.org/T170936#3617750 (10fdans) [14:57:38] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3617752 (10fdans) [15:00:29] ping fdans mforns [15:00:35] sorryyy [15:02:07] going [15:46:17] elukey: sorry i didn't get your explanation of themonitoring_enabled thing [15:46:22] milimetric, mforns: quick review of backend? [15:46:28] joal, yep! [15:46:37] do we need it? i thought we were going to use prometheus only for new (profiled) kafka stuff [15:46:42] back to cave! [15:46:51] Yes ! [15:49:17] ottomata: I added a hiera parameter to profile::kafka::broker to enable/disable the inclusion of jmxtrans and alerts based on graphite. It is currently set to false for jumbo, so we don't have a gazillion criticals in icinga fro now [15:49:57] (sorry I got the second part only now) [15:50:17] no we don't really need it buuut before dropping I thought to use a less drastic step :) [15:50:21] wanted to check in with you first [15:50:35] we can easily remove it from puppet if we want [15:51:00] but we can't use it, can we? [15:51:03] we need to rebuild jmxtrans [15:51:18] when would we ever set it to true? [15:51:26] yep, but it is an easy step if we see that prometheus does not work for XYZ [15:51:43] i guess so, seems weird to add it if we plan on never using it [15:52:11] also, if we are adding it, why not call it jmxtrans_monitoring_enabled? [15:52:13] I only wrapped some code up, that's it, just wanted to check in with you first [15:52:16] to match the other one [15:52:17] ? [15:52:44] elukey: , another unrelated thought... [15:52:54] isn't AQS in the analytics network? [15:52:58] or is it in prod? [15:53:06] IIRC it is in prod [15:53:10] hm ok [15:53:11] cool [15:53:14] lemme check [15:54:35] 10Analytics-Kanban, 10Analytics-Wikistats: Create Druid HTTP basic auth proxy and LVS endpoints for druid, open this to AQS in prod network - https://phabricator.wikimedia.org/T176223#3617909 (10Ottomata) [15:54:39] joal: omw [15:54:44] joal: elukey ^^ for this task [15:55:05] it's in prod [15:55:57] ok [15:56:00] thanks for checking [15:56:14] 10Analytics, 10Operations, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3617926 (10Nuria) @Jan_Dittrich : bucketing is available as part of wikimedia events, see an example of usage as part of serach code: https://github.com/wikimedia/mediawiki-extensions-WikimediaEvent... [15:56:33] for the name: I am going to remove that puppet code tomorrow morning, it was only a placeholder to 1) ask you opinion first 2) avoid any spam by icinga [15:56:58] since it is basically used only by jumbo I didn't see a big issue in doing it [15:57:04] fdans: did you double checked that the newest set of fixes are deloyed to prod? [15:58:14] nuria_: I did, they are :) [15:58:32] fdans: pk, thank you , we can move all those to done then? [15:59:32] sure, will move them now [16:01:23] (03CR) 10Nuria: [V: 031 C: 031] "Ready to merge then @elukey, we need to merge this code before puppet changes" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [16:02:24] nuria_: you guys can merge and deploy --^, I'll take care of puppet [16:03:45] elukey: ok [16:04:31] (03CR) 10Nuria: [V: 032 C: 032] Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [16:04:42] elukey: merged now [16:12:39] elukey: qqs about druid lvs [16:12:39] and proxy [16:12:45] there are a bunch of druid services [16:12:52] the one that accepts queries is the broker on port 8082 [16:13:02] it is likely that that is the only one that will need lvs [16:13:05] and proxy [16:13:07] but [16:13:13] how do you think the proxy should be set up? [16:13:16] just port based routing? [16:13:22] e.g. reqs on port 8182 (or something) all go to 8082? [16:13:23] or [16:13:25] name based? [16:13:26] e.g. [16:13:38] requests to druid.svc.eqiad.wmnet go to localhost:8082 [16:13:39] ? [16:13:51] and if we need to route other services in the future, they will have a different name? [16:13:58] e.g. druid-coordinator.svc.eqiad.wmnet [16:14:26] say we wanted to expose the admin web guis, like we do for yarn. [16:19:24] I think that it should be druid.svc.eqiad.wmnet goes to druid100[1-6]:port_used_by_the_proxy, that in turn proxies to 8082 [16:20:16] and druid.svc.eqiad.wmnet could be reused for other ports [16:21:01] say we set up druid.svc.eqiad.wmnet in front of druid100[1-6] [16:21:15] it should be possible to send requests to whatever backend ports [16:21:51] for api.svc.eqiad.wmnet we have only one domain [16:22:00] reachable via 80|443 [16:22:07] (apache,nginx on mw*) [16:22:17] makes sense? [16:23:33] ah ok [16:23:37] so if we need to reach another service [16:23:42] we'd use same domain, different port? [16:24:08] say we wanted coordinator [16:24:08] we'd do [16:24:20] it is port 8081 [16:24:26] we'd make proxy from 8181 [16:24:27] and do [16:24:29] druid.svc.eqiad.wmnet:8181 [16:24:33] whereas broker would be [16:24:35] druid.svc.eqiad.wmnet:8182 [16:24:36] ya? [16:24:43] elukey: ^? [16:25:44] I think so yes [16:30:32] ok cool [16:34:48] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog: EventLogging subscriber module in ready state but not sending tracked events - https://phabricator.wikimedia.org/T175918#3618093 (10ovasileva) p:05Triage>03High [16:35:34] wikimedia/mediawiki-extensions-EventLogging#692 (wmf/1.30.0-wmf.19 - 807040b : Translation updater bot): The build has errored. [16:35:34] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/compare/wmf/1.30.0-wmf.19 [16:35:34] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/277387998 [16:44:27] joal: you are going to respond to ops email with our plan, ya? [16:47:25] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid HTTP basic auth proxy and LVS endpoints for druid, open this to AQS in prod network - https://phabricator.wikimedia.org/T176223#3618159 (10Ottomata) [16:52:15] we said staff was cancelled today, right? [16:52:29] or no? [16:54:34] yep [16:54:58] gtg now but I'll check the lvs and el patch ottomata tomorrow morning [16:56:35] k cool [16:56:36] laters! [16:57:11] * elukey off! [17:11:37] 10Analytics-Kanban, 10Analytics-Wikistats: Add basic authentication abilities to restbase endpoint to be able to connect to druid authentication proxy - https://phabricator.wikimedia.org/T176234#3618269 (10Nuria) [17:19:14] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid HTTP basic auth proxy and LVS endpoints for druid, open this to AQS in prod network - https://phabricator.wikimedia.org/T176223#3618333 (10Ottomata) [17:20:47] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid HTTP basic auth proxy and LVS endpoints for druid, open this to AQS in prod network - https://phabricator.wikimedia.org/T176223#3617909 (10Ottomata) [17:20:49] 10Analytics-Kanban, 10Analytics-Wikistats: Add basic authentication abilities to restbase endpoint to be able to connect to druid authentication proxy - https://phabricator.wikimedia.org/T176234#3618343 (10Ottomata) [17:50:40] joal: should mw history queries in pivot/druid work ok? [17:52:22] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3618443 (10Nuria) [17:52:25] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Wikimedia-Stream, 10Patch-For-Review: Decom RCStream in Beta Cluster - https://phabricator.wikimedia.org/T172356#3618442 (10Nuria) 05Open>03Resolved [17:52:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations: Reinstall Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3618446 (10Nuria) [17:52:35] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3618444 (10Nuria) 05Open>03Resolved [17:52:43] 10Analytics-Kanban, 10Patch-For-Review: Add redirect and pagelinks tables for partition repair in sqoop job for mediawiki history - https://phabricator.wikimedia.org/T174484#3618447 (10Nuria) 05Open>03Resolved [17:53:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Mobile PMs has reports on session-related metrics from Wikipedia Apps {hawk} - https://phabricator.wikimedia.org/T86535#3618449 (10Nuria) [17:53:34] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Make oozie work with spark jobs that use HiveContext - https://phabricator.wikimedia.org/T94596#3618448 (10Nuria) 05Open>03Resolved [17:53:46] 10Analytics-Kanban, 10Patch-For-Review: Add zero carrier to pageview_hourly data on druid - https://phabricator.wikimedia.org/T161824#3618452 (10Nuria) 05Open>03Resolved [17:53:59] 10Analytics-Cluster, 10Analytics-Kanban: Backup HDFS NameNode fsimage metadata - https://phabricator.wikimedia.org/T175740#3618455 (10Nuria) 05Open>03Resolved [17:54:13] 10Analytics-Kanban, 10Analytics-Wikistats, 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban): Set up continuous integration for wikistats 2.0 UI - https://phabricator.wikimedia.org/T170458#3618459 (10Nuria) [17:54:16] 10Analytics-Kanban, 10Analytics-Wikistats: Cleanup Routing code - https://phabricator.wikimedia.org/T170459#3618458 (10Nuria) 05Open>03Resolved [17:55:46] fdans: wikistats looking and functoning lot better ! [17:58:55] 10Analytics, 10Analytics-Wikistats: Wikistats unique devices metrics needs some copy that says "monthly" - https://phabricator.wikimedia.org/T176240#3618486 (10Nuria) [17:58:59] back from lunch [18:00:26] ottomata: They should, but only for the last 2 years [18:00:44] ottomata: Currently writing an email to ops [18:01:11] gr8 [18:01:19] ottomata, elukey: Just found a kerberos extension for druid - Would we be interested in using that, or do we prefer BasicAuth proxying? [18:01:52] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats metrics should link to corresponding page in meta - https://phabricator.wikimedia.org/T176241#3618515 (10Nuria) [18:01:56] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats metrics should link to corresponding page in meta - https://phabricator.wikimedia.org/T176241#3618528 (10Nuria) [18:03:06] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats unique devices metrics needs some copy that says "monthly" - https://phabricator.wikimedia.org/T176240#3618529 (10Nuria) [18:04:56] milimetric, I'm also here [18:04:57] mforns, fdans : also nice looking urls [18:05:09] in wikistats that is [18:05:16] cool, one sec, doctor called [18:05:21] milimetric, np [18:10:28] ottomata, elukey - Just an email for proof-reading before ops list [18:11:02] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#3618572 (10GWicke) I honestly don't have a strong preference between the other "hearted" tasks. Given that all of them are f... [18:11:36] mforns: ok, hangout? [18:11:42] milimetric, yes [18:11:46] omw [18:17:59] hey, was there ever a conclusion about the chrome 41 bug? [18:18:03] like do we know why that happened? [18:19:25] ottomata: only for some of it [18:19:48] ottomata: most of the traffic (not all) disappeared when we returned a 403 response [18:20:11] ottomata: to those requests at the varnish layer [18:21:39] specifically to chrome 41? [18:21:40] we did that? [18:21:52] we never heard from chrome folks about it? [18:22:54] ottomata: 401 wait [18:23:05] ottomata: it was not chrome, it was windows ciphers [18:23:20] ottomata: teh windows +chrome combo [18:24:12] ottomata: https://gerrit.wikimedia.org/r/#/c/313828/2/modules/varnish/templates/text-frontend.inc.vcl.erb [18:25:09] huh [18:25:45] ottomata: that stopped one of teh issues that "seemed" to be attempting over and over to stablish an https connection [18:25:59] ottomata: there was a service pack ms release in the middle of it too [18:28:08] Pchelolo: mmm.. where could i see the authentication support for restbase ? i was looking all over teh docs w/o success [18:29:28] nuria_: authentication? We never actually enabled restbase on private wikis, so although RB is capable of authenticating users that codepath is still experimental and untested [18:29:45] Pchelolo: no from client to restbase [18:29:56] Pchelolo: but rather from restbase outside [18:30:03] Pchelolo: sorry [18:30:10] Pchelolo: let me explain better [18:30:23] Pchelolo: imagine taht instead of connecting to cassandra i connect restbase to mysql [18:30:27] nuria_: is that what milimetric was asking for with basic auth to druid? [18:30:32] Pchelolo: yes [18:30:42] Pchelolo: ah sorry, did he do that on svcs channel? [18:31:00] ya, that was on services channel [18:31:02] I forget to switch always if people are logged in here [18:31:15] Pchelolo: will re-read no need to repeat [18:31:51] so TLDR is that we don't have anything fancy built in because nobody neededf it, so we decided that for now you'll just manually do the header and then we can add more fancy support [18:34:23] Pchelolo: it is a bit more than a header, right? tls + retrieval of password of token from private git repo + basic auth connection [18:35:37] Pchelolo: sorry, retrieval of password Or token or user/pw combo from private git repo [18:36:20] currently all passwords that RB uses are stored in puppet-private and then added to the config.yaml [18:36:25] so that's easy [18:37:16] Pchelolo: ok so password or token retrieval shoudl be easy, how about stablishing tls http connection ? [18:37:53] milimetric, mforns: are you in da cave? [18:38:00] yep :] [18:38:00] hm we use TLS for cassandra, but I'm not the best person to ask about that one, I don't know how that works.. [18:38:10] ok joining [18:38:16] try checking with urandom on services channel [18:38:50] Pchelolo: ok [18:39:13] nuria_: it won't be TLS i don't think [18:39:23] just pw auth [18:39:37] doint TLS will be a much bigger task [18:39:45] but will be easier after cergen stuff next quater! :) [18:48:23] ottomata: but plain user/pw auth is no auth as it is plain text http [18:48:36] ottomata: so defacto visible/spoofable by everyone [18:51:37] is true! :) [18:51:50] but we def won't have TLS on this this quarter. i mean i guess we could, but it would be hacky and annoying [18:58:16] ottomata: Heya - Can we ask for some help in druid conf setting? [19:00:52] ya! [19:01:01] wassup joal? [19:03:26] ottomata: We realised that even with druid 0.9.2, we were not using the groupBy v2 feature because of conf [19:03:52] ottomata: Do you think we could ask for a fast update in conf to check if it actually solves our issue ? [19:04:21] you need it on all nodes? [19:04:22] or just one? [19:04:54] ottomata: all nodes, and update applies to broker, historical and middle-managers :( [19:04:59] ottomata: see https://raygun.com/blog/druid-groupby-v2-engine/ [19:05:02] ok gonna puppetize it then, its fine [19:05:14] wellllll i guess i could do it real quick [19:05:14] to see [19:05:16] ottomata: We'd need druid.processing.numMergeBuffers set [19:05:31] ottomata: if it works, it's really of great help :) [19:06:56] ottomata: Just double checked values for broker: we could set it to 20 [19:07:03] nMaybe we can say 10 to be safe :) [19:08:25] ottomata: actually, we'll need different values for the various servers (since buffer size changes) [19:08:35] right [19:09:02] ottomata: important ones for us are broker and historical - middle-manager is for real-time tasks, we're not after those (yet) [19:10:39] ottomata: 20 would work for both broker and historical, but let's be safe and say 10 maybe? [19:10:39] joal: do we need to set all of them to see if this works? [19:10:47] ottomata: I think we do, yes [19:11:33] joal i don't think 20 would fit for broker [19:11:38] druid.processing.buffer.sizeBytes=2147483647 [19:11:44] -XX:MaxDirectMemorySize=64g [19:11:51] druid.processing.numThreads=10 [19:12:12] 2147483647*(20+10+1) == 66571993057 [19:12:20] 61 G [19:12:22] oh i guess it would [19:12:22] hm [19:12:28] is that ok though?i don't really know what this does [19:12:34] would that take up all the RAM? [19:12:35] ottomata: 10 would already be good for a test [19:13:26] ottomata: Those buffers will be allocated yes - so, let's go for 10, to leave some space for caching [19:14:08] joal: to test, can we just set it for broker? would that be enough for you to verify? [19:14:33] ottomata: I don't think so - broker and historical are needed (actually I'd say historical even more) [19:14:46] historical do the computation ottomata, so it needs it [19:15:15] ok, i will set in common properties for now, and restart historical and broker to test [19:15:33] Thanks a lot ottomata - You rock [19:15:41] let us know when restarted :) [19:19:37] !log restarting druid broker and historical processes with druid.processing.numMergeBuffers=10 [19:19:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:21:40] ottomata: Will care the failed oozie job [19:24:08] !log Rerun pageview-druid-hourly-wf-2017-9-19-17 failed during druid restart [19:24:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:24:24] joal: what should we set middlemanager to if we do this? [19:24:59] ottomata: hm - middle-managers should have really smaller datasets (realtime only) [19:25:16] ottomata: buffers and threads are smaller as well ... [19:25:47] MaxDirectMemorySize is nto set thoguh [19:25:53] ya heap is only 64m [19:29:41] joal: am getting not enough direct memory errors [19:29:46] 9) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, druid.processing.numThreads, or druid.processing.numMergeBuffers: maxDirectMemory[2,147,483,648], memoryNeeded[7,516,192,768] = druid.processing.buffer.sizeBytes[536,870,912] * (druid.processing.numMergeBuffers[10] + druid.processing.numThreads[3] + 1) [19:29:57] OH [19:29:59] oh [19:30:02] middlemanager [19:30:06] because peons use this too [19:30:10] and i put it in common [19:30:11] fixing [19:30:17] Arf [19:31:23] ok joal that may mean some failed jobs [19:31:31] since those new peon processes will have failed [19:31:35] i took it out of common [19:31:43] and put it in just broker and historical [19:31:46] will look after that ottomata [19:31:47] and those are restarted [19:31:52] so you should be able to test v2 now [19:32:25] I have already started playing :) [19:33:05] Thanks again a lot ottomata [19:33:28] you testing? [19:33:28] ResourceLimitExceededException: Grouping resources exhausted [19:33:57] from historical [19:34:09] Yes, I get those errors as well [19:34:16] ottomata: I'm trying hard stuff [19:34:27] ha ok [19:41:41] joal: let me know if i can/should make that change permanent [19:41:44] puppet is currently disabled [19:42:11] ottomata: So, there definitely is some improvement with v2, but not enough for us to use that as a core [19:42:45] ottomata: I would be very usefull to make the changes permament, but you have time [19:45:35] joal: it is not hard, jet let me know if i should [19:45:38] just* [19:45:41] i'll do it now [19:45:45] ottomata: I think we should [19:45:48] ok [19:45:52] for middlemanager too? [19:51:01] joal? [19:51:13] ottomata: oh sorry [19:51:18] ottomata: hm, not sure [19:51:40] ottomata: I don't think we use groupBy over realtime data [19:51:46] ok [19:51:46] so I'd say no [19:51:55] so 10 for broker, 10 for histoircal [19:52:26] sounds good [19:53:02] joal: https://gerrit.wikimedia.org/r/#/c/378992/ [19:53:19] oop [19:53:19] s [19:57:26] ottomata: seems to be 2 times the lines in broker file in CR [19:59:18] fixed in patch 2 [19:59:32] joal: [20:14:34] 10Analytics-Kanban, 10Discovery, 10Operations, 10Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#3619208 (10debt) [20:22:39] (03PS1) 10Joal: Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) [20:25:06] (03CR) 10Joal: "2 potential improvements: remove page_title as we actually don't need it (we use page_id for correct topN results). Also, use a fake event" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [20:25:15] milimetric, mforns --^ [20:25:37] I have tested the metrics, they seem to work (except for 1, but it's heavily flagged) [20:26:04] 10Analytics-Kanban, 10Analytics-Wikistats: Pageview retrieval does not work if one of the fails requests - https://phabricator.wikimedia.org/T176261#3619243 (10Nuria) [20:28:17] * milimetric looks [20:29:45] 10Analytics, 10Operations, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3619277 (10Tbayer) >>! In T135762#3617324, @Jan_Dittrich wrote: >> You can use eventlogging and wikimediaevents code at this time , there are quite >> a bit of examples of how to run ab tests on dis... [20:29:55] (03CR) 10Joal: [C: 04-1] Update mediawiki-history-reduced oozie job (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [20:30:15] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for PageCreation and other Page* schemas, archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3619279 (10Nuria) [20:44:07] 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#3619321 (10Ottomata) [21:11:26] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for PageCreation archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3619429 (10Nuria) [21:33:20] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for PageCreation archive table on hdfs - https://phabricator.wikimedia.org/T171629#3619521 (10Nuria) [21:33:39] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for PageCreation archive table on hdfs - https://phabricator.wikimedia.org/T171629#3471480 (10Nuria) Ok, we are ready to drop PageCreation_7481635_1542324 and PageCreation_7481635 from MySQL ping @elukey Working now on removing events from mediawik... [21:45:57] logging off for tonight a-team - see you tomorrow [21:48:31] latersss [22:32:07] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for outdated schems PageCreation, PageDeletion, PageMove, archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3619883 (10Nuria) [22:33:20] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for outdated schemas PageCreation, PageDeletion, PageMove, PageRestoration. Archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3471480 (10Nuria) [22:34:04] 10Analytics-Kanban, 10Patch-For-Review: Stop collecting Data for outdated schemas PageCreation, PageDeletion, PageMove, PageRestoration. Archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3471480 (10Nuria) [22:40:16] 10Analytics: Correct pageview_hourly and derived data for T141506 - https://phabricator.wikimedia.org/T175870#3619906 (10Tbayer) >>! In T175870#3611263, @Nuria wrote: >>And community members and the public do not have that option and are thus left with the faulty data. > This is most certainly not true, any data... [22:41:50] (03Abandoned) 10Nuria: Add "Interwicket" to the list of bots [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/337632 (https://phabricator.wikimedia.org/T154090) (owner: 10Nuria) [23:23:01] 10Analytics: Correct pageview_hourly and derived data for T141506 - https://phabricator.wikimedia.org/T175870#3620005 (10Tbayer) >>! In T175870#3606162, @Nuria wrote: > We already discussed this issue on this ticket: https://phabricator.wikimedia.org/T141506#2575088 and I second @BBlack 's > opinion. In a gist...