[09:37:08] (PS4) Joal: Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) [10:06:25] Quarry: Show all published queries in profile - https://phabricator.wikimedia.org/T77948#1404110 (Edgars2007) Temporary solution. In the "Recent queries" page add `?limit=5000` to page URL, so you (currently) get all queries. Then you can search for your username. Yes, it isn't a simple way, but at least it... [11:40:54] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1404495 (Aklapper) >>! In T103292#1399462, @Qgil wrote: > How does Metrics Grimoire scan Git/Gerr... [11:47:16] Analytics-Tech-community-metrics: Exclude pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1404499 (Aklapper) NEW [11:48:19] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1386502 (Aklapper) >>! In T103292#1387470, @Qgil wrote: > We should probably take the repository... [12:57:57] Analytics-Kanban: Vet data in intermediate aggregate {wren} [8 pts] - https://phabricator.wikimedia.org/T102161#1404685 (JAllemandou) Analysis done one one hour of data: 2015-06-24T00:00:00, using newly generated projectview and legacy projectcounts. It is to be noted that new projectview files don't contain... [13:26:22] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1404729 (Aklapper) > According to the [[ http://korma.wmflabs.org/browser/scm.html | "Authors" gr... [13:27:33] (CR) Ottomata: Add webstatcollector projectview transformation (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) (owner: Joal) [14:24:37] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 40.00% of data above the critical threshold [30.0] [14:24:51] Hey ottomata ! [14:24:53] heya! [14:24:57] that is probably me! [14:24:59] that el alarm [14:25:00] Thx foe the reviews :) [14:25:03] yup [14:25:07] should be ok in a sec. [14:25:10] Yeah, that would have been my next question ;) [14:27:36] (PS5) Joal: Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) [14:27:46] Hey ottomata, hopefully the last one :) [14:27:54] Thx or spotting inconsistencies ! [14:29:30] haha, joal i see one more! can we make the actual names of the hql files match too? :D [14:29:40] hive_script_transform [14:29:43] archive_webstatcollector.hql [14:29:44] ottomata: :S sorry sir ;) [14:30:06] maybe this is fine? [14:30:06] hive_script_aggregate [14:30:06] javascript:; [14:30:06] javascript:; [14:30:06] [14:30:07] [14:30:07] javascript:; [14:30:08] javascript:; [14:30:09] projectview_hourly.hql [14:30:09] projectview_hourly.hql [14:30:10] eek [14:30:10] anyway ja [14:30:16] ? [14:30:29] first one: aggregate_projectview.hql [14:30:42] second: transform_projectview.hql [14:30:47] ottomata: --^ [14:30:49] ? [14:31:27] joal: maybe transform_projectcount? [14:31:40] it is generating the projectcount dataset, right? [14:31:45] yup [14:31:48] ok [14:31:58] Then transform_projectview_projectcounts [14:32:05] ? [14:32:06] hehe [14:32:24] transform projectcounts.hql is fine :) [14:32:53] pagview_aggregator_projectview [14:32:53] projectview_transform_projectcount [14:32:53] ? [14:32:58] aggregate* [14:33:39] we can make them even more explicit joal, if you like [14:33:48] transform_projectcount_to_projectview.hql [14:34:02] soryr, other way around:) [14:34:11] in pagecounts, 2 coords, hql files are: insert_pagecounts_hourly.hql and archive_projectcounts.hql [14:34:21] transform_projectview_to_projectcount.hql [14:34:21] aggregate_pageview_to_projectview.hql [14:34:26] yeah [14:34:33] I like explicit, so let's have aggregate_pageview_to_projectview.hql [14:34:47] transform_projectview_to_projectcounts.hql [14:35:04] ok [14:35:17] ok not have hourly in the title, right ? [14:38:34] (PS6) Joal: Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) [14:38:38] ottomata: hopefully good this time :) [14:44:37] PROBLEM - Check status of defined EventLogging jobs on analytics1010 is CRITICAL Stopped EventLogging jobs: reporter/statsd [14:45:07] that's ok! [14:45:08] PROBLEM - Eventlogging /srv disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 14928 MB (84% inode=93%) [14:45:11] weird that that happens [14:45:12] huh! [14:45:13] cool! [14:46:47] ottomata: cool ? [14:47:23] sorry joal, am fixing some EL puppet stuff, with you shortly [14:47:31] np :) [14:59:18] (CR) Ottomata: [C: 2 V: 2] Add webstatcollector projectview transformation [analytics/refinery] - https://gerrit.wikimedia.org/r/220426 (https://phabricator.wikimedia.org/T101118) (owner: Joal) [14:59:27] Thx andrew ;) [14:59:37] Tell me, what was cool about hte EL alarm ? [15:01:49] PROBLEM - Check status of defined EventLogging jobs on graphite consumer on hafnium is CRITICAL Stopped EventLogging jobs: reporter/statsd [15:03:38] which one? :) [15:03:53] the first one that triggered today was def a real problem, but very short [15:04:11] i am changing hosts in the URIs [15:04:14] using IP address [15:04:23] had already tested that in labs [15:04:28] but made a mistake when testing in prod [15:04:33] so it broker for a little bit [15:04:35] broke* [15:04:42] the other ones, like analytics1010, are just dumb [15:04:51] consequence of monitoring classes not being very smart about where they are applied [15:04:58] makes sense [15:05:18] PROBLEM - Check status of defined EventLogging jobs on hafnium is CRITICAL Stopped EventLogging jobs: reporter/statsd [15:05:48] PROBLEM - Eventlogging /srv disk space on hafnium is CRITICAL: DISK CRITICAL - free space: / 1129 MB (12% inode=75%) [15:07:22] ok that is more interseting, checking on that [15:07:28] i'm sure that is not real, but it shoudlnt' fire [15:07:55] So tell me, just so that I follow: you are at stage 1 (multiple outputs for processors)? [15:08:27] haven't deployed that yet [15:08:31] that is running in beta [15:08:32] but not prod [15:08:32] ok [15:08:37] want to deploy that today [15:08:49] So how come changing hosts in URI ? [15:08:49] RECOVERY - Check status of defined EventLogging jobs on graphite consumer on hafnium is OK All defined EventLogging jobs are runnning. [15:08:58] RECOVERY - Check status of defined EventLogging jobs on hafnium is OK All defined EventLogging jobs are runnning. [15:09:01] to get 0mq rfom everywhere ? [15:11:04] ottomata: ottomata you sure you wanna deploy on Friday ? [15:11:08] maybe monday ?g [15:11:12] yeah i do! [15:11:13] heheh [15:11:16] it is early enough [15:11:17] ;) [15:11:28] it'll deploy code to eventlog1001, but wont' deploy and functional changes there [15:11:36] On my side, I'll wait monday for the project stuff [15:11:43] haha [15:11:43] k [15:12:58] joal: yes pretty much, zmq everywhere. also it has to do with how the variables are set in puppet. zmq doesn't like hostnames it seems, and 0.0.0.0 is too generic. so i'm having it just bind and use the main ipaddress everywhere by default [15:14:15] ottomata: how awefull :( [15:15:51] awful!?, its ok. :) [15:15:58] huhu :) [15:15:59] it is a facter variable in puppet [15:16:12] i also refactored the puppet stuff to make it easier to put different services on different nodes [15:16:17] * joal doesn't like static IPs in conf, [15:16:19] RECOVERY - Check status of defined EventLogging jobs on analytics1010 is OK All defined EventLogging jobs are runnning. [15:16:30] :D [15:16:38] ottomata: That's cool :) [15:17:36] what happened here, unicode? https://github.com/wikimedia/operations-puppet/commit/54fa58df5bb526f3e9ec15fd7080f58f52f25e0d [15:18:20] YOU WILL NEVER FIND OUT! HAH! [15:18:39] haha [15:19:25] milimetric: it's an ops secret. we both see the same diffs, BUT WE SEE MORE THAN YOU DO! [15:19:42] milimetric: i've been climbing a lot, and that is making my pinky get beefier, which may or may not have caused it to weigh down on a certain meta key while typing a space [15:20:41] :) [15:20:45] you crazy kids [15:22:09] A meta-space is not acceptable, that's for sure ! [15:30:56] ottomata: how do you run the processor with input from a file:// and output to a file:// ? I fail: [15:30:56] time ./eventlogging-processor --sid client-side events %q file:///home/milimetric/load.test.30k file:///home/milimetric/out.load.30k [15:36:00] you don't need an sid if you are not using tcp:// but that is not your problem [15:36:44] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1405130 (ggellerman) p:Triage>Normal [15:38:20] milimetric: what's your error (it doesn't work for me either :) ) [15:38:43] you could do [15:38:46] cat ... | stdin:// [15:40:38] handler = handlers[parts.scheme] [15:40:38] KeyError: u'file'? [15:40:53] oh, milimetric! there is no file reader [15:40:59] you'll have to use stdin [15:41:11] or make a file reader :) [15:42:00] makes sense, doh [15:44:36] o/ ottomata [15:44:46] it looks like http://datasets.wikimedia.org/ is timing out when I try to download a dataset [15:45:09] Not sure what's up. I had someone else test too. [15:45:33] halfak: what data? [15:46:29] http://datasets.wikimedia.org/public-datasets/enwiki/etc/session_revisions.20131105.tsv.gz [15:46:44] Hmm... I just got another dataset to download. [15:46:52] Or start downloading rather. [15:50:35] joal: the pulling of the aggregator-data repository is done automatically: [15:50:36] https://github.com/wikimedia/operations-puppet/blob/acacf97e2df962fef83487a461f3559fa07e4d6f/manifests/role/wikimetrics.pp#L314 [15:51:05] so as long as you're using the same repo, it will pull. However, we should change the symlink. I'll make a task explaining [15:51:11] Yeah. It looks like that one link ottomata. [15:51:19] For some reason I can't even get the download to start. [15:51:26] It's been a few minutes. [15:51:44] hmm, 9G [15:51:53] FWIW, that's one of the larger files [15:51:56] yeah. [15:52:06] But you'd expect the bits to start transfering right away. [15:52:51] hm, yeah it is weird for sure [15:52:54] i can't even HEAD it [15:53:14] joal: https://gerrit.wikimedia.org/r/#/c/220952/ [15:54:43] Seems like it is a normal file: https://gist.github.com/halfak/5d8036ce5a5609563e71 [15:54:46] ottomata, ^ [15:54:59] Analytics-Backlog: Link to new projectcounts data and serve via wikimetrics - https://phabricator.wikimedia.org/T104003#1405205 (Milimetric) NEW [15:55:05] joal: ^ [15:55:28] i think varnish is attempting to cache these large files [15:55:38] Oh! Maybe that's why it takes so long? [15:55:44] Also. AHHHH! [15:57:31] SELECT month, [15:57:31] day, [15:57:31] COUNT(DISTINCT COALESCE(x_analytics_map['wmfuuid'], [15:57:31] parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID'))) AS app_uniques [15:57:31] FROM wmf.webrequest [15:57:31] WHERE user_agent LIKE('WikipediaApp%') [15:57:31] AND parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'action') = 'mobileview' [15:57:32] AND COALESCE(x_analytics_map['wmfuuid'], [15:57:32] parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL [15:57:33] AND webrequest_source IN ('mobile','text') [15:57:33] AND year=2015 [15:57:34] AND month=5 [15:57:55] pastebin, madhuvishy :P [15:58:00] oops. sorry, irccloud was supposed to tell me to paste it via pastebin [15:58:02] IRCCloud even asks you! :P [15:58:08] it dint! [15:58:17] yeah, blame the computer! ;) [15:58:20] madhuvishy, why are you using concat and parse_url? [15:58:53] why not just LIKE or RLIKE on uri_query? Presumably it should be somewhat faster ;p [15:58:55] Ironholds: gah, this is not my query [15:58:58] ahhh [15:59:25] Ironholds: but no, i din't know that [15:59:39] "bla.org/woo/" [15:59:52] ಠ_ಠ [15:59:54] halfak, my friend Kara's last name is woo. She has a personal R package. All the calls are extra-fun [15:59:56] woo::merge() [16:00:03] ha [16:00:08] ottomata: still weird... [16:00:10] milimetric@analytics1004:~/EventLogging/server/bin$ ./eventlogging-processor "%q" stdin:// stdout:// [16:00:11] /usr/bin/env: python -OO: No such file or directory [16:00:45] Ironholds: halfak this query is from - https://phabricator.wikimedia.org/diffusion/ANRE/browse/master/oozie/mobile_apps/uniques/daily/generate_uniques_daily.hql [16:00:48] ha, halfak, misc varnishes only have 8G memory allocated to them [16:00:51] that file is 9G [16:01:05] milimetric: [16:01:09] python ./eventlogging-processor [16:01:23] ottomata, can we not have a varnish between datasets.wikimedia and the world? [16:01:28] madhuvishy, aha [16:01:38] no, stat1001 no longer has a public IP, and i think that is the right hting [16:01:40] we should proxy to it [16:01:44] varnish or not [16:01:47] We have lots of files bigger than 9GB and it seems like varnish isn't helping anyone [16:01:58] but, i think we should tell varnish not to cache datsets maybe? [16:02:00] somehow? [16:02:05] Oh... sure. [16:02:08] That'd work too. [16:02:17] not sure how to do that, am pining bblack [16:02:22] No varnish == varnish not doing its thing [16:02:23] and poking around [16:02:26] Thanks ottomata. [16:02:32] Should I start a phab task? [16:03:44] Analytics-Backlog: Link to new projectcounts data and serve via wikimetrics {Musk} - https://phabricator.wikimedia.org/T104003#1405234 (ggellerman) [16:04:29] Analytics-Backlog: Link to new projectcounts data and serve via wikimetrics {Musk} - https://phabricator.wikimedia.org/T104003#1405205 (ggellerman) p:Triage>Normal [16:05:03] halfak: ja [16:05:12] i think i got something, if you create a task I can link it and ask bblack [16:05:29] OK will go [16:05:31] *do [16:05:34] will go do [16:05:36] :D [16:08:42] Analytics-Cluster, operations: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1405240 (Halfak) NEW [16:08:46] ottomata, https://phabricator.wikimedia.org/T104004 [16:08:47] Analytics-Backlog, Labs, Labs-Infrastructure: Report page views for labs instances - https://phabricator.wikimedia.org/T103726#1405247 (ggellerman) p:Triage>Low [16:08:47] gone did [16:10:43] halfak: danke [16:10:43] https://gerrit.wikimedia.org/r/#/c/221139/ [16:10:48] will ping bblack about that whne I see him around [16:11:03] Thanks! :) [16:13:58] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1405272 (ggellerman) @Legoktm - could you answer Milimetric's question? Thanks! [16:18:21] has datasets.wikimedia.org screwed someone over again? ;p [16:28:28] madhuvishy: The query you past, is that the one you are trying to run ? [16:28:48] Because it looks exactly the same as the prod one (no filter for specific domain) [16:29:01] milimetric: yo [16:29:27] cause i'm about to do it, should the topic name for the union eventlogging topic be [16:29:29] eventlogging-union? [16:29:39] is union a good name? [16:29:47] joal: hmmm, checking. [16:30:03] Analytics-Backlog, Labs, Labs-Infrastructure: Report page views for labs instances - https://phabricator.wikimedia.org/T103726#1405313 (Milimetric) Dear @Spage: we can't commit to supporting a production or labs instance of piwik which would help with this. Using Event Logging from labs might be an o... [16:30:22] combined? [16:30:37] oiy, I fear names like that [16:30:48] like whta? [16:30:52] because say we push labs events through, would that include labs events? [16:31:01] names like "all" are hard [16:31:07] i don't want all [16:31:12] because we are going to blacklist some [16:31:25] but, it will have more than one schema in it [16:31:26] so this is all - blacklist schemas [16:31:29] yes [16:31:31] joal: http://pastebin.com/wJQa2KHe [16:31:39] I'm filtering for uri_host [16:31:40] multischema [16:31:41] ew [16:31:44] uh... [16:31:57] polyschematic [16:32:12] mixed [16:32:16] yes! [16:32:17] good [16:32:39] well, better than polyschematic anyway [16:32:40] ok cool :) [16:32:41] haha misc [16:32:42] hehehe [16:33:04] heterogeneous [16:33:07] heh [16:33:21] I think mixed will make sense when listing schemas and seeing that others are schema specific [16:33:28] yeah mixed might be good [16:33:30] but union might imply "all" [16:33:32] joal: opine? [16:33:46] we need a topic name that includes by default all schemas, minus any thing that is blacklisted [16:34:02] eventlogging-mixed [16:34:02] ? [16:34:02] * joal is thinkng [16:34:26] ottomata: is ImportError: No module named eventlogging solvable? Wouldn't I have to pip install to get that? [16:35:03] export PYTHONPATH=/home/milimetric/EventLogging/server [16:35:12] ottomata: Since there a blacklist, it means that this topic is purposedly reduced, and therefore could have a more function-oriented name ? [16:35:21] cool :) [16:35:37] like what? [16:35:40] eventlogging-mysql? [16:35:48] I was wondering that [16:35:53] naw, def not that. [16:35:55] But that's not vey good either [16:36:05] ottomata: in meeting, will be back [16:36:15] ok, joal, if you dont' mind, i think we might go with mixed [16:36:25] ok please do [16:36:29] k [16:37:02] grrr, milimetric, although, I already have a topic in prod kafka called eventlogging-all [16:37:02] only concern ottomata : differenciating with schema based ones [16:37:09] we migiht want to reuse that one [16:37:50] oh you can't delete topics? [16:37:52] no [16:37:59] :) ok but... [16:38:09] 12:31:08 i don't want all [16:38:11] yeah, but the unused topic doesn't really hurt anytihng but my eyes [16:38:22] yeah [16:38:25] wait... you can't ever never ever?! [16:38:29] that's crazy [16:38:33] unless t hey add that feature, basically no. [16:38:34] you *can* [16:38:36] i have done it [16:38:40] but you have to take the whole system down [16:38:42] delte files [16:38:45] and delete zk references [16:38:47] wow, awesome [16:38:53] ok, i mean we can use -all that's fine [16:38:58] we'll know what it means [16:39:11] ja, we can change it later if we need to. [16:39:11] and maybe once we're done with the whole mysql consumer we'll repurpose it again and it'll really be all [16:39:18] ja maybe so. ok [16:39:18] yeah [16:39:21] will resuse for now [16:39:24] and add big ol comment in puppet [16:39:25] k [16:40:01] comment: # should be named eventlogging-small-schemas-that-do-not-break-mysql but, you know, java [16:40:25] mforns: I marked all the people I reached out to and waiting for response with yellow. you can pick a color and do the same for people you reached out to too if you want :) [16:41:05] madhuvishy, sure! [16:41:38] milimetric: do you know of any topic we can blacklist now? just for testing? [16:41:52] that is, are all active schemas currently used in mysql? [16:42:41] oh, the one mforns was talking about, Jared and Juliusz's schema [16:42:56] but i would just test in beta [16:43:11] ja i will test in beta first, i'm making the puppet change and will cherry pick it there first [16:43:16] * mforns reading [16:43:25] mforns: what's the name of that schema? [16:43:36] and leave the blacklist in prod alone for now, we'd only blacklist if search turns up their sampling and they're ok with hadoop for analysis [16:43:44] milimetric, PersonalBar [16:43:48] dawww ok, i wanted to blacklist in prod! :) [16:44:19] we were going to disable the logging for that schema anyway so I mention it [16:44:29] ok, nm its ok [16:44:34] we can test the blacklist stuff in prod later [16:44:38] i've laready tested that in beta [16:45:22] cool, you can def. use that schema if you want, it's not needed [17:01:37] Analytics, Traffic, operations: Provide summary of MediaWiki downloads - https://phabricator.wikimedia.org/T104010#1405400 (Krenair) So you need to get statistics on downloads from Gerrit, Gitblit, Github (not in our infrastructure...), and releases.wikimedia.org? [17:02:46] ottomata: what does this error imply? [17:02:50] https://www.irccloud.com/pastebin/7VxPVInx/ [17:03:35] madhuvishy: has your query finished ? [17:06:07] iiinteresting [17:06:39] madhuvishy: that happens when the namenode you are pointing at is in standby state, instead of active [17:06:53] but, analytics1001 is active [17:12:13] hmmm [17:28:32] mforns: our sheet's so pretty :D [17:28:37] Analytics-Kanban: Spike: gather requirements to implement unique tokens {bull} - https://phabricator.wikimedia.org/T101784#1405491 (kevinator) a:kevinator [17:31:39] Analytics-Backlog: Change mediawiki-storage api queries to adapt to the api changes [5 pts] {crow} - https://phabricator.wikimedia.org/T101539#1405515 (mforns) a:mforns [17:32:03] Analytics-Kanban: Change mediawiki-storage api queries to adapt to the api changes [5 pts] {crow} - https://phabricator.wikimedia.org/T101539#1342023 (mforns) [17:33:27] Analytics-Kanban: Gather information on all the schemas {tick} [13 pts] - https://phabricator.wikimedia.org/T102515#1366348 (mforns) Blocked waiting for Aaron's response. He will work on it on Monday Jun 29. [17:33:35] madhuvishy, hehehe [17:38:51] madhuvishy, ottomata : prod spark job seems to have been launched correctly [17:39:21] madhuvishy: I let you modify your parameter name (spark_driver_memory) and commit yout change, then I'll merge and deploy [17:43:58] joal: yeah i did that. let me push [17:44:07] madhuvishy: Great :) [17:46:01] (PS2) Madhuvishy: Add driver memory as a configurable property to Spark job [analytics/refinery] - https://gerrit.wikimedia.org/r/220952 (https://phabricator.wikimedia.org/T97876) [17:46:07] joal: ^ done [17:46:53] (CR) Joal: [C: 2 V: 2] Add driver memory as a configurable property to Spark job [analytics/refinery] - https://gerrit.wikimedia.org/r/220952 (https://phabricator.wikimedia.org/T97876) (owner: Madhuvishy) [17:47:12] madhuvishy, ottomata : deplying refinery [17:47:39] joal: was it just a resource allocation issue? were you able to launch it with 2G? [17:47:45] cool [17:47:50] Worked fine for me with 2G [17:48:08] So, I don't know more really :S [17:48:12] joal: you're magic [17:48:14] milimetric: https://gerrit.wikimedia.org/r/#/c/221155/ [17:48:17] i need lunch and power [17:48:21] be back in a bit [17:50:14] madhuvishy: would love to be more magic than th [17:50:16] at [17:50:19] , but thanx :) [17:51:08] madhuvishy, and team, I'll leave in 10m to the gym and be back in a while [17:51:16] Enjoy mforns ! [17:51:20] :] [17:52:03] mforns: okay :) [17:52:08] madhuvishy, regarding Tick, I'm blocked now waiting for the owners responses, and Aaron's [17:52:18] mforns: yeah same here [17:52:37] madhuvishy, however, I'll write to Sean Pringle to setup a meeting on possible solutions for the auto-purging [17:53:00] madhuvishy, I guess you want to be there too right? [17:53:09] mforns: yup :) [17:53:19] cool, I'll cc you :] [17:53:41] madhuvishy: Thought of something [17:53:52] mforns: thanks :) [17:53:56] joal: yes? [17:54:00] 'till later people :] [17:54:12] madhuvishy: I'll restart the job after deploy starting the 4th of may :) [17:54:23] Analytics-Tech-community-metrics, Engineering-Community, ECT-July-2015: Check whether it is true that we have lost 40% of code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1405612 (Qgil) Is there a way to get a list of newly created repos in Git/Gerrit.wikimedia.org? [17:54:28] joal: why 4th? [17:54:29] I didn't realise that, since it is weekly, it's better to have it start on monday [17:54:36] joal: aah [17:54:36] on maybe sunday [17:54:41] I don't mind [17:55:00] joal: hmmm, yeah Sunday sounds good [17:55:03] What do you think would be best ? [17:55:10] ok, let's go for 3rd then :) [17:55:20] I'll delete already computed data and restart the thing [17:55:33] joal: great! [17:58:16] Analytics-Backlog, Analytics-Cluster: Add Pageview aggregation to Python {musk} [13 pts] - https://phabricator.wikimedia.org/T95339#1405619 (kevinator) [18:02:15] joal: did it launch? [18:02:26] Analytics-Backlog, Analytics-Cluster: Add Pageview aggregation to Python {musk} [13 pts] - https://phabricator.wikimedia.org/T95339#1405631 (kevinator) [18:02:26] still deploying [18:02:28] Analytics-Backlog, Analytics-Visualization: Update Vital Signs UX for aggregations {musk} [13 pts] - https://phabricator.wikimedia.org/T95340#1405630 (kevinator) [18:02:32] joal: alright :) [18:06:32] Analytics-Tech-community-metrics: Exclude pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1405648 (Qgil) https://www.mediawiki.org/wiki/Upstream_projects#Components and below would be useful. Instead of asking Bitergia to add this repo and remove that other repo, w... [18:07:15] Analytics-Tech-community-metrics, ECT-July-2015: Remove deprecated repositories from korma.wmflabs.org code review metrics - https://phabricator.wikimedia.org/T101777#1405652 (Qgil) [18:07:44] Analytics-Tech-community-metrics: Exclude pulled upstream code repositories from metrics - https://phabricator.wikimedia.org/T103984#1404499 (Qgil) [18:14:24] joal: I see the first job launched [18:14:39] that is great [18:14:43] Yup, we're there :) [18:14:56] however there still is an issue with resource fight [18:15:04] joal: hmmm [18:15:18] the job had 16 execs (as planned) at the beginning, now only has 6 [18:15:34] It should work (hoppefully) [18:15:40] But it's not great [18:15:57] I'll ask ottomata to look into dynamic allocation for Spark on Yarn :) [18:16:52] joal: aah, that's sad [18:16:54] okay [18:17:28] that's the thing with preemption :) [18:17:28] joal: also, this is what happens to my hive query. [18:17:38] https://www.irccloud.com/pastebin/ZFqgxUNR/ [18:18:24] madhuvishy: Weirdoh ! [18:18:29] never saw that one [18:18:37] Will investingate [18:19:08] this is the query - [18:19:12] https://www.irccloud.com/pastebin/iOmj0ilE/ [18:19:20] joal: ^ [18:44:33] milimetric: can you merge this? [18:44:34] https://gerrit.wikimedia.org/r/#/c/221155/ [18:44:39] want to put that code on analytics1010 [18:44:50] just tested it all in beta with the corresponding puppet change [18:46:23] oh, it merged [18:46:25] hm [18:46:26] k [18:47:19] ottomata: sorry, I was looking at it, I think if you give it +2 it'll auto-merge 'cause it auto-verifies [18:47:46] yeah, thought it was verified, guess it verified after I +2ed? [18:47:54] anyway, still review, lemme know if you have comments [18:48:51] ottomata: no it looks fine, so the socket_id parameter was only used when configuring in puppet, right? [18:49:09] yeah, actually, this is kinda inconsistent [18:49:10] some services [18:49:14] take an --sid option [18:49:21] I didn't see any writer / reader specific tests last time I looked but if you ran the tests it's all good [18:49:24] and then append the identity to the url before getting te reader [18:49:36] others [18:49:47] just expect you to append the idientity in the uri [18:49:51] madhuvishy, ottomata : http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_rn_parquet_ki.html [18:50:57] hmm [18:51:10] madhuvishy, ottomata : Launching madhu's query on mobile only, fine [18:51:17] madhuvishy, ottomata : Launching madhu's query on text only, breaks [18:51:26] joal: oh [18:51:28] madhuvishy, ottomata : Launching madhu's query on text only, half month, fine (either side of the half [18:51:30] :/ [18:51:41] joal: yeah, i did half months too [18:51:55] and got daily data for whole month with 2 queries [18:51:58] interesting! [18:51:58] I want monthly now [18:52:06] but, that is just for the writer? [18:52:07] Problem seems to come not from partition number, but from file number [18:52:21] can you reduce the block size when writing this data in parquet? [18:52:34] or, if you like, since this is smaller aggregate data (right?) write something other than parquet (for now)? [18:52:50] it's not writtend in parquet I think [18:53:01] ? [18:53:03] It's just that even reading parauet fails [18:53:03] hmmm, ottomata I'm querying webrequest refine table [18:53:13] hm, the thing you liked to says writers [18:53:42] yes, I know, but seems very related though [18:53:44] ottomata: https://www.irccloud.com/pastebin/ZFqgxUNR/ [18:54:09] joal was investigating this [18:54:13] Job doesn't even get launched [18:54:25] madhuvishy: do you increase HEAP SIZE for hiev query? [18:54:34] ottomata: I tried [18:54:46] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Troubleshooting [18:54:49] ja? [18:55:06] oh [18:55:10] I dint do that [18:55:16] i did hive -dmapred.child.java.opts="-Xmx2048m" -f appuniques-monthly.hql > monthly.tsv [18:55:30] if you see the OOM on your hive cli output [18:55:31] ottomata: could work ! [18:55:36] it is OOMing in the CLI, not on the jobs [18:55:38] Awesome :) [18:55:46] yup [18:55:48] hive has to read in tons of crap for lots of partitions from the metastore [18:55:52] ottomata: alright let me try that [18:55:53] and teh query planner needs more mem [18:56:01] makes sense [18:56:11] hm .... [18:56:21] ottomata: joal mm hmmm [18:57:07] ottomata: joal that seems to have worked. you both are magic /\ [18:58:11] Well done ottomata :) [18:58:43] Funny thing: it works naturally with oozie (monthly job has worked) [19:00:07] yeah! [19:10:16] Hi ottomata! How's it going? Pls let me know if you have a sec to follow up on EventLogging -> Kafka for FR banner history :) Thx!! [19:12:14] Hiya! [19:12:19] ya sure, how goes? [19:12:21] AndyRussG: ^ [19:13:47] good, thx! Yeah already here in Mexico City (staying with my wife's family here)... Getting close to finished with the banner history logging that we talked about once [19:14:17] (context: https://www.mediawiki.org/wiki/Extension:CentralNotice/Notes/Campaign-associated_mixins_and_banner_history) [19:14:43] As we'd mentioned before, part of this is logging the history of banners viewed for a sample of users [19:15:03] And that would in theory be sent via EventLogging 2 Kafka [19:15:21] There's a subsction of that page with the tentative initial data structure... https://www.mediawiki.org/wiki/Extension:CentralNotice/Notes/Campaign-associated_mixins_and_banner_history#Data_and_logging [19:15:44] So I was wondering what steps are next? I guess we have to create an EventLogging schema, and... then what? [19:15:48] ottomata: ^ [19:16:29] hmmm reading [19:17:25] AndyRussG: what is the volume of events here again? [19:18:45] milimetric: can we batcave in a bit? we should talk about eventlogging-reporter and the plan [19:19:21] ottomata: k, i'll be in there [19:21:08] AndyRussG: i will do two conversations at once! :) [19:22:47] ottomata: cool! [19:24:00] Yeah the volume will vary and we can start small. Eventually it will be a certain percentage of all users who are targeted by a FR campaign + all users who click on a baner to donate [19:24:44] ottomata: ^ .... also pls take your time! I'll be around for a while :) [19:25:06] AndyRussG: do you have a rough idea of messages / sec? [19:25:10] order of magnitude is fine [19:25:25] Hmm [19:25:47] Do you want initial values or likely maximum values? [19:34:58] AndyRussG: both? [19:34:59] :) [19:35:27] ottomata: K! I'm asking K4 right now for the max numbers... [19:40:02] Analytics-Cluster, operations, Patch-For-Review: Can't download large datasets from datasets.wikimedia.org - https://phabricator.wikimedia.org/T104004#1405886 (Halfak) The patch should have been merged by now, but the problem persists. [19:40:03] ottomata: for initial rollout I think it might be 1 or 2 per second, maybe even less [19:42:10] oh, no problem [19:42:13] evnetlogging totally cool thne [19:43:20] ottomata: you haven't heard the max numbers tho, one sec... [19:44:36] (PS1) Joal: Correct bug in projectview archive workflow [analytics/refinery] - https://gerrit.wikimedia.org/r/221185 [19:45:26] ottomata: ok if I self merge that ? [19:45:28] joalaction name looks weird [19:45:32] [19:45:40] hm [19:45:50] ottomata: Looks like I'll get the top numbers in 30 min 2 an hour :) [19:45:58] haha, ok, what would your guess be AndyRussG? [19:46:38] (PS2) Joal: Correct bug in projectview archive workflow [analytics/refinery] - https://gerrit.wikimedia.org/r/221185 [19:46:43] joal: mark_projectcount_dataset_done? [19:47:00] ottomata: changed to more explicit [19:48:05] cool, except, those are called projectcounts, right? [19:48:06] joal? [19:48:30] They are called projectview, but with projectcounts format [19:48:35] ah, whatever, that sounds fine joal [19:48:38] projectcounts are the original ones [19:48:54] hm, nooo [19:49:00] ?? [19:49:03] oh [19:49:08] original as in legacy [19:49:13] correct, sorry [19:49:15] not orignal as in source of the transformed dataset :) [19:49:16] hehe [19:49:22] :D [19:49:26] aye, but you are marking the projectcount dataset done [19:49:32] but, ja [19:49:38] transofrmed_projectview == projectcount [19:49:39] * joal hates naming things [19:50:04] right, but projectcounts dataset still exists, as with legacy [19:50:11] ok [19:50:13] So I don't want to reuse [19:50:15] ja i'm ok with tis [19:50:26] (PS3) Ottomata: Correct bug in projectview archive workflow [analytics/refinery] - https://gerrit.wikimedia.org/r/221185 (owner: Joal) [19:50:43] AndyRussG: got an idea? 100s?1000s? millions per sec? [19:50:52] thanx for rebase andrew [19:51:01] (CR) Ottomata: [C: 2 V: 2] Correct bug in projectview archive workflow [analytics/refinery] - https://gerrit.wikimedia.org/r/221185 (owner: Joal) [19:51:07] got o merged too :) [19:51:14] ottomata: will deploy now [19:51:21] ottomata: my guess? Mmm take combined peak page views/sec in countries that we run the end-of-the year fundraiser in and multiply by 0.02 [19:51:57] Analytics-Cluster: Sudo permissions for hdfs user on analytics-hadoop - https://phabricator.wikimedia.org/T104020#1405901 (madhuvishy) NEW a:Ottomata [19:52:12] So peak hours page views/sec in the U.S., Canada, and England (Australia I guess doesn't overlap in peak hours) * 0.02 [19:52:44] Woooow ottomata, that looks like a good stress test for EL --^ :) [19:52:52] k, we peak overall at around http requests 200,000 / sec [19:53:08] what's pageviews joal? any idea off the top of your head? since you have been looking at that? [19:53:23] per country, can't say [19:53:28] Will tell you in a minute [19:53:37] total is fine [19:53:39] we are guessing magnitudes [19:53:50] oh vital singsn kinda tells me [19:54:06] So awight just got back to me w/ peak donations, highest level averaged over an hour is 3.8 donations/second [19:54:30] AndyRussG: you are only logging donation events? [19:54:59] not rush hour, I have 5500 / sc [19:55:08] cool, joal, that makes sense [19:55:16] let's say 10000 [19:55:31] so AndyRussG is guessing around max +200 msgs per second [19:55:38] AndyRussG: here's another Q [19:55:47] do you need these events ultimately stored in MySQL? or is Hadoop fine? [19:56:23] in any case, i think your initial rollout of around 1 or 2 per second you can do now [19:56:26] in eventlogging as it is [19:56:34] so no blockers there [19:56:34] :) [19:56:50] ottomata: cool! Yeah I think Hadoop is good, at least to start :) [19:57:08] These numbers are for uses of this system by FR only, BTW [19:57:22] If community banners start getting involved, which they might well, the numbers could go way up [19:58:00] But in that case we could restrict the sample rate/use of the system until the infrastructure is ready [19:59:50] AndyRussG: hopefully you can restrict that in your sending code for now, ja? [20:00:03] AndyRussG: 1 to 2 per second will be no problem in MySQL [20:00:08] so you can just go aheda with that [20:00:14] but if you have more, then that might be a prolem [20:00:19] but, we are working on that :) [20:00:24] ottomata: yeah eventually it'll get to more [20:00:30] we're making it so we can blacklist high volume schemas for mysql [20:00:44] everything will go into hadoop [20:00:44] but not everything into mysql [20:02:56] Analytics-EventLogging, Analytics-Kanban: Load Test Event Logging {oryx} [8 pts] - https://phabricator.wikimedia.org/T100667#1405919 (Milimetric) Conclusion: Processor is slower than the Consumer, by quite a bit, but both can theoretically handle over 1k events per second if given enough resources. The F... [20:03:38] ottomata: OK sounds good! Yeah with numbers just provided by K4 and awight, it looks like 200-300/sec is an OK max to assume for the end-of-the-year campaign [20:03:44] At least it'll be around that order of magnitude [20:03:56] ottomata: I have more questions but I have standup, back in a sec! [20:05:47] ok, AndyRussG, FYI, as is, that will be too much for Eventlogging [20:05:56] but, we hope to be able to support that in a month or two (NO PROMISES!) [20:06:06] (or less!) [20:22:08] joal: still around? [20:24:25] joal: ottomata I just remembered - need to run https://phabricator.wikimedia.org/diffusion/ANRE/browse/master/hive/mobile_apps/create_mobile_apps_session_metrics_table.hql this hive query to create external table [20:24:37] on production [20:24:45] madhuvishy: will do :) [20:25:23] joal: oh yay thanks [20:25:37] the first job succeeded :) [20:25:46] thanks yall :) [20:26:01] Happy clusterers :D [20:26:08] My bug is fixed as well ! [20:26:15] table created madhuvishy :) [20:26:51] joal: thanks :D [20:26:54] https://hue.wikimedia.org/filebrowser/view/wmf/data/wmf/mobile_apps/session_metrics/session_metrics.tsv [20:27:05] madhuvishy: please test it, but it shouldn't work :-P [20:27:08] Did [20:27:18] didn't notice at code review, but path issues [20:27:38] joal: oh no [20:28:25] hm, will make the table work, but you should cotrrect the code (change the path in table creation) [20:29:07] joal: yeah okay, will fix code. [20:30:14] madhuvishy: https://gist.github.com/jobar/41f0bf684a732dc0b9eb [20:30:47] joal: thanks a ton. I will make the code change [20:30:52] no prob [20:31:02] we really need to push it, not to forget ;) [20:33:42] ACKNOWLEDGEMENT - Eventlogging /srv disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 14850 MB (84% inode=93%): ottomata This is not a real alert! [20:34:21] (PS1) Madhuvishy: Fix external table path for session metrics job output [analytics/refinery] - https://gerrit.wikimedia.org/r/221280 [20:35:04] joal: yeah, done ^. It must be too late today, we can merge on monday too [20:35:34] (CR) Joal: [C: 2 V: 2] Fix external table path for session metrics job output [analytics/refinery] - https://gerrit.wikimedia.org/r/221280 (owner: Madhuvishy) [20:35:39] Merged ! [20:35:45] joal: :) [20:35:47] thanks a ton [20:35:47] WIll be deployed in the next batch :) [20:35:53] no prob [20:36:02] I Should have spotted ;) [20:36:13] time for me to go in weekend ! [20:36:21] Have a good one :D [20:36:37] laters! [20:39:13] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1406033 (Legoktm) >>! In T101194#1388545, @Milimetric wrote: > It seems like the graphs on http://edit-reportcard.wmflabs.org/ are... [20:40:29] ottomata: hi again! K cool, yes I undertand the timeline you mentioned ^ above [20:41:24] In any case it's a pretty experimental thing and for now we're just trying to get a minumum product out the door [20:41:42] Time will tell how useful it is and how much scaling up is desired or useful :) [20:41:58] ok cool, then yup, ithink eventlogging willb e just fine for now [20:42:46] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1406038 (Milimetric) To help with both the graph creation and generating results for that query on a periodic basis, I need to kno... [20:43:26] ottomata: yeah! So my next question is, what sepcifically do I need to do to get up and running with this? [20:43:27] I see https://wikitech.wikimedia.org/wiki/EventLogging [20:44:04] now that is a great question! for that I direct you to milimetric :) [20:44:06] Do I just put a schema somewhere and that's that? Ops requests? [20:44:35] AndyRussG: where should I read to get caught up? [20:44:50] no, not ops [20:44:51] AndyRussG: i've never done it, but it is something like: [20:44:51] create a schema on meta [20:44:51] configure your code to log using the Eventlogging extension [20:44:51] or some JS thing? dunno [20:44:55] milimetric: Hi! How's it going? tl;dr: I need to start doing some event logging [20:44:57] milimetric: AndyRussG wants to make a new schema that will currently do about 1 or 2 events per sec [20:45:03] cool, works [20:45:24] AndyRussG: is it going to send events client side or server side (PHP)? [20:45:44] and he wants some pointers [20:45:48] milimetric: definitely client-side and likely also server-side! [20:46:04] ok, no prob. So step 1: create schema [20:46:16] check for examples of existing schemas: https://meta.wikimedia.org/wiki/Schema:PageContentSaveComplete [20:46:41] and examples of unfortunate existing schemas that try to do too many things in one schema: https://meta.wikimedia.org/wiki/Schema:Edit [20:47:13] then think about what questions you'd like to answer once you have the data and work backwards to create the schema that will let you do that [20:47:30] creating the schema is just as easy as making a page in the Schema: namespace [20:48:00] (btw, there's a lot of documentation on Event Logging: https://www.mediawiki.org/wiki/Extension:EventLogging) [20:48:18] milimetric: cool! Yes I have used the extension before, locally [20:48:36] ok, cool, so the next steps you know - instrumenting your code [20:48:38] Yeah the data is already basically defined. [20:48:42] (PS1) Mforns: Add rawcontinue=1 flag for API compatibility [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/221288 (https://phabricator.wikimedia.org/T101539) [20:48:46] ok, good, so where are you at? [20:48:56] Do I need special permissions on Meta to create the schema? [20:49:32] (Something approximatign the schema is here: https://www.mediawiki.org/wiki/Extension:CentralNotice/Notes/Campaign-associated_mixins_and_banner_history#Data_and_logging) [20:49:39] (CR) Milimetric: [C: 2 V: 2] Add rawcontinue=1 flag for API compatibility [analytics/mediawiki-storage] - https://gerrit.wikimedia.org/r/221288 (https://phabricator.wikimedia.org/T101539) (owner: Mforns) [20:49:54] AndyRussG: nope, no special permission required [20:50:24] milimetric, what? you merged the thing before I could add you as a reviewer? ninja?! [20:50:36] it looked good, so I merged it [20:50:42] just luck, saw the IRC ping [20:50:47] milimetric: cool! So, do I just write the code and deploy? No additional permissions or ops setup to do? [20:50:48] hehe [20:50:53] milimetric, thx [20:51:18] AndyRussG: if you could please test your code on the beta cluster, that'd be great [20:51:27] because we don't want invalid events happening in prod [20:51:38] AndyRussG: docs for doing that: https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs [20:51:42] milimetric: ah OK that sounds cool [20:52:10] but after that your data will be on analytics-store.eqiad.wmnet in database "log" table YourSchema_[revision] [20:52:30] milimetric: looks fantastic :) [20:52:31] AndyRussG: a recommended step is to vet your schema with a researcher [20:52:57] Ah OK... In truth I won't be doing the actual research with the results... It's for fundraising analytics [20:52:58] once all that's done, I can help you set up a dashboard that updates periodically if you need, just let me know [20:53:11] AndyRussG: yeah, so maybe check with ellery? [20:53:21] yep! [20:53:49] milimetric: K all good thanks so much for the help!!! [20:54:25] I have to be AFK for a few minutes here, but I'll be back soon... I'll be sure to bug you if I have more inquiries ;D [20:54:43] thx again, TTY soon [20:54:50] I might be out of here soon but you can always send an email or assign a phab ticket [20:54:57] and you're welcome [20:55:03] milimetric: K, got it ;) [20:56:07] ottomata: before leaving, just noticed the icinga alert about disk full ... [20:56:15] You manage that ? [20:57:09] yes [20:57:09] cool [20:57:09] that was actually an acknowledgement [20:57:09] thx mate ! [20:57:09] its on analytics1010 and is not true [20:57:09] huhuhu :) [20:57:11] not sure I understand the thing :D [20:57:22] am trying to fix withhttps://gerrit.wikimedia.org/r/#/c/221006/ [20:57:33] joal: i am deploying eventlogging with puppet in more places [20:57:41] hm, right [20:57:43] makes sense [20:57:44] and there is a check that just assumes it has a special big /srv parittion [20:57:50] but that only makes sense for eventlog1001 [20:57:59] so that check got installed on analytics1010 [20:58:08] and it is not lying, ther eis less than 50G available [20:58:09] Yeah, got it [20:58:15] but we are nto writing any data there so PFFFF [20:58:46] hm [20:59:07] if nothing urgent, i'm off then ;) [20:59:11] k lalaaters! [20:59:12] have a good weekend [20:59:18] Tnaks for handling the EL stuff, you rock :) [21:08:29] milimetric: ah! [21:08:34] i did break graphite [21:08:35] https://gerrit.wikimedia.org/r/#/c/221289/ [21:08:38] for eventlogging [21:08:41] quick review + merge? [21:08:55] umm, i think i will do a hacky deploy of that so I don't have to deploy my other changes to prod [21:09:39] Analytics-Kanban, Reading-Web: Cron on stat1003 for mobile data is causing an avalanche of queries on dbstore1002 - https://phabricator.wikimedia.org/T103798#1406093 (JKatzWMF) @kevinator thanks for the heads up, yes this is a problem. I don't care about the edit graphs (sorry), but I do care about the ma... [21:12:27] ottomata: jenkins -1ed you, but I don't really know how to read that code [21:13:45] Analytics-Kanban, Patch-For-Review: Change mediawiki-storage api queries to adapt to the api changes [5 pts] {crow} - https://phabricator.wikimedia.org/T101539#1406105 (mforns) The rawcontinue=1 flag has been added to mediawiki-storage. Milimetric merged that and I tagged it to 0.3.0. No need to change Da... [21:13:46] jenkins seems to be checking it as aMW thing? dunno why [21:13:50] milimetric: all that code does [21:13:55] is rename the 'socket' variable to sub_socket [21:14:00] so it doesn't conflict with the module import [21:14:05] oh [21:14:12] and then, instead of conifguring the sub_socket with 127.0.0.1 [21:14:17] it uses socketmodule to get IP of node [21:14:49] weird about the jenkins thing [21:15:46] milimetric, usually you deploy vital signs and edit-analysis, is there a reason for that, or just altruism? if the latter, I'd like to learn how [21:17:13] milimetric: manually edited the reporter on eventlog1001 to fix this now, cause I don't want to do a full code deploy! [21:17:16] it seems better! [21:17:57] i can see the metrics being sent to statsd now [21:18:53] ottomata: cool [21:18:56] mforns: ssh into limn1 [21:18:57] cd /var/lib/dashiki (which is owned by me so you might need sudo to do stuff) [21:18:57] then for edit-analysis: [21:18:57] gulp --layout compare --config VisualEditorAndWikitext --piwik piwik.wmflabs.org,1 [21:18:57] vim dist/compare-VisualEditorAndWikitext (replace ../fonts with fonts) [21:18:57] then for vital-signs: [21:18:57] gulp --layout metrics-by-project --config VitalSigns [21:19:19] mforns: if you wanna save that somewhere, it appears I'm allergic to documentation or something [21:19:38] milimetric, I'll save it, np [21:19:53] sorry I'm short with y'all, I've got this complicated knockout thing to deal with [21:21:43] milimetric, don't worry [21:23:23] ottomata, how do I know if I have sudo in limn1? [21:24:48] try to sudo? [21:24:48] :) [21:24:51] you shoudl, it is labs, right? [21:25:02] you have the same permissions on every analytics labs instance [21:25:25] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 33.33% of data above the critical threshold [30.0] [21:25:33] mforns: you got sudo on that for sure. If not I can add you to the admin group [21:25:50] trying "sudo su" is the easiest way. If it asks you for a password, you don't got it [21:26:20] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1406133 (Legoktm) Yeah, we should probably keep it separate. How about limn-extdist-data? [21:26:49] milimetric, ottomata, yes I know, the thing is I've tried it before and I got ops pulling my ear :] hehehe [21:27:00] mforns: they won't do that in labs :) [21:27:01] but it was probably not labs, you're right [21:27:06] ok hehehe [21:27:14] thanks! [21:28:08] Analytics-Backlog, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics - https://phabricator.wikimedia.org/T101194#1406135 (Milimetric) Ok, cool, I'll add this to our board and get to it. Probably not today but Monday. [21:28:56] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK Less than 15.00% above the threshold [20.0] [21:29:15] Analytics-Kanban, MediaWiki-extensions-ExtensionDistributor: Set up graphs and dumps for ExtensionDistributor download statistics {frog} [3 pts] - https://phabricator.wikimedia.org/T101194#1406136 (Milimetric) [21:35:16] PROBLEM - Check status of defined EventLogging jobs on analytics1010 is CRITICAL Stopped EventLogging jobs: reporter/statsd [21:37:45] yuck [21:37:48] not an error! [21:37:55] that will also be fixed by my change [21:38:26] ACKNOWLEDGEMENT - Check status of defined EventLogging jobs on analytics1010 is CRITICAL Stopped EventLogging jobs: reporter/statsd ottomata Not a real problem! [21:39:12] ACKNOWLEDGEMENT - Eventlogging /srv disk space on hafnium is CRITICAL: DISK CRITICAL - free space: / 1118 MB (12% inode=75%): ottomata Not a real problem! [21:47:21] OOO pretty [21:47:22] milimetric: [21:47:24] check that out [21:47:24] http://grafana.wikimedia.org/#/dashboard/db/eventlogging [21:49:42] ok time to go byyyyeeee [21:50:09] cool ottomata [21:50:14] i don't get why overall < raw [21:50:17] byyyyeeee [21:50:17] :) but that's ok [21:50:21] yeah something is wrong with something [21:50:25] tbd [21:50:59] have a good weekend all, I'm outa here too [21:51:13] i think something is happneing on an10, thta happened when i started producing to the raw topics, and when i started producint to eventloging-all [21:51:14] byyeyee [21:52:59] good weekeeend! see you on monday!