[00:00:12] Analytics-Kanban, Mobile: EL unable to decode may apps events - https://phabricator.wikimedia.org/T96938#1229807 (Nuria) NEW [00:02:24] joal|night: i have created couple phab tickets regarding the validation failures [00:02:41] nuria: ok [00:02:46] joal|night: visible in log: /var/log/upstart: eventlogging_processor-client-side-events.log [00:02:56] joal|night: those are likely responsible for alarms [00:03:02] ok [00:03:10] Are there alarms currently ? [00:03:16] I don't receive anything :( [00:03:41] joal|night: ok, that is something that otto needs to fix, you can tell him tomorrow [00:03:48] yup [00:03:57] joal|night: good nite (forya) [00:04:07] byebye :) [00:07:12] halfak: the delay i can see on sending events to db is seconds: [00:07:16] https://www.irccloud.com/pastebin/bPEihQ3Y [00:07:50] halfak: but the db consumer restarted 30 mins ago so there might be some lost events [00:14:45] halfak: still there ? [00:50:05] Analytics-EventLogging, Beta-Cluster: puppet agent disabled on beta cluster deployment-eventlogging02.eqiad.wmflabs instance - https://phabricator.wikimedia.org/T96921#1229915 (Nuria) I have just enabled puppet again, no reason to have it disabled anymore (we did so for testing purposes couple weeks back) [00:51:34] Analytics-Kanban, Traffic, operations: VCL support for Last-Access cookie - https://phabricator.wikimedia.org/T96861#1229919 (Nuria) [00:57:38] Analytics-EventLogging, Analytics-Kanban: Eventlogging logs on 1002 not updated since april 3rd - https://phabricator.wikimedia.org/T96934#1229928 (Nuria) [02:21:31] Analytics-Kanban, Traffic, operations: VCL support for Last-Access cookie - https://phabricator.wikimedia.org/T96861#1230025 (BBlack) Open>Invalid Sorry, I didn't realize (or forgot?) you already had a ticket for the VCL work as well at T92435. Closing up this one and continuing to use that one... [02:22:03] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1110744 (BBlack) [05:05:16] (PS11) Nuria: Add Apps session metrics job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [05:07:05] (CR) Nuria: "Corrected some of the comments and tested workings on spark-shell." (5 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/199935 (https://phabricator.wikimedia.org/T86535) (owner: Mforns) [12:33:32] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-April-2015: Basic metrics about contributors exercising +2/-2 permissions in Gerrit - https://phabricator.wikimedia.org/T59038#1231001 (Qgil) These graphs are interesting. The number of monthly users in Gerrit is pretty stable, which is ki... [12:39:49] Analytics-Tech-community-metrics, Wikimedia-Git-or-Gerrit, ECT-April-2015: Active Gerrit users on a monthly basis - https://phabricator.wikimedia.org/T86152#1231008 (Qgil) http://korma.wmflabs.org/browser/scr-contributors.html This looks good. Thank you! We are still discussing about labels at T5903... [12:41:48] Analytics-Tech-community-metrics, ECT-April-2015: "Who contributes code" page metrics are not updating - https://phabricator.wikimedia.org/T95166#1231015 (Qgil) Open>Resolved Yep, thank you. [13:29:24] Hey joal|night [13:38:35] Hi Halfack [13:38:45] halfak sorry [13:39:19] No worries. Was just realizing you asked about multiline matching in the regex. It seems like you could drop it. [13:42:12] halfak: I also found a bug in the regexp ... [13:42:30] Cool! Is it that it matches ? [13:42:57] 'cause that's a bug :\ [13:43:14] nop, it is that it doesn't match content [13:43:23] Because of star misplacement [13:43:26] it doesn't? [13:43:45] ouh, my bad [13:43:54] Yeah.,.. it matches them in my tests [13:44:03] I probabky misplaced the star myself, and then was happy to found a bug :) [13:44:07] halfak: --^ [13:44:07] :D [13:44:14] twac late :D [13:44:35] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1231133 (BBlack) ^ Aside from the technical-level work in PS10, a few other things have changed that affec... [13:45:09] What format do you want for the lists ?h [13:45:13] halfak: --^ [13:45:19] I was JSONing them [13:45:24] So JSON array of strings [13:45:57] hm [13:45:59] ok [13:46:25] Analytics-Cluster, Analytics-Kanban, Patch-For-Review, Performance: Implement Last-Access cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1231138 (BBlack) Since this ticket has a bit wider audience: can you guys take a look at the lower-level questions about the cookie/data work in... [13:52:05] halfak: would that fit ? https://gist.github.com/jobar/cf137e623cb6fefda9ec [13:53:07] What are the "-1"s? [13:53:18] dummy valeus :) [13:53:27] kk. Otherwise, it looks good :) [14:01:29] I hate travel. [14:01:39] (travelling, not: the travel team. They make travel lovely) [14:02:29] o/ Ironholds [14:02:34] yo [14:02:35] Agreed. [14:02:42] I am heading to NYC in...3 hours [14:02:45] Travel == lame, Travel team == <3 [14:02:51] the problem with travel is: travel is premised on me having my shit together [14:03:35] Gotta pack the day before so that you can remember what you forgot! [14:03:51] I'm half-packed and operating on two hours of sleep ;p [14:04:47] Noo. Why no sleep? [14:04:54] eh, just nightmares. Pretty common things. [14:05:06] But seriously, earlier today I lost my cigarettes *inside my apartment* [14:05:10] finally found that I'd put them in my fridge [14:05:18] I have zero business being trusted outside this environ [14:05:43] Did you put them there intentionally or absent-mindedly? [14:06:29] absent-mindedly [14:09:10] Ironholds, you're safe so long as you don't put the fridge things elsewhere. [14:09:24] heh [14:09:26] Sometimes the milk goes in the cereal cabinet and no one is happy about that move. [14:09:51] The problem is that you'll have to do *some* things before coffee. [14:10:19] I organize my coffee supplies strategically so that tired-halfak can figure it out. [14:11:58] haha [14:17:35] nuria: Are you here master of EL ? [14:18:04] * joal invoques EL master ! [14:18:21] hmmmf ... No magic today [14:19:33] Analytics-Kanban, Analytics-Wikimetrics, Community-Wikimetrics, Easy, Need-volunteer: "Create Report" button does not appear when uploading a new cohort - https://phabricator.wikimedia.org/T95456#1231197 (Aklapper) [14:19:35] Analytics-Engineering, Analytics-EventLogging, Need-volunteer: EventLogging calling deprecated SyntaxHighlight_GeSHi::buildHeadItem - https://phabricator.wikimedia.org/T71328#1231195 (Aklapper) [14:19:37] Analytics-EventLogging, Need-volunteer: Add sanitized User-Agent to default fields logged by EventLogging - https://phabricator.wikimedia.org/T54295#1231198 (Aklapper) [14:19:39] Analytics-EventLogging, Need-volunteer: Story: User clicks on link to event capsule schema while viewing a schema - https://phabricator.wikimedia.org/T74745#1231196 (Aklapper) [14:19:41] Analytics-EventLogging, Need-volunteer: Two tests classes on testing efSchemaValidate - https://phabricator.wikimedia.org/T67546#1231199 (Aklapper) [14:19:43] Analytics-EventLogging, Need-volunteer: Add Composer support - https://phabricator.wikimedia.org/T60459#1231204 (Aklapper) [14:19:45] Analytics-Dashiki, Need-volunteer: Improve Dashiki's HTML template - https://phabricator.wikimedia.org/T73983#1231203 (Aklapper) [14:19:47] Analytics-Engineering, Analytics-EventLogging, Need-volunteer: Validate JsonSchemaContent using MediaWIki core's handling - https://phabricator.wikimedia.org/T76432#1231205 (Aklapper) [14:19:49] Analytics-EventLogging, Need-volunteer: Check that schema name matches revid - https://phabricator.wikimedia.org/T48174#1231201 (Aklapper) [14:19:51] Analytics-EventLogging, Need-volunteer: Empty objects can pass schemas with required fields - https://phabricator.wikimedia.org/T67607#1231200 (Aklapper) [14:19:53] Analytics-EventLogging, Need-volunteer: Generate alerts if theoretically impossible or unwanted logging occurs - https://phabricator.wikimedia.org/T49591#1231202 (Aklapper) [14:21:00] Analytics-Volunteering, Engineering-Community, Phabricator, Project-Creators, and 3 others: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#1231210 (Aklapper) Open>Resolved Thank you! * Analytics-Vo... [14:31:15] joal, are you awake after yesterday's late work? are you coming to tasking? [14:31:26] arriving :) [14:31:44] mmm this oatmeal's delicious [14:31:46] joal, ok [14:33:31] joal: https://phabricator.wikimedia.org/T96926 [14:37:15] Analytics-Visualization: Limn sample chart does not load - https://phabricator.wikimedia.org/T57939#1231247 (Milimetric) Open>Invalid a:Milimetric I'm in town for the next week, maybe it'll be easier to chat in person. The bug as described seems to be trying to use a graph that doesn't exist, hence... [14:38:50] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: limn-mobile-data requirements.txt file needs distribute version - https://phabricator.wikimedia.org/T75431#1231257 (Milimetric) [14:38:59] Analytics-Kanban, Analytics-Visualization, Patch-For-Review: limn-mobile-data requirements.txt file is missing python-dateutil - https://phabricator.wikimedia.org/T75432#1231259 (Milimetric) [14:47:40] Analytics, Analytics-Cluster, Analytics-Kanban: Add support for X-WMF-UUID method of transmitting app install ID to apps uniques reports - https://phabricator.wikimedia.org/T96926#1231270 (kevinator) current query looks at URL: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mobile_... [15:15:33] Analytics, Analytics-Cluster, Analytics-Kanban: Add support for X-WMF-UUID method of transmitting app install ID to apps uniques reports [5 pts] - https://phabricator.wikimedia.org/T96926#1231286 (kevinator) [15:20:01] Analytics, Analytics-Kanban, MediaWiki-API-Team, MediaWiki-Authentication-and-authorization, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1231302 (Milimetric) [15:21:39] halfak: /user/joal/simplewiki_diff_refs/ [15:21:52] \o/ [15:21:55] * halfak goes to look [15:22:13] joal, how long did it take to run? [15:22:18] 10 mins [15:22:32] gzip :P [15:23:06] joal, lots of revisions with no diff. That's fine. It's easy to filter them out. :) [15:24:03] Wanna try it on enwiki? [15:24:35] Also, I'm curious how you managed to process pairs. Did you process whole pages in a mapper like we discussed? [15:24:43] joal, ^ [15:24:55] in meeting, will explain later [15:24:58] kk [15:29:46] halfak: two minutes now [15:30:05] For enwiki, I filter out revs with empty diffs ? [15:30:19] Anf as for gz, I did that for YOU :) [15:30:26] To easily uncompress ;) [15:32:41] bzip2 plz <3 [15:32:50] ok, will do :) [15:32:52] Also yes, please filter empty diffs. [15:33:14] ok [15:33:27] bzip2: utilities on every unix install & splitable block compression [15:33:32] first, I want a confirmation from you that result seems correct [15:33:44] OK. Let me check more carefully. [15:33:46] Then I go for enwiki [15:33:54] Analytics, Multimedia, Multimedia-Sprint-2015-03-25, Patch-For-Review: Measure how many users have CORS-hostile proxies - https://phabricator.wikimedia.org/T507#1231343 (Tgr) Beta is throwing `[ImageMetricsCorsSupport] Missing or empty schema` errors. The code seems valid so for now I'm going to ass... [15:36:16] Seeing some weirdness. [15:36:26] First column is rev_id, right? [15:36:39] correct [15:37:26] Bah. I was looking in enwiki. [15:37:29] This is simplewiki! [15:37:32] :) [15:37:42] There we go. [15:38:46] joal, could you limit results to namespace = 0 too? [15:39:11] Analytics-Kanban: Safely reboot limn1, wikimetrics*, dan-pentaho, and any other labs instances running Ubuntu Precise - https://phabricator.wikimedia.org/T96175#1231355 (kevinator) [15:40:23] halfak: sure [15:40:28] For enwiki as well ? [15:40:35] yes :) [15:40:38] ok :) [15:40:50] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Troubleshoot EventLogging missing chunks of data {oryx} - https://phabricator.wikimedia.org/T96082#1231362 (kevinator) [15:40:51] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Inserts events by scid - https://phabricator.wikimedia.org/T96872#1231361 (kevinator) [15:41:04] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Inserts events by scid {oryx} - https://phabricator.wikimedia.org/T96872#1231363 (kevinator) [15:41:46] I love that we have an article on https://en.wikipedia.org/wiki/Human_behavior [15:41:49] But it needs some love [15:41:58] * halfak gets into a wikihole while spot-checking data a lot [15:42:12] huhuhu [15:42:37] Honestly, it's amazing I ever finished my degree. Sometimes I'll be reading for hours before I realize that I'm not supposed to be doing that right now. [15:42:48] :D [15:43:04] That's the reason why you manage to finish a PhD though :) [15:43:19] halfak: --^ [15:46:39] joal, OK. Other than the things I mentioned, this looks solid. [15:46:39] Oh. Wait. One more thingh. [15:46:40] I updated the regex. [15:46:40] https://github.com/halfak/mwrefs/commit/c55f360e3e1b81d22519e9a7e88e0e82b8592c4c#diff-d2c6f413e08df30f810738613c456c77L3 [15:47:08] Fair point :) [15:51:00] halfak: new version doesn't pass my unittest :( [15:55:13] Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Get a measure of daily usage of wikimetrics by userbase - https://phabricator.wikimedia.org/T94193#1231379 (ggellerman) a:Milimetric [15:55:36] (PS2) Milimetric: Add basic piwik usage tracking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203186 (https://phabricator.wikimedia.org/T94193) [15:56:01] (PS3) Milimetric: Add basic piwik usage tracking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203186 (https://phabricator.wikimedia.org/T94193) [15:57:24] (CR) Milimetric: [C: 2 V: 2] "self-merging per Kevin. Legal was ok with piwik and we have some reservations but we feel that we can iterate on that by tweaking the piw" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203186 (https://phabricator.wikimedia.org/T94193) (owner: Milimetric) [16:05:40] Analytics-Kanban, Analytics-Wikimetrics, Patch-For-Review: Get a measure of daily usage of wikimetrics by userbase - https://phabricator.wikimedia.org/T94193#1231423 (Milimetric) Open>Resolved Merged and deployed piwik analytics code. [16:08:02] Analytics-EventLogging, Analytics-Kanban: EL unable to decode mobile events due to appinstallid - https://phabricator.wikimedia.org/T96940#1231427 (kevinator) [16:11:23] Analytics-EventLogging, Analytics-Kanban: EL unable to decode mobile events due to appinstallid - https://phabricator.wikimedia.org/T96940#1231436 (Milimetric) a:Deskana [16:12:02] Analytics-EventLogging, Analytics-Kanban: EL unable to decode mobile events due to appinstallid - https://phabricator.wikimedia.org/T96940#1229822 (Milimetric) @Deskana, it looks like the events emitted just don't have the appInstallID property [16:12:11] Analytics-EventLogging: EL unable to decode mobile events due to appinstallid - https://phabricator.wikimedia.org/T96940#1231443 (Milimetric) [16:34:38] halfak: you ther ? [16:34:59] I have few minutes to try to settle the job before getting eatten by EL :) [16:35:02] In meeting :( [16:35:34] np, later :) [16:48:47] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Inserts events by scid {oryx} - https://phabricator.wikimedia.org/T96872#1231580 (mforns) The bug is fixed an pushed to gerrit. However, I'll still write a script to generate lots of event requests, to be able to test "real" thoughput in beta-... [16:59:42] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1231632 (BBlack) PS11 refactors a bit further to work around the Cookie-access issues by splitting into re... [17:05:26] Analytics-Visualization: Limn sample chart does not load - https://phabricator.wikimedia.org/T57939#1231651 (bmansurov) Which town? I'm a remotie. [17:16:30] halfak: only the regexp to srto out [17:17:33] Still in meeting, but is is budget so I have plenty of extra brain cycles [17:17:46] joal, what's up with the regexp ? [17:17:47] milimetric, mforns_gym --> I have tried on a new dataset, no more chance [17:17:59] the new one you provided doesn't work well :( [17:18:03] halfak: --^ [17:18:15] doens't match my unittests [17:18:17] Hmm... Passes my tests. [17:18:27] Can you point to a case where it doesn't work? [17:18:35] I'd like to add it to my unit tests. [17:19:17] Actually none of the example article I have :( [17:20:45] https://gist.github.com/jobar/6163571525b49117cf82 [17:28:26] halfak: I must be drunk again ... my bad [17:28:36] :P [17:28:56] Will upload, run on simplewiki, and let you double check [17:29:05] If everythiong ok, enwiki :) [17:29:10] Cool. Thanks joal. [17:29:21] * halfak prepares to wait 15 minutes :DDDDD [17:29:37] a little more, upload time ;) [17:29:50] kk [17:30:49] And with the filters, (namespace and no diff), should even ba a little faster [17:49:01] halfak: check ? same folder [17:50:02] halfak: not even 10M global for simplewiki [17:50:10] 11mins run [17:52:58] halfak: launching enwiki ? [18:13:01] Analytics-Cluster, Analytics-Kanban: Estimate how many machines to add to cluster - https://phabricator.wikimedia.org/T97060#1231908 (ggellerman) NEW a:kevinator [18:18:29] Analytics-Cluster, Analytics-Kanban: Estimate how many machines to add to cluster - https://phabricator.wikimedia.org/T97060#1231951 (kevinator) [18:26:09] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1110744 (ggellerman) + Dan (milimetric) from Analytics Engineering now that Nuria is on leave [19:05:38] joal, checking simplewiki now [19:10:23] joal, looks good to me [19:10:24] :) [19:27:07] Analytics-EventLogging, operations, Patch-For-Review: Add icinga-wm bot to #wikimedia-analytics - https://phabricator.wikimedia.org/T96928#1232215 (Dzahn) 1) create special icinga contact in private repo: ``` 51 define contact{ 52 contact_name irc-analytics 53 al... [19:32:36] Hello Analytics channel. I just joined you because i was requested in T96928. This is a test but from now on i should report actual Icinga issues here but only the ones analytics is a contact for. [19:35:02] CUSTOM - Check status of defined EventLogging jobs on eventlog1001 is OK All defined EventLogging jobs are runnning. [19:37:21] Analytics-EventLogging, operations, Patch-For-Review: Add icinga-wm bot to #wikimedia-analytics - https://phabricator.wikimedia.org/T96928#1232238 (Dzahn) Open>Resolved 5. restart icinga-wm root@neon: /etc/init.d/ircecho restart see it join the channel: 12:31 -!- icinga-wm [~icinga-wm@neon.wik... [20:24:35] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1232319 (Milimetric) Thanks Grace. Brandon, I'm getting caught up on this and will respond shortly. I'll... [20:31:21] joal & nuria: it looks like the EL events I was looking for appeared over night. [20:31:42] halfak: likely lag, cause you are querying slave [20:32:06] nuria, was weird that there were more recent events showing up though. [20:32:16] halfak: it is sometimes hard to see the amount of lag but if there is backfilling going on several hours is not that rare [20:32:34] e.g. select max(timestamp) from found timestamps substantially beyond the events I was looking for. [20:32:42] halfak: I am not sure lag will affect all tables equally though [20:32:58] nuria, indeed. I was looking in the same table. [20:32:59] halfak: ah i see what you mean, in the same table [20:33:26] halfak: boy.. i sure do not have an explanation for that , seems a question we should ask springle [20:33:47] Yeah. That's why I was in here last night. [20:33:56] Also, it seems that I'm not finding all the events I expected to find. [20:34:07] * halfak widens search criteria. [20:34:26] Should I expect some events to be dropped around 2300 UTC yesterday? [20:34:46] Note that these originate on the server-side. [20:37:40] halfak: again could be that events are not valid, that will be the 1st thing to check. Did you check with otto about those logs? [20:37:56] that should be on 1002 but are not (yet) [20:38:00] nuria, good Q. I figured I couldn't check them until that ticket was resolved. [20:38:15] halfak: it should be very short to resolve though [20:38:21] ottomata has been out of town today isn't he? [20:38:46] yes [20:39:33] * halfak goes to see if the logs magically appeared. [20:39:54] nope [20:43:02] nuria, any other way I can find out if the events aren't validating. I'm worried this will block the VE experiment. Today is the day we adjust the schedule. [20:43:58] Bah... it's happening again. A max(timestamp) on the table returns 2015-04-23 20:42:05 [20:44:30] But the revision table shows I submitted my events on 2015-04-23 20:36 [20:45:18] Maybe eventlogging's clock is off? [20:45:20] * halfak checks [20:46:04] It looks like the events that came in late yesterday have the right timestamp. [20:46:16] The revision table and logging table roughly agree, that is. [20:49:00] * halfak pulls more hair [20:50:40] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1232448 (Milimetric) Brandon, I've commented on the patch, I basically agreed with your first two changes... [20:53:40] mforns/milimetric, do you know another way I might determine if events aren't validating? [20:53:53] halfak: sorry i was trying to catch up [20:53:58] Another as opposed to looking in stat1002:/a/eventlogging [20:54:05] * halfak waits patiently :) [20:54:25] halfak: yeah, logs only go there on a cron, so it's not up to date. The latest are on vanadium [20:54:27] uhhh [20:54:31] eventlog1001 [20:54:42] but you don't have rights there? [20:54:48] milimetric, I don't think so. [20:54:57] i'll check for you, what are you trying to find out? [20:55:22] fwiw, i've just confirmed that eventlog1001 and stat1002's clocks are in sync. [20:55:27] Will pastebin something quick. [20:55:30] thanks jgage [20:55:43] jgage: yes, but stat1002 only gets the logs every once in a while [20:55:50] thanks for the check [20:56:41] yeah, you know more about this setup than I do but I wanted to eliminate that possiblity and assure myself :) [20:57:41] milimetric, https://gist.github.com/halfak/654b35c5a88569ba5541 [20:57:48] thanks halfak [20:58:39] I think that if you can just search today's logs for any edit doc['event']['page.title'] == "page.title` = "User:EpochFail/sandbox", that will get me what I want. [20:58:52] halfak: going into a meeting now, but quick thing: [20:59:08] the SQL insert is definitely behind the logs [20:59:14] because it batches and falls behind a bunch [20:59:44] milimetric, I've checked this against "SELECT max(timestamp) FROM Edit_11448630" [20:59:48] I think the last time we looked there's usually up to an hour and a half of lag of inserting to the DB, so your events would make it in there later if at all [20:59:55] but I'll look through the logs now and see if I find them [21:00:07] the timestamp returned is well beyond what I'd expect for the event. [21:00:12] Happened yesterday too. [21:00:13] ha, ok [21:00:25] But the events appeared over night -._o_.- [21:00:49] yeah, that can happen with the way batching works [21:00:56] milimetric, gotcha. [21:35:01] halfak: I'm not seeing your edits in the logs [21:35:16] Agreed. But they are in the revision table. [21:35:22] Oh wait.. you mean the raw logs. [21:35:23] oh, I believe you :) [21:35:26] yes [21:35:28] no [21:35:31] the valid logs so far [21:35:33] Does that mean they were probably never sent? [21:35:38] I'll look in the raw now [21:36:25] Gotcha. Thanks :) [21:39:12] Analytics, Analytics-Cluster, Analytics-Kanban: Add support for X-WMF-UUID method of transmitting app install ID to apps uniques reports [5 pts] {hawk} - https://phabricator.wikimedia.org/T96926#1232680 (kevinator) [21:48:17] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [1800.0] [21:48:49] mutante ^ :D [21:50:54] kevinator: What's the go-to solution for dashboarding EL data now? Is it still Limn? [21:51:14] yes, still limn [21:51:40] jgage: :) [21:57:37] Analytics, Gather Sprint Forward, Mobile-Web, Patch-For-Review: Update main menu schema to include collections for limn graphs - https://phabricator.wikimedia.org/T93690#1232702 (Jdlrobson) Open>Resolved [21:58:08] Analytics-Tech-community-metrics, Phabricator, Wikimedia-Hackathon-2015, ECT-April-2015: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1232704 (Aklapper) >>! In T28#1125190, @Qgil wrote: >> Putting it all into the monthly email implemented in T1003 feels like overkill but is the obv... [22:13:14] Analytics-Tech-community-metrics, Phabricator, Wikimedia-Hackathon-2015, ECT-April-2015: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1232723 (Aklapper) **Dropping SQL queries only in this comment; please ignore.** >>! In T28#1125190, @Qgil wrote: >>>! In T28#1081799, @Aklapper wr... [22:20:26] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL 20.00% of data above the critical threshold [1800.0] [22:21:29] so.. does anyone know the significance of that alert? what action should be taken, if any? (if not, is it useful?) [22:22:01] s/not/none/ [22:27:56] Analytics, Analytics-Kanban, Traffic, Patch-For-Review: Code changes in VCL to add a Last-Access cookie to track "top domain" uniques [13 pts] {bear} - https://phabricator.wikimedia.org/T92435#1232738 (kevinator) hey @BBlack, when will this be deployed to all varnishes? I'm getting really excited... [22:30:35] halfak: Wikitext is sampled 1/4 so you're most likely just being sampled out [22:30:42] sorry I didn't realize you were testing on WT [22:30:52] milimetric, oh god. I forgot. [22:31:08] That's going to be a problem. [22:31:24] :) [22:31:42] milimetric, when did the sampling go live? [22:32:02] uh... one sec I can tell from graphite for sure [22:32:11] Thanks. [22:33:39] halfak: my guess would be April 15, but maybe do a quick group by hour to validate the exact hour [22:33:56] milimetric, thanks. That explains a lot. [22:34:00] :) [22:34:20] halfak: sampling was critical here, without sampling the Edit event stream was causing EL to drop basically 2-3 hours of data almost daily [22:34:28] very, very sad :( [22:34:49] now we're losing an hour every few days instead [22:34:54] that's just very sad [22:44:22] milimetric, seems like this is a problem that will only get worse. Is there a plan? [22:44:54] milimetric, also, VE events are going to grow substantially when we run the A/B test. [22:45:00] halfak: yes, marcel has a patch in progress that's doing batching per schema. This should allow us to take advantage of the much much faster raw insert feature [22:45:00] It seems like we should account for that. [22:45:09] I can do a quick back-of-the-envelope. [22:45:21] milimetric, coool :) [22:45:46] halfak: that'd be useful, but in general the infrastructure may just fall apart. Personally, I've been arguing that what we have just doesn't work period. But if it has to fail for other people to agree, I'm happy to let it fail [22:46:18] milimetric, boo. We ought to be reasonable people. [22:46:30] Is the bottleneck storage in MariaDB? [22:49:19] halfak: the real bottleneck is really on the analysis side. The rate of events flowing into mysql is too large to analyze more than a few days of data [22:49:44] there's this other insert bottleneck but that's just an implementation problem which marcel's patch should solve [22:50:21] but the real question is, once we insert all this data, what use is it to maintain this ever more complicated system when we can barely even query the data it's gathering [22:50:58] milimetric, makes sense to me. [22:51:17] well, on the other hand, if the event throughput increases a lot more, I'm sure new bottlenecks will pop up [22:51:33] for example the validation logic so far has been keeping up but that's a single point of failure with the current infrastructure [22:51:44] I'd love to have EL go straight to HIVE with a nice workflow for scooping chunks back into MySQL as necessary -- or doing the joins in Hive. [22:51:53] and at some point no amount of smart batching will keep up [22:52:05] halfak: the last proposal in tasking today was this: [22:52:23] keep X days of data in mysql and delete where timestamp < now - X on a daily cron [22:52:38] in parallel, store everything in Hive tables [22:53:03] and work, over time, to bring other useful stuff to Hive tables (mw tables, etc.) [22:53:18] and then put a nicer than Hive query interface on top of it all (we're looking at Impala right now) [22:53:18] milimetric, I would love that so much [22:53:23] me too [22:53:33] but, you know, if things have to fail for other people to love it, :) [22:53:38] also, it would solve the other bottleneck in the system [22:54:03] which is "if you don't have a PdM who can push for this EL schema to be Necessary, GL getting a schema in" [22:54:32] if we can avoid performance and reliability bottlenecks we can be a lot more liberal about new schemas and that has big implications for non-engineering-driven research [22:56:15] (also, hi from NYC) [22:57:08] milimetric, thanks for the info. not sure who to push, but happy to do some pushing. [22:57:42] hi Ironholds :) [22:58:25] I don't think pushing does anything at this point, I think I'll take a more active role in explaining to people why certain things are happening and hopefully getting everyone on the same page about what the right next step is [22:58:35] but I'll tell them I have your support :) [22:58:54] Please do :) [23:04:46] OK. I'm off. Have a good night folks. [23:04:46] o/