[00:00:48] AndyRussG: just so you know I saw your cries for help and I really wanted to help but I have this Q3 "top goal" thing hanging like a noose around my neck [00:00:49] sorry man [00:01:20] milimetric: hey no worries! Thanks also [00:02:44] Don't get too worn down! it was awight who egged me on to try to sound more "serious" :) [00:03:15] :p [00:03:23] Booo! paychecks = zero! [00:03:45] milimetric: appreciate the intention! good luck with all that :) [00:04:26] nono, this seriously sounds serious [00:12:06] milimetric: qchris helped get the logs we needed and delve deep enough into things to have an idea of where to turn (now ops, it's a weirdness that's on their turf and it seems fundraking isn't broken :) ) [00:12:44] ok, well if you need further help I'm a little more relaxed now, I'll be off until Tuesday but I'm happy to help after that [00:15:10] Analytics: Report New editors per month in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89277#1035730 (Tbayer) @ezachte, as Kevin said, this is about the data that already exists http://reportcard.wmflabs.org/graphs/new_editors , just not yet for December 2014. Could you try to adapt your approximation m... [00:20:10] milimetric: OK you bet! Thanks so much, really appreciated :) [00:20:20] Also enjoy your time off :) [00:34:50] Analytics-Engineering, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. - https://phabricator.wikimedia.org/T89397#1035750 (leila) Thanks for starting this, Ottomata. For eye-balling the data, it's really good to have a redirect_page_name column... [00:50:18] Analytics-Engineering, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. - https://phabricator.wikimedia.org/T89397#1035788 (Ottomata) I think we should add names by joining with page tables in the refinement phase, but not in the raw x_analytics... [00:59:33] Analytics-Kanban, Analytics-Cluster: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1035811 (csteipp) When I talked to @halfak, he was thinking they would set a cookie that was not unique, that expired at the end of the month. So everyone during the month gets the coo... [01:12:16] Analytics-Kanban, Analytics-Cluster: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1035843 (tstarling) The parent (T88647) does say "unique identifiers will not be used for this report". [01:36:35] Analytics-Kanban, Analytics-Cluster: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1035896 (bd808) >>! In T88813#1035843, @tstarling wrote: > The parent (T88647) does say "unique identifiers will not be used for this report". So it does. I should read things more cl... [01:42:14] Analytics-Kanban: Script adds indices to the Edit schema on analytics-store [5 pts] {bear} - https://phabricator.wikimedia.org/T89256#1035909 (mforns) staging.milimetric_edit has the following indexes: ``` KEY ix_milimetric_edit_session (event_editingSessionId) USING HASH, KEY ix_milimetric_edit_action (even... [01:44:41] kevinator: what is the connection between T88813 and MediaWiki-Core? The changes you need require ops, at least going by the description you have in phab. [01:45:42] Yes, I new it was more opsy stuff, but I thought I’d mention it in the spreadsheet too in case there were any dependencies [01:48:19] Analytics-Kanban, Analytics-Cluster: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1035918 (RobLa-WMF) Ok, it took me a little bit to wrap my head about what y'all are doing with this, and was kind of skeptical. Assuming I understand this correctly, this sounds like... [02:14:00] (PS1) Mforns: Add SQL script to create indexes in EL Edit tables [analytics/data-warehouse] - https://gerrit.wikimedia.org/r/190404 (https://phabricator.wikimedia.org/T89256) [02:16:45] (PS2) Mforns: Add SQL script to create indexes in EL Edit tables [analytics/data-warehouse] - https://gerrit.wikimedia.org/r/190404 (https://phabricator.wikimedia.org/T89256) [10:30:04] Analytics: Report visitors (comScore) in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89281#1036342 (Tbayer) a:Tbayer As discussed, these are already available at http://reportcard.wmflabs.org/graphs/unique_visitors So I don't think we need a separate task for this, except as a reminder for myself to... [10:34:35] Analytics: Report Signups in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89276#1036344 (Tbayer) [10:35:49] Analytics: Report Signups in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89276#1032107 (Tbayer) Has this been assigned to anyone, as planned earlier ( T88846#1021705 )? Or do we need to leave it out in this quarterly report (to be published on Sunday Feb 15)? [11:52:22] Analytics: Foundation-only geowiki data trapped in redirection loops - https://phabricator.wikimedia.org/T89447#1036401 (QChris) NEW [14:08:44] Analytics-Kanban: Hand off of Christian's MaxMind geolocation databases repository - https://phabricator.wikimedia.org/T89453#1036629 (Milimetric) NEW [14:27:56] qchris: that's such a lovely clean script [14:28:11] I didn't know about --porcelain [14:28:28] but in a meta way, I'm going to use this script as my standard for "porcelain" when I write my own [14:28:31] You are way too nice to me .) [14:28:49] no, really, I think of bash as this thing that I have to bash my head into the wall until I get it to work [14:29:02] :-D [14:29:04] and your scripts have really helped me appreciate what it's good at [14:29:18] Heh! :-) [14:45:41] who [14:45:52] whoops :) [15:02:45] joal: standup [15:11:47] thx milimetric, was reading ... [15:11:52] Apologizes for that [15:11:59] I'll setup a reminder [15:12:13] np at all, I'm just pinging for convenience [15:28:02] joal, let's talk today about some tasks, i have a few easy ones (that are not oozie coding) for you, but I think you might want to do them different (e.g. spark :p ) [15:28:30] etl related [15:28:51] we can IRC talk about them, apparently this cafe is not good for meetings [15:38:19] Hey ottomata [15:38:23] so joal, ja, those two repositories are particularly relevant [15:38:50] refinery/source is pretty much all java stuff. refinery contains more scripts, oozie configs, etc. as well as released jars of refinery-source modules [15:38:54] for those listening here, they are refinery and refinery/source [15:39:05] (the jars are put there via git-fat, they aren't actually committed to the repo) [15:39:17] ok [15:40:44] so, joal, if you want to really get into something, I subscribed you to a few phab tickets i created yesterday [15:41:18] https://phabricator.wikimedia.org/maniphest/query/PPJnw0Zbhyg5/#R [15:41:33] this might be a good starter [15:41:33] https://phabricator.wikimedia.org/T89401 [15:42:48] Thanks for that ! [15:43:31] they aren't really speced well, if you want to do one of them, i'll add more info [15:43:32] I am gonna have a look at the code, and probably ask for help ;) [15:43:36] ok cool, for sure [15:43:40] you want to try that one? [15:43:46] Yeah, sure ! [15:43:52] ok cool. will add more detail now. [15:44:16] I'll first try to find the part of the code where I should work ! [15:47:44] k! [15:50:42] ottomata: With bits again in place, and all oozie jobs running. Is it ok if I stop looking at those jobs and assume that you'll handle them? [15:50:44] (aka if you upgrade cdh on Monday, is it ok if I let you do all the checking and maybe restarting of things?) [15:51:16] qchris: yup, i think that is fine. [15:51:27] joal: I updated that task with some more info [15:51:29] Cool. Thanks. [15:52:01] qchris: are all udp2log tsvs being created now? [15:52:19] yup. they should. let me double check. [15:52:59] Yup. /a/log/webrequest/archive looks good on stat1002. [15:53:04] so. awesome. [15:53:25] how do you think we should handle a switchover? just a readme file in /a/squid/ directories? [15:53:43] and an annoucement? [15:54:10] I'd say only the README would be too little. [15:54:19] But README + announcement sounds great. [15:54:36] should we try to do some symlinking? [15:54:47] maybe move all data into the /a/log/webrequest/archive/* directories from /a/squid [15:54:52] and then symlink the old directories? [15:54:55] or should we keep them separate? [15:55:05] I would not do that. [15:55:15] The squid directory contains so much cruft. [15:55:21] ok [15:55:26] The zero directory being the worst offender. [15:55:40] ha ok [15:55:40] ok. [15:55:43] Also ... we have the Hive tsvs back to 2015-01-01. [15:55:47] aye [15:55:51] That would get overwritten otherwise. [15:56:12] if you got a sec, want to write up a README and stick it in /a/squid/archive? [15:56:22] I can do the announcement if you like [15:56:43] when I write an announcement, I'll thikn about a timeline for turning of udp2log instances! :O [15:56:53] if you would rather I write the README, i can [15:56:55] just lemm eknow [15:57:18] I gotta sign out in a few minutes, so I can do the README over the weekend, but not today :-( [15:57:36] ok that's fine [15:57:45] i'm excited about this! [15:58:01] About the udp2log timeline ... did you get the fundraising use of udp2log settled? [15:58:07] What are they about to use and when? [15:58:10] kafkatee? [15:58:31] 'im mostly talking about our udp2log instances [15:58:39] the fundraising one I will push harder once we turn those off [15:58:51] but yes, i believe they will use kafkatee [15:59:44] udp2log instances being udp2log filters then? (Because fundraising is using the udp2log on erbium IIRC) [16:00:20] Anyways. I am also pretty excited about it :-) [16:00:28] This is really a great thing :-D [16:01:08] k. Gonna head to some nerd meeting. Have fun! [16:02:42] otwt [16:02:54] s/otwt/ottomata: there is one thing I forgot that helped me a lot when looking at the oozie stuff/ [16:02:59] https://commons.wikimedia.org/wiki/File:Refinery-oozie-overview.png [16:03:09] I just updated that for the addition of bits. [16:03:27] Now I am off. Again :-) [16:03:34] laters! thanks [17:03:02] ottomata: is puppet re-run automatically on halfnium? [17:03:49] ottomata: this is to know if the updates to the alarm code we deployed get there automatically or we have to push them [17:04:28] should be [17:04:32] i can check [17:12:10] or maybe this: joal heyyyy [17:14:20] nuria: [17:14:21] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=tungsten&service=Throughput+of+event+logging+events [17:14:26] This service has been scheduled for fixed downtime from 2015-02-06 16:39:37 to 2015-02-20 18:39:37. Notifications for the service will not be sent out during that time period. [17:14:43] ottomata: yessir [17:14:58] looks applied properly [17:14:58] ottomata: ?? [17:15:20] dunno, someone has a very long downtime for eventlogging scheduled, it looks lke [17:15:24] did you not do that? [17:16:14] ottomata: no [17:16:23] ottomata: and larms are being sent [17:16:28] *alarms [17:17:08] ottomata: first: question: we know the patch applied because of the "Last Updated: Fri Feb 13 17:16:46 UTC 2015 - Updated every 90 seconds [pause]" is that so? [17:17:37] ottomata: second, i have no clue where the downtime comes from [17:17:40] well, there are now multiple alerts for the same data, due to the way this puppet thing is applied [17:18:00] ottomata: we want to have only one though [17:18:05] yeah [17:18:05] i know [17:18:10] ottomata: ah ok [17:18:16] i just saw this as I was looking at it [17:18:20] preivously, there was only one grpahite server [17:18:26] and since these are remote checks of graphite data [17:18:33] they just need to run somewhere. it doesn't matter where [17:18:38] so someone included them on the graphite server [17:18:43] but, now there are multiple graphite servers [17:18:48] and these checks are being included on all of them [17:18:55] even though they look at and alert for the same data [17:19:00] so, they need a new home. [17:19:42] basically, all of these: [17:19:42] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/graphite.pp [17:19:53] probaly should be in their own class, and manually included on just one server [17:21:49] ottomata: i see, it's not "types" of alerts [17:22:13] ottomata: rather the "same" alert is going to fire more than once (as many times as graphite host we now have) [17:22:20] yes [17:22:28] ottomata: ok, good to know [17:23:04] ottomata: and i assume the host info will not come on the alert so we just have to be on the lookout for that [17:23:25] yes [17:23:33] feel free to submit a patch :) [17:27:37] ottomata: to include the host info you mean? [17:27:50] no, to make it so there is only one alert [17:28:45] ottomata: and ... ahem... how do i do that? I thought it was the graphite setup itself that was creating multiple alerts [17:30:05] 1. create some new class, role::graphite::checks or something [17:30:18] move ::monitoring:: checks into that class [17:30:21] 2. ^ [17:30:30] 3. inlcude that class on just one graphite host (tungsten maybe?) not all [17:43:32] ottomata: but then tungsten will be running everybody's alarms would it not? [17:43:42] yes [17:43:46] that is unavoidable though [17:43:53] one host needs to be picked [17:43:59] you could make the patch just be relevant for yours [17:44:07] if you like [17:47:41] ottomata: I rather not refactor ops puppet graphite code to tell you the truth as I am completely disconnected from changes to that arena, also that class includes 500 alarms for mediawiki code so it's pretty sensitive [17:51:31] you could just do yours! :) [17:51:38] but ja, that's fine [17:55:49] ottomata: but andrew , how would we just move ours, on the /graphite.pp code there is no EL code that i can see , it is just global definitions and some alarms [17:56:05] let me see how are ours defined again [18:03:00] include ::eventlogging::monitoring::graphite [18:03:01] just that one. [18:04:03] ottomata: ah yes, ok i see removing that and making a puppet class that just deploys to tugsten makes sense. [18:05:02] aye [18:05:03] :) [18:05:25] ottomata: sorry! now i get it, made task: https://phabricator.wikimedia.org/T89469 [18:05:31] hey joal, do you have experience configuring spark and running it in yarn? i'm trying to do it in vagrant and getting really annoying classnotfound excpetions [18:05:46] aye, thanks nuria :) [18:06:02] ottomata: i will take care of that once EL backfilling is over [18:06:30] cool [18:06:32] danke [18:07:10] nuria: how far along is the backfilling? [18:13:20] ori: nothing, not even finished the 1st day [18:13:37] ori: otherwise slave replication was getting over 1 hour [18:15:43] ori: so, baby sitting this over the weekend we might be done by monday [18:18:59] nod [18:25:17] ori, q for you. any idea if it would be possible to stick redirect_page_id into x-analytics? [18:32:04] ottomata: possibly; file a task? [18:32:22] Analytics-Tech-community-metrics, MediaWiki-Developer-Summit-2015: Achievements, lessons learned, and data related with the MediaWiki Developer Summit 2015 - https://phabricator.wikimedia.org/T87514#1037185 (Qgil) Casual notes from a conversation with Robla and Greg: * Have a list of questions beforehand. Da... [19:06:11] Analytics-Cluster, Analytics-Kanban: Implement Last Visited cookie [34 pts] {bear} - https://phabricator.wikimedia.org/T88813#1037327 (csteipp) >>! In T88813#1035918, @RobLa-WMF wrote: > In general, it seems as though the only possible change to privacy is that, if the client computer is seized or compromised... [19:15:59] ottomata: going for lunch , i leave backfilling on vanadium on as we have tested how it runs and it does not seem to cause trouble [19:17:04] ottomata: easy to kill if it were causing problems, will be back in 2hrs cc: ori [19:18:57] ok [20:15:04] Analytics: Knowledge Gaps in Wikipedia - https://phabricator.wikimedia.org/T89487#1037526 (Krenair) I'm guessing this is to do with #Analytics. [20:15:11] Analytics: Missing Links in Wikipedia - https://phabricator.wikimedia.org/T89488#1037530 (Krenair) I'm guessing this is to do with #Analytics. [20:15:19] Analytics: Wikimania submissions - https://phabricator.wikimedia.org/T89486#1037533 (Krenair) I'm guessing this is to do with #Analytics. [20:35:58] Analytics: Foundation-only geowiki data trapped in redirection loops - https://phabricator.wikimedia.org/T89447#1037597 (Ottomata) Open>Resolved [20:36:45] ori: https://phabricator.wikimedia.org/T89397 [20:37:00] Analytics-Engineering, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. - https://phabricator.wikimedia.org/T89397#1037599 (Ottomata) [20:47:31] Analytics: Knowledge Gaps in Wikipedia - https://phabricator.wikimedia.org/T89487#1037611 (leila) http://etherpad.wikimedia.org/p/wikimania_knowledge https://wikimania2015.wikimedia.org/wiki/Submissions/The_next_million_articles_in_Wikipedia [21:02:43] nuria, yt? [21:03:20] Analytics: Knowledge Gaps in Wikipedia - https://phabricator.wikimedia.org/T89487#1037488 (leila) >>! In T89487#1037526, @Krenair wrote: > I'm guessing this is to do with #Analytics. Hi @Krenair. Thanks for checking in. I removed Analytics as a project. My team has not moved to phabricator yet and I'm using... [21:04:56] Analytics-Engineering, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. - https://phabricator.wikimedia.org/T89397#1037672 (ori) Cache invalidation could get tricky. Redirects can be made into bona-fide articles, and vice versa. [21:10:54] Analytics: Wikimania submissions - https://phabricator.wikimedia.org/T89486#1037698 (leila) [21:47:01] operations, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1037822 (Dzahn) [21:47:11] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1035650 (Dzahn) [21:50:51] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1037826 (Dzahn) stat1003 is a Statistics general compute node (non private data) (role::statistics::cruncher) stat1002 is a analytics server (role::analytics) stat1002 is... [22:20:30] nuria: Hey, I was thinking that we should talk about Krenair's work to add Schema:Edit funnel logging to WikiEditor. [22:21:04] James_F: sure , irc? e-mail? hangout? [22:21:47] nuria: IRC works; in meeting right now but ggellerman pointed out that we should talk. Mostly it's a "Hmm, can EL scale to this additional inflow?" question. [22:22:26] James_F as long as you keep it to 20 res a sec more or less i would say it's np [22:22:40] nuria: I imagine it might be higher than that. [22:23:06] James_F: how so? you can control that with a sampling rate [22:23:27] James_F: note taht you want to keep mysql tables sizable so you can query them, with a higher inflow [22:23:39] nuria: (~2–7 events per attempted edit; there are lots of attempts, and we're trying not to sample if possible.) [22:23:57] James_F: and how many edits per day per enwiki? [22:24:10] James_F: and why wouldn't you want to sample? [22:24:55] James_F: sampling does not skew your data for an experiment like this one that is running for weeks [22:25:05] James_F: it's normally how EL data is gathered [22:25:08] nuria: enwiki is ~60k edits a day I believe right now; sampling hides small-scale extreme issues, like Dan found with one user whose funnel took 20! times to get the CAPTCHA right. [22:25:31] It's not an experiment, though, it's general user behaviour baseline data for impact tracking. [22:25:45] James_F: no, statistically it doesn't if you are analyzing enough data [22:25:52] We could sample instead, and just multiply all the WT data up. [22:25:58] James_F: note that you have a long running experiment [22:26:01] We're doing unsampled collection for VE. [22:26:11] James_F: the amount of data you study is your total dataset [22:26:41] James_F: do you have comparable usage among both ? [22:26:59] James_F: you probably do not and that is ok [22:27:06] James_F: that is what stats are for [22:27:15] nuria: In terms of numbers? No. It's roughly 1:10, but it varies. [22:27:56] But the exact ratio is indeed something we'd want to track. [22:28:10] James_F: so I would not worry about sampling on wikieditor, this is a long running experiment (we also run unsampled short tests but this is not one of these as far as i can see) [22:28:20] James_F: then set up an experiment for ratio [22:28:37] James_F: you do not need to track 7/6 events per editor to calculate ratio right? [22:28:37] Yes, that's what I'm talking about. [22:28:43] We certainly do. [22:28:59] The whole point is to measure relative funnel successes. [22:29:12] James_F: That is different for ratio [22:29:17] *from ratio [22:30:00] James_F: and for that you can intercompare data as long as both sets are statistically significant within the same period of time, makes sense? [22:30:07] nuria: WE:VE starts; WE:VE attempted saves; WE:VE successful saves. [22:30:16] Sure. [22:31:39] James_F: understood, what I am saying is that you gather two sets of data on timeperiod T both of which are statistically significant for what you want to measure. Note that you can acheive this by sampling wikitext data [22:32:55] James_F: from looking at your dataset initially with dan there are many small wikis for which VE dataset was too small to drive statistically significant conclusions, but there were many others for which there was enough data [22:34:54] James_F: also rather that absolute # of events I would compare ratios [22:36:26] nuria: Yeah, though inter-wiki differences may need investigating too. [22:36:36] James_F: like VE stats/VE saves versus equivalent set of wikitext, and on this our scientists can elaborate but it is easier to convey information that way [22:36:51] * James_F nods. [22:36:59] James_F: I think inter-wiki would be SUPER interesting once we have a baseline [22:38:13] James_F: so, summing up: sampling wikitext data would be np as long as both VE and wikitext have a statistically significant dataset to work with. [22:38:23] * James_F nods. [22:38:42] We'd have to work out a way that samples at a consistent rate between the cohorts we care about. [22:38:53] James_F: and 2) ratios of event1/event2 are easier to compare than absolute counts [22:39:09] James_F: yes, cohorts. so what cohorts you care about specifically? [22:39:11] Logged-out users and new accounts being the focus; once on for an edit session, should stay on. [22:39:34] James_F: i would think for a baseline we would want to get "everyone" to start but maybe not .... [22:39:54] So maybe (e.g.) (accountID || IP)%10 = 0 or whatever. [22:40:20] James_F: ah yes, we do not sample within the session, once we start gathering session data we do it for the whole session [22:42:00] James_F: but for VE do you have also this cohort focus? [22:47:14] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1037997 (Halfak) So, both stat1002 and stat1003 have access to private data. I can't comment about puppet roles. I'm not sure what you are asking for WRT a "centralized... [22:49:16] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038004 (Dzahn) I'm asking because to add the packages in puppet we need to decide which role to put it on. We install things in role classes which are applied to host nam... [22:50:25] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038013 (Dzahn) a:Ottomata [22:51:21] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1035650 (Dzahn) Given to @ottomata for his advice where the puppet code should be added. [23:05:40] Analytics: Report Signups in 2014 Oct-Dec - https://phabricator.wikimedia.org/T89276#1038039 (Halfak) Done. See https://meta.wikimedia.org/wiki/Research:Monthly_registrations [23:07:10] nuria: No, we're collection at 100% right now. [23:07:32] James_F: Then to compare both datasets we should have the same focus on both [23:07:55] James_F: let's do that 1st and be more specific later once we have some preliminary analysis, sounds good? [23:09:01] James_F: we can run "cohort-specific" experiments later, that should not be hard as data model is the same [23:14:06] James_F: let me know if this sounds good/want more info, I can put what we talked about in an e-mail as needed. [23:14:28] nuria: Sounds good to me. E-mail would be good. [23:15:19] James_F: ok, to recap :1) no cohorts to start in VE or Wikitext (to be done later, after we get a baseline) 2) sampling on wikitext 3) comparation of statistically significant datasets [23:15:24] 4) cmaparation of ratios [23:15:29] *comparation [23:15:58] SGTM. [23:15:59] James_F: let me know if something is missing or do you want more info [23:16:04] James_F: k [23:16:17] James_F: e-mail on the way [23:25:37] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038087 (Ottomata) Daniel, check out modules/statistics/manifests/compute.pp Put this there. [23:28:13] nuria: hey superm401 agreed to help debug the EL weirdness [23:28:33] he had a really neat suggestion which is - let's take a look at the timestamps of the events in those monster 70,000 record insert statements [23:28:42] that might give us a clue how they're getting so big [23:29:14] milimetric: sounds good, i also found some other issues , regarding python/threads and scope [23:30:02] interesting, yeah, feel free to CC him and he'll take a look / help review etc. [23:30:10] I'm reading your conversation with J above [23:30:12] Yep [23:30:22] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038091 (Dzahn) a:Ottomata>Dzahn [23:31:10] Are you still updating https://lists.wikimedia.org/mailman/listinfo/eventlogging-alerts ? [23:34:54] superm401: yes, nuria sends email to that list when something's broken [23:35:09] her latest ones notified everyone of the outage and the backfilling she's attempting [23:35:31] milimetric, that's weird. I don't remember getting the outage email (and mailman just said I was already subscribed). [23:35:36] Is it CCed to another mailing list? [23:35:54] looking [23:37:13] milimetric: sent e-mail summing up conversation with James_F so we are all in the loop (to include Krenair too) [23:37:33] superm401: "[Analytics] EL dropping events for about 8 hours from midnight Feb 5th to about 8am Feb 5th" [23:37:46] superm401: i bet whoever administers the list needs to "approve" messages [23:37:46] that one went out to el alerts [23:38:01] thanks nuria [23:38:03] superm401: if we can find out who that is... [23:38:05] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038114 (Dzahn) @ottomata alright, amended: https://gerrit.wikimedia.org/r/#/c/190592/3/modules/statistics/manifests/compute.pp good to go? [23:38:15] nuria, I was wondering about getting the schema improved... [23:39:00] Krenair: yesssir ... [23:39:12] Compare the possible values for action.saveFailure.type against EditPage's error status constants :/ [23:39:13] thanks nuria for the email. All the analysis I've done so far does not include raw counts so sampling shouldn't affect anything [23:39:31] though if we ever do count to count analysis, it's going to be annoying remembering the sampling ratio and not having it written down in the schema [23:39:48] milimetric: ack about raw counts, just wanted to clarify that with James_F so we do not go through that path [23:40:18] but then again you know I disagree with you and Ori in that aspect. That in theory, you're right and sampling should be left up to ops. But in practice it's important IMO to record it in the event even if it's controlled externally [23:41:04] milimetric, nuria, yeah, it was definitely CCed. It's possible I don't see the other version because I'm subscribed to both. But you should check to make sure your postings are going through to the alerts list. [23:41:05] Krenair: there are more error status constants or less? Or just totally mismatched? :) [23:41:17] milimetric: for a funnel comparation absolute counts have little meaning [23:41:17] a couple do map [23:41:23] there are much more error status constants [23:41:35] Krenair: lemme look at schema [23:41:36] I'm not sure what to do about those extension* values [23:41:36] nuria: i agree, more general point about the annoyance of "remembering" what was sampled at what rate when [23:41:57] hm... hm... hm... [23:42:51] milimetric: What if we have to change the sampling rate due to unexpected load? Do we change the schema revision? [23:42:53] Krenair: can you respond to Nuria's email with that comparison and we can figure out where to go from there? I'd say let's expand the current schema and add all the failure types [23:43:06] because otherwise we lose valuable information [23:43:22] +1 [23:43:24] James_F: no, sampling is separate from the schema regardless [23:43:27] ok [23:43:47] ideally the sampling is recorded in the event though, because otherwise analysis later on becomes really tacit [23:43:59] ok Krenair, will look for info on e-mail [23:44:12] thanks, Alex, that'll be very useful [23:44:55] James_F: I'm not saying to change the schema, the sampling rate should be filled in by the server as part of the capsule. But this is an age-old debate that I keep losing :) [23:45:43] James_F: have you had a chance to look at the new graph? I was wondering what you thought. Aaron and I were happy with the paths it points us down [23:45:49] milimetric, James_F : if you are comparing ratios sampling is not going to have an aeffect on your data analysis [23:46:33] I'm in total agreement with you on that nuria, but I'm making a different point [23:47:04] milimetric: ah sorry [23:48:28] brb, locking up chickens [23:49:48] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038155 (Dzahn) @halfak packages have been installed on both hosts. ``` Notice: /Stage[main]/Statistics::Compute/Package[libopenblas-dev]/ensure: ensure changed 'purged'... [23:49:58] milimetric, how do i get to pentaho? [23:50:00] operations, Analytics-Engineering, Analytics-Cluster: Install packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038156 (Dzahn) Open>Resolved [23:52:03] operations, Analytics-Engineering, Analytics-Cluster: Install Fortran packages on stat1002 and stat1003 - https://phabricator.wikimedia.org/T89414#1038159 (Dzahn) [23:56:38] yurikR: http://pentaho.wmflabs.org/pentaho/Login [23:56:53] use the evaluator login thing [23:56:56] just click "go" [23:57:09] warning: this is literally running off my home share in labs [23:57:39] so it has exploded often and awesome people like nuria maintain it in their spare time sometimes, but no guarantees [23:58:56] milimetric, thx! [23:59:42] ya yurikR , do not tell your friends as i doubt it sustains more than a handful of users [23:59:56] yurikR: once you get in, you have to do "Create New" -> "New Saiku Analysis" and pick the v0.4 cube