[09:36:38] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c9 (nuria) Summing up, somehow we are generating requests in the client like: "http://es.m.wikipedia.org/http://es.m.wikipedia.org/wiki/Wikipedia:Portada" which, according to http are valid req... [09:37:08] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371 (nuria) NEW>RESO/INV [09:40:25] Analytics / Wikimetrics: Wikimetrics is not supporting mlwiki cohort - https://bugzilla.wikimedia.org/69462 (nuria) NEW p:Unprio s:normal a:None Attempting to upload an indic language cohort today we discovered that Wikimetrics is not supporting mlwiki cohort - server error for list includi... [09:43:23] hola springle, [09:43:37] can i ask you about labs db and replication lag? [09:46:17] nuria: !ask :) [09:47:00] Is it possible to have a high replication lag (like > 24hrs) between labs db and production? [09:47:15] or rather is that something that has ever happened? [09:47:36] that is possible, but only if something is broken [09:48:37] nuria: yes, it's happened. we've had broken replication due to bugs or user transaction blocking [09:48:43] is that something we should guard against on our applications ? or is it too unlikely instance? [09:48:48] i'm not aware of any recently [09:49:03] "too unlikely of an instance" [09:49:43] guard against how? replication is asynchronous. it's always possible that lag can occur, hopefully small but maybe large, and applications need to be aware [09:51:58] We have reports that run everynite that would be sentisitive to a lag of more than 24 hours, small lag is no issue [09:55:26] springle: what is the best way to automatically monitor the event lag? I saw we have an event_log table that seems to keep track of replication , is querying that the best way to know replication is Ok? [09:55:41] https://git.wikimedia.org/blob/operations%2Fsoftware.git/c57ddf6b82f046f893de8e70bda15e4d57b4ae25/dbtools%2Fevents_labsdb.sql [10:02:49] nuria: ops.event_log table has nothing to do with replication lag. i think most people who care look at timestamps in active tables [10:03:53] springle: ok, so we look at our data rather that look somewhere when replication happened, right? [10:04:46] it's possible to grant user accounts access to REPLICATION CLIENT which allows SHOW SLAVE STATUS command, but to my knowledge that isn't done on labsdbs [10:05:03] we'd have to chat with Coren [10:07:43] springle: is there anything (alarm, script?) ongoing that monitors the replication delay for enwiki, dewiki.. etc on labs? [10:08:50] not presently. there was, but we're halfway through a migration and the new multi-source replication requires new monitoring [10:10:08] nuria: https://icinga-admin.wikimedia.org/cgi-bin/icinga/status.cgi?host=db1053&nostatusheader [10:10:26] that is the sanitarium server (or one of them), which is labsdb master in replication tree [10:11:35] if that is all green, chances are high that labsdb is also fine. we have events in place on labsdb watching for replag that should make it hard for users to block [10:12:29] eventually labsdbs will appear with multiple channels like https://icinga-admin.wikimedia.org/cgi-bin/icinga/status.cgi?host=dbstore1002&nostatusheader << that's analytics-store [10:13:03] ok, let me talk to the team and see how they want o monitor this best, if you are to setup alarms will it be OK for our team to receive them? (just to be informed, we obviously cannot take any action) [10:13:13] hmm, i've passed icinga-admin urls. that may not help you much [10:13:46] i see them cause i have icinga permits from EventLogging [10:13:57] excellent :) [10:14:07] yes, you guys can be notified if you wish [10:17:02] nuria: i think we could expose slave lag in a table. we've recently started information_schema_p on labsdb [10:17:20] springle: that would be excellent for us [10:17:40] as we could query the table and make sure not run "current day" reports [10:18:01] our daily runs backfill and yesterday reports, if left empty, will get backfilled tomorrow [10:18:28] springle: should i write a bug describing teh use case to create teh table? [10:18:39] sorry, "the use case for which [10:18:58] yes [10:18:58] it will be great to have a table to consult replication lag" [10:19:02] assign to me [10:19:10] ok, what project should it be under? [10:19:23] no idea :) pick something [10:19:42] ok, analytics then [10:24:25] Analytics / General/Unknown: Create a table in labs with replication lag data - https://bugzilla.wikimedia.org/69463 (nuria) NEW p:Unprio s:normal a:None I am creating this bug at the request of springle. It will be very useful to be able to consult replication lag on a table with wide acc... [10:25:24] Analytics / General/Unknown: Create a table in labs with replication lag data - https://bugzilla.wikimedia.org/69463 (nuria) a:Sean Pringle [10:25:41] many thanks again springle for your help [10:28:52] :) [10:29:41] (PS7) Nuria: Removing usage of celery chains from report scheduling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150475 (https://bugzilla.wikimedia.org/68840) (owner: Milimetric) [10:35:23] Analytics / General/Unknown: Create a table in labs with replication lag data - https://bugzilla.wikimedia.org/69463#c1 (nuria) We schedule reports by project and i imagine replication will reported per host, not per project so a global measure of how replication is working on the labs cluster will be... [12:10:17] ohai Ironholds_ [12:10:48] Ironholds_: wanted to let you know that the URL format for Mobile App requests will change super slightly - action=mobileview will come before format=json, but that's it. [12:13:07] (PS6) Yuvipanda: Show only latest run of query in queries list [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [12:13:16] (CR) jenkins-bot: [V: -1] Show only latest run of query in queries list [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [12:23:53] (PS7) Yuvipanda: Show only latest run of query in queries list [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 [12:30:54] (CR) Yuvipanda: [C: 2] Show only latest run of query in queries list [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [12:31:02] (CR) Yuvipanda: [C: 2] Move check_sql into QueryRevision model [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153620 (owner: Yuvipanda) [12:31:07] (Merged) jenkins-bot: Move check_sql into QueryRevision model [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153620 (owner: Yuvipanda) [12:31:11] (Merged) jenkins-bot: Show only latest run of query in queries list [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153626 (owner: Yuvipanda) [12:53:08] Analytics / Tech community metrics: Remove severity related graphs from bugzilla_response_time.html - https://bugzilla.wikimedia.org/69179#c1 (Quim Gil) NEW>PATC This should do it. https://github.com/Bitergia/mediawiki-dashboard/pull/52 [13:04:58] Analytics / Tech community metrics: Gerrit metrics: details about review queues - https://bugzilla.wikimedia.org/58428#c3 (Quim Gil) ASSI>RESO/WON p:Normal>Lowest After using http://korma.wmflabs.org/browser/gerrit_review_queue.html on a daily basis, I think it already offers the informatio... [13:13:38] Analytics / Tech community metrics: Wrong data at "Update time for pending reviews waiting for reviewer in days" - https://bugzilla.wikimedia.org/68436#c6 (Quim Gil) Any idea of what is happening with DataValues? Also, projects like Parsoid, SmashPig, and gerrit.wikimedia.org_integration_docroot appea... [13:21:22] (PS1) Yuvipanda: Use Unicode, not String [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153785 [13:27:00] yoooo [13:27:03] (PS2) Yuvipanda: Use Unicode instead of String and force utf8 connections [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153785 [13:27:03] qchris: :) [13:27:08] ottomata: :-) [13:27:14] (CR) Yuvipanda: [C: 2] Use Unicode instead of String and force utf8 connections [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153785 (owner: Yuvipanda) [13:27:19] (Merged) jenkins-bot: Use Unicode instead of String and force utf8 connections [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153785 (owner: Yuvipanda) [13:27:20] mornin [13:27:27] mornin [13:27:32] do you think it would be better or worse to filter eventlogging by: [13:27:40] a. using grep before awk? [13:27:53] or [13:27:53] b. filtering in the awk scripts? [13:28:00] (some filtering is already done in the awk scripts) [13:28:36] I thought about using grep before awk. But then we'd have to teach grep about the columns (Referer column might get in the way. Unlikely ... but still) [13:28:50] ah, true [13:28:50] Yes, we can filter in the awk scripts. [13:29:00] But it turned out that analytics people do not like awk. [13:29:07] haha, you mean...dan? [13:29:09] So I'd prefer to keep awk usage to a minimum. [13:29:56] hello analytics people! i'm going to bring my sed & awk book for you! [13:29:57] to SF [13:30:15] ottomata: Not sure about him ... but! The less languages, the better. [13:30:25] And grep is pretty anywhere already. [13:30:27] yeah, true [13:30:28] i'm fine with it [13:30:29] cool [13:30:50] awk really is pretty handy though, i'm not sure what out there is better than awk for awk's purpose :p [13:30:57] * qchris likes awk a lot! [13:31:16] Well there is xml + xslt :-) [13:31:40] hah [13:31:43] uh [13:32:07] your statement that that exists is true. [13:32:17] Please ... could someone go "Yes, totally qchris. xml + xslt just rocks!" ? [13:32:37] um, for stream parsing text into fields? [13:33:00] haha, qchris must really like oozie then...:) [13:33:00] Meh. No love for it these days :-) [13:33:12] ottomata: I do! [13:35:57] (PS10) Ottomata: Add Oozie bundle for Icinga monitoring of webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [13:36:31] qchris: i meant to submit that yesterday, but for some reason didn't! [13:36:56] :-) [13:54:28] ottomata: does Icinga distinguish between a service's name and its description? [13:54:31] So: [13:54:33] hive_partition_webrequest-bits [13:54:35] vs. [13:54:41] Raw webrequest bits data imported into HDFS and Hive. [13:54:41] ? [13:54:59] Because the send_nsca man page says that one should use the description [13:55:15] But we call it 'name' and use 'hive_partition_webrequest-bits' [13:55:22] hm, i believe so... [13:55:23] https://gerrit.wikimedia.org/r/#/c/151963/3/manifests/role/analytics/refinery.pp [13:55:27] let me check the icinga confs [13:55:52] The 'puppet freshness' service has [13:56:02] uppercase P in puppet in the description name [13:56:13] and the submit_check_result also uses upper case P. [13:56:23] Other than that, I could not find passive checks. [13:56:55] i think you are right [13:57:01] the 'name' here is for puppet only [13:57:02] not for icinga [13:57:24] But that would mean a new patch set for change 151963 [13:57:30] I can merge the refinery part. right? [13:59:09] hm, i think i want to change the name of the argument then [13:59:14] to match send_nsca [13:59:19] Ok. [13:59:24] and change the arguemnt description in oozie to not refer to puppet [13:59:47] No CR+2 then :-( [13:59:57] ha, s'ok, will get that in in a sec [14:12:12] (PS11) Ottomata: Add Oozie bundle for Icinga monitoring of webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [14:15:44] (PS1) Yuvipanda: Remove QueryRepository, use SQLAlchemy directly [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153794 [14:30:56] (PS1) Yuvipanda: Remove QueryRevisionRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153796 [14:30:58] (PS1) Yuvipanda: Remove QueryRunRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153797 [14:31:00] (PS1) Yuvipanda: Remove UserRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153798 [14:31:02] (PS1) Yuvipanda: Fix NPE when creating a new query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153799 [14:31:04] (CR) jenkins-bot: [V: -1] Remove QueryRevisionRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153796 (owner: Yuvipanda) [14:31:07] (CR) jenkins-bot: [V: -1] Remove QueryRunRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153797 (owner: Yuvipanda) [14:31:11] (CR) jenkins-bot: [V: -1] Remove UserRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153798 (owner: Yuvipanda) [14:31:55] (CR) jenkins-bot: [V: -1] Fix NPE when creating a new query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153799 (owner: Yuvipanda) [14:32:28] (PS2) Yuvipanda: Remove UserRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153798 [14:32:30] (PS2) Yuvipanda: Fix NPE when creating a new query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153799 [14:32:32] (PS2) Yuvipanda: Remove QueryRevisionRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153796 [14:32:34] (PS2) Yuvipanda: Remove QueryRunRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153797 [14:37:53] Analytics / General/Unknown: Create a table in labs with replication lag data - https://bugzilla.wikimedia.org/69463#c2 (nuria) Please note that this table needs to exist on the labs side, not on the production side. [14:55:13] (CR) Yuvipanda: [C: 2] Remove QueryRepository, use SQLAlchemy directly [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153794 (owner: Yuvipanda) [14:55:16] (CR) Yuvipanda: [C: 2] Remove QueryRevisionRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153796 (owner: Yuvipanda) [14:55:19] (Merged) jenkins-bot: Remove QueryRepository, use SQLAlchemy directly [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153794 (owner: Yuvipanda) [14:55:21] (CR) Yuvipanda: [C: 2] Remove QueryRunRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153797 (owner: Yuvipanda) [14:55:23] (Merged) jenkins-bot: Remove QueryRevisionRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153796 (owner: Yuvipanda) [14:55:25] (CR) Yuvipanda: [C: 2] Remove UserRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153798 (owner: Yuvipanda) [14:55:29] (Merged) jenkins-bot: Remove QueryRunRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153797 (owner: Yuvipanda) [14:55:31] (CR) Yuvipanda: [C: 2] Fix NPE when creating a new query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153799 (owner: Yuvipanda) [14:55:33] (Merged) jenkins-bot: Remove UserRepository [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153798 (owner: Yuvipanda) [14:55:38] (Merged) jenkins-bot: Fix NPE when creating a new query [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153799 (owner: Yuvipanda) [15:06:28] hola springle [15:43:20] (PS2) Milimetric: [WIP] Ensure wikimetrics session is always closed [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [15:43:40] (PS2) Milimetric: Fix slow Rolling Active Editor metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/149482 (https://bugzilla.wikimedia.org/68596) [15:48:00] (PS3) Milimetric: Fix slow Rolling Active Editor metric [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/149482 (https://bugzilla.wikimedia.org/68596) [15:49:13] (CR) Milimetric: Fix slow Rolling Active Editor metric (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/149482 (https://bugzilla.wikimedia.org/68596) (owner: Milimetric) [15:51:45] YuviPanda: awesome you're talking to Sean about making event logging public, heartfelt +2 [16:15:02] (PS3) Milimetric: [WIP] Ensure wikimetrics session is always closed [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [16:16:26] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c10 (nuria) Need to look at IP ranges as I looked at languages and wikipedia's for geographic commonality and that might not be the best. [16:19:46] nuria: I'm working on merging your "removing the chain" patch [16:19:52] k [16:19:57] great thank you [16:20:09] let me know if there is something i should do [16:20:12] I'll upload a new patchset, it's a million times easier than explaining - but feel free to revert, just a sec [16:20:35] (PS8) Milimetric: Removing usage of celery chains from report scheduling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150475 (https://bugzilla.wikimedia.org/68840) [16:20:46] nuria: ^ you can take a look [16:21:12] https://gerrit.wikimedia.org/r/#/c/150475/7..8/wikimetrics/schedules/daily.py [16:22:02] milimetric: did you run tests? cause w/o teh signature tests were failing [16:22:21] yeah, the parallel_reports test ran fine [16:23:06] it must've been something else, that signature thing is just another syntax to use, and it's usually used as shorthand when you're passing tasks around [16:23:14] but let me know if tasks don't run for you [16:23:32] can you run manual tests? [16:23:49] yeah, I ran it, I was saying above [16:24:02] I'm running it again, just in case it's nondeterministic :) [16:28:21] milimetric, YuviPanda, are you subscribed to wikidata mailing list? http://lists.wikimedia.org/pipermail/wikidata-l/2014-August/004293.html [16:29:04] i think we should allocate either data:graph: or graph: namespace on commons :) [16:36:12] nuria: btw, the tests run fine still, do they not run for you? [16:37:48] milemtric, let me run them again with patches 7 and 8 [16:37:55] sorry milimetric [16:39:30] they do run milimetric so i guess were good [16:39:45] *we're [16:39:49] k, cool, i'll rebase and merge [16:39:57] (PS9) Milimetric: Removing usage of celery chains from report scheduling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150475 (https://bugzilla.wikimedia.org/68840) [16:40:06] (CR) Milimetric: [C: 2] Removing usage of celery chains from report scheduling [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150475 (https://bugzilla.wikimedia.org/68840) (owner: Milimetric) [16:41:21] woa.... that merged into my session thing WITHOUT ISSUES. What?! :D [16:42:21] yurikR: I'm not subscribed to anything right now, I feel hugely overwhelmed with email [16:42:41] but I agree, a "graph:" namespace on commons would be great [16:43:48] milimetric, hehe, i hear you, agree, this way it will contain lots of embeddable graphs ( ) [16:44:24] or [16:51:41] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c11 (nuria) I take my prior comment back, this looks like a proxy issue not client issue. Data below for requests that match "orghttp" in the month of August thus far in zero, mobile and sample... [16:58:47] qchris: oops, meant to ping you here [16:58:56] Ok. That channel it is. [16:59:28] ok so readY? [16:59:28] https://gerrit.wikimedia.org/r/#/c/152050/ [17:00:03] I've been in meetings/away since you uploaded that. Let me have alook again. [17:00:47] k [17:00:57] mostly just change variable name and documentation [17:04:35] (CR) QChris: [C: -1] Add Oozie bundle for Icinga monitoring of webrequest datasets (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [17:05:53] (CR) QChris: Add Oozie bundle for Icinga monitoring of webrequest datasets (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [17:07:53] (PS12) Ottomata: Add Oozie bundle for Icinga monitoring of webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [17:08:32] (CR) QChris: [C: 2 V: 2] Add Oozie bundle for Icinga monitoring of webrequest datasets [analytics/refinery] - https://gerrit.wikimedia.org/r/152050 (owner: QChris) [17:08:53] Is there anything I can do on the puppet part of it too? [17:11:07] not sure! [17:11:19] Meh. I could only nag on tabs. vs. spaces. [17:11:24] The rest I do not understand. [17:11:59] oo that was copy pasted, looks like that file is not consistent anyway :( [17:12:14] :-D [17:12:22] qchris: interested in an explanation? [17:12:47] Sure. Up to now, I think I only have a rough clue. [17:12:54] I'd love to understand more of it. [17:13:09] so, the monitor_service does some fancy puppet magic to get icinga config files on the icinga host [17:13:25] the config files set up a nagios_service with the provided parameters there [17:13:27] So whenever the freshness is no longer met, the analytics_cluster_data_import-FAIL is run. And oozie updates the freshness, right? [17:13:53] yes, that's right [17:13:56] you got it :) [17:14:15] Oh. Ok. :-) [17:14:44] i'm going to get the oozie part running first [17:14:51] Nope. [17:14:58] We need send_nsca on the data nodes. [17:15:05] That is in the puppet part :-) [17:15:32] (At least it screamed at me when I tried) [17:15:46] qchris: i realized that for location i had checked the language+wiki before but not the IP (i know, retarded) so yes, there is commonality of IPs [17:16:12] oh, yes, hm, send_nsca must be there, true [17:16:16] the icinga stuff doesn't have to be set up though [17:16:19] ok, puppet first [17:16:56] oh, I am not yet including that ::check class anyw=here [17:16:56] cool [17:17:04] so we can just merge this nad get send_nsca installed [17:17:19] Sounds great :-) [17:18:19] nuria: If you're looking on the /zero/ tsvs, common IPs are somewhat expected, as the traffic (mostly) comes through carrier IPs, or Opera IPs. [17:19:15] nuria: If you want to disambiguate, you can use the X-Forwarded-For header [17:19:18] qchris: but the "overall common" and the "bad" common do not match [17:20:08] ok. [17:26:11] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c12 (nuria) IPs with most issues in zero are not the most used IPs so, again, this points to a proxy issue. [17:29:11] Analytics / Wikimetrics: Wikimetrics can't run a lot of recurrent reports at the same time - https://bugzilla.wikimedia.org/68840#c6 (nuria) We removed chains to simplify and be able to better test our code, the bigest gain on performance however comes from the migration of labs db hosts to maria db.... [18:07:34] yurikR: that's super interesting! [18:07:45] yurikR: I want to get it deployed on meta as well, so the research graphs can use this [18:08:19] YuviPanda|groggy, i don't mean it will only be available on commons - rather this should be the storage spot [18:08:30] yurikR: right. that'll be doubly awesome [18:08:50] but it will not preclude it from running (and storing) data on other wikis [18:08:55] yurikR: although I'm wary of storing laaarge amounts of data with ContentHandler [18:09:12] YuviPanda|groggy, want to +2 a few things? deploying it to prod now (not enabling yet) [18:09:24] w00t sure [18:09:25] link me [18:09:35] YuviPanda|groggy, [18:09:36] https://gerrit.wikimedia.org/r/#/c/153840/ [18:10:12] yurikR: done [18:10:24] YuviPanda|groggy, working on wmf16 patch, sec [18:10:25] yurikR: that just enables branching, right [18:10:35] ?? [18:11:00] the patch I just merged :) [18:11:12] that auto-adds extension to the new branches generated [18:11:21] which means it won't touch 16 [18:11:24] doing it manually [18:11:25] right [18:11:30] but doesn't check them out in prod or anything [18:11:52] yurikR: by deploying it you mean only on zerowiki, right? [18:12:27] YuviPanda|groggy, not even there - it will be on the servers, but not enabled anywhere [18:12:40] makes it much easier to enable it via configs first on beta, etc [18:12:45] ah cool [18:13:18] although maybe i should enable it on zerowiki since that's where i will be playing with it the most at first [18:13:22] will see if i have enough time [18:14:24] yurikR: \o/ cool [18:18:18] (PS1) Yurik: Cleaned up log parsing and filtering [analytics/zero-sms] - https://gerrit.wikimedia.org/r/153845 [18:18:39] (CR) Yurik: [C: 2 V: 2] Cleaned up log parsing and filtering [analytics/zero-sms] - https://gerrit.wikimedia.org/r/153845 (owner: Yurik) [18:47:22] phuedx: around for a bit of idea bouncing? [19:39:26] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c13 (Yuri Astrakhan) Nuria, are you saying one of our proxies is causing this? Or is it some common proxy software that many carriers are using that sets incorrect HOST value when forwarding req... [19:41:56] Analytics / General/Unknown: zero.log contains duplicate host in logs - https://bugzilla.wikimedia.org/69371#c14 (nuria) Well, neither. By looking at the data looks to be caused by a proxy but I do not think is "common" software as the percentage data affected seems pretty small. [19:43:49] ottomata: would you be so kind to look at this change: https://gerrit.wikimedia.org/r/#/c/153390/ [19:44:22] corresponding wikimetrics change has been merged: https://gerrit.wikimedia.org/r/#/c/150475/ [19:44:40] (change has been tested on dev) [19:45:55] hm, ok ,nuria, this seems like the type of thing you'd want to parameterize and change in role class, no? [19:46:15] by lowerin concurrency in the module, you lower the default for all users of the module, independent of whatever environement it is [19:46:36] is MAX_PARALLEL_PER_RUN no longer a proper config? [19:46:43] ottomata: that is the intention as db connection pool is limited [19:47:11] no, we no longer use MAX_PARALLEL_PER_RUN [19:47:18] hmm, ok [19:47:23] now, the setting could be parametized regardless [19:47:30] i can do that [19:47:37] ottomata, epic sql question: I select ... into outfile, without specifying the directory, and I expected it to end up in stat1003's /tmp. can you point? [19:47:38] i think it is, right? [19:47:56] naw, into outfile operates on the mysql server [19:48:03] unfortunetly [19:48:18] so, you won't have access to it that way [19:48:41] I see. so, I have to change my home permission, and write it there? [19:48:41] nuria: sorry, just merged! what more needs parameterized? [19:48:53] naw, you can't use into outfile unless you have access to the mysql server node [19:48:53] ottomata: np [19:48:54] which you don't [19:49:03] (right? double checking...) [19:49:25] ottomata: change works fine, i was just taking your suggestion to make it better [19:49:31] The http://dev.mysql.com/doc/refman/5.0/en/select-into.htmlhttp://dev.mysql.com/doc/refman/5.0/en/select-into.htmlhttp://dev.mysql.com/doc/refman/5.0/en/select-into.html form of http://dev.mysql.com/doc/refman/5.0/en/select.html writes the selected rows to a file. The file is created on the server host [19:49:52] nuria, i think you are fine with this change as is, if you intend to change the default everywhere anyway [19:49:58] you need this to remove the MAX_PARALLEL thing anyway [19:49:59] so it sgood [19:50:28] leila: read http://dev.mysql.com/doc/refman/5.0/en/select-into.html [19:50:33] scroll down to the part where it says [19:50:35] ok ottomata, will send you corresponding module change for puppet [19:50:38] statement is intended primarily to let you very quickly dump a table to a text file on the server machine [19:50:40] k [19:50:54] reading, ottomata. [19:51:08] "However, if the MySQL client software is installed on the remote machine, you can instead use a client command such as mysql -e "SELECT ..." > file_name to generate the file on the client host." [19:51:32] gotcha. trying. thanks. [20:03:46] ottomata, mysql -e "select *" > file_name results in ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) [20:04:05] is it related to mysql-server being installed or not? [20:04:55] nope, you need to specify the hostname you are trying to connect to [20:05:10] mysql -hs1-analytics... [20:05:12] or whatever it is [21:12:48] (PS1) Yuvipanda: Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 [21:12:53] (CR) jenkins-bot: [V: -1] Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:13:49] (CR) Yuvipanda: [C: 2] Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:13:53] (CR) jenkins-bot: [V: -1] Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:13:57] (CR) Yuvipanda: [C: -2] Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:14:44] (PS2) Yuvipanda: Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 [21:15:25] (CR) Yuvipanda: [C: 2] Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:15:33] (Merged) jenkins-bot: Add user page for each user [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153942 (owner: Yuvipanda) [21:33:56] Analytics / Tech community metrics: Wrong data at "Update time for pending reviews waiting for reviewer in days" - https://bugzilla.wikimedia.org/68436#c7 (Jeroen De Dauw) DataValues still has a copy on Gerrit?! This stuff was moved to GitHub ages ago https://github.com/DataValues/ [21:35:49] (PS1) Yuvipanda: Add a link to user profile in drop down [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153947 [21:36:20] (PS1) Yuvipanda: Rename 'Query Runs' to Recent Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153948 [21:37:07] (CR) Yuvipanda: [C: 2] Add a link to user profile in drop down [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153947 (owner: Yuvipanda) [21:37:12] (Merged) jenkins-bot: Add a link to user profile in drop down [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153947 (owner: Yuvipanda) [21:37:14] (CR) Yuvipanda: [C: 2] Rename 'Query Runs' to Recent Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153948 (owner: Yuvipanda) [21:37:19] (Merged) jenkins-bot: Rename 'Query Runs' to Recent Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153948 (owner: Yuvipanda) [21:39:55] (PS1) Yuvipanda: Fix bug where you seem to be the person whose profile you're seeing [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153949 [21:40:07] (CR) Yuvipanda: [C: 2] Fix bug where you seem to be the person whose profile you're seeing [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153949 (owner: Yuvipanda) [21:40:13] (Merged) jenkins-bot: Fix bug where you seem to be the person whose profile you're seeing [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153949 (owner: Yuvipanda) [21:43:06] milimetric, hey [21:43:19] hi [21:43:51] so multimedia doesn't show in the zero logs then? [21:45:03] doesn't seem so [21:45:08] (PS1) Yuvipanda: Order queries in profile by recentness of creation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153950 [21:45:14] i think its because varnish doesn't do the analysis on it [21:46:09] (CR) Yuvipanda: [C: 2] Order queries in profile by recentness of creation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153950 (owner: Yuvipanda) [21:46:14] (Merged) jenkins-bot: Order queries in profile by recentness of creation [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153950 (owner: Yuvipanda) [21:46:17] so much self merging [21:46:50] hm, so yurikR this change would affect only zero logs or it would be available everywhere, but the zero partners are interested in the figures? [21:47:06] *affect only zero requests, rather [21:47:09] this change? [21:47:18] as in, the reduced file sizes [21:47:32] its already public, we have been using it for a month or two [21:47:41] first on smaller partners [21:47:42] right, I know [21:47:50] we basically change HTML to request smaller image [21:47:54] nothing else [21:48:02] but only when it's hit from zero, right? [21:48:18] only when its a hit from a specific subset of zero partners [21:48:36] we are rapidly increasing that set [21:48:52] unified design is using small images by default [21:50:41] cool, so yurikR I assume you know about the cache log format: https://wikitech.wikimedia.org/wiki/Cache_log_format [21:50:46] and that the reply size is field 7 there [21:51:07] so basically, we have a hairy problem then [21:51:13] a. get zero request with images [21:51:23] b. find images that would be requested [21:51:29] c. find sizes of those [21:51:38] d. find size if they weren't requested through zero [21:51:41] something like that? [21:52:03] milimetric, yes hairy :) Ideally we should be marking all traffic, not just multimedia, with X-Analytics tag [21:52:29] in which case we simply group-by zero [21:52:32] right [21:52:36] and divide by image count [21:53:25] otherwise we would have to do a ton of these weird manipulations :) [21:53:38] yep, I don't see a way around it right now [21:54:02] you mean to analyze backwards or to do it at all? [21:54:03] what ori was saying - to just curl and get the size, that would still mean you need to know what images are being requested from zero [21:54:28] i don't see a simple hack is what I mean [21:54:41] yes, i did that analysis once - tons of work, cannot easily repeat the test, and practically useless [21:54:49] yeah, it sucks [21:55:04] let me get bblack [21:55:04] yurikR: let me brainbounce with the europe folks tomorrow morning and we'll see if we can think of something better? [21:55:30] i really think we should start marking all traffic, not just text [21:56:00] ops will be happy [21:56:14] milimetric, another major issue, much more pressing [21:56:19] what do i do with tons of logs [21:56:28] where should i put them on stat1002 [21:56:39] those 4GB you were talking about? [21:57:09] yurikR: ^ [21:57:14] yep [21:57:24] and a small python script to go with it [21:57:39] btw, that python script needs a number of libs, e.g. S3 access [21:57:45] 4GB is really small compared to the stuff on there [21:57:57] location? [21:57:57] so I wouldn't worry too much, /a/ is the usual spot [21:58:01] cron? [21:58:08] root for me [21:58:09] :) [21:58:11] cron would need to be puppetized, do you know where? [21:58:29] is it in a separate puppet repo? [21:58:32] from the main prod? [21:58:43] no, operations/puppet, lemme point you to the spot though, or at least an example [21:59:42] yurikR: https://git.wikimedia.org/blob/operations%2Fpuppet/a64123c2abaa4934c131b2a760bf8619e5d5ea6e/manifests%2Fmisc%2Fstatistics.pp#L246 [21:59:50] but wait, yurikR, what exactly are you doing there? [22:00:06] doing where? :) [22:00:11] i download stats from S3 [22:00:17] like, what's the cron doing, what stats are these, etc [22:00:19] logs from our SMS partner [22:00:22] sorry if you already talked this over [22:00:25] np [22:00:30] and parse them [22:00:59] would labs not be ok to do this? [22:01:10] need high sec [22:01:13] gotcha [22:01:28] well, a lot of people have access to stat1002 though, but they all are NDA [22:01:44] 1) download new files, 2) combine them into one 3) sort/uniq 4) process into stats 5) generate dashboards 6) profit [22:02:05] though, not sure how "high" the sec you need, if our NDA is not enough, you can set permissions restrictively [22:02:20] that's fine, our partner salt+hash phone numbers [22:02:42] and search strings are no different from regular logs [22:02:54] can you reach S3 from stat1002? [22:03:39] hm, i can't ping google, so probably not [22:05:58] milimetric, stat1002 is bolted down from external access? wow [22:06:05] kinda makes sense, but still [22:07:04] yeah... hm... [22:07:20] yurikR: as usual, I think this is a new usecase [22:07:27] these machines are typically locked down like this [22:07:48] DarTar: when you folks have to analyze external data, how do you get it on the stats machines in prod? [22:08:33] hey milimetric: give me a few mins, brb [22:08:35] np [22:09:19] DarTar: yurikR was asking, I have to bounce in a few minutes, he's gotta grab private data and analyze it somewhere. So if you guys have a way of doing that, just let him know [22:10:06] np, chat soon :) and milimetric, i'm actively pushing the graph ext out, all the depl patches are ready, pending sec review [22:10:31] so we should start making it pretty [22:10:35] by default [22:10:37] yurikR: worst case, you can use labs and encrypt everything with a secret key. Should be fairly hard to crack that [22:11:04] ok yurikR, do you have examples of graphs that you're going to write on top of it? [22:11:31] maybe make a few pages and point me to them, I'll work on making them as pretty as possible without fancy interaction for now [22:13:26] the way one would do "hover" in vega world is pretty manual: paint a transparent layer over the whole graph, then grab x/y coords on mouseover and do the right thing. For now the simplest way would be to extend the vega spec and have some declarative hover setup that we interpret [22:14:04] milimetric, i'm planning to do a limn dashboard replacement as the very first step [22:14:15] since vega is much easier to host on zerowiki inside a page [22:15:06] milimetric, in reality, the simler, the better. That's what i kinda liked about limn from the dashboard perpective - through some data, and it looks pretty ) [22:15:16] yurikR: sounds like we'd need a template for the vega json then [22:16:05] when it's deployed, or up somewhere to test, let me know and we'll work on the first graph together [22:16:28] and you can teach me how to make templates and I can fiddle with it until it's prettier [22:19:59] milimetric, are you talking about CSS for it? [22:20:10] because you won't be able to change JS as part of the data [22:20:28] right, no JS, just the vega definition [22:21:19] hey yurikR, milimetric [22:21:37] hi DarTar [22:21:48] milimetric, we could do js as part of the extension [22:22:07] yurikR: mind summarizing the thread? I had to disconnect and missed parts of it [22:22:20] yeah, yurikR, sure, I mean, making the graph interactive will need to be done on the extension [22:22:20] which part - we covered 3 issues :) [22:22:28] making it prettier can be done solely with the Vega definition for now [22:22:39] milimetric, do you have vagrant? [22:22:46] yurikR > when you folks have to analyze external data, how do you get it on the stats machines in prod? [22:23:20] yurikR: yes, I think it would help to have a public hosted wiki of this so we can collaborate [22:23:31] I put one up in labs but porno spammers... [22:23:41] DarTar, right - our external partner stores logs on amazon cloud. I have a python script to pull it and parse it and call it George... I meant generate dashboards [22:24:19] milimetric, that's easy on labs. I had an instance somewhere [22:24:25] will revive it [22:24:45] will email about it later tonight [22:25:00] yurikR: so the question is where to analyze that data in prod? [22:25:07] and store [22:25:17] currently its on my laptop ) [22:25:25] not good really [22:25:29] how about import it to stat1003 [22:25:32] crunch it there [22:25:51] and if the aggregates can be shared publicly have them rsync’ed to stat1001 [22:25:52] DarTar, does stat1003 has 1) access to outside world 2) python 3) backup? [22:26:17] aggregates should have limited sharability - they are public but hidden [22:26:20] sorry guys i gotta bounce, good luck [22:26:30] thx milimetric will ping you later [22:26:40] oh I thought you wanted just to upload it somewhere for the purpose of analysis [22:27:01] so, yes: stat1003 has python [22:27:03] DarTar, that python script does everything including the analysis [22:27:12] and downloading (with the help of s3 lib) [22:28:04] and yes you should be able to fetch data from wherever it lives [22:28:53] in terms of backup, I keep backups of my own code and data or have it in a repo so I don’t know the level of backup support you would need [22:29:18] as to where to host private that, that’s a trickier one [22:29:38] DarTar, host aggregates or logs? [22:29:39] qchris set up a few places in prod that are password protected [22:30:12] I imagine you would like to give selected people access to dashboards or processed/aggregate data, right? [22:30:29] i don't need password-protected area atm, because soon we will store aggregates on zerowiki [22:30:44] and i will have full control over that [22:31:06] re backups - does stat1003 data gets backed up [22:32:11] public data gets rsynced to stat1001 and there’s a few directories that are also replicated, you should ask the devs about what gets backed up where systematically [22:32:46] thx DarTar, might work. who manages it? [22:33:27] ottomata is the person you want to talk to [22:33:56] yurikR: the rsyncing DarTar's talking about is the piece of puppet i pointed you at above [22:35:22] oh, excellent. I will take a look at that puppet in depth. So it seems stat1003 is a better alternative to stat1002 as it has external access [22:35:46] just need to figure out how to put python libs that i need there, test it and run :) [22:35:56] and add do the cron magic (haven't done that yet) [22:39:46] (PS1) Yuvipanda: [WIP] Let users star and unstar queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153963 [22:39:51] (CR) jenkins-bot: [V: -1] [WIP] Let users star and unstar queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153963 (owner: Yuvipanda) [22:41:47] (PS2) Yuvipanda: [WIP] Let users star and unstar queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153963 [22:41:52] (CR) jenkins-bot: [V: -1] [WIP] Let users star and unstar queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/153963 (owner: Yuvipanda) [23:07:56] Analytics / Wikimetrics: Story: Community has documentation on chosen dashboard architecture and alternatives - https://bugzilla.wikimedia.org/67125 (Kevin Leduc) p:High>Low [23:10:26] Analytics / Wikimetrics: Story: EEVS user does not see reports for projects without databases - https://bugzilla.wikimedia.org/69297 (Kevin Leduc) p:Normal>Highes [23:13:58] Analytics / Wikimetrics: Story: EEVSuser has agregate metrics - https://bugzilla.wikimedia.org/68193 (Kevin Leduc) p:Normal>Low