[00:27:00] milimetric, YuviPanda|brb http://graphtest.wmflabs.org/wiki/Main_Page [00:27:18] yay, cool [00:27:31] so yurikR, what's the first graph you want to setup [00:27:40] like, set of timeseries I mean [00:29:22] milimetric, something like that, so i was looking at stocks one http://trifacta.github.io/vega/editor/index.html?spec=stocks [00:29:22] very cool. I just saw http://graphtest.wmflabs.org/wiki/Graph:Test and clicked edit. [00:29:38] kevinator, yep, that's only the very basic uses [00:29:39] :) [00:30:16] basically i would love to have something like http://trifacta.github.io/vega/editor/index.html?spec=stocks [00:30:42] (which is the most similar to our current LIMN dashboard) [00:31:16] yurikR: so we gotta solve the data problem [00:31:20] as for the data, we can store the data as CSV/TSV/JSON directly in the wiki [00:31:32] as separate pages (which is one way) [00:31:34] er... how are we going to update it then, manually? [00:31:39] api :) [00:31:56] good point api man :) [00:31:58] it takes a few lines of requests-based python script to upload a wiki [00:32:04] what about specifying the URL to the data on the wiki? [00:32:16] that would violate origin policy kevinator [00:32:21] and we could get around it with CORS [00:32:21] which btw is much easier than dealing with rsyncing and private keys [00:32:29] exactly my thoughts :) [00:32:49] yurikR: CORS may be easier in this case though, no? [00:33:06] milimetric, easier than what? [00:33:13] it all depends on where the data lives [00:33:18] than changing how the data is generated [00:33:23] or wait, were you going to do that anyway... [00:33:39] like, is this data going to live on the limn server still? [00:33:41] it can be on the same wiki, or on a separate wiki that is part of our CORS domain [00:33:56] oh are you computing this separately and storing it somewhere else? [00:34:10] i was planning to put all data on the same server [00:34:22] but we sohuld also allow cases like enwiki pulling data from commons [00:34:44] right, I think that'd be fine... lemme try external URLs with CORS I know works [00:35:04] milimetric, not sure if wmflabs is part of cors [00:35:16] it's not yurikR [00:44:21] milimetric, http://graphtest.wmflabs.org/wiki/Graph:Stocks [00:51:15] yurikR: sorry I'm failing very sadly to do anything useful, I've gotta get back to banging my head against this multithreading bug thing [00:51:36] np, just wanted to share it [00:51:37] but I recommend figuring out where you'll put the data [00:51:45] milimetric, take a look at the graph [00:51:50] its already stored separately [00:51:56] http://graphtest.wmflabs.org/wiki/Graph:Stocks [00:52:08] the data is actually in http://graphtest.wmflabs.org/wiki/Data:Csv:Stocks [00:52:15] this is the simplest example [00:52:31] i did, yeah, i mean, your format will be different though [00:52:31] i might implement an api module to supply the data [00:52:45] i was trying to do it with the wikimetrics format but i'm having json parsing errors [00:52:50] true, but i don't think it will be substantially different [00:53:02] a few columns that will need to be grouped by [00:53:08] yep, exactly, which is why i don't think my separate example will be very useful [00:53:20] which separate example? [00:53:31] i was trying to render the wikimetrics format with it [00:53:36] but i'm giving up :) [00:53:45] don't worry about data formats - that should be trivial [00:53:58] i will eb able to generate any format needed [00:53:58] :) [00:54:05] CSV, TSV, json [00:54:07] whichever [00:54:08] yep, no that's totally fine [00:55:13] what really troubles me is how to implement styling :( [00:55:18] like making it cute [00:55:32] well, what do you like from limn [00:55:33] if you can make that stock graph look good, that would be all i need ) [00:55:36] just the interactivity or more? [00:55:45] grids? [00:55:52] fonts? [00:56:09] interactivity would be ultimate, but in reality, just having the graph be more lively - ilke showing current point vaule [00:56:20] i think that instantly grants most of the value [00:56:36] current point, you mean like the last in the series? [00:56:37] i guess that is interactivity :) [00:56:44] oh you mean what you hover over [00:56:47] yes [00:56:55] yeah, that's a couple days of work sadly [00:56:57] and show its value somewhere on the side [00:57:19] k, that's ok, that's not too much to ask [00:57:28] i'll do that as soon as I can [00:57:33] I've gotta do it next month anyway [00:57:34] no worries :) [00:57:39] might as well do it sooner :) [00:57:48] thx :) [00:58:20] ideally, it should have the "feel" of limn - a graph which is well layed out - good colors, nice bold strokes, etc [01:00:01] its kinda funny - i can't really figure out what is the "good style" that we all like in limn. I just like it from the design perspective [01:00:11] i shall play with the style every chance I get and try to do the interactivity every weekend from now on [01:00:20] yeah, david is a good designer [01:00:36] milimetric, no no no, down time is more important! [01:00:58] says the man who just took on 100 man years worth of work [01:01:00] :P [01:01:01] who is david, maybe he will want to poke at it before we put it into production :) [01:01:26] meh, i am off next week for the BM [01:05:15] not next - the one after [01:13:23] yurikR: http://graphtest.wmflabs.org/wiki/Graph:Stocks [01:13:35] david is david schoonover, and he's not been around for a while [01:13:41] milimetric, uuu!!! :) [01:13:54] that's as far as we can get with just style I think [01:13:58] so now, interactivity [01:14:06] but you see no CSS was needed, it's all vega [01:14:14] already looks awesome!!! [01:14:20] and I don't know how mediawiki works at all but I assume we can make a template of some sort with this boilerplate? [01:14:26] which kinda makes me worried :) [01:14:33] seriously right? [01:14:40] I'm like the least wiki employee [01:14:53] one of these days I fell like I'll start editing and never stop [01:15:02] because if so much can be done with vega, this might be a big security problem [01:15:10] not at all yurikR [01:15:18] it even has an expression parser [01:15:22] exactly [01:15:24] and will just choke if anything weird comes into it [01:15:28] so it's completely safe [01:15:40] we don't want it to allow raw javascript execution there [01:15:44] I've read through the source, not possible [01:15:51] that's only for admins [01:16:03] I mean, there are bugs etc. but I deem it safe after a fairly close reading [01:16:11] their architecture is very far away from "eval" [01:16:19] hope so :) [01:16:28] because otherwise cstepp will kill me [01:16:35] take a look at the diff if you want, notice the properties are set with "value": "blah" [01:16:43] btw, any way to make a legend there? [01:16:46] i did [01:17:02] yes, legends are built in, but they're a bit weird in that they hover over the thing [01:17:09] I'll work on that when I add interactivity maybe [01:17:37] thanks! you trully rock :) [01:17:55] pretty soon this data will come from some magic database in the background ;) [01:18:08] and will will open up our datacenter for any data storage :) [01:18:14] will call it massivewiki [01:18:31] ppl will upload data (or even stream it) [01:18:59] i can just see stock prices and weather readings streaming in live on various web pages ;) [01:19:04] wiki pages [01:19:09] interactivitiy! [01:19:29] (not what you are working on, the realtimeness is what i meant) [01:20:02] oh yea, vega's actually really cool about this [01:20:10] we can add live data very easily [01:20:18] you basically save the view that it creates [01:20:32] then go view.data(newStuff).update({duration: 300}) [01:20:36] and it does a nice transition [01:21:08] yes, but my point is that we can start storing large amounts of data - like statistical stuff [01:21:21] yeah, I've thought about the data problem a lot [01:21:24] if its in publ domain that is :) [01:21:31] ideally ideally that would all be on wikidata [01:21:42] and this extension would know how to query wikidata [01:21:48] possibly, but i feel that wikidata format is substantially different [01:21:53] and I have some ideas about steps from here to there [01:21:54] massive data would need a separate domain [01:22:10] they tell everyone that they're not working on massive data at all [01:22:14] wikidata is all about "one piece of data per edit" [01:22:21] yeah, but I talked to them at length [01:22:23] and I think that's coming [01:22:24] like changing a birthdate of a person [01:22:26] just not anytime soon [01:22:30] so they don't want to get people's hopes up [01:22:35] which is smart and great [01:22:44] but like, long long term - 3 years - that'd be amazing [01:22:51] perhaps, its just that we would want historical changes, and that's kinda hard for large datasets [01:23:02] true true :) [01:23:15] yeah, that's why in the meantime I'm all about data warehousing and getting all our data centralized somewhere [01:23:27] public but I'm thinking wikifying it would add too much friction [01:24:03] so that works for analytics stuff but for stuff that people edit... harder problem [01:24:15] anyway, i gtg [01:24:19] take it easy man [01:24:54] thanks! [01:24:57] later :) [03:52:07] (PS1) Yurik: Refactored sms & web parsing to common base class [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154230 [03:52:21] (CR) Yurik: [C: 2 V: 2] Refactored sms & web parsing to common base class [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154230 (owner: Yurik) [04:35:17] (PS1) Yurik: Proxy support for S3 [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154232 [04:41:48] (PS2) Yurik: Proxy support for S3 & API [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154232 [04:42:02] (CR) Yurik: [C: 2 V: 2] Proxy support for S3 & API [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154232 (owner: Yurik) [04:54:56] Analytics / Tech community metrics: Wrong data at "Update time for pending reviews waiting for reviewer in days" - https://bugzilla.wikimedia.org/68436#c8 (Andre Klapper) (In reply to Jeroen De Dauw from comment #7) > DataValues still has a copy on Gerrit?! Was it ever requested somewhere to remove it... [05:31:10] (PS1) Yurik: minor - smslog reporting, cleanup [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154233 [05:31:29] (CR) Yurik: [C: 2 V: 2] minor - smslog reporting, cleanup [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154233 (owner: Yurik) [05:54:19] (PS1) Yurik: Safer cache saving, sms log messages [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154235 [05:54:33] (CR) Yurik: [C: 2 V: 2] Safer cache saving, sms log messages [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154235 (owner: Yurik) [07:23:15] (PS1) Yurik: old requests lib workaround [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154237 [07:23:32] (CR) Yurik: [C: 2 V: 2] old requests lib workaround [analytics/zero-sms] - https://gerrit.wikimedia.org/r/154237 (owner: Yurik) [09:35:47] springle, hola, can you give me an idea on whether the "lag table" is a project that will happen in the next month or next quarter (either one is fine) so as to plan our interim solution? [09:45:41] and another question springle, how many concurrent connections will be appropiate for us to configure to hit for example, "enwiki" for a single client? [10:07:24] nuria: within this quarter [10:07:37] ok, great. [10:07:50] what sort of connections? persistent? and what sort of traffic [10:09:06] queries would be similar to: http://pastebin.com/bPhUc3PD [10:09:50] springle: but "scheduled" , as in "this many reports (each query being a report) run at the same time " [10:11:11] oh dear [10:11:18] :) [10:11:30] haha [10:11:50] running many of these simultaneously would not be fantastic. spacing them out, staggered say, would be wie [10:11:52] springle: any suggestions you might have please send them along! [10:11:53] wise* [10:12:50] how many are "many"? 10? 100? [10:12:58] 2 [10:13:35] wow, from our tests they run super fast (sql might be slightly different, give me a sec) [10:13:48] well, that's not fair. many with the EXPLAIN plan i see for that one would cause problems due table scans [10:13:53] where are you testing? [10:14:26] http://pastebin.com/3bg1rQzG [10:14:59] i tested it in labs, after the migration [10:15:49] can you give me one that will execute? without the @T and @n [10:16:03] with roughly expected ranges [10:17:20] n=5 [10:17:25] t= 1 month [10:17:40] Makes sense? [10:18:17] Rather interval is "30 days" [10:21:07] now define "super fast"? [10:22:57] i wrote numbers here: https://www.mediawiki.org/wiki/Analytics/Editor_Engagement_Vital_Signs/Backfilling [10:23:56] backfilling a day [10:24:08] means running that query once [10:24:21] so backfilling a year means running that query 365 times [10:24:30] springle: let me know if it makes sense? [10:24:54] ok, so for future reference, your "super fast" means "glacially slow" in DBA-speak ;) [10:25:05] hangon, testing [10:26:03] haha [10:26:11] (spanish laugh) [10:26:36] You would have SAD if you had seen those numbers prior to your migration [10:26:37] this doesn't fill me with joy http://aerosuidae.net/paste/6/53eddf63 [10:26:56] effectively a full table scan on revision, twice over [10:27:23] We know we need an inrermediary table [10:28:58] but point taken, this quarter we will work in teh 1st impl of these metrics and for next quarter we need to heavily optimize the sql we have (probably by creating intermediary tables in many instances) [10:30:23] I just run several of those queries against frwiki just this morning so you might be able to see them in logs [10:42:51] nuria: to clarify, 365 queries per project, or 12 per project? [10:43:14] 365 as we run it daily , looking back every day 30 days [10:43:25] for every wiki? [10:43:53] so far we only have run them for "some" wikis [10:44:12] we will need to run it for every wiki to complete teh project [10:44:30] I think our immediate concern is with the "top 10" wikis [10:45:21] If you are asking about this morning (eu time) i just run queries against frwiki [10:55:02] nuria: for normal daily schedule with ~10 big wikis, i suggest start carefully and run them in series. enwiki takes 10min, so a couple hours to complete, right? [10:55:33] backfilling data is a different problem; a once-off job [10:56:21] pringle: the 10 minutes of enwiki come from you running teh query just now? [10:56:26] yes [10:56:43] for 1 day? [10:57:34] springle: interesting, frwiki takes <1 min for 1 day [10:57:46] that was a 30 day interval [10:58:13] (it would be much easier if you could supply working sql so we know we have the same values) [10:59:20] but as i undrstand it, yes, 1 day, looking back 30 days. 10m on enwiki. 1m on frwiki [10:59:22] Give me a sec, cause i have to get it from sql-alchemy, what i sent you were [11:00:03] the "raw" queries, sql alchemy might be doing something slightly different [11:15:46] ok, springle, please let me know if this makes sense: http://pastebin.com/nqTSwX05 [11:16:04] the query can be run for a "subset" of users (cohort) [11:16:09] or the whole project [11:17:11] "makes sense" how? :) [11:17:31] I can't test it meaningfully without %s values... [11:24:05] (5, 1, 0, '20130531000000', '20130630000000', '20130531000000', '20130630000000') [11:24:16] springle: sorry about that [11:24:41] those should be the correct %s values [11:27:31] ok [11:27:49] so this is related to the earlier query? [11:28:03] what frequency will it run, which projects, same questions :) [11:31:14] nuria: this is all very vague, since I don't really know the details of your project. how about this: just start running all this stuff in one or two concurrenct connections. see how long it takes. if i spot problems i'll let you know. if you find it all takes too long, you let me know and we'll revise. would that work for you? [11:32:02] ok, springle: the numbers i sent you I got with 10 concurrent connections. https://www.mediawiki.org/wiki/Analytics/Editor_Engagement_Vital_Signs/Backfilling [11:32:24] well, i *think* they are concurrent [11:33:05] ok, that would be good to find out. plus you don't necessarily know if that resulted in performance issues for other people, right? [11:33:26] no, taht is why i was letiing you know what we were doing [11:34:18] As we did not do any tunning of our mysql driver on sqlalchemy i imagine the connections are concurrent but i will double check [11:34:27] Thank you springle [11:34:28] yep, thanks. appreciate that. can you start a full run now? i'll watch and see [11:34:37] we can answer both questions [11:34:57] yes, i will run things for fr wiki, i need 10 minutes though [11:34:58] plus i can see more examples of the un-obfuscated SQL [11:35:04] np [11:35:08] ping me [11:36:06] ok [11:46:17] qchris, hols [11:46:22] hola sorry [11:46:24] Hi nuria. [11:46:43] I just rebooted dev machine (wikimetrics dev) [11:46:48] and on the logs saw: [11:46:50] mountall: mount /public/backups [665] terminated with status 32 [11:46:50] mount.nfs: No such device [11:47:19] Let me log in there ... [11:48:13] /public/backups is mounted now. [11:48:20] yes, [11:48:23] But do we need that directory? [11:48:38] not in dev as i was just using it testing [11:48:46] Using it "for testing" [11:49:03] but .. who creates the mount.. is it created when set up teh instance? [11:49:32] or is it a labs default? [11:49:36] I don't know :-) It just is magically there when you create a new instance. [11:50:58] nuria: templates/labsnfs/auto.space.erb in operation/puppet looks like it could do the trick. [11:51:28] ok, let me check [11:54:56] Oh no! That was pmpta. [11:55:15] :) [11:55:21] Hi milimetric [11:55:23] nuria: sorry to interrupt [11:55:32] i was just doing some math [11:55:33] nuria: operations/puppet/manifests/role/labs.pp [11:55:39] nuria: ^ Looks better. [11:56:16] with only one concurrent thread, running all 883 projects, for 12 hours every day, backfilling would take 9 years [11:56:18] lol [11:56:53] that's just one metric [11:57:01] so all metrics would take 36 years [11:57:11] therefore - I think we need that intermediate table :) [11:57:48] or to backfill only the last month [11:58:27] but then, 883 projects, 4 metrics, if each takes 1 minute we still can't do it within a day [11:59:34] so with 20 metrics, we basically need our queries to take less than a second each [11:59:53] milimetric: i do not think we need one concurrent thread though [12:00:01] that's what sean was saying [12:00:04] start with 1 or 2 [12:00:27] we wstill need to test whether there is an issue [12:00:27] i understood there were ~10 projects initially, not 883 :) [12:00:54] sorry for the misunderstanding, the ~10 projects are the ones we'll show initially when the site loads up [12:01:07] but for this project to be called "done" the idea is to get these metrics evaluated on all wikis [12:01:24] I did told that to springle [12:01:25] (as an aside: we all hate this "Rolling Active Editor" monster query that's killing all our lives :)) [12:01:51] so yeah, 883 projects, each running 4 queries by the end of September, and 20 queries at some point in the near future after that [12:01:51] but let's not get carried over yet, [12:02:01] i am about to launch a bunch of queries for frwiki [12:02:10] so spriongle can look at them [12:02:57] my point is, even with 10 concurrent threads, for this nightly thing to complete in a reasonable amount of time, we should have every query finish in sub-10 seconds [12:03:16] meaning we need intermediary tables wherever we can use them [12:03:34] yes milimetric, true, but "1. do it, 2. do it better 3,. do it best" [12:04:05] ok, for 1. do it we need them to finish in under 50 seconds then (4 metrics instead of 20) [12:04:19] springle queries should be running now [12:04:23] ok [12:04:25] nuria: I check on another instance. Yes, manifests/role/labs.pp is responsible for those mounts. [12:04:34] (thanks very much springle for helping us out!) [12:05:25] nuria: what is the labsdb user account? [12:05:53] ah no, wait, springle i just rebooted teh instance [12:05:58] and it cannot connect to fr [12:06:05] 'frwiki' [12:06:44] nuria: You did not set up the nat rules. [12:07:01] qchris gets shinny star of teh week [12:07:07] nuria: I set them up for you again. [12:07:11] springle, account is s52263 [12:07:14] Can you try if it works for you now? [12:09:11] Well ... wfm. So I assue it works for you too now. [12:10:19] milimetric: 10s on average would be achievable. most wikis are tiny. frwiki at 1min is one of the big ones [12:12:18] ok, springle, qchris, milimetric frwiki queries running now [12:12:27] nuria: Coolio. [12:13:06] they look fast to me but i guess for dbas they are 'glacially slow' as springle put it [12:14:45] nuria: I dont see any s52263 queries running on labsdb1001 or labsdb1002. or else they are very fast [12:15:13] well, some went through some minutes back looks like [12:15:18] let me reschedule some more [12:15:41] done [12:15:57] they should be running now [12:16:01] with 10 connections [12:16:43] ah heh [12:17:02] frwiki. nuria that's labsdb1003. not upgraded yet [12:17:04] i see them now [12:17:10] oohhh, even better [12:22:02] nuria: try enwiki? [12:22:13] enwiki, sure [12:22:22] give me 5 mins so i can set it up [12:22:31] yep [12:22:57] Analytics / Wikimetrics: Labs instances rely on unpuppetized firewall setup to connect to databases - https://bugzilla.wikimedia.org/69042#c4 (christian) This bug hit us again today. See http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140815.txt starting on [12:05:53] [12:29:53] springle: i think you should be seeing some enwiki queries now [12:30:01] yep [12:31:45] ok, these also run with 10 connections [12:32:52] only 6 connections show up [12:35:27] I can run more, if you want, maybe only 6 were used as i still had some frwiki running [12:35:42] ^springle [12:35:52] that's fine [12:39:04] ok, thank you for taking a look springle, let us know if you have any recommendations [12:39:11] Analytics / Tech community metrics: Duplicate item in list on korma.wmflabs.org/browser/bugzilla_response_time.html - https://bugzilla.wikimedia.org/61860#c4 (Andre Klapper) (In reply to Santiago Dueñas from comment #2) > We are going to procede to remove them from the database but it will take > some... [12:41:20] nuria: these queries are of the second form you pasted earlier. will we also see the first form? [12:41:26] Analytics / Tech community metrics: Bugzilla ticket with recent comments listed under "Longest time without comment" on bugzilla_response_time.html - https://bugzilla.wikimedia.org/64373#c10 (Andre Klapper) (In reply to Santiago Dueñas from comment #9) > we still have a problem when an issue is moved f... [12:41:30] no springle [12:41:37] how do they relate? [12:41:41] the 1st form evolved into teh 2nd form [12:42:57] Analytics / Tech community metrics: Tech metrics should talk about "Affiliation" instead of organizations or companies - https://bugzilla.wikimedia.org/60091#c6 (Andre Klapper) (In reply to Andre Klapper from comment #5) > Patch isn't clean / includes noisy unrelated changes. Vinay: Do you plan to rew... [12:44:59] ok, need to make food for family, be back in an hour cc springle [12:45:14] Analytics / Wikimetrics: replication lag may affect recurrent reports - https://bugzilla.wikimedia.org/68507 (christian) a:christian [12:47:12] Analytics / Wikimetrics: replication lag may affect recurrent reports - https://bugzilla.wikimedia.org/68507#c3 (christian) As the title limits to recurrent reports (although the same issue also affects non-recurrent reports), I am assuming that we really only need to cover recurrent reports. [12:48:58] Analytics / Wikimetrics: replication lag may affect recurrent reports - https://bugzilla.wikimedia.org/68507#c4 (Dan Andreescu) That's a fine assumption. The mechanism we use to determine replag could be reused later. And for now, people running ad-hoc reports are probably used to replag the same way... [13:10:53] nuria: those queries take ~15min on labsdb and ~30sec on analytics-store [13:12:22] analytics-store has had a few indexes added over time. some came from s1-analytics-slave. some dartar and halfak requested [13:12:45] in this case one on (rev_timestamp, rev_user) [13:13:15] I believe that the rev_user, rev_timestamp index already existed. [13:13:19] any way we can do these queries on analytics slaves, or must they be in labsdb? [13:13:20] It's the archive one that was missing [13:13:26] so ar_user, ar_timestamp [13:13:46] it did exist, but it was still added specially for analytics at some stage [13:14:16] the important point being it isn't on labsdb, and probably shouldn't be just for this [13:18:16] halfak: actually, note the order: rev_timestamp, rev_user. you're thinking about the reverse [13:18:31] possibly dartar asked for it? don't recall [13:25:32] Oh! Yikes. Wasn't me then. [13:25:49] I dunno what you would user (timestamp, user_id) for [13:25:57] springle, ^ [13:26:39] I guess you could use that to get the number of unique edits by a user in a timespan without scanning the table. [13:26:57] Seems like rev_timestamp would be about as performant though. [13:27:11] I wonder if anyone tested that. :) [13:28:12] not quite. rev_timestamp would incur index + row lookup depending on optimizer switches and storage engine. rev_timestamp_user only hits index for the access attern you describe [13:28:50] which is the same pattern nuria's labsdb queries are doing :D [13:31:17] halfak: heh, yeah, email thread with dario and this http://aerosuidae.net/paste/7/53ee0b8d [13:31:32] same pattern you describe [13:36:09] springle: these must be in labsdb as wikimetrics lives in labs on purpose (so it can only access public data) [13:36:47] brb, restarting [13:45:41] (PS1) QChris: Reschedule recurring reports to 03:00 [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154267 (https://bugzilla.wikimedia.org/68507) [13:45:57] (PS1) QChris: Document faked 'created' timestamps for recurring reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154268 [13:48:30] (CR) QChris: Document faked 'created' timestamps for recurring reports (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154268 (owner: QChris) [13:57:08] springle, second milimetric, queries must be on labs db [14:00:25] milimetric, I don't see a good reason for the (timestamp, user) index. [14:00:38] ottomata: Join us! :-D [14:00:42] Woops. springle ^ [14:00:49] I can consult DarTar [14:01:08] But the (ar_user, ar_timestamp) index seems like a good idea. [14:01:15] (PS4) Milimetric: Ensure wikimetrics session is always closed [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [14:01:17] And it is relatively cheap given that archive is short. [14:02:43] halfak: I gave a reason :) ^ [14:03:09] Oh. Sorry I missed that. [14:03:13] * halfak checks that query [14:03:21] it's an arcane reason, to be sure... [14:04:04] nuria: http://aerosuidae.net/paste/8/53ee1338 ... note tweaks: ORDER BY NULL and archive_userindex [14:04:30] springle, looks like it isn't doing an index scan. [14:04:32] springle, in meeting will look in 20 mins [14:05:07] meh... I'm reading something wrong. [14:05:19] I thought that index scans were explicit. [14:05:24] Now, I'm not so sure. [14:06:18] did I say index scan? i didn't mean to [14:06:23] * springle re-reads back [14:07:45] I might have the jargon wrong. I mean reading the index rather than the table rows. [14:08:11] I havn't rechecked dario's query. For nuria's query which uses rev_timestamp_user, it doesn't do an index scan on revision but a range access, plus "Using index" means only reading index not rows [14:08:45] ah yep, that's "Using index". a true index scan would be "index" in `type` column of explain output [14:08:52] It seems like it. I thought it used "Using index scan". [14:08:59] Looks like the docs call it an "index scan" http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html [14:09:12] Ahh. I see springle [14:09:53] there are a few "Using index..." phrases [14:11:49] (PS1) Yuvipanda: Add a 'description' field to Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154270 [14:12:52] (CR) Yuvipanda: [C: 2] Add a 'description' field to Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154270 (owner: Yuvipanda) [14:12:57] (Merged) jenkins-bot: Add a 'description' field to Queries [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154270 (owner: Yuvipanda) [14:12:59] nuria: ack meeting. no hurry. bbl myself [14:14:01] (PS5) Milimetric: Ensure wikimetrics session is always closed [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [14:14:28] oh god, what now. Why are my alter tables taking forever again?! [14:14:50] hmpf, restarting mysql server fixes it [14:15:07] * YuviPanda should switch Quarry to postgres at some point [14:17:39] postgres FTW [14:17:58] I used to use postgres for my wiki research -- because it was just better. [14:18:14] Much faster for nearly everything. [14:18:25] Spinning disks and no RAID [14:18:51] When I needed to do big aggregations I'd partition tables. It worked pretty well. [14:18:56] heh, yeah [14:19:06] this is the first time I'm writing something that uses MySQL as backing store [14:19:08] rather than postgres [14:19:13] (not counting MediaWiki work) [14:28:04] weird, qchris_meeting, the last 2 hours of logs from cp3015 are off by 1! [14:28:11] count_different = 1 [14:32:49] (PS1) Yuvipanda: Use simple textbox for title editing and display [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154273 [14:32:54] (CR) jenkins-bot: [V: -1] Use simple textbox for title editing and display [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154273 (owner: Yuvipanda) [14:33:46] (PS2) Yuvipanda: Use simple textbox for title editing and display [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154273 [14:34:03] (CR) Yuvipanda: [C: 2] Use simple textbox for title editing and display [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154273 (owner: Yuvipanda) [14:34:15] (Merged) jenkins-bot: Use simple textbox for title editing and display [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154273 (owner: Yuvipanda) [14:47:50] ottomata: the maximum sequence number for hour 9 is 1 bigger than the minimum sequence number for hour 10 [14:48:16] ottomata: did you try finding duplicates/missings? [14:50:44] qchris: that's normal, right? [14:50:53] oh [14:51:03] sorry, the older our has a larger seq by 1 [14:51:05] hmm [14:51:18] yes. [14:51:23] weird! i wonder what the timestamps on those seqs are...looking! [14:52:04] ottomata, looking? [14:52:09] or...sequing? [14:52:16] (it works better out loud. But either way; I'm sorry) [14:52:24] haha [14:52:27] seeking! [14:57:15] weird, qchris! [14:57:21] 8354234902 2014-08-15T09:59:59 [14:57:21] 8354234901 2014-08-15T10:00:00 [14:57:32] Hahaha :-) [14:58:18] The timestamp ... what timestamp is that exactly? the time when the request arrived at varnish, or when it starts responding? [14:59:56] "%{%FT%T@dt}t" looks like fun :-) [15:00:16] Time when the request was received, [15:00:25] https://www.varnish-cache.org/docs/3.0/reference/varnishncsa.html [15:02:08] Yup. Came to the same conclusion. [15:03:00] So I guess that's the kind of issues one expects with parallel systems. [15:03:37] wll ask snaps [15:03:47] k [15:08:58] cool, all green again [15:08:59] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=hive_partition [15:10:29] \o/ [15:32:43] Analytics / Wikimetrics: Improve perf of rolling active editors with dba's suggestions - https://bugzilla.wikimedia.org/69610 (nuria) NEW p:Unprio s:normal a:None ELECT anon_1.user_id AS anon_1_user_id, IF(SUM(anon_1.count) >= 5, 1, 0) AS `IF_1` FROM ( Note "order by null " and archive_... [15:34:10] springle: filed a bug to change query: https://bugzilla.wikimedia.org/show_bug.cgi?id=69610 (we will try to grab it next sprint) [15:39:06] nuria: great. also it's possible we can add the index discuessed earlier to labsdb, however I have yet to trial the tokudb online index creation for a table as large as revision. [15:39:39] and if i understood right the migration had not happened on the hosts in which we were running teh queries correct? [15:39:42] *the [15:40:19] the frwiki host has not been migrated. the enwiki host /has/ benn [15:43:49] qchris_away: just talked to snaps, have a good explanation [15:44:44] (PS6) Milimetric: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [15:44:57] nuria: ^ ready for review [15:45:16] ok, milimetric, will do that in 1hr or so [15:45:22] no prob, just fyi [15:45:47] i'm gonna go grab lunch and try to get sessions out of my brain [15:45:59] milimetric.sessions.close() [16:01:30] ottomata: Interesting! [16:01:36] I am in trap chat. [16:01:50] Or you can also type :-) [16:04:03] oh, easy to explain! [16:04:15] timestamp generated when request received by varnish [16:04:22] seq generated by varnishkafka after request has finished [16:04:31] Argh. Ok. [16:04:42] Mhmm. [16:05:53] The 09:59:59 request took an order of magnitude longer for time to first byte. [16:06:01] So that matches too. [16:22:50] maybe we should put some leniency into the check_sequence_stats [16:22:50] like [16:22:56] Analytics / Wikimetrics: Improve perf of rolling active editors with dba's suggestions - https://bugzilla.wikimedia.org/69610 (Andre Klapper) [16:23:11] where percent_different != 0.0 AND ABS(count_different) > 10 [16:23:13] or something [16:23:14] dunno [16:23:53] We could also exclude the first and last second. [16:24:21] Or include neighboring partitions (but that would mean we can start the job only an hour later) [16:25:36] We could also just wait to see how often this occurs. [16:25:50] The fix is simple ... we would just greate the _SUCCESS files by hand. [16:26:07] And the waiting oozie jobs would start after that. [16:26:28] hm, true, i just checked august where count_different = 1 [16:26:31] 4 entries [16:26:45] :-( [16:26:47] the other 2 don't seem to match this, at least, they dont' pair up in the same way [16:27:12] OH [16:27:13] yes they do [16:27:14] sorry [16:27:28] was looking at the wrong fields [16:27:29] So ~1 incident per week. [16:27:46] guess so, in our 2 week sample size :) [16:27:58] the other one from bits, not upload [16:28:39] Oozie for the win! Maybe we just run a follow-up test if the naive test failed. [16:28:48] not a bad idea [16:28:49] That would be transparent to everything else, and [16:28:56] fail-to: something other than kill? [16:28:57] we would only need to test this specific case. [16:29:00] Right. [16:29:03] aye [16:29:15] let's wait and see, but i think that's a good idea [16:29:26] Ok. [16:46:07] Ottomata: would you be so kind as to merge this chnageset: https://gerrit.wikimedia.org/r/#/c/153568/4 [16:46:18] i tested it a bunch on dev and found no issues [16:52:56] gotcha [16:54:51] dev meaning the labs development instance of wikimetrics [16:57:28] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615 (christian) NEW p:Unprio s:normal a:None The timestamp reported by varnish is taken when the request arrives. The sequence number reported by varn... [17:08:34] nuria: there's an "overflow" amount that can be added to the pool [17:08:38] I think the default is like 10 [17:08:45] so it can go to 42 [17:08:47] ahhh [17:08:48] ok [17:08:57] but if you don't see it going over, that might make sense too [17:10:01] because the pool might be sized based on information from the server itself and the engine url passed to it [17:10:21] so then it can be shared among all the workers without those threads sharing any instance [17:24:41] Analytics / Refinery: Make webrequest partition validation handle races between time and sequence numbers - https://bugzilla.wikimedia.org/69615#c1 (Dan Andreescu) nice find! [17:29:35] qchris: all changes here look (very) good to me https://gerrit.wikimedia.org/r/#/c/153395/4/files/backup/hourly_script [17:30:05] Great. Thanks. [17:30:21] I have tested on dev (which has redis of 2.5M) and it takes few secs (~2) [17:30:33] to dump so i think 15 secs is plenty [17:31:07] Wonderfull. [17:31:21] Redis dumps every 60 seconds anyways (if there was a change) [17:32:23] So we could go a bit higher if we needed. But 15 worked well in my tests too. [17:38:16] qchris: what i do not understand is why we didi not use "/usr/bin/redis-cli LASTSAVE" , I did not understand the comment about it [17:38:56] LASTSAVE gives you the timestamp the redis save previously. [17:39:14] That is fine if the redis config's rdb matches our $REDIS_DB_FILE [17:39:46] But assume the redis puppet module changes, does crazy things, and no longer honors the rdb filename setting that we pass to it. [17:40:04] And we do not notice this change in the redis puppet module. [17:40:10] ah ok, it is not the reporting, is teh filename [17:40:21] Right. [17:40:27] very well, i see, ultra-mega-cautious [17:40:48] Better safe than sorry :-) [17:41:19] But the LASTSAVE code would not be simpler, so I thought file's mtime is more defensive. [17:43:28] ootomata: would you be so kind as to merge this: https://gerrit.wikimedia.org/r/#/c/153395/ [17:43:42] sorry, ottomata : would you be so kind as to merge this: https://gerrit.wikimedia.org/r/#/c/153395/ [17:45:32] done [17:45:47] ok, let me send you the "module" bump up [17:45:53] thanks nuria! [17:46:02] thanks ottomata! [17:49:55] thanks ottomata: this is teh last one that bumps up wikimetrics module https://gerrit.wikimedia.org/r/#/c/154290/ [17:53:34] done [18:14:12] ottomata, http://graphtest.wmflabs.org/wiki/Main_Page [18:14:25] our beautiful new graphs :) [18:14:51] awesooooome [18:15:01] thx to milimetric who greatly improved some visual aspects of the first one [18:30:31] ottomata, do you know how to bump up the memory is teh newest vagrant? [18:34:37] (PS7) Milimetric: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [18:35:45] k, but it's definitely not forwarding the port [18:36:02] ottomata: found it [18:36:22] ottomata : found it [18:38:10] nuria hm no [18:38:14] oh [18:38:16] good! [18:41:23] nuria: sadly: [18:41:25] Ran 291 tests in 68.778s [18:41:25] OK [18:41:33] :/ [18:41:38] sadly? [18:41:41] (in vagrant) [18:41:42] it's good!! [18:41:55] nuhuh! now we have something nondeterministic no? [18:41:55] right? [18:42:07] no , my machine doesn't have enough memory i bet [18:42:11] hm, ok [18:42:20] interesting hypothesis, may be true [18:42:49] anyway, let me know if you can repro any of the failures, and feel free to give it a shot to write the two tests I added and marked @attr('manual') [18:43:00] (they have big TODO that says they're not working) [18:43:35] and I was gonna start writing down component structure ideas: http://etherpad.wikimedia.org/p/analytics-67806 [18:46:37] ok, milimetric , bumped up memory and restarting [18:47:00] etherpad sounds good for components to start! [18:55:18] milimetric, i am going to have to go but i see alredy our ideas about components differ, can we work on this on moday? [18:55:34] totally [18:55:41] I'll leave my thoughts here and keep thinking about it [18:55:52] but i'm happy that we differ, means we'll have a great design [18:56:02] have a good weekend [18:56:34] ottomata1 or hashar, do you know how to get newer package ver on stat1002? It seems like the python-requests is 3+ years old [18:56:35] I am still running tests.. wait [18:56:45] yurikR: no clue [18:57:06] hashar, i thought you wrote that lovely page on how to backport packages? [18:57:09] yurikR: ah yeah python-requests comes from Debian package , so that would be the one shipped by Ubuntu Trusty [18:57:11] milimetric, tests work! [18:57:22] yay [18:57:36] cool, so then let's bash it around in staging and hopefully we're good [18:57:45] it was the memory, it looked like it cause vagrant was way slower than it normally is [18:58:01] if you test in staging i will CR+ think about components [18:58:11] i shall test in staging now then [18:58:34] ok, thank you , we are doing great with teh sessions i think [18:58:37] *the [19:00:43] Analytics / Wikimetrics: Backing up wikimetrics data fails if data is written while we back it up - https://bugzilla.wikimedia.org/68731#c9 (nuria) Tested throughly on dev but this of course needs baking time in prod. Wish we had a status "READY_TO_DEPLOY" that should be how bugs are left at the end of... [19:10:45] yurikR: you will have to make sure upgrading python-requests doesn't cause issue with software we wrote that rely on it [19:11:36] hashar, stats servers didn't have requests lib before i added it a few days ago [19:11:56] nor any of the other libs like boto and pandas [19:11:57] yurikR: but upgrading the package on apt.wikimedia.org make the change cluster wide! [19:12:16] do we use requests anywhere? [19:12:37] is there a way to even see that? (debating if its possible to search ALL of gerrit's code) [19:12:53] grep operations/puppet :-D [19:12:57] for python-requests [19:13:09] it is probably used on labs as well [19:13:55] probablyeasier to make your code work with requests 0.8.2 as provided by Precise [19:14:24] ormigrate to Trusty to have 2.2.1 :] [19:14:43] last resort, provide the dependency along with your code and tweak PYTHON_PATH [19:15:28] hey, should we do something about reportcard.wm ? [19:15:37] it's on stat1001 but the docroot doesnt exist [19:18:29] mutante: we should probably remove it [19:18:31] it was going to be limn [19:18:37] but limn was decided not to be productionized [19:20:10] gotcha, well, i see 10-limn-reportcard.conf created by puppet [19:20:32] or, i just thought it was from puppet [19:20:48] looks [19:35:29] Analytics / General/Unknown: http://reportcard.wikimedia.org/ - redirect and delete old stuff - https://bugzilla.wikimedia.org/69625 (Daniel Zahn) NEW p:Unprio s:normal a:None http://reportcard.wikimedia.org/ points to stat1001 reportcard.wikimedia.org is an alias for stat1001.wikimedia.or... [19:46:55] hmm, need to revert something about the ports setup on stat1001 it seems [19:47:29] but i wonder how that could have worked before, it seems it would have happened on any apache restart [19:47:41] the ports.conf from the package conflicts with the puppetized one [19:47:56] and then the webserver wont come up sayin the ports are already in use [19:57:23] hm [19:57:52] mutante: stat1001 is a bit of a wild west, i grant you 100% permission to do whatever is necssasry! [19:57:54] actually, can I help? [19:58:03] (PS10) Terrrydactyl: Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858 [19:58:50] ottomata: so.. i think i will do a partial revert [19:58:56] it's not really stat1001's fault [19:59:12] but if it would ever get reinstalled, it would probably break [19:59:24] the thing is the port.conf [19:59:36] there is one from the Debian package and another one that is puppetized [20:00:05] so i could just delete /etc/apache2/ports.conf now and it would be ok [20:00:09] until it gets reinstalled [20:00:34] and here is why: see the change on https://gerrit.wikimedia.org/r/#/c/153832/5/manifests/misc/statistics.pp [20:00:45] the apache::site part is just fine [20:01:01] the apache::conf part though, even though we use the same source => file [20:01:35] it writes it to /etc/apache2/conf-available/ , not to /etc/apache2/ports.conf [20:01:51] which means the ports.conf is also there, which means conflict and "port already in use" [20:02:25] that seems to be a general shortcoming of the new Apache module [20:02:39] it would conflict with distro package [20:03:09] oooooo [20:04:14] what does the distro ports.conf have? [20:04:54] no NameVirtualHost *:443 [20:05:00] that's why we started puppetizing it [20:05:26] i'm going to go back to something that doesnt break and then later ask _joe_ and ori [20:05:27] hm, we could just leave distro ports.conf as is, with port 80 in there [20:05:37] and then puppetize 443 separately [20:05:48] yes, we had this issue on wikitech as well [20:05:55] actually,, wait.. [20:06:17] ori said the same thing for wikitech, then that he already fixed it [20:06:23] but i have to lookup _how_ [20:21:21] ottomata: so, yea, for now i reverted and Ori's fix appears to be this [20:21:24] https://gerrit.wikimedia.org/r/#/c/154003/ [20:21:31] and removing the entire ports.conf part [20:21:37] https://gerrit.wikimedia.org/r/#/c/153946/2/manifests/openstack.pp [20:21:59] but we can do that later and not necessarily now, it's in a stable state [20:26:22] aye [20:26:22] k [20:26:42] milimetric: Any idea when we'll next deploy something to wikimetrics1.eqiad.wmnet? [20:27:04] I'd like to see the new backup in action :-) [20:31:26] Analytics / Visualization: Graph types ("Core" and "Secondary") are unclear on Wikimedia Report Card - https://bugzilla.wikimedia.org/39120#c1 (Andre Klapper) "Other" is called "Secondary" now but the issue remains. [20:40:24] (PS1) QChris: Drop documentation around 30 days limit for recurrent reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154326 [20:41:04] (PS1) QChris: Pull duplicated report naming code into base class [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154327 [21:04:58] (PS1) Yuvipanda: Store results in SQLite databases [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154333 [21:05:05] (CR) jenkins-bot: [V: -1] Store results in SQLite databases [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154333 (owner: Yuvipanda) [21:07:32] (PS2) Yuvipanda: [WIP] Store results in SQLite databases [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154333 [21:07:38] (CR) jenkins-bot: [V: -1] [WIP] Store results in SQLite databases [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154333 (owner: Yuvipanda) [21:08:08] (PS3) Yuvipanda: [WIP] Store results in SQLite databases [analytics/quarry/web] - https://gerrit.wikimedia.org/r/154333