[00:23:37] (PS6) Terrrydactyl: Add autcomplete to tags [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 [01:12:50] (PS3) Gergő Tisza: [WIP] Track opt-out ratio [analytics/multimedia] - https://gerrit.wikimedia.org/r/143501 [05:44:47] Analytics / Wikimetrics: Cors on wikimetrics. - https://bugzilla.wikimedia.org/67825#c1 (nuria) Deploy and test necessary changes so cors works on wikimetrics. [09:44:02] Analytics / Wikimetrics: Cors on wikimetrics. - https://bugzilla.wikimedia.org/67825#c2 (Dan Andreescu) well, CORS is working in production wikimetrics right now, I deployed it last night. But I had to manually enable the apache mod_headers module, so that should probably get into puppet. [09:52:58] springle, can i ask you a question? [10:20:28] nuria: certainly [10:21:22] I tried to create a query in staging (a subset of the tofu-selector data so we can actually look at the data in an samller dataset) [10:21:58] how can i monitor that the table creation is not eating up all resources of the system? [10:22:34] it just finished but it took 32 minutes. [10:23:19] which seems a lot and might have affected other users running queries on the system. [10:25:49] nuria: icinga didn't care, nor much of a spike on ganglia. i think one such process at a time will never hurt much [10:26:35] springle: would you be so kind as to send me the graphs to spot those issues in ganglia? [10:26:46] http://ganglia.wikimedia.org/latest/?r=hour&tab=ch&hreg[]=^dbstore1002 [10:26:58] also https://tendril.wikimedia.org/host/view/dbstore1002.eqiad.wmnet/3306 [10:27:12] springle: are icinga alarms set up via puppet? [10:27:20] yes [10:27:49] boy.. what is tendril? [10:27:54] my tool [10:28:47] and .. shouldn't those graphs be on db1047.eqiad.wmnet? [10:29:03] er [10:29:13] springle: you use the google api eh? for graphs [10:29:19] are we talking about db1047? i thought staging was on analytics-store now? [10:30:03] my t=reserach user config connects to research@db1047.eqiad.wmnet [10:30:12] hehe [10:30:19] in that case i'm looking the wrong box [10:30:30] and that is how i access staging, please let me know if i should do it any other way [10:33:04] i don't know :) i think dario/aaron asked for staging on both boxes. i don't know if anyone has formulated rules on what goes where yet [10:34:05] ok, then this is it: https://tendril.wikimedia.org/host/view/db1047.eqiad.wmnet/3306 [10:34:12] yep [10:34:42] not much spike for tofu_selection [10:34:55] times in your graph are GTM/UTC? [10:35:08] yes [10:35:43] aha [10:36:04] tofu_selection is ARIA engine, which is fantastic [10:36:25] mmmm.. don't know what ARIA engine is, need to reda about that [10:36:27] *read [10:36:28] that's a recent default setting change from InnoDB engine [10:36:54] much better for data warehouse stlye tasks exactly like tofu_selection [10:39:06] nuria: ah, no indexes yet on tofu_selection. that is once reason why it had little impact [10:39:19] yes, EL tables do not have indexes [10:39:22] when created [10:39:35] the db consumer adds them i think? [10:39:48] as in "aaron" the human, yes [10:39:55] id, uuid, and timestamp at least [10:39:57] ah :) [10:40:42] in any case, creatting a table like that with CREATE TABLE .. AS SELECT, without indexes, is no more load than streaming results to the client [10:41:40] and because it is ARIA, it won't fill InnoDB txn logs and end up affecting replication [10:42:07] that was a large part of the reason why db1047 used to suffer from replication lag more [10:42:16] for fields in all tables like timestamp i the code adds indexes I guess [10:42:28] yeah I'd hope so :) [10:42:29] ahhhh [10:42:37] doing it manually would be annoying [10:43:49] Ok, i just looked up aria a bit, let me know if you know of a good resource to read more about that. [10:46:58] Are you familiar with MyISAM? [10:47:13] from olden days MySQL [10:47:46] ARIA is an ACID version of MyISAM, well mostly [10:48:01] crash safe at least, and with a buffer pool similar to InnoDB [10:48:29] lightweight is the important bit for tables like tofu_selection [10:49:58] https://mariadb.com/kb/en/mariadb/mariadb-documentation/mariadb-storage-engines/aria-formerly-known-as-maria/aria-storage-engine/ [10:50:31] there are many pros and cons [10:51:16] ok, will make sure to read that, thanks for your prompt response [10:51:19] basically, the wikis and eventlogging use either InnoDB or TokuDB engines, because those are fully transactional, have good concurrency, and are robust [10:51:35] other tables, like ad-hoc and temporary, use Aria, because it is lightweight [10:51:46] feel free to hammer aria tables, reindex, etc [10:53:47] * YuviPanda wonders if MariaDB will ever get materialized views [13:19:08] (CR) Ottomata: "Awesome! I think we can put this text in README.md though, no?" [analytics/refinery] - https://gerrit.wikimedia.org/r/145421 (owner: QChris) [13:26:45] (CR) Ottomata: Add basic deployment script (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/144677 (https://bugzilla.wikimedia.org/67129) (owner: QChris) [13:27:44] yo nuria [13:28:18] I have an issue with wikimetrics I wanted to discuss with you or milimetric [14:27:47] (CR) Gilles: "Is this still WIP or do you want it reviewed?" [analytics/multimedia] - https://gerrit.wikimedia.org/r/143501 (owner: Gergő Tisza) [14:54:50] Analytics / Wikimetrics: Load test cohort uploading - https://bugzilla.wikimedia.org/67858 (nuria) NEW p:Unprio s:normal a:None Load test cohort uploading. We should have boundaries on how big a cohort can be to be updated successfully. From Dario: I had a question that required looking... [14:56:17] Analytics / Wikimetrics: Load test cohort uploading - https://bugzilla.wikimedia.org/67858#c1 (nuria) The estimation should also include tweaking of settings to improve as much as possible our performance when it comes to upload cohorts. [14:57:09] DarTar, i missed your ping [14:57:19] but is it the cohort uploading? [14:57:38] look at the bug i just filed: https://bugzilla.wikimedia.org/67858 [14:57:41] yep, just saw it [14:57:46] thank you [14:58:37] re: prority it’s not particularly urgent, I am now getting the data via a script, but it’s good if we know how to handle files of this size [15:05:31] DarTar We can do binary search now, split teh cohort in two and find teh limit [15:05:34] *the [15:06:25] ha, cool [15:08:33] Dartar: want to do that? [15:09:05] nah, like I said it’s a super-low priority [15:10:46] ok [15:11:04] BTW, tears come to my eyes , big thanks to Nemo_bis [15:11:32] the ULSFO blogpost got published: https://blog.wikimedia.org/2014/07/11/making-wikimedia-sites-faster/ [15:11:36] whee [15:24:11] nuria, Nemo_bis: \o/ [15:24:34] \o/ \o/ \o/ [15:39:50] nuria: works nicely with earlier blog post by faidon with the ripe atlas system [15:39:53] excellent timing [15:40:04] (CR) Nuria: Add autcomplete to tags (4 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/145039 (owner: Terrrydactyl) [15:40:39] hashar: ya, that was the idea [15:40:48] it "looks" like a follow up [15:40:56] which is nice [15:41:05] that make it looks like we are well organized \O/ [15:41:12] ja ja [15:41:18] (spanish laugh) [15:42:15] it was Nemo_bis who pushed for the blog post, i feel bad his name is not there though, he did a bunch of proofreading [15:42:37] yeah Nemo_bis is everywhere [15:43:21] ah https://commons.wikimedia.org/wiki/File:Improvement_in_Page_Load_Times_on_Wikipedia_after_the_ULSFO_datacenter_deployment_(mapped_with_carto_db).png [15:43:24] cartodb is lovely [15:43:29] I wish we had that on commons [15:43:51] i.e. get a dataset + a map template = profit! [15:43:52] hashar: carto db blows your mind, have you seen: [15:44:25] there was a startup that proposed a similar service but to creates graphs [15:44:31] it had a ton of very interesting dataset [15:44:32] s [15:44:38] and made it super trivial to create nice graphs [15:44:46] it went bankrupt though :-( [15:46:27] harshar: http://mwcimpact.com/ [15:47:05] harshar: one nice thing about carto db is that they started (and got really well known) without VC funding. in Madrid [15:47:14] nuria: ah that is backed up by a bank isn't it ? [15:47:15] hashar [15:47:25] yes [15:47:30] nuria: I think I have met the project manager behind it last year in my city [15:47:39] he had a nice map of Madrid main street [15:47:45] with the top sellers [15:47:57] all drawn out of credit cards transaction. That was super impressive [15:48:10] yeah that is exactly that [15:48:19] I should reconvert to data viz [15:48:50] hashar: You can be our guinea pig #2 for dashboard stuff [15:49:00] guinea pig #1 is alredy claimed [15:49:04] ;:D [15:50:35] standars are high on that role [15:50:37] nuria: http://atlas.media.mit.edu/ that one is awesome [15:51:05] ie http://atlas.media.mit.edu/explore/tree_map/hs/export/esp/all/show/2012/ [15:51:09] product exported by spain [15:51:17] wow [15:51:24] only the name is awesome! [15:51:31] *just the name that is [15:52:14] it is under CC-BY-SA maybe the software behind it is under some open source license [15:53:02] they have some per countries dashboards http://atlas.media.mit.edu/profile/country/esp/ [15:53:47] it is super awesome i have to say [15:53:50] https://github.com/alexandersimoes/oec [15:53:52] python based [15:54:08] I really love the design [15:54:13] make it super easy to select data [15:54:21] and let you get clues from the datasets [15:55:15] looking at the team, one of the members is a coauthor of http://d3plus.org/ [15:55:19] which is amazing as well [15:56:14] you know , the install is also real nice [15:56:20] not only the viz [15:56:35] I can't believe you are already installing it [15:56:43] brb [15:57:05] no, i just looked it up on teh github page [16:06:47] happy you like the mit atlas :) [16:07:03] I am rushing out, time to pack luggage and take some vacations [16:07:06] see you in a week [16:46:43] ottomata: Would you mind checking on whether my SSH key is still authorised on stat1001? I can't seem to SSH there. [16:46:52] Or helping me diagnose, I dunno [16:47:19] stat1001? [16:47:27] did you ever have access there? [16:47:32] ...pretty sure? [16:47:39] you sure you don't mean stat1002 or stat1003? [16:47:42] I had stat1, and I thought it was moved over [16:47:44] most people don't have access to stat1001 [16:47:47] stat1 -> stat1003 [16:47:47] ...weird [16:47:49] Oh [16:47:53] That might be why then. [16:47:57] stat1003.wikimedia.org [16:49:10] * marktraceur is on it [17:00:41] OK next question [17:00:54] Whose leg do I have to hump to get commonswiki replicated to s1? [17:02:04] marktraceur: for statsy stuff? [17:02:10] marktraceur: use analytics-store, has *all* the wikis [17:02:32] Aha. [17:45:49] (PS1) MarkTraceur: Add preference counts to the generated stats [analytics/multimedia] - https://gerrit.wikimedia.org/r/145595 [18:04:51] (PS1) MarkTraceur: Add graphs for opt-outs [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/145599 [18:56:13] (PS1) MarkTraceur: Use preference change events instead of our own [analytics/multimedia] - https://gerrit.wikimedia.org/r/145608 [21:34:40] (CR) MarkTraceur: [C: 2 V: 2] Clean up limn graphs [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/143872 (owner: Gilles) [22:12:32] WOOT, 19 / 23 DataNodes in CDH5 online [22:12:43] 644TB of space [22:12:47] 3 of them are being weirdos [22:12:52] gonna have to get chris to help on monday for those [22:13:10] sorry 19 / 22 [22:13:16] but woowooooo [22:13:27] ok time for weekend, going to do hive and other stuff on monday too [22:13:29] laters all! [22:14:17] oh! that's awesome ottomata. [22:14:21] have a good weekend! [22:17:01] laters! you too! [22:49:54] Ironholds, do we still keep 1:1000 request logs? [22:50:02] because I need some request data [22:50:15] nope. For the last 9 months every time hadoop went down I just took a marker pen to the graph and guessed [22:50:34] (yes, we have 1:1000 sampled logs and I have a whole shitton of methods oriented around making the data that comes out of them not blow chunks. Waddayaneed.) [22:50:34] hokay... can I get you to run a hadoop query for me then? [22:50:40] hadoop is down ;p [22:50:45] hah [22:50:59] I actually have a sampled dataset loaded and open now. [22:51:04] so: well done ;p [22:51:10] (god bless you, screen) [22:51:33] I need the number of requests to https://en.wikipedia.org/w/index.php?title=Special:Book broken out by the value of the URL parameter bookcmd [22:51:49] well; the number of requests to that page on all wikis [22:52:09] but I only need the bookcmd independent variable [22:52:27] over the time period of say; a week [22:53:43] * Ironholds headscratches [22:53:45] come over to my cave?