[02:05:06] (PS5) Terrrydactyl: [WIP] Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858 [02:05:15] (CR) jenkins-bot: [V: -1] [WIP] Add ability to global query a user's wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/129858 (owner: Terrrydactyl) [03:45:44] (PS1) Yuvipanda: Add login link to landing page properly [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148594 [03:45:46] (PS1) Yuvipanda: Add user model and populate it after login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148595 [03:46:15] (CR) Yuvipanda: [C: 2 V: 2] Add login link to landing page properly [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148594 (owner: Yuvipanda) [03:50:42] (PS1) Yuvipanda: Don't specifically handle /static [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148596 [04:06:01] (CR) Legoktm: Add user model and populate it after login (6 comments) [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148595 (owner: Yuvipanda) [04:06:34] (CR) Legoktm: [C: 2] ":D" [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148596 (owner: Yuvipanda) [04:07:21] (PS2) Yuvipanda: Add user model and populate it after login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148595 [04:07:23] (PS2) Yuvipanda: Don't specifically handle /static [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148596 [07:32:05] Analytics / EventLogging: Add index on event_action, event_isAnon and event_namespaceId to NavigationTiming tables - https://bugzilla.wikimedia.org/68396#c2 (Sean Pringle) The eventlogging slaves (analytics-store and s1-analytics-slave) already have indexes on (wiki, timestamp) for all tables, plus the... [07:35:36] Analytics / EventLogging: Add index on event_action, event_isAnon and event_namespaceId to NavigationTiming tables - https://bugzilla.wikimedia.org/68396#c3 (Tisza Gergő) Once a day per wiki. [07:37:06] Analytics / EventLogging: Add index on event_action, event_isAnon and event_namespaceId to NavigationTiming tables - https://bugzilla.wikimedia.org/68396#c4 (Tisza Gergő) event_action is probably not useful since we are filtering on view events, which are the majority, so it's unlikely that index would... [07:39:06] Analytics / EventLogging: Add index on event_type to MultimediaViewerDuration tables. - https://bugzilla.wikimedia.org/68397#c1 (Tisza Gergő) On second thought, not sure if this is worth it. We only have three types, with about the same frequency, so an index might not help much there. [08:03:27] (CR) Springle: [C: 1] "Performance isn't too bad using the existing (wiki, timestamp) indexes already on all eventlogging tables. We're adding a adding a coverin" [analytics/multimedia] - https://gerrit.wikimedia.org/r/148021 (owner: Gergő Tisza) [08:30:22] Analytics / EventLogging: UniversalLanguageSelector-tofu logging too much data - https://bugzilla.wikimedia.org/67463#c2 (nuria) We have dropped the large table UniversalLanguageSelector-tofu_7629564 and created a table on staging with data so the i18n team can do the needed research. Closing bug as I... [08:30:36] Analytics / EventLogging: UniversalLanguageSelector-tofu logging too much data - https://bugzilla.wikimedia.org/67463 (nuria) NEW>RESO/FIX [08:44:14] (CR) Nuria: Track loading time for MediaViewer and the file page (1 comment) [analytics/multimedia] - https://gerrit.wikimedia.org/r/148021 (owner: Gergő Tisza) [09:03:13] (CR) Nuria: [C: 2] "Thanks for taking care of this." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/148537 (https://bugzilla.wikimedia.org/68410) (owner: Milimetric) [09:03:22] (Merged) jenkins-bot: Fix bad dependency chains and imports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/148537 (https://bugzilla.wikimedia.org/68410) (owner: Milimetric) [09:08:35] (PS3) Yuvipanda: Add user model and populate it after login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148595 [09:08:37] (PS3) Yuvipanda: Don't specifically handle /static [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148596 [09:09:05] (PS1) Yuvipanda: Fix accidental camels in Wikitech [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148617 [09:10:29] (CR) Legoktm: [C: 2 V: 2] Add user model and populate it after login [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148595 (owner: Yuvipanda) [09:10:49] (CR) Legoktm: [V: 2] Don't specifically handle /static [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148596 (owner: Yuvipanda) [09:11:09] (CR) Legoktm: [C: 2 V: 2] Fix accidental camels in Wikitech [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148617 (owner: Yuvipanda) [12:07:32] hi nuria, did you want to talk? [12:07:41] (about dashboards that is) [12:16:38] Analytics / Tech community metrics: Wrong data at "Update time for pending reviews waiting for reviewer in days" - https://bugzilla.wikimedia.org/68436 (Quim Gil) p:Unprio>High [12:16:39] Analytics / Tech community metrics: Wrong data at "Update time for pending reviews waiting for reviewer in days" - https://bugzilla.wikimedia.org/68436 (Quim Gil) NEW p:Unprio s:normal a:None Something has happened in the past days that has made the column "Update time for pending reviews w... [12:48:17] (PS1) QChris: Drop leading 0s for partitions in sample HiveQL [analytics/refinery] - https://gerrit.wikimedia.org/r/148647 [12:48:19] (PS1) QChris: Switch to fully qualified table names [analytics/refinery] - https://gerrit.wikimedia.org/r/148648 [12:48:21] (PS1) QChris: Allow to compute sequence stats more than once [analytics/refinery] - https://gerrit.wikimedia.org/r/148649 [12:48:23] (PS1) QChris: Add pipeline for basic verification of webrequest logs [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) [13:11:28] (CR) QChris: "I know that this is not perfect and needs some discussion on" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [13:25:53] (CR) Ottomata: [C: 2 V: 2] Drop leading 0s for partitions in sample HiveQL [analytics/refinery] - https://gerrit.wikimedia.org/r/148647 (owner: QChris) [13:26:35] morning qchris! [13:26:39] Hi ottomata [13:26:57] oo, alter table doesn't work with fq table name? [13:27:14] Right. [13:27:25] There is a Hive ticket for it. [13:27:46] hm, ok, ah i see your regex there then [13:27:48] hmmm Ok [13:28:02] ok, i'm not 100% on using FQ for the create table [13:28:07] but, i suppose, since the location is hardcoded there [13:28:08] we might as well [13:28:36] Yes, I know. That's one thing we should discuss. [13:28:38] if location wasn't hardcoded, i probably wouldn't want the database name in the table schema file, as the file could be used to create that table in any database, which is good for testing [13:28:58] Currently, I added it mostly for consistency. [13:29:03] but, ha, in our case LOCATION doesn't even really do anything does it [13:29:04] ? [13:29:14] what happens if we didnt' even specificy the location in the create statement? [13:29:26] That's true. LOCATION doesn't do anything for us. [13:29:40] Nothing happens if we drop it. [13:30:00] Recall the CR on the webrequest create HiveQL file? [13:30:20] If you prefer we drop it ... let's drop it. [13:31:25] haha, but if we drop it, then we have to argue about whether or not we shoudl include the database name, right? [13:31:35] unless you are un-opinionated about that [13:32:05] I do not follow ... database name and location settings are unrelated to me. [13:32:13] I am about 60% towards not using database name in create table schema (so my feelings are not very strong) (much weaker than my feelings about file names in bin/ :p) [13:32:27] technically they are not related [13:32:35] If you argue for dropping the database name, let's drop it. [13:32:41] but, we are by convention keeping raw tables in /wmf/data/raw [13:32:45] hence the database name is wmf_raw [13:32:52] but really [13:32:56] the reason I would argue for leaving it out there [13:33:08] is so that anyone can just use that schema to create an external webrequest table [13:33:14] in their own databases, in labs, or wherever [13:33:44] especially for testing reasons. say you had one of you rshell scripts to populate webrequest data [13:33:44] So we drap the hardcoded databasename, but (for now) leave the hardcoded table name there. [13:34:00] or say we built something that created webrequest data fixtures, [13:34:08] would be nice to be able to just use this .hql file directoy [13:34:11] without having to edit it [13:34:22] yes, haha, yeah we could make the same arguemnt for the table name eh? [13:34:26] but i think that is less usefull [13:34:32] mehh, ok, if you are easy, then i'm easy! [13:34:37] no database name, no location :) [13:34:45] Ok. [13:35:19] want me to just merge 148648? and fix that in another patch? [13:35:32] Whatever you prefer. [13:35:43] I can provide a new Patch Set in that change. [13:35:57] Let me do just that. [13:37:45] (PS2) QChris: Allow to compute sequence stats more than once [analytics/refinery] - https://gerrit.wikimedia.org/r/148649 [13:37:47] (PS2) QChris: Switch to fully qualified table names [analytics/refinery] - https://gerrit.wikimedia.org/r/148648 [13:37:49] (PS2) QChris: Add pipeline for basic verification of webrequest logs [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) [13:38:05] The dropping of location will get a separate unrelated change. [13:39:07] k [13:39:22] (CR) Ottomata: [C: 2 V: 2] Switch to fully qualified table names [analytics/refinery] - https://gerrit.wikimedia.org/r/148648 (owner: QChris) [13:40:30] iiintersting, about the partitions on the sequence table [13:40:38] those are going to be pretty small partitions, ja? [13:40:44] i guess it doesn't matter [13:40:46] (PS1) QChris: Drop LOCATION setting for creating webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/148653 [13:40:58] Yes, they'd be pretty small. [13:41:12] (CR) Ottomata: [C: 2 V: 2] Allow to compute sequence stats more than once [analytics/refinery] - https://gerrit.wikimedia.org/r/148649 (owner: QChris) [13:41:25] YAY THE OOZIE PATCH [13:41:25] We could avoid partitions, and add timestamps /when/ the sequence stats got computed. [13:41:28] I'm so excited! [13:41:46] hm, and then keep the old ones around? [13:41:48] WITH bundle.xml !!!!!!1!!!11!!111! [13:41:50] i think I like this for now [13:41:51] oh boy [13:42:14] partitioning seemed like the cleanest approach (although the partitions are small) [13:49:54] qchris, nit: <> and != are equivalent in hql, right? [13:50:00] yes. [13:50:13] <> is the sql variant. [13:50:18] did I use != somewhere? [13:50:33] no, i prefer != ! :) but am not that opinionated about it [13:51:09] I do not care too much, but try to stick with standards where possible. [13:51:33] I'll add != to coding conventions and fix it in a separate commit. [13:54:27] (PS1) QChris: Prefer '!=' over '<>' in HiveQL [analytics/refinery] - https://gerrit.wikimedia.org/r/148655 [13:59:16] uffff, someone scheduled an elasticsearch meeting during standup! didn't realize it was now! [13:59:20] hm, ok I will miss standup today [13:59:23] i guess... [14:00:04] will email later [14:00:07] after this meeting [14:11:05] Analytics / Wikimetrics: forms module is poorly nested inside controllers - https://bugzilla.wikimedia.org/68410 (Dan Andreescu) PATC>RESO/FIX [14:21:38] Analytics / EventLogging: Delete tofu table from staging database after research is done - https://bugzilla.wikimedia.org/68441 (nuria) NEW p:Unprio s:normal a:None Delete tofu_selection table from staging database after research is done. [14:22:51] Analytics / EventLogging: Delete tofu table from staging database after research is done - https://bugzilla.wikimedia.org/68441#c1 (nuria) Once i18n team is done with their research please remember to drop the table from staging database. Assigning to amir as I believe he will be the person working o... [14:24:55] Analytics / EventLogging: Delete tofu table from staging database after research is done - https://bugzilla.wikimedia.org/68441 (nuria) a:Amir E. Aharoni [14:59:20] (PS2) QChris: Prefer '!=' over '<>' in HiveQL [analytics/refinery] - https://gerrit.wikimedia.org/r/148655 [15:01:36] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c1 (nuria) Meeting notes on etherpad: http://etherpad.wikimedia.org/p/6gR8aSREkz [15:39:32] (PS1) QChris: Harmonize format of HiveQL to create tables [analytics/refinery] - https://gerrit.wikimedia.org/r/148684 [15:46:41] Analytics / Wikimetrics: headers for public files - https://bugzilla.wikimedia.org/68445 (nuria) NEW p:Unprio s:normal a:None The public report files on wikimetrics need to be served with the right set of headers for caching and CORS so the dashboard can use them [15:47:43] Analytics / Wikimetrics: Deploy newly register metric to production - https://bugzilla.wikimedia.org/68446 (nuria) NEW p:Unprio s:normal a:None Deploy newly register metric to production, populate for all projects. [15:48:54] Analytics / Wikimetrics: Deploy "rolling active editors" metric to production. - https://bugzilla.wikimedia.org/68447 (nuria) NEW p:Unprio s:normal a:None Deploy "rolling active editors" metric to production. Populate for all projects. [15:57:26] Analytics / Wikimetrics: Story: Mediawiki Dashboard Storage - https://bugzilla.wikimedia.org/68448 (nuria) NEW p:Unprio s:normal a:None We should make sure we can use mediawiki for storage of the dashboard meta-data with mediawiki team. Also, if possible we should be able to retrieve the... [16:02:31] nuria: I'm done with https://www.mediawiki.org/wiki/Analytics/Editor_Engagement_Vital_Signs/Dashboard [16:02:51] ok, [16:03:52] milimetric, just sent e-mail with items on critical path. feel free to add anything. [16:04:01] just read it, looks good [16:29:05] ok, milimetric, edited dashboard a bit but basically it's all there now [16:29:17] will send to list [16:29:37] cool [16:29:44] https://plus.google.com/hangouts/_/wikimedia.org/dashboarding is going on now [16:30:19] (PS2) QChris: Harmonize format of HiveQL to create tables [analytics/refinery] - https://gerrit.wikimedia.org/r/148684 [16:30:21] (PS1) QChris: Reindent HiveQL files [analytics/refinery] - https://gerrit.wikimedia.org/r/148696 [16:30:24] (PS1) QChris: Harmonize HiveQL documentation [analytics/refinery] - https://gerrit.wikimedia.org/r/148697 [16:31:00] (PS1) QChris: Decrease concurrency for webrequest log verification [analytics/refinery] - https://gerrit.wikimedia.org/r/148698 [16:31:02] (PS1) QChris: Increase timeout for webrequest log verification [analytics/refinery] - https://gerrit.wikimedia.org/r/148699 [16:31:04] (PS1) QChris: Decrease throttle for webrequest log verification [analytics/refinery] - https://gerrit.wikimedia.org/r/148700 [16:42:27] Analytics / General/Unknown: Inventory Analytics systems for operational support - https://bugzilla.wikimedia.org/68450#c1 (Toby Negrin) We need to understand our support needs so we can discuss resourcing with ops. [16:51:17] (PS1) QChris: Document XML tag placement convention for multi-line element values [analytics/refinery] - https://gerrit.wikimedia.org/r/148701 [17:14:50] darnit [17:15:02] where are the hive/hadoop home directories and libs on stat2 again? [17:22:16] hi qchris / Ironholds [17:22:21] I'm about to look at sampled logs [17:22:25] heya milimetric. [17:22:30] and you two have epic awesome ways to do that [17:22:37] milimetric, eh, I have okay ways of doing that. [17:22:49] so basically, I'm going to: [17:22:53] if you want to use R ;p. Otherwise whatever qchris uses is probably better. [17:23:13] analyze July 9th and July 10th, grouped by hour and data center (ULSFO and EQIAD only) [17:23:19] Meh. I just use some scripts. But they are ... bad scripts. [17:23:23] I have no problem with R [17:23:25] oh, that's doable. [17:23:30] I can probably write up a script to do it, actually ;p. [17:23:44] er... I don't want to make Toby kill me Ironholds [17:23:44] milimetric: I might have that data at hand ... let me check. [17:23:46] what's the purpose and what do you want grouped by it? [17:23:50] and why would toby kill you? [17:23:55] wasting your time :) [17:24:06] so, the purpose is this bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=68199 [17:24:25] specifically, checking to make sure the move of the Varnish caches in ULSFO didn't affect the data negatively [17:24:25] oh, gotcha. [17:24:34] that should be easy. [17:24:34] the move happened on July 9th [17:24:45] so I was going to take an hourly look at July 9th vs. July 8th or something [17:24:52] and group it by data center [17:24:53] * Ironholds nods. [17:25:03] that's because ULSFO traffic was diverted to EQIAD for the duration of the move [17:25:14] so total traffic should be pretty similar between the days [17:25:17] milimetric: on stats1002 /home/qchris/milimetric/data has relevant per day and per hour data. [17:25:29] but ULSFO should drop to 0 at some point and EQIAD should spike up the corresponding amount [17:25:33] damn, qchris is faster than me :D [17:26:01] I'll never understand you qchris, I can't even do it that fast in SQL... [17:26:03] Ironholds: but the format of that data is not too nice ... [17:26:21] Let me transpose the data for you. [17:26:59] * pizzzacat looks around for Pau [17:27:18] this is analytics. We hath not a pau. [17:27:20] hi pizzzacat, haven't seen him around here [17:27:39] (for people who don't know pizzzacat, it's sherah. She puts the raisin in fundraising.) [17:27:40] aww. ok [17:27:42] i know [17:27:46] milimetric: /home/qchris/milimetric/data/ulso-move [17:27:47] lol [17:27:55] I know, I just wanted to make the 'raisin in fundraising' joke. [17:27:56] milimetric: ^ transposed data. [17:28:05] I came up with it for K4 yesterday and I've been dying to use it. [17:28:06] I am in charge of dried fruits around here [17:28:11] and don't you forget it [17:28:16] and, you know, K4 hits me more than pizzzacat does. [17:28:39] :) aw I would love to see that, I should visit soon [17:29:04] I thought the joke was "we put the 'draising in fundraising" [17:30:22] anyway. I guess I'll look somewhere else for that Pau [17:30:42] thanks for meeting today milimetric and nuria. [17:32:11] let us know if you think we can help you with anything pizzzacat [17:33:54] :) [17:58:04] qchris ...on the vagrant issues you were having last week [17:58:12] did you run into this one: Could not parse options: invalid option: --hiera_config=/tmp/vagrant-puppet-1/hiera.yaml [17:58:20] nope. [17:59:25] ah .. the joy [17:59:43] Can it be that your vagrant is too old? [18:00:03] (I had to update mine to make it work with the current mediawiki-vagrant) [18:00:14] (older versions of mediawiki-vagrant ran fine previously) [18:00:26] maybe .. will try that next [18:00:54] I'll boot the machine, so I can tell you a 'known to work' vagrant version number. [18:03:13] "vagrant --version" tells me "Vagrant 1.4.3" [18:03:32] My mediawiki-vagrant is at "db9297eb0d1b1809978cbe4fd83c5084e9920421" [18:03:43] (Which is from 2014-07-16) [18:12:18] ya, looks like I need a new db [18:24:21] today has been 100% meetings! [18:24:26] here comes anothaaaaaa [18:25:08] Yay Meetings :-D [19:18:24] milimetric: I'm thinking of just building this as a separate tool. I don't think there'll be too much technical repetition in the implementation, and the model doesn't fit exactly either [19:30:07] Hallo. [19:30:32] Is there an easy way to see the number of new articles created in a wiki each week? [19:31:03] I want to see whether the ContentTranslation extension has any effect on the rate of article creation. [19:31:42] I know that 26 new articles were created in the Catalan Wikipedia with the help of that extension, and that's a nice number, [19:31:57] but it would be even nicer if I could put it into context, [19:32:43] for example, to see how it compares to the number of articles that is created there during a usual week. [19:33:32] Ironholds, marktraceur , matanya , StevenW ^ [19:37:01] * marktraceur not sure why I'm involved [19:37:03] yes aharoni [19:37:14] just a sec [19:37:48] https://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm [19:37:52] aharoni: ^ [19:46:11] marktraceur: you are involved in a lot of things, thought you might know :) [19:46:26] Too many things, you might say :P [19:46:43] * marktraceur looks wistfully at his list of highlighted channels [19:52:48] aharoni: Yes [19:52:49] There are schemas for all wikis now, tracking article creations, deletions, and restorations [19:52:50] They don't go back more than a few months, but they're much easier to work with than reconstructing page histories from scratch [19:53:19] StevenW: something newer than https://stats.wikimedia.org/EN/TablesArticlesNewPerDay.htm ? [19:53:26] which matanya shared? [19:53:41] So the weird thing about using that [19:53:59] is that it depends on the whole "countable page" metric which Erik Z. thought up [19:54:06] So it's not really comprehensive [19:54:32] "countable page"? [19:55:14] it's a made-up sub-definition of page which (for example) excludes pages without incoming links for instance [19:55:23] so if I create orphan articles, not counted [19:55:50] it also only includes NS0, so if you want to track pages created in a different namespace and then moved, I'm not sure it gives you that [19:56:03] does it include the archive table? [19:57:14] But if you want precision, I would go with using the recent data from Schema:PageCreation, PageMove, PageDeletion, PageRestoration. [19:57:31] aha [19:57:40] ok, looking [20:12:01] StevenW: looks like PageCreation does include non-0 namespaces [20:12:09] Yes [20:12:22] I meant the tables on stats.wikimedia.org [20:12:24] which don't [20:12:29] AFAIK [20:12:35] oh [20:23:57] (CR) Ottomata: "This looks so good!" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [20:24:13] (CR) Ottomata: "'generate_sequence_statistics.hql'?" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [20:24:45] (CR) Ottomata: [C: 2 V: 2] Prefer '!=' over '<>' in HiveQL [analytics/refinery] - https://gerrit.wikimedia.org/r/148655 (owner: QChris) [20:25:33] (CR) Ottomata: [C: 2 V: 2] Harmonize format of HiveQL to create tables [analytics/refinery] - https://gerrit.wikimedia.org/r/148684 (owner: QChris) [20:26:05] (CR) Ottomata: [C: 2 V: 2] Decrease concurrency for webrequest log verification [analytics/refinery] - https://gerrit.wikimedia.org/r/148698 (owner: QChris) [20:27:56] (CR) Ottomata: [C: 2 V: 2] Increase timeout for webrequest log verification [analytics/refinery] - https://gerrit.wikimedia.org/r/148699 (owner: QChris) [20:29:59] (CR) Ottomata: Decrease throttle for webrequest log verification (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/148700 (owner: QChris) [20:32:46] ottomata1: zuul got broken you will have to CR+2 again :( [20:33:11] eh? [20:33:47] ottomata: I broke Zuul earlier [20:33:55] so the patches you send above will not be merged [20:33:59] at least not by zuul :D [20:35:00] Oh, that's ok, they aren't a hurry [20:35:04] and also still ahve a dependency that hasn't been merged [20:38:57] (CR) Ottomata: [C: 2 V: 2] Reindent HiveQL files [analytics/refinery] - https://gerrit.wikimedia.org/r/148696 (owner: QChris) [20:39:47] (CR) Ottomata: [C: 2 V: 2] Harmonize HiveQL documentation [analytics/refinery] - https://gerrit.wikimedia.org/r/148697 (owner: QChris) [20:43:07] Analytics / Visualization: EEVSUser selects time range - https://bugzilla.wikimedia.org/68470 (Kevin Leduc) NEW p:Unprio s:enhanc a:None as an analyst, I can set the start & end dates to visualize, so I focus the visualization on a narrow period of time. [20:43:44] (CR) QChris: "Sorry. Sure. I am fine with 'generate_sequence_statistics.hql'" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [20:45:06] Analytics / Visualization: EEVS Release Candidate - https://bugzilla.wikimedia.org/68350 (Kevin Leduc) [20:45:07] Analytics / Visualization: EEVSUser selects time range - https://bugzilla.wikimedia.org/68470 (Kevin Leduc) [20:47:34] (CR) Ottomata: "Yup, no probs!" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [20:47:42] (CR) Ottomata: [C: 2 V: 2] Add pipeline for basic verification of webrequest logs [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [20:48:18] (PS2) Ottomata: Document XML tag placement convention for multi-line element values [analytics/refinery] - https://gerrit.wikimedia.org/r/148701 (owner: QChris) [20:48:23] (CR) Ottomata: [C: 2 V: 2] Document XML tag placement convention for multi-line element values [analytics/refinery] - https://gerrit.wikimedia.org/r/148701 (owner: QChris) [20:48:56] (PS2) Ottomata: Drop LOCATION setting for creating webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/148653 (owner: QChris) [20:49:00] (CR) Ottomata: [C: 2 V: 2] Drop LOCATION setting for creating webrequest table [analytics/refinery] - https://gerrit.wikimedia.org/r/148653 (owner: QChris) [20:51:50] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c2 (Dan Andreescu) * check with gage whether this work already took place Checked with gage, ops had not checked the network traffic in-depth during the switchover * check that during the switchover, host... [20:52:13] qchris: still around? [20:52:17] Yup. [20:52:25] Reading your comment on 148700. [20:52:53] oh aye, ok, ja, wanted to talk about dns change too [20:53:07] Ok. Then dns changes first. [20:53:35] ha ok, ok welp, so [20:53:36] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c3 (Dan Andreescu) Created attachment 16018 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16018&action=edit the daily traffic to ulsfo and eqiad, by host, during the switchover [20:53:45] jobtracker isn't necessary, as it isn't a real YARN service [20:53:49] it doesn't exist on our cluster [20:53:52] Right. [20:53:55] Oozie just needs that set for some dumb reason [20:54:01] but, namenode is good [20:54:09] if we wanted the jobtracker equivalent, we would use 'resourcemanager' [20:54:14] Right. [20:54:19] but, namenode will work just fine i think for that url [20:54:22] they are just on different ports [20:54:23] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c4 (Dan Andreescu) Created attachment 16019 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16019&action=edit the daily traffic to ulsfo and eqiad, by datacenter, during the switchover [20:54:23] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c5 (Dan Andreescu) Created attachment 16020 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16020&action=edit the hourly traffic to ulsfo and eqiad, by host, during the switchover [20:54:38] unless we wanted to name-based proxy it or somethign [20:54:44] But since they are different concerns, I'd use different names. [20:54:51] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c6 (Dan Andreescu) Created attachment 16021 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16021&action=edit the hourly traffic to ulsfo and eqiad, by datacenter, during the switchover [20:55:00] Also ... if we switch to mr1 (for whatever reason) having the [20:55:07] setup care for it, would not hurt. [20:55:14] ? [20:55:22] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c8 (Dan Andreescu) Created attachment 16023 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16023&action=edit spreadsheet of hourly data for ulsfo and eqiad with totals and graph [20:55:22] Analytics / General/Unknown: ULSFO post-move verification - https://bugzilla.wikimedia.org/68199#c7 (Dan Andreescu) Created attachment 16022 --> https://bugzilla.wikimedia.org/attachment.cgi?id=16022&action=edit spreadsheet of daily data for ulsfo and eqiad with totals and graph [20:55:33] oh, like, not make the name depend on the service? [20:55:34] at all? [20:55:43] we could use [20:55:48] master.hadoop.analytics.eqiad.wmnet :p [20:56:03] or even [20:56:06] just hadoop.analytics.eqiad.wmnet [20:56:07] Mhmm. Not sure ... we're talking about the same thing? [20:56:08] is fine with me [20:56:21] since that is the main uri for the hadoop webservices [20:56:30] we certainly aren't going to make cnames for each datanode interface [20:56:35] eh? [20:56:41] No. Right. We won't. [20:56:53] But we have to pass a jobtracker. [20:57:03] Oozie's hive action needs it. [20:57:14] Currently we used analytics1010 and it worked. [20:57:38] I haven't tested if an empty/bogus value works too. [20:58:05] Do you know if empty/bogus works? [20:58:55] ERRRRRRR don't remember [20:58:56] i see [20:59:16] so, why not just use hadoop.analytics.eqiad.wmnet to mean the master hadoop node overall? [20:59:26] if we separate out the NameNode from other services, this will be a problem [20:59:33] Yup. [20:59:36] but we don't have any plans to do with that, and can deal with it if/when it happens [20:59:44] That's why I like namenode as separate cname. [21:00:08] ha, ok, i thought you were just doing this for nice UI reasons, so that we as humans would have a nice url [21:00:11] but you want this for oozie [21:00:23] Yes. Just for oozie. [21:00:43] hm, i dunno if this is worth it! its easier to change oozie configs than to change dns [21:00:45] I would not want to resubmit jobs if sevice migrates to a different machine. [21:00:52] hmmm [21:00:56] resubmit jobs...right [21:00:57] m [21:00:58] hm [21:01:20] ok, i'm cool with namenode, but I feel weird about jobtracker [21:01:29] can we just set namenode, and point the oozie jobtracker property at namenode? [21:01:36] We could. But [21:01:44] let me test a bit what jobtracker gets used for. [21:01:49] ok [21:02:56] port 8032 (which our property files use) is the resourcemanager. [21:03:13] So what about resourcemanager.analytics.eqiad.wmnet? [21:05:09] its pointed at that? [21:05:13] Yes. [21:05:27] I mean ... that's also what the resourcemaneger does. [21:05:36] It took part of what jobtracker did. Right? [21:05:47] (I mean mr1 vs. mr2) [21:06:24] Analytics / Visualization: EEVSUser selects Target Site breakdown - https://bugzilla.wikimedia.org/68473 (Kevin Leduc) NEW p:Unprio s:enhanc a:None Every project metric shows 3 new lines: - Desktop Site - Mobile Site - API [21:06:52] Analytics / Visualization: EEVS Release Candidate - https://bugzilla.wikimedia.org/68350 (Kevin Leduc) [21:06:52] Analytics / Visualization: EEVSUser selects Target Site breakdown - https://bugzilla.wikimedia.org/68473 (Kevin Leduc) [21:06:54] But I haven't tested the empty/bogus values. Sorry. Will do that now. [21:08:47] yes that's correct [21:11:22] YuviPanda: cool, good luck and feel free to reconsider anytime [21:12:23] Analytics / Wikimetrics: Story: WikimetricsUser generates report with Target Site breakdown - https://bugzilla.wikimedia.org/68475 (Kevin Leduc) NEW p:Unprio s:normal a:None for a chosen metric, add to the generated output 3 new numbers for: - desktop website - mobile website - API [21:12:52] Analytics / Visualization: EEVS Release Candidate - https://bugzilla.wikimedia.org/68350 (Kevin Leduc) [21:12:52] Analytics / Wikimetrics: Story: WikimetricsUser generates report with Target Site breakdown - https://bugzilla.wikimedia.org/68475 (Kevin Leduc) [21:14:22] milimetric: will do! Wikimetrics is awesome, btw :) [21:14:35] Analytics / Wikimetrics: Story: WikimetricsUser generates report with Target Site breakdown - https://bugzilla.wikimedia.org/68475 (Kevin Leduc) s:normal>enhanc [21:16:22] Analytics / Wikimetrics: Story: WikimetricsUser runs report against all wikis - https://bugzilla.wikimedia.org/68477 (Kevin Leduc) NEW p:Unprio s:enhanc a:None Get an aggregate of all wikis together. Numbers are dedublicated. Research wants to weigh in on best way to aggregate. [21:16:36] Analytics / Visualization: EEVS Release Candidate - https://bugzilla.wikimedia.org/68350 (Kevin Leduc) [21:16:36] Analytics / Wikimetrics: Story: WikimetricsUser runs report against all wikis - https://bugzilla.wikimedia.org/68477 (Kevin Leduc) [21:25:08] Analytics / Visualization: EEVSUser selects ALL wikis - https://bugzilla.wikimedia.org/68478 (Kevin Leduc) NEW p:Unprio s:enhanc a:None the same way a user can select a project, he cal select "All" which will show the aggregate [21:25:38] Analytics / Wikimetrics: Story: WikimetricsUser runs report against all wikis - https://bugzilla.wikimedia.org/68477 (Kevin Leduc) [21:25:39] Analytics / Visualization: EEVSUser selects ALL wikis - https://bugzilla.wikimedia.org/68478 (Kevin Leduc) [21:25:39] Analytics / Visualization: EEVS Release Candidate - https://bugzilla.wikimedia.org/68350 (Kevin Leduc) [21:25:50] (PS1) Gergő Tisza: Use correct DB for optout queries [analytics/multimedia] - https://gerrit.wikimedia.org/r/148838 [21:41:59] (PS1) Gergő Tisza: Record optout stats for users with 1+ edits and 100+ edits [analytics/multimedia] - https://gerrit.wikimedia.org/r/148842 [21:49:39] (CR) QChris: Decrease throttle for webrequest log verification (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/148700 (owner: QChris) [21:49:44] (PS1) Gergő Tisza: Add generated files to .gitignore [analytics/multimedia] - https://gerrit.wikimedia.org/r/148845 [21:52:48] * YuviPanda pokes ottomata with https://gerrit.wikimedia.org/r/148847 [21:52:53] ottomata: lzia requested those [22:01:40] ottomata: Ever since I am having a hard time killing jobs in oozie. In my labs cluster, I am not authorized to do it: [22:01:42] http://dpaste.com/2HZ3D5J [22:01:58] I tried as root, oozie, hdfs, qchris. [22:02:07] But nothing helped. [22:02:21] Howe do you kill jobs? [22:22:53] ottomata: ^ [22:28:31] (PS1) QChris: Rename script to generate sequence statistics [analytics/refinery] - https://gerrit.wikimedia.org/r/148863 [22:29:09] (PS1) Yuvipanda: Switch to custom Models, stop using SQLAlchemy [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 [22:29:20] (CR) QChris: "> is it ok if I fix-up once the other parts are merged?" [analytics/refinery] - https://gerrit.wikimedia.org/r/148650 (https://bugzilla.wikimedia.org/67128) (owner: QChris) [22:30:41] (CR) Legoktm: "Looks fine, but I think using oursql is a better idea." (1 comment) [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 (owner: Yuvipanda) [22:30:56] legoktm: why? [22:31:04] it's sane [22:31:10] supports proper parameterization [22:31:17] better unicode support [22:31:25] allows you to use cursors with "with" [22:31:46] https://pythonhosted.org/oursql/ the list is the first thing in their documentation :P [22:32:00] legoktm: hmm, fine, I'll switch [22:32:17] only downside of oursql is that there is no debian package, but you weren't trying to get this into prod anyways [22:32:30] legoktm: yeah [22:32:36] legoktm: also it seems to be API compatible [22:33:50] for the most part, just switch the %s formatting in your queries to ? [22:34:04] and use with for cursors [22:34:40] legoktm: hmm, right [22:36:56] (PS2) Yuvipanda: Switch to custom Models, stop using SQLAlchemy [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 [22:36:58] legoktm: switched. that was painless [22:37:12] aww, this channel has +c set :( [22:37:58] legoktm: heh [22:37:59] legoktm: Yup. I think in black and white :-) [22:38:08] * YuviPanda gives qchris rose tinted glasses [22:39:02] * qchris takes YuviPanda's glasses and swaps them with his own. Thon now only sees rose tinted cloudy things. [22:39:29] (CR) Legoktm: [C: -1] Switch to custom Models, stop using SQLAlchemy (2 comments) [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 (owner: Yuvipanda) [22:39:30] (PS3) Yuvipanda: Switch to custom Models, stop using SQLAlchemy [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 [22:39:59] it looks so weird! [22:40:02] legoktm: is that for autoclose behavior? [22:40:06] yes [22:40:08] also autocommit [22:40:25] legoktm: hmm, right [22:40:34] legoktm: will that put me in weirdass scoping territory? [22:40:37] one way to find out! [22:40:40] no [22:40:48] with doesn't change scope [22:41:06] legoktm: ah, cool [22:41:47] http://effbot.org/zone/python-with-statement.htm explains what it does [22:41:56] it's just shorthand for foo.__enter__ and foo.__exit__ [22:41:59] yeah, I know what it does... [22:42:04] ok :D [22:42:04] just have never used it before :D [22:42:27] (PS4) Yuvipanda: Switch to custom Models, stop using SQLAlchemy [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 [22:42:27] legoktm: updated [22:44:31] (CR) Legoktm: [C: 2 V: 2] Switch to custom Models, stop using SQLAlchemy [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148864 (owner: Yuvipanda) [22:44:39] legoktm: w00t [22:50:09] (PS2) Gergő Tisza: Record optout stats for users with 1+ edits and 100+ edits [analytics/multimedia] - https://gerrit.wikimedia.org/r/148842 [22:50:12] (PS2) Gergő Tisza: Add generated files to .gitignore [analytics/multimedia] - https://gerrit.wikimedia.org/r/148845 [22:51:43] (Abandoned) Gergő Tisza: Add generated files to .gitignore [analytics/multimedia] - https://gerrit.wikimedia.org/r/148845 (owner: Gergő Tisza) [22:52:05] Analytics / Visualization: Story: EEVSUser selects ALL wikis - https://bugzilla.wikimedia.org/68478 (Kevin Leduc) [22:52:35] Analytics / Visualization: Story: EEVSUser selects Target Site breakdown - https://bugzilla.wikimedia.org/68473 (Kevin Leduc) [22:52:50] Analytics / Visualization: Story: EEVSUser selects time range - https://bugzilla.wikimedia.org/68470 (Kevin Leduc) [22:53:11] legoktm: now I need to do a Query / QueryRevision model [22:53:48] the lack of colors is actually bugging me [22:58:07] Analytics / Wikimetrics: Spike: Mediawiki Dashboard Storage - https://bugzilla.wikimedia.org/68448 (Kevin Leduc) [22:58:50] Analytics / Wikimetrics: Spike: Mediawiki Dashboard Storage - https://bugzilla.wikimedia.org/68448 (Kevin Leduc) s:normal>enhanc [22:59:50] Analytics / Wikimetrics: Deploy newly register metric to production - https://bugzilla.wikimedia.org/68446 (Kevin Leduc) s:normal>enhanc [23:00:07] Analytics / Wikimetrics: Deploy "rolling active editors" metric to production. - https://bugzilla.wikimedia.org/68447 (Kevin Leduc) s:normal>enhanc [23:00:23] Hey YuviPanda, are you coming to Wikimania? [23:00:29] halfak: indeed [23:00:30] :D [23:00:38] YuviPanda, yay! [23:00:41] I'll also be in the UK for 2 months afterwards [23:00:44] we have a research hackday. you should come to it. [23:00:49] oh yeah, definitely [23:00:51] where are you the week after WM? [23:00:58] I think I offered halfak toollabs help at the hackday if needed [23:01:05] Oh yes. [23:01:06] Ironholds: Glasgow. Leaving the day after WM [23:01:09] I remember that now. [23:01:19] :D [23:02:07] Analytics / Wikimetrics: EEVSUser downloads report with correct Http Headers - https://bugzilla.wikimedia.org/68445 (Kevin Leduc) [23:03:33] halfak: btw, I hope to have a working demo of Quarry by Wikimania [23:03:45] halfak: not perfect, but... something :) [23:03:46] You work very quickly [23:03:51] :) [23:04:11] I'm just trying to get the diff parsing server I've been working on for months up and running. [23:04:46] halfak: the things I work on are far easier :) [23:04:51] and far less novel [23:05:15] Public query interface is kinda novel :) [23:05:30] The space you are operating in is a whole field that hasn't been cracked open yet. [23:05:41] :D [23:05:57] indeed, but *technically* it's rather trivial. just another CRUD app, almost :D [23:06:09] it'll take longer to figure out what to build than to actually build [23:06:46] It might take a little bit of creativity to observe the effects as well. [23:06:58] Luckily you've already decided to make actions publicly logged. [23:07:05] indeed [23:08:06] halfak: I'll also at some point need to figure out what I intend to accomplish with this, and how to measure it, and establish those before building any instrumentation so I don't randomly pick up numbers to justify whatever I want. [23:08:08] haven't even gone there yet [23:08:43] YuviPanda, seriously though. I think this is a bomb-ass project and I'm happy to run support. [23:09:12] halfak: \o/ indeed :D Getting actual researchers trying it out and telling me things would be the best help :D [23:09:25] considering all the researchers I know are in -research... [23:10:00] Also, I can help with measurement, ideas for what to expect as far as community dynamics, etc. [23:10:04] indeed [23:10:24] Also the network of other researchers and the research hackathons. might give you a boost of users. [23:10:30] indeed. [23:11:12] halfak: has anyone else done similar things like us? Open up a biggish research db to easy querying without much in the way of 'setup'? [23:42:25] (PS1) Yuvipanda: Put the .save method on the model itself [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148890 [23:43:43] (CR) Legoktm: [C: 2 V: 2] Put the .save method on the model itself [analytics/quarry/web] - https://gerrit.wikimedia.org/r/148890 (owner: Yuvipanda)