[08:38:15] Hallo. [08:38:47] http://stats.wikimedia.org/EN/TablesWikipediaHE.htm in the "Database records per namespace" section says "301 k" in articles. [08:38:52] This doesn't seem to be useful. [08:39:24] There are 160 k articles in the Hebrew Wikipedia. I suspect that it counts talk pages or redirects. [08:39:44] Is there a better place to see the number of pages per namespace? [08:40:18] I was actually interested in the number of templates, but now I'm not sure how to use this table. [08:56:15] http://stats.wikimedia.org/EN/TablesWikipediaHE.htm#namespaces [08:56:25] aharoni: ^ might be what you are looking for. [08:56:44] Columns are different namespaces. [08:57:07] qchris: that's precisely where I'm looking, and I don't see what I expect to see. [08:57:18] There are no 301 k articles. [08:57:28] There are 160 k. [08:57:52] It either includes redirects or talk pages or both. [08:58:20] These tables don't have columns for talk pages. [08:58:25] 301k (Articles) - 31k (User) - 38k (Project) - 44k (Binaries) / 188k [08:58:36] 301k (Articles) - 31k (User) - 38k (Project) - 44k (Binaries) = 188k [08:59:30] 301 k articles is a petty meaningless number for hewiki. [08:59:53] As is "11.1 M" for enwiki - http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#namespaces . [08:59:57] :-) [09:00:21] But is the 22k for Templates the number you're looking for? [09:01:04] Theoretically yes, but given that the number of articles in the same table is way off-mark, I cannot be sure that the number of templates is correct. [09:01:33] The table at the top of that page says 161 k articles, which does make sense, but it doesn't have other namespaces. [09:03:40] * qchris facepalms himself [09:03:59] The equation that I gave above is wrong. sorry. [09:07:29] aharoni: The 301k counts all pages in article namespace. But the 160k is only a subset of them. [09:08:01] 160k is "pages in article namespace, that have at least one internal link" [09:08:16] mmmmmmmmmmmm [09:08:32] so does the English Wikipedia have 4 million articles or 11 million? [09:08:50] That depends on your definition of "article" :-) [09:09:00] There are many different definitions floating around. [09:09:07] ! [09:09:08] Some want to count all pages. [09:09:17] A page in the main space that is not a redirect. [09:09:21] Is that naive? [09:09:25] Some want to count only pages that meet certain other criteria [09:09:42] Some defs considered a minimum length. [09:09:58] Some defs considered at least some punctuation. [09:10:05] Some defs considered at least an internal link. [09:11:10] The E column on the first table of http://stats.wikimedia.org/EN/TablesWikipediaHE.htm [09:11:37] (That's the column with 161k says "Articles that contain at least one internal link") [09:12:32] This 161k does not count redirects. [09:12:47] That is probably very similar to the number of non-redirect pages in the main space. [09:13:00] It's hard to imagine a Hebrew Wikipedia page without any internal links. [09:13:31] But is it so unusual to be curious about the simple number of pages per namespace? [09:13:32] I do not know hewiki :-), but for other wikis, there are typically quite some pages without internal links. [09:14:01] "pages per namespace" is http://stats.wikimedia.org/EN/TablesWikipediaHE.htm#namespaces [09:14:46] does the 301 k there include redirects? [09:15:35] I'd assume so, but it does not say. If you want to be sure, check with ezachte. [09:16:24] I'll send him an email and CC you. [09:17:28] there {{PAGESINNS:0}}, which could be useful, but it's disabled. [09:17:28] I could use something like {{NUMBEROFARTICLES}}, but for templates. [09:18:36] thanks qchris [09:21:18] Email sent. [09:33:46] qchris: i don't thnk that I got it [09:35:23] Mhmm. [09:35:33] Oh. My bad. [09:35:36] Retrying. [09:36:52] I made a typo in your email address :-( [09:37:21] np [10:32:49] (PS3) QChris: Schedule all runs for recurrent reports through scheduler [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154800 [10:32:51] (PS3) QChris: Stop extraneous report runs for parent of recurrent reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154799 [10:32:53] (PS1) QChris: When creating fixtures, allow to pass revision_timestamp as datetime [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155001 [10:32:55] (PS1) QChris: When creating fixtures, allow to pass the wiki [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155002 [10:32:57] (PS1) QChris: Stop scheduling new recurrent runs if databases lag [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) [10:32:59] (PS1) QChris: When testing, stub out replication lag checking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155004 [10:33:03] (CR) jenkins-bot: [V: -1] Stop scheduling new recurrent runs if databases lag [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) (owner: QChris) [10:33:05] (CR) jenkins-bot: [V: -1] When testing, stub out replication lag checking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155004 (owner: QChris) [10:41:52] (PS2) QChris: Stop scheduling new recurrent runs if databases lag [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) [10:41:54] (PS2) QChris: When testing, stub out replication lag checking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155004 [11:10:21] (CR) Milimetric: [C: 2] Stop extraneous report runs for parent of recurrent reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154799 (owner: QChris) [11:14:36] (CR) Milimetric: [C: 2] Schedule all runs for recurrent reports through scheduler [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154800 (owner: QChris) [11:15:14] (CR) Milimetric: [C: 2] When creating fixtures, allow to pass revision_timestamp as datetime [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155001 (owner: QChris) [11:16:12] (CR) Milimetric: [C: 2] When creating fixtures, allow to pass the wiki [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155002 (owner: QChris) [12:15:01] (PS9) Nuria: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [12:17:11] (CR) Nuria: "Need to make doc of test a little better." (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [12:18:39] (PS3) QChris: Stop scheduling new recurrent runs if databases lag [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) [12:18:41] (PS3) QChris: When testing, stub out replication lag checking [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155004 [12:20:18] (CR) QChris: "PS2 and PS3 only differ in the factory test." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) (owner: QChris) [12:24:03] (CR) QChris: Stop scheduling new recurrent runs if databases lag (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155003 (https://bugzilla.wikimedia.org/68507) (owner: QChris) [13:39:28] (PS1) QChris: Reuse created instance in factory for replag service [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155027 [13:39:30] (PS1) QChris: Cache replication lag status [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155028 [13:41:11] (CR) QChris: "I used this change during developing, and it might come handy, if" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155027 (owner: QChris) [13:41:16] qchris: this is strange, but I was switching between branches and your fix to the pyc removal broke [13:41:21] this didn't work: find . -name *.pyc | xargs --no-run-if-empty rm [13:41:26] but this worked: find . -name *.pyc | xargs rm [13:41:30] (Abandoned) QChris: Reuse created instance in factory for replag service [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155027 (owner: QChris) [13:41:42] the pyc files were in tests/stubs/ (from your patch) [13:42:22] Mhmm. Wfm :-/ [13:42:32] Since the find is in the current directory ... [13:42:53] You were in the same directory between trying both variants? [13:42:57] yes, root [13:42:59] very strange [13:43:17] Can you reproduce the issue? [13:43:30] yes [13:43:40] checkout your patch with the stubs directory [13:43:43] run scripts/test [13:43:47] checkout a different patch [13:43:49] run scripts/test [13:43:56] the pyc files stick around and aren't cleaned up [13:44:18] OK. I'll try that. Let be just abandon the other parked code change. [13:45:02] no problem, not urgent [13:45:06] (CR) QChris: "I used this change during developing, and it might come handy, if" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155028 (owner: QChris) [13:45:11] (Abandoned) QChris: Cache replication lag status [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155028 (owner: QChris) [13:47:40] (PS10) Milimetric: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [13:48:09] (CR) Milimetric: [C: 1] Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [13:48:18] (PS3) Milimetric: Disable pooling for mediawiki dbs [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154851 (https://bugzilla.wikimedia.org/68833) [13:49:13] (PS3) Milimetric: Add an option to schedule reports for all cohorts [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/150440 [13:49:19] (PS2) Milimetric: Ignore private or invalid wikis [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154864 [13:55:01] milimetric: I thought the quotes around the find glob were dropped by IRC ... but there are no quotes around the find glob :-D [13:55:12] So the first run cleans up the current directory. [13:55:30] And the second run of the script would clean up remaining pyc files. [13:55:44] Did this ever work as expected? [13:55:53] so it should be: find . -name "*.pyc" | xargs --no-run-if-empty rm ? [13:55:58] Right. [13:56:14] well, yeah, without the --no-run-if-empty param, it used to work fine [13:56:22] and then you added that and I was like - that seems perfectly sensible [13:56:32] but it has this weird side effect. K, it's my fault so I'll fix it [13:58:37] (PS1) Milimetric: Fix pyc cleanup in testing script [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155032 [13:58:56] (PS1) QChris: Make the test script clean up all pyc files even if there are some in cwd [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155033 [13:59:08] :) [13:59:16] :-D [13:59:56] (Abandoned) QChris: Make the test script clean up all pyc files even if there are some in cwd [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155033 (owner: QChris) [14:01:00] (Restored) QChris: Make the test script clean up all pyc files even if there are some in cwd [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155033 (owner: QChris) [14:01:15] (CR) Milimetric: [C: 2] Make the test script clean up all pyc files even if there are some in cwd [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155033 (owner: QChris) [14:01:17] ottomata: yay for upgrading stat machines :) [14:01:23] (Merged) jenkins-bot: Make the test script clean up all pyc files even if there are some in cwd [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155033 (owner: QChris) [14:01:26] just one! [14:01:27] :) [14:01:43] (Abandoned) Milimetric: Fix pyc cleanup in testing script [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/155032 (owner: Milimetric) [14:03:19] ottomata: for now :) [14:34:57] Analytics / General/Unknown: Packet loss alarm on oxygen on 2014-08-16 - https://bugzilla.wikimedia.org/69663#c2 (nuria) RESO/FIX>REOP Reopening as it looks there were other intervals affected. [14:53:40] (PS11) Nuria: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [14:53:47] (CR) jenkins-bot: [V: -1] Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [15:03:48] ottomata: staff started for Realz :) [15:17:36] qchris: i have updated the mobile and zero streams with occurrence of bug 69663 [15:17:50] cc qchris_meeting [15:25:36] nuria: Great! Thanks. [15:37:30] Analytics / Wikistats: add talk pages count to project statistics - https://bugzilla.wikimedia.org/69742 (Amir E. Aharoni) NEW p:Unprio s:normal a:None The talk pages count is not shown in the project statistics. It should be shown. [15:41:08] kevinator / tnegrin: 1404 reports ran last night in 23 minutes [15:41:25] wow! [15:41:30] it's crazy - whatever sean and them did on labsdb has improved the performance by an incredible amount [15:41:38] flash [15:41:40] rolling active editor is running in like sub-second times mostly [15:41:43] yeah, seriously [15:41:52] still a very very bad query :) [15:41:55] let him know -- that's f'ing great [15:42:01] but we can live with these numbers [15:42:08] do we even have benchmarks to quantify the increase in performance? [15:42:11] meh -- through hardware at it is a time honored strategy [15:42:16] throw [15:42:47] sounds like 100 times faster, may be 1000 [15:43:08] that means we can backfill like 10-20 days every day pretty comfortably, so we'd catch up in about half a year, but we can make it many orders of magnitude better with a better intermediary table, connection pooling, etc. [15:43:35] well, so it used to run in 2-7 minutes, now it runs in 1-3 seconds [15:43:43] halfak, Ironholds, YuviPanda, leila: stat1003 upgrade done! UGOTYOPACKAGES [15:43:51] so 120 - 140 times faster [15:43:51] that was very easy! [15:44:00] Analytics / Wikistats: for all namespaces show page count without redirects - https://bugzilla.wikimedia.org/69743 (Amir E. Aharoni) NEW p:Unprio s:normal a:None In the "Database records per namespace" section of project statistics all the pages counts include redirects. This may be useful... [15:44:08] you made some people super happy ottomata. thanks so much! [15:44:27] yay! happiness [15:45:57] Analytics / Tech community metrics: Duplicate item in list on korma.wmflabs.org/browser/bugzilla_response_time.html - https://bugzilla.wikimedia.org/61860#c5 (Santiago Dueñas) Opsss, this is already solved. You can close it. [15:46:44] aw, premature happy on my run-times [15:46:49] enwiki still took 400 seconds [15:46:58] tnegrin / kevinator ^ [15:47:19] aww [15:47:30] but still - no errors, no hanging connections, no mystery - just chugging away [15:47:41] I went looking for more happy news (they say it comes in threes) [15:47:43] and 23 minutes per day of backfilling means we can catch up in at worst 6 months [15:47:59] yeah, it’s still a huge improvement [15:48:06] that's great -- you might tell sean about the en-wiki situation [15:48:18] i remember some of these were not upgraded yet [15:48:19] there might be some addtl optimization [15:48:29] there is, he showed us some tricks [15:48:35] ottomata: \o/ cool [15:48:38] we know we can get another order of magnitude or so [15:48:42] IIRC, flash helps with seeks not with reads [15:48:50] so as the amount of data increases, the benefits go down [15:49:27] yep, exactly. Ultimately, it'd be great to throw some of this data into our ETL and have it pre-crunch a table like Aaron and I made [15:49:36] then we hook up quarry to that and - watch out [15:49:45] the whole world's gonna be interested in studying that cluster [15:50:02] (PS12) Nuria: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [15:50:14] (CR) jenkins-bot: [V: -1] Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [16:00:46] Analytics / Wikistats: for all namespaces show page count without redirects - https://bugzilla.wikimedia.org/69743#c1 (Erik Zachte) I wasn't even aware that redirects also occur in other namespaces than 0, but of course why wouldn't they? So yes this would be new metric, but not a trivial change (the... [16:13:08] (PS13) Milimetric: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [16:13:19] (PS14) Milimetric: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) [16:13:57] Analytics / Tech community metrics: Bugzilla ticket with recent comments listed under "Longest time without comment" on bugzilla_response_time.html - https://bugzilla.wikimedia.org/64373#c11 (Santiago Dueñas) The problem here is we only track those issues from the list of products and components (key p... [16:25:43] milimetric: looks like this bug was fixed a while back, but it was only closed recently. Is it worthy of mentioning this in the showcase? https://bugzilla.wikimedia.org/show_bug.cgi?id=67030 [16:27:58] sure kevinator, otherwise we'd never get credit :) [16:29:55] done, they are in the spreadsheet [16:30:02] err, the slide deck [16:33:54] (CR) Nuria: [C: 2] Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [16:34:01] (Merged) jenkins-bot: Ensure database sessions are always cleaned up [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/153616 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [16:34:13] (PS4) Milimetric: Disable pooling for mediawiki dbs [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154851 (https://bugzilla.wikimedia.org/68833) [16:35:12] Analytics / Wikimetrics: session management - https://bugzilla.wikimedia.org/68833 (Dan Andreescu) PATC>RESO/FIX [17:30:23] milimetric: nuria: is there anything else to add / edit for the showcase slide deck? [17:31:07] Aslo, which one of you will talk in the showcase section? [17:35:20] I concede to milimetric as he did most of teh work on sessions [17:35:23] *the [17:35:48] i've removed all bullet points from showcase presentation [17:36:15] ahem ... [17:36:15] sorry if that was too drastic [17:36:18] but i am kind of alergic to bullet points [17:49:14] yeah, it looks nice without bullet points. [18:18:17] (CR) Nuria: [V: 2] Disable pooling for mediawiki dbs [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/154851 (https://bugzilla.wikimedia.org/68833) (owner: Milimetric) [18:32:45] Hey guys. Sorry to be late for the showcase. I was trying to connect the whole time and that was the first time my signal was good enough. [18:32:52] I caught the tail end of the centralauth discussion. Was that related to de-duping editors for metrics that are cross-wiki? [18:34:55] DarTar, ^ [18:35:15] halfak: yep [18:36:00] basically jaime has a request for centralauth support in Wikimetrics but I suspect that unless the issue with non-unified accounts among new users is sorted out this is not going to produce super-useful data [18:36:10] I wonder if we can team up with whoever is working with centralauth to build some good recommendations for account unification from our deduping explorations. [18:36:24] Oh yeah. Good point. [18:36:37] That should be sorted immediately though, right? That seems like a nasty bug. [18:41:54] halfak: I haven’t looked into this recently, but I’ll try and find the relevant BZ tickets, I was not under the impression that it had been fixed [18:42:24] No worries I can search around. [18:49:41] halfak: all BZ tickets are referenced from this one: https://bugzilla.wikimedia.org/show_bug.cgi?id=66101 [18:49:47] kevinator: ^^ [18:50:21] (re: the issue with non-unified new registrations, i.e. new records that don’t exist in centralauth) [18:54:12] Analytics / EventLogging: Multiple user_ids per username in account creation events from ServerSideAccountCreation log - https://bugzilla.wikimedia.org/66101#c12 (Aaron Halfaker) You probably shouldn't be able to register an account name if someone already has a global account with that name. You shou... [19:11:13] qchris_meeting: I'm playing around with HIVE now, hopefully am not breaking anything [19:13:51] can we please stop CAPITALISING the words HIVE and HADOOP? [19:14:08] It makes me WORRY I work with TEN YEAR OLDS or have ended up in SOME WEIRD ESPERANSO NIGHTMARE [19:14:35] to be fair brion lives in that nightmare so it has its plus points [19:19:51] Ironholds: OK, buT I lOvE aCROnYMs [19:19:57] Ironholds: also this is how us young kids talk [19:20:06] dnt cr f t bthrs u [19:22:59] my toes just curled, thanks [19:32:29] YuviPanda: play away [19:32:31] ! [19:33:03] ottomata: :) [19:33:16] ottomata: I suppose Idon't have any permissions that'll let me fuck things up, right? [19:33:53] ha, you probably do [19:36:07] ottomata: oh, arr. I'll keep myself down to SELECTs then [19:36:33] yuvi, you can/should create a database as your username [19:36:38] and create whatever tables you like in it [19:36:48] select from wmf_raw.webrequest insert into yuvi.woohoo [19:36:52] whateve you want:) [19:36:55] (within reason!) [20:23:08] ottomata: aaah, cool [22:09:27] Analytics / Tech community metrics: Duplicate item in list on korma.wmflabs.org/browser/bugzilla_response_time.html - https://bugzilla.wikimedia.org/61860 (Andre Klapper) ASSI>RESO/FIX [22:56:27] Analytics / Tech community metrics: Bugzilla ticket with recent comments listed under "Longest time without comment" on bugzilla_response_time.html - https://bugzilla.wikimedia.org/64373#c12 (Andre Klapper) Disclaimer: I'm rather clueless & it's late, so this might be naive: (In reply to Santiago Dueñ...