[00:05:51] well its working Platonides i've been cleanup and copyediting the en lang files and git is not exploding :P [00:34:25] where can i find MediaWiki:Apihelp-bs-responsibleeditorspages-store-description in the core files? [00:36:00] where can i find MediaWiki:Apihelp-bs-responsibleeditorspages-store-description in the core files? [00:37:33] that doesn't sound like a core message to me [00:38:45] where could i find that [00:45:36] bs bluespice? [00:46:03] Yup [00:46:10] extensions/BlueSpiceExtensions/ResponsibleEditors/i18n/en.json [02:16:48] When applying an upgrade to a MW site, is it common to take the web server offline? [04:10:02] is there a mediawiki docker image? [04:10:22] ideally I download a script of some form, put it on a machine with lots of disk space, run the script, wait a few hours, and come back to a local copy of mediawiki [04:10:34] (I need to query the site a lot, and I don't want to hit main wikipedia -- I'd rather hit a local cache) [04:11:41] abcdefg_, sounds like what you want is a wikipedia db dump [04:12:04] Krenair: I tried that already. We ahve XY problem, let me take a step back. [04:12:22] so I have downloaded the wikipedia db dump; I have parsed the XML, but there's still some mediawiki specific markdown in the text. [04:12:33] My original problem = I want all of wikipedia, in raw text format, with no markup at all. [04:12:54] you want the html it gets parsed to by mediawiki? [04:12:56] What I currently wnat to try: build local install of mediawiki en dump, run w3m against every page [local copy so I don't dos wikipedia.org] [04:13:10] I want the HTML I see when I visit wikipedia.org [04:13:20] yeah, so I need a function with type signature "markdown -> HTML" [04:13:30] well you'll need all the parser extensions etc. installed on wikipedia, as well as all templates and modules [04:13:39] scribunto, parserfunctions, etc. [04:14:01] would something like https://www.mediawiki.org/wiki/MediaWiki-Vagrant handle it all? [04:14:07] I wonder if vagrant would give you - [04:14:07] yeah [04:14:14] this looks like I download this script, run it, it builds a VM with wikipedia.org en edition inside it [04:14:29] not exactly [04:14:57] it'll give you a blank install, you can probably import a database dump from wikipedia [04:15:28] okay, I can view the entire VM as a function with type siganture " database dump -> (article name = string) -> HTML" [04:15:39] so I'll have to throw in the db dump, but then I can start querying it [04:16:12] any other advice? [04:16:21] (it seems suprisinlgy convoluted way to just extract plain text) [04:16:55] <[UAwiki]> VETE A LA MIERDA ABCDEFG_ [04:17:08] <[UAwiki]> PUDRETE ABCDEFG_ [04:17:48] lol, was not aware I was talking to an op [04:19:08] our op status is not a big deal [04:20:06] abcdefg_, yeah, I guess ideally there'd be dumps containing HTML instead of the original wikitext source [04:20:30] part of the philosophy of the freenode network is not to stay oped unless you are using it, so there are a few of us in the channel [04:21:52] so the 'real' XY problem is that I need a trainigset for Word2Vec [04:22:08] and it seemed like an wikipedia would be an nice training set (but the markup screws things up, so I need plain text) [04:23:36] wait wait [04:23:41] ubuntu has mediawiki packages [04:23:48] I should be able to just apt-get those and drop in a db dump? [04:23:57] what version is the ubuntu package? [04:24:19] Krenair: We used to have HTML dumps. [04:24:25] But they broke and nobody wants to fix them. [04:24:32] is there a task? [04:24:34] Scraping the live site is easy enough to get HTML quickly. [04:24:48] Debra, yeah but that's not great in cases like abcdefg_'s :( [04:24:57] https://phabricator.wikimedia.org/T17017 [04:25:00] also doens't seem very polite :-) [04:25:29] As long as you use canonical URLs, it's fine. [04:25:36] Wikipedia can handle the traffic. [04:28:50] Debra: Are you the right person to ask about this https://github.com/mzmcbride/database-reports/blob/master/reports/enwiki/unusedtemplates.py SQL query? [04:29:17] I'm trying to fathom what it does and make it usable. [04:30:27] Krenair: I have 1.1.19 instaled (mediawiki) [04:30:32] Debra, so they got a wikimedia server with 12TB space assigned last year, and... haven't implemented the service? [04:31:23] 1.1.19? not 1.19? [04:31:26] abcdefg_: Probably not 1.1.19. [04:31:43] you're right, I misread [04:31:45] Krenair: NFI. That task is a mess. [04:31:48] 1.19 is old too. [04:31:50] it's Version: 1:1.19.20+dfsg-2.3 [04:32:23] Debra: I can't take an older db dump? I don't need the latest, I'm happy with dumps from 2-3 years ago. [04:32:26] will Scribunto run on 1.19? [04:32:44] abcdefg_, looks like we're talking more like 8 years ago [04:33:02] I may have missed what you're trying to do. [04:33:09] task created 3/8/2008 [04:33:15] <[Alvaromolina]> ABCDEF_ ERES UNA MIERDA.. PUDRETE [04:33:21] I'm looking for training data for word2vec (a machine learning algorithm) [04:33:33] for this algorithm, I need massive amounts of English text data (since it's based on what words appear close to other words) [04:33:50] So I want to take a wiki dump, strip out the xml (which I can already do) and then strip out the markdown and get plain text. [04:34:21] Wikipedia text is repetitive in weird ways. [04:34:25] It's not really representative. [04:34:31] But you could use Parsoid, maybe. [04:34:40] Parsing current wikitext with MediaWiki 1.19 would be awful. [04:34:50] Parsing wikitext is a pain in the ass with modern software. [04:35:14] https://en.wikipedia.org/wiki/Cheese?action=render [04:35:21] https://en.wikipedia.org/wiki/Cheese?printable=yes [04:35:31] Could just scrape those, but that would be slower. [04:35:58] How would I get a list of article titles? Would I just random walk the site ? [04:36:13] You can use api.php. [04:36:35] https://de.wikipedia.org/w/api.php?action=help&modules=query%2Ballpages [04:36:40] https://en.wikipedia.org/w/api.php?action=help&modules=query%2Ballpages [04:37:54] okay; in this case I should just scrape instead [04:37:56] thanks for your insights [04:37:59] You can also use api.php to get the HTML. [04:38:03] But that's less cached. [04:38:06] And maybe a bit slower. [04:40:59] quick question: is the Zim format: http://www.kiwix.org/wiki/Main_Page any better ? [04:41:27] I've never used it. [04:41:33] But Kiwix seems moderately successful. [04:41:41] Hi Niharika. [04:41:49] Niharika: I updated that report today. Someone was asking about it. [04:43:28] Which part of the query is confusing? [04:44:28] Debra: is this directed at me or Niharika ? [04:45:16] Niharika. [04:46:36] Debra: I'm confused about toolserver.namespace. Does that table still exist or was it abandoned when we switched from toolserver? [04:47:15] Debra: Yep, someone was asking about it and I was planning to put it on the bot. [04:47:29] So this query still works? I can put it in as is. [04:49:45] Niharika: That query probably mostly works. [04:49:49] You can just remove the ns_name stuff. [04:50:11] Switch it to page_namespace or just hard-code "Template". [04:50:16] Either approach is fine. [04:50:35] Debra: Alright. Thanks. [04:50:43] toolserver.namespace was probably re-created on Tool Labs. [04:50:47] But it's probably not needed. [04:51:23] https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Unused_templates/Configuration is the script I used. [04:51:35] I still don't really use the "import reports" abstraction. [04:52:42] Ah. [04:53:26] https://dpaste.de/Hhxg/raw is what I used. [04:53:39] The Git version is more modern. [04:53:45] But this was quicker for me. [04:53:49] More isolated. [04:54:27] I also increased rows per page from 1000 to 2000. [04:54:35] And it still went to 26 pages. [04:54:45] https://en.wikipedia.org/wiki/Wikipedia:Database_reports/Unused_templates Silly. [05:03:15] The report hadn't been generated in a long time. [05:03:39] I'm planning to ditch the pagination and just show top five thousand or something. [05:05:24] Loading 5000 results on a page is kind of a lot. [05:05:37] The report hadn't been updated in two years, but we have Special:UnusedTemplates. [05:05:41] I'm not sure it's very valuable. [05:10:20] does it take into accounts, redirects to the templates? [05:11:17] Probably. [12:19:12] hi [12:22:40] i don't understand why there is a nulled page in "algebra" [12:23:48] they should add more learning algebra [16:41:37] Are we allowed to add our names to special:Credits and to the credits file? [16:47:17] test [16:55:13] Is it bad im to scared to jump into the real code and im just doing the easy stuff that would never break anything :P [17:21:22] addshore: thanks for the +2 code review :) [17:21:37] no worries :) [17:31:20] addshore: i replied to your code review in #wikimedia-dev and I also asked for your opinion in an phab task [18:04:51] Zppix: no, it's not bad [18:05:09] as a fun fact, when I was finally given commit access [18:05:36] Platonides: you scared me i was deep into coding something and i think i'm going to go grab paper towels to clean up keyboard from the tea i spilt now :P [18:05:39] my first commit was one touching the parser [18:06:28] i had my first edit merged today as well it was https://gerrit.wikimedia.org/r/302111 change [18:23:08] Zppix: it was reverted :) [18:23:14] ? [18:23:20] that first commit [18:23:47] of mine [18:23:48] what you talking about? [18:23:50] oh [18:23:51] ok [18:23:54] you uh confused me [18:26:33] Platonides: are u able to +2 this https://gerrit.wikimedia.org/r/#/c/302129/ [18:27:25] I don't think so [18:27:29] it's in operations/puppet [18:27:50] what can u +2? [18:28:07] I have +2 in mediawiki [18:28:22] could u +2 my doc and lang file edits? [18:28:50] link? [18:28:55] kk [18:29:18] https://gerrit.wikimedia.org/r/302061, https://gerrit.wikimedia.org/r/302063, https://gerrit.wikimedia.org/r/302116 [18:29:56] oh Platonides, tengo unos cuantos +2 para tí, mwahahahaha [18:32:49] even if you can +2 in ops/puppet it's only useful if you have ops rights, you need to be able to run puppet-merge on the production puppetmasters [18:35:03] it's the same type of thing as the MW deployment branches [18:35:11] i figured [18:35:22] gerrit and mw are on the same servers arent they [18:36:04] people who do actually have the rights necessary to do such things would not be happy to find such unexplained items coming up :) [18:36:04] Zppix, mw being... mediawiki.org? [18:36:16] yes [18:36:26] mediawiki.org runs on the same group of servers as wikipedia.org [18:36:31] and meta.wikimedia.org [18:36:44] gerrit.wikimedia.org is on it's own special server, it runs completely different software [18:37:53] enwiki is prob about to get its own it has to be taking a toll on the servers i believe its the biggest wiki on the wikimedia foundation servers considering [18:38:31] it's a large group of servers [18:38:39] you couldn't run en.wikipedia.org on a single machine [18:39:09] enwiki gets it's own group of database servers, but shares mediawiki application servers with the rest of the wikimedia wikis [18:39:41] it had to be special :P [18:39:47] db is what i meant haha my brain is totally dead atm [18:40:30] mafk: its not special its just enwiki is "main" wiki of wikipedia its pretty big and i think the userbase is in the top 2 german wikipedia if not bigger then the second biggest [18:40:41] english and german are the most commonly used langs on the internet [18:40:58] spanish is around 3rd [18:40:59] Zppix: I know, I was just joking :) [18:41:03] ok [18:41:10] it has it's own database servers... plural. [18:41:14] I've been around on WMF for ~8 years [18:41:25] geez, getting old :| [18:42:08] oh you're gonna make me look... i think i started as an IP? then i made this account on enwiki jan 2015 (citation needed :P) then i started really developing yesterday night [18:42:16] developing mw that is [18:42:47] I'm not coder, I just do petty stuff. [18:42:50] We don't always take our IP edits into account... :) [18:43:01] annoying little bugs (tm) :D [18:43:35] i've been focusing on documentation and lang files for english at this time documention thats in the repo atleast [18:44:14] i hate the bugs that are so minior its like removing one single letter in a word and its fixed it annoys me sometimes i see that on like other things i dev and im like y computer must u do this to meh [18:46:03] that's my area of expertise lol [18:46:24] a dot here, a comma there [18:46:37] * mafk feels so useless sometimes [18:48:21] hey editing the lang files requires 0 code knowledge unless your adding more to it thats not already there... but besides that as long as whats there stays thats required you do whatever changes that are needed/helpful [18:48:48] Krenair: can u +2 some things for me it appears Platonides went away [18:49:07] what things? [18:49:18] https://gerrit.wikimedia.org/r/302061, https://gerrit.wikimedia.org/r/302063, https://gerrit.wikimedia.org/r/302116 [18:53:56] Platonides: i dont understand what u mean by not convinced by this change [18:54:00] that doesnt tell me anything [18:54:04] but u dont like it :P [18:54:32] in which context does morenotlisted appear? [18:55:10] i'd have to look again [18:57:23] I can't find the use hold on [18:57:56] its the more tab from what i found (source: https://gerrit.wikimedia.org/r/#/c/136610/) [18:59:34] it can't be that… [18:59:49] that's vector-more-actions [18:59:54] either way its more grammicaly correct [19:01:35] I wonder if instead of asserting that it is not complete it should say that it *may* not be complete [19:02:46] ok [19:02:57] ill fix it real quick and ill let u know [19:04:00] fixed [19:06:16] Platonides: [19:52:01] Platonides: i fixed it [20:03:26] Hi guys. Hoping someone can help; been googling this for ~30 minutes but can't find any similar issues. I could just be bad at searching. [20:03:50] My Wiki's special pages that figure stuff out based on links e.x. which pages are wanted, which pages are dead-end, seem to be broken. [20:04:09] It's listing every page as a dead-end page and no pages as wanted pages, even though there are plenty of links (Regular [[ ]] syntax) all over the wiki. [20:04:19] All caching is disabled and I haven't enabled miser mode [20:06:57] hmmm Ashley_ Let me see if anyone else has had your issue [20:07:12] Ashley_: does Special:Whatlinkshere work as expected (it displays links where they should be)? [20:07:48] Wait first off what version of mediawiki are you using Ashley_ [20:08:09] Not sure, but I downloaded it from the typical place a few days ago, so presumably the most recent stable one. I'll go check on both of those things for you [20:08:49] Whatlinkshere seems to think nothing is linked to, and I'm running 1.27 [20:09:18] I can pastebin my version page if it helps [20:11:02] Thanks for helping btw, I appreciate it :) [20:11:03] umm i dont know if this is your version as with what i found is from 2005 but do you have an maintance script within your files to run to update your special pages [20:12:15] I have maintenance/updateSpecialPages.php [20:12:42] run that [20:13:16] Just `php updates/..php`? [20:13:21] or is there a web front? [20:13:38] i dont know know the answer to that do you Vulpix [20:14:29] I executed it on the command line with PHP and got a bunch of errors. Looking into those now [20:15:10] if you could paste thoose errors into dpaste.de that could be appricated [20:16:07] Installed some software (`snmp-mibs-downloader` from apt) which fixed them all [20:16:10] Here's my output now: [20:16:37] https://dpaste.de/1Kq8 [20:19:02] there [20:19:10] now if you check for instance special:Whatlinkshere [20:19:20] it should be working better if not perfect [20:19:50] Still nothing :( [20:19:57] hmmm [20:20:10] "Wantedfiles [QueryPage] got 0 rows in 0.00s" this looks wrong to me too [20:20:17] I guarantee there's plenty of red links on my wiki [20:20:27] meant to paste the Wantedpages line; it's also 0. [20:20:31] Tell you what... if you publish this bug on phabricator.wikimedia.org, I guarantee you someone will be by soon [20:20:49] because this is not my speciality at the moment with special pages [20:20:56] but its definetly worth to look into [20:21:15] I'm worried I may've screwed something up rather than it being an issue in mediawiki though; should I still report there? [20:21:48] Unless you went through all the backend code and changed stuff I can't find a way you could've messed it up [20:21:54] but to answer question yes please report it [20:22:38] Okay. Will they need access to my server and/or front-end wiki though? It's behind auth on a private server and I'm not able to provide access to anyone outside [20:22:44] No [20:23:00] We will communicate with you on there [20:23:02] Brilliant, thank you very much for the help! :) I really appreciate it even if the problem's not resolved. Have a great day! [20:23:20] No problem, come back anytime! Have a great rest of the day! [20:25:29] Krenair: the translate wiki stuff is in holy crap look at gerrit :P [20:25:47] hm? [20:26:09] you mean l10n-bot is doing its thing? [20:26:37] yes [20:26:56] also 2 of my commits werent re looked at you mind doing that Krenair ? [20:27:33] yeah, well... it's being doing that for several years [20:27:49] https://gerrit.wikimedia.org/r/302061 , https://gerrit.wikimedia.org/r/302063 please look at this Krenair and +2 pls [20:29:11] I don't see much point in most of the changes in https://gerrit.wikimedia.org/r/#/c/302063/3/includes/api/i18n/en.json [20:29:45] Ashley_: can you run the https://www.mediawiki.org/wiki/Manual:ShowJobs.php maintenance script and see if it reports anything? [20:29:52] I'm also not sure about https://gerrit.wikimedia.org/r/#/c/302061/3/languages/i18n/en.json [20:30:05] that script is under the maintenance folder, you'll need to run it through command line [20:30:13] Vuplix: Just the number 233 [20:30:32] what line Krenair [20:30:34] Assuming that means 233 jobs piled up, I assume that is not a good thing [20:30:52] yeah, apparently, jobs aren't being executed on your installation [20:31:32] could that be because www-data only owns the image folders? [20:32:09] oh Krenair the changes are because of grammar and flow would you write show latest revisions or show the latest revisions? [20:32:10] no idea how MW's jobs run but it might not have execute privileges in some dir [20:32:24] You may need to set https://www.mediawiki.org/wiki/Manual:$wgRunJobsAsync to false, the current job execution is broken on several situations without this setting to false [20:32:40] and it's not a permission issue [20:32:54] Zppix, I'd be more likely to write 'show latest revisions' on a form element [20:33:37] Krenair: this is going to change what user sees not what the devs use [20:33:52] * Vulpix blames AaronSchulz about job queue issues [20:34:07] Neat, it's going down now. Thanks! :) I'll just hold F5 until it's 0 [20:34:19] Vulpix: is there current phab task on the jobs issue? [20:34:57] Zppix: there are lots of tasks about the job queue, let me find one that may be relevant for that... [20:35:18] Awesome, everything's perfect now (or seems it) -- thanks so much Vulpix! :) (And Zppix too) [20:35:31] Zppix: https://phabricator.wikimedia.org/T68485 and https://phabricator.wikimedia.org/T107290 for example [20:35:33] If it helps, I'm on Ubuntu 16.04 with whatever version of PHP7 apt has default this month [20:35:44] no problem Ashley_ we hope to fix that issue within a resonable time :) [20:35:59] MediaWiki's open source right? [20:36:08] Zppix: I documented some of those problems in https://www.mediawiki.org/wiki/Manual:Job_queue [20:36:50] unfortunately, since WMF doesn't use the in-process job execution, "nobody cares" [20:37:53] Vulpix: i just added some info to t107290 [20:38:35] but on the other hand, the changes that broke it on those installs were done in the hope to make it better... and I'm not sure if that was really an inprovement [20:38:56] well if it was to improve the brokeness it worked :P [20:39:15] lol [20:39:19] how did the bug not get caught while it was in gerrit and not released [20:41:15] well, it works, except when the wiki is in a reverse proxy or some not-so-simple setup that makes HTTP request from PHP to itself to fail [20:41:40] Ashley_: do you have some setup like this? (reverse proxy, etc?) [20:41:50] I'm running ~7 VirtualHosts [20:42:05] if it's trying to do localhost requests to call itself that might be problematic [20:42:15] indeed [20:42:21] if it's external, it's behind basic HTTP auth [20:42:24] so that'll probably be why [20:43:24] making an internal HTTP request to itself was never a good idea, but the change was merged anyway [20:43:48] i'd wonder if wmf would approve the devs running some tests to figure out the reason its doing this using the test wiki [20:44:05] is there not an option to run the jobs on a cron? [20:44:13] yes there is [20:44:19] last time i checked atleast [20:44:30] like i said didnt even know about this issue til now [20:44:41] running a cron for this is recommended, in fact [20:45:11] never worked on anything remotely to the scale of mediawiki of course but I've always found webapps to be the biggest PITA to keep maintenance working on, and cron always feels a bit hacky [20:46:06] hope you guys manage to track down the bug without too much difficulty :) props to you all for being active here; by far the most helpful and welcoming IRC chat I've been on [20:46:51] yw :) [20:47:36] Ashley_: ty... we just call exterminators for the bugs its the big ones we're scared of :P [21:26:53] Hey can i have an op thats also an op in #wikipedia-en-help we got a situation [21:28:29] nvm [22:07:09] anomie: r u around [22:26:07] hey Krenair can u +2 this https://gerrit.wikimedia.org/r/#/c/300557/6 [22:27:01] Zppix, I don't think so, sorry [22:27:08] ok [22:27:16] how about my changes? [22:27:58] they can found at https://gerrit.wikimedia.org/r/#/q/owner:%22Zppix+%253Csupport%2540zppixballee.com%253E%22+status:open [22:48:05] huh, new gerrit lets you edit arbitrary files in the web interface [22:49:26] Zppix: For future reference, you should specific the bug number as a line right above the change id, that has the form "Bug: TXXXX"