[18:09:47] hey where are configs like namespace name -> namespace number stored? [18:09:58] those vary by wiki, and they're in some PHP files somewhere...? [18:11:45] this? https://github.com/wikimedia/operations-mediawiki-config [18:13:08] milimetric: extension.json or the repo you linked [18:17:30] thanks tgr! but what's extension.json? [18:17:52] I see the repo has php files that change over time, so rebuilding the history of namespace changes seems ... tricky :) [18:18:37] extensions specify the namespaces their use (along with all other configuration settings) in the file /extension.json [18:18:55] or .php for older extensions [18:20:22] namespace changes should be very rare as they mean wiki links have to be changed and whatnot [18:20:53] you wouldn't be very far off by just assuming the current assignments were always true [18:23:00] new namespaces are added every so often, but very rarely removed [18:24:01] ok! thanks both [20:01:22] milimetric: it's the best lib for screen scraping, but that's like being the best tool to have when you have been trapped under an avalanche of garbage [20:01:56] handy if you end up in that situation but never where someone wants to end up when they plan their day [20:02:43] dammit I made into !bash :) [20:04:14] *made it [20:27:43] ok, we have this data loaded in hive, it's in milimetric.namespace_mapping [20:27:57] https://gist.github.com/milimetric/fd14bcf6131c676fd11db30d19e32a0a with absolutely no soup, beautiful or otherwise [20:28:26] so this will be incredibly useful in untangling page history (moves etc. which are logged without target namespace in the logging table) [20:28:38] erm wait [20:28:40] but it will also be useful when trying to get the namespace for pages out of webrequest [20:28:45] why are you still screenscraping action=sitematrix? [20:28:52] to get all the sites [20:29:03] easiest way, right? [20:29:10] no but why are you screenscraping it? [20:29:12] https://www.mediawiki.org/w/api.php?action=sitematrix&format=json [20:29:18] return re.findall('"url": "([^"]*)"', sitematrix) [20:29:20] oh oh, that was just lazy [20:29:21] just parse the JSON...? [20:29:26] it's probably faster if i'm just getting the urls [20:29:37] than parsing the whole thing as json [20:29:46] does it matter? [20:30:05] not really, but that part doesn't matter at all anyway [20:30:25] the important part is we have namespace info in hive :) [20:30:25] I mean, screenscraping is obviously going to break in the future, but the actual JSON isn't [20:31:23] (entering mode where we talk about stuff that doesn't matter) I agree with you that it's ugly to screenscrape but in this case I want everything that has a url property, so if the property name changes it would break both json parsing and screenscraping [20:31:39] and if new things with url properties get added, the next step (querying the api on those urls) just won't work [20:31:59] so in general, you're right, in this case, it's probably very similarly rigid [20:32:47] sorry. I don't understand your logic at all [20:33:10] so I'm getting the "something" out of "url": "something", right? [20:33:36] if I parsed this as json it would be result[..][..]["url"], right? [20:33:49] no, I don't understand why you would want to screenscrape instead of just parse the JSON [20:34:30] oh, I thought about it for like 2 seconds and was just lazy, because this is throw-away [20:35:01] but I'm saying it doesn't matter anyway, parsing the json would be just as rigid in this cse [20:35:13] 'cause you gotta hardcode "url" [20:35:57] I suppose if the url had \" in it, it would break :) [20:36:43] but if you guys put up a new project with " in the hostname I'm not sure this little throw-away script is the biggest problem [20:40:10] btw, I've parsed the sitematrix before and it's a really ugly format: https://github.com/wikimedia/analytics-wikimetrics/blob/master/scripts/admin#L79 [20:46:11] huh [20:46:30] how does that even work? there is no space after the attribute name in format=json [20:48:17] oh, you are just relying on the default format being jsonfm [20:49:35] and assuming jsonfm will never change (e.g. get syntax highlights) [20:49:51] and assuming the url won't contain any character that needs to be either JSON-encoded or HTML-encoded [20:50:04] that seems... very unwise [21:09:52] has something substantial changed in MWMultiVersion? I'm getting this on phpunit: #0 MWMultiVersion::error(MWMultiVersion instance already set! [21:09:59] didn't happen last week... [21:33:31] SMalyshev: I'm guessing you are talking about mw-vagrant. It doesn't look like we've changed any multiwiki code there for ~2 months -- https://github.com/wikimedia/mediawiki-vagrant/tree/master/puppet/modules/mediawiki/templates/multiwiki [21:35:06] bd808: yes. strange... it used to work last week, I've upgraded everything to latest version and it didn't anymore... [21:35:21] I'll dig in my custom settings then, maybe I messed something up [21:36:09] bd808: btw, if you're already here, mind checking out https://gerrit.wikimedia.org/r/#/c/296643/ ? [21:38:22] {{done}} [21:38:35] thank you! [21:39:56] SMalyshev: feel free to self-merge those kinds of mv-vagrant config changes too. [21:40:26] you may be the only person who cares about the wikidata role right now :) [21:40:37] bd808: ok :) I try to not self-merge just in case I do something stupid and not notice it. More eyes is better :) [21:40:50] true enough [23:06:24] should we try to fix the parent/child relationship of the parent tasks and subtasks of T16950 ? [23:06:24] T16950: Support global preferences - https://phabricator.wikimedia.org/T16950 [23:07:20] * robla can't figure out if he screwed up in how he added T120385 as a subtask of T16950