[01:48:44] yuvipanda, that is excellent thank you [01:49:22] halfak, that is what i did, i acutally used wikidata-toolkit to get the sitelink and title and the your mwxmldump package to get pageid [14:00:16] o/ [14:33:01] \o_ [15:14:33] * halfak extends and cleans up ORES documentation. [18:24:51] yuvipanda, can I run a mysql client in PAWS? [18:25:06] halfak: from the commandline? [18:25:08] Yeah [18:25:19] halfak: I haven't installed it, but I could. [18:25:32] Would be nice to work with my Makefile [18:26:00] * yuvipanda nods [18:26:00] Could you install less while you are at it? :) [18:26:04] haha [18:26:15] Have time to do this now, or should I find another way in the short-term? [18:26:35] halfak: I'm doing it now, but a build takes 30mins. [18:26:45] halfak: http://mycli.net/ is what I was going to install as well [18:27:20] Sounds interesting [18:32:46] halfak: I'm building it now, gonna be a while [19:03:57] yuvipanda, can I also have wget :D [19:04:50] halfak: haha ok :) [19:05:00] halfak: the previous build is still running tho [19:05:14] I should move these to the on-labs build server soon [19:59:00] Nooo. PAWS locked up [20:00:39] halfak: your browser? [20:00:39] or? [20:00:39] Nope. 504 now [20:00:39] waaa [20:01:04] wtf [20:01:08] hmm, works for me. [20:01:20] halfak: what request caused the 504? [20:01:37] yeah [20:01:42] My home dir [20:01:56] https://paws.wmflabs.org/paws/user/EpochFail [20:02:09] I have a terminal open that froze too [20:02:26] what are we talking about here [20:02:37] tell me what were dealing with guys [20:03:34] aint no mountain high enough baby...ya know? [20:03:57] halfak: looking into it now. opening in another tab still faisl for you? [20:04:17] "connecting to kernel" "gateway timeout" yeah [20:04:46] paws-public doesn't load either [20:06:19] I can ssh into your container [20:06:42] and I see the things running [20:09:28] Weird. [20:09:39] am investigating [20:09:57] logout button no workie [20:11:04] halfak: give it a moment, and just try your other URLs? [20:11:20] I've just restarted the hub. it is coming up again momentarily [20:11:22] No other URLs work. Trying to get into paws.wmflabs.org now [20:11:25] kk [20:12:56] halfak: try just going to paws.wmflabs.org now [20:15:53] halfak: still not working I presume [20:15:54] yuvipanda, still timing out [20:15:56] yeah [20:16:04] Sorry thought I'd give it a good wait [20:16:05] halfak: mind if I restart your server? [20:16:11] Sure [20:16:18] sorry about the troubles, I see which component has the issue now at least [20:18:44] halfak: try now? [20:19:37] \o/ [20:19:38] works! [20:19:40] Thanks yuvipanda [20:19:55] I'm going to upgrade and put in place some more processes there to see how it goes [20:20:06] sorry about the interruption, halfak [20:20:21] No worries :) [20:20:24] <3 PAWS [20:20:25] on the plus side, I'm moving the container images to our own repo rather than dockerhub now, which should speed up build times [20:20:34] halfak: I'm maintaining my worklog in PAWS now btw [20:20:42] nice! [20:20:56] Will be nice not to have to upload figures to commons in order to show them in my work logs :/ [20:21:01] yeah. [20:21:26] halfak: https://etherpad.wikimedia.org/p/paws-public-url-structure is paws public URL proposal, me and JMo debated it a while ago and seem mostly happy with it [20:22:56] yuvipanda, why make user space not-write-able by others? [20:23:12] Hmm... wait... maybe that does make sense. [20:24:06] our problem is that execution is always personal, and publishing is split between perosnal and 'common'. We don't have a solution for 'common publishing' yet [20:24:51] Yeah. That's fair. Personal publishing makes sense. [20:25:06] And disallowing changes to personal publications seems reasonable. [20:25:08] it also means you can unpublish them at any point of time [20:25:17] it's also the easy thing to do :D [20:25:34] Maybe a pull-request pattern for personal space? [20:25:37] we'll have 'common publishing' at some point, which would be more work and need setups [20:25:54] halfak: that would be fairly difficult to implement in many ways. [20:26:18] halfak: my hope is that actual work lives in projects (tbd how they will work!) and peopl ewill just use personal space for quick showoffs [20:26:48] so you'd fork off a project, do stuff on your personal space, then send a pullrequest type thing (or a wiki-type-edit - tbd!) back [20:27:15] but that's a way off - more than 3-6 months away, I think. [20:27:23] step 1 is just rock solid stability [20:27:39] yuvipanda, I think most of my worklogs should live in user space. [20:27:54] Or a user-prefixed part of public [20:28:05] +1 [20:28:11] mine too [20:28:39] but userspace will have no versioning, since building that would require a few years :) [20:28:49] and without versioning, it is hard to allow other people to edit [20:29:26] Eek. [20:29:31] yeah. Want versioning. [20:29:34] real projects would hopefully be based on git or git-like versioning [20:29:36] User-prefix it is! [20:29:50] in the meantime, you can just use git yoursefl from the terminal, but that's not a long term solution [20:30:12] Now, for the published "objects", it seems like those should not be notebooks. [20:30:16] But notebook containers. [20:31:02] halfak: indeed. they sohuld be a code+data+environment object thingy [20:31:07] +1 [20:31:21] that should be standardized across a lot of places [20:31:21] * halfak would like to wrap each "object" up into a repo [20:31:26] indeed. [20:32:06] that's my hope too. either one git repo, or in IPFS terminology, one IPNS-per-project (this gives them all the same characteristics of a git repository, in addition to some new and useful ones) [20:32:08] So, I have a personal space for experimenting with "objects and stuff" and a public space for publishing an "object" [20:32:26] IPNS-per-project sounds good [20:33:32] the whole 'execution will always be personal, while publishing is personal or group' is a bit hard to think about [20:33:56] Yeah [20:34:04] Well... sort of. [20:34:08] That sounds like github [20:34:17] github doesn't have personal execution on the cloud [20:34:18] Except the only instance you can use is your machine [20:34:24] Yeah [20:34:30] indeed, so that's the confusing stretchy part [20:34:41] How many instances can a user have? [20:34:50] right now one, but that'll probably change in the future [20:35:15] I'd like to offer 'omnibus container', 'R container', etc (maybe?) [20:35:34] and also forking a particular published object should open you up in a container that has that particular environment already setup [20:35:52] so this would mean you can have X instances at a time, for some configurable quota of X [20:38:43] yuvipanda, +1 for small quota. [20:38:54] Even 1 would be OK if I can freeze state along the way [20:39:30] halfak: yeah, there's a lot of work being done around userspace checkpointing that I hope makes its way to us mere mortals in a few months [20:40:18] yuvipanda, ^ in IPFS or Jupyterhub? [20:40:57] halfak: in docker and the lniux kernel [20:41:05] Gotcha [20:56:14] https://ores-staging.wmflabs.org/ [20:56:19] ^ new, fancy-lookin [20:58:30] ragesoss, [20:58:35] I have a present for you :) [20:58:58] click [20:59:08] https://ores-staging.wmflabs.org/v2/scores/enwiki/damaging/642215410/?features [20:59:44] https://ores-staging.wmflabs.org/v2/scores/enwiki/wp10/642215410/?features&feature.enwiki.revision.cn_templates=2 [21:00:02] You can now see and modify features when making requests to ORES :) [21:00:13] I haven't deployed yet, but it seems to work well in staging :) [21:02:03] halfak: so you set {{cn}} to 2, even though there's really only 1, and it shifts the predictions accordingly. [21:02:11] Yes [21:02:13] :D [21:02:20] :-D [21:02:26] Cool right? [21:02:31] yessssssss [21:02:34] :D [21:02:38] Sorry it took so long [21:03:17] no problem. I've had a ton of other things taking up my time, so this is actually good timing; I can probaby dive back into revision scoring stuff soon. [21:03:25] halfak: how're you caching these btw? [21:03:33] yuvipanda, good Q! [21:04:05] So I convert the features that you inject into a sorted tuple, run it through python's hash(), convert that to a string and append it onto the cache key [21:04:22] So, multiple requests for the same revision with the same injected features will pull from the cache. [21:04:49] But we'll never end up in the situation where an injected scoring gets cached in place of a natural scoring [21:04:59] * halfak did a lot of testing [21:05:38] Also, we don't actually store the cached values, so again, someone can send us freeform text, but we'll never store it :) [21:05:58] ^ *injected values [21:06:04] :D [21:06:10] cool! [21:06:27] * halfak plans to experiment with this at the hackathon this weekend. [21:06:39] halfak: I'd advice against using hash() btw [21:06:47] yuvipanda, sha1 instead? [21:06:54] * yuvipanda nods [21:07:05] Hmm... I guess if I sort and then str() then sha1 [21:07:20] halfak: yup. [21:07:25] That works. [21:07:28] Why not hash()? [21:07:53] it's not a true hash function, just uses pointers in RAM [21:08:16] Hmm... it generates hashes consistently though [21:08:20] also [21:08:34] Return a hash value for the object. Two objects with the same value have\nthe same hash value. The reverse is not necessarily true, but likely. [21:08:46] E.g. hash((1, 2, 3, "foo")) == hash((1, 2, 3, "foo")) [21:09:12] halfak: on the same python instance :) [21:09:14] I suppose the hash space is limited by int64 [21:09:21] halfak: -4727268686433886156 for me [21:09:22] yuvipanda, fair enough. [21:09:29] halfak: but -6885842389583586115 on a server [21:09:36] 2307326341089471994 for me [21:09:37] OK [21:09:40] sha1 it is! [21:09:45] halfak: and since we have multiple webservers accessing it, it's not consistent enough to be useful. [21:09:51] +1 [21:10:01] halfak: if you restart your python process, it'll give you a different value [21:10:04] * halfak wants yuvipanda doing code review :P [21:10:27] brb [21:10:29] halfak: so hash() is only useful for doing Object comparisons in the same running process (this happens very fast, because pointers) [21:15:20] halfak: this is surprising: https://ores-staging.wmflabs.org/v2/scores/enwiki/wp10/642215410/?features&feature.enwiki.revision.cite_templates=0 [21:15:46] doing cite_templates 0 substantially pushes up the quality prediction. [21:18:42] halfak: where's the best place to read documentation of how the features are defined? [21:26:19] ragesoss, cite_templates is a custom one for enwiki, but most features can be looked-up at http://pythonhosted.org/revscoring/ [21:28:08] halfak: expect some turbulence in paws atm, fiddling a bit. [21:28:17] should prevent things like what happened to you earlier [21:28:35] yuvipanda, no worries. Out for a while [22:27:38] halfak: hey, https://ores-staging.wmflabs.org/ [22:28:05] Hosted on Wikimedia Labs @ written in Python 3 [22:28:18] that "@" should be "and" what do you think?