[16:01:16] In case you're on the YouTube stream, we're waiting to start the broadcast in a couple minutes [16:01:38] thanks :) [16:02:37] dr0ptp4kt: cool. i'm on the YouTube stream. [16:05:29] ok, stream about to start [16:06:23] hi! [16:06:29] \o [16:06:37] yo [16:06:48] yo! [16:07:14] is the audio okay for people? video? [16:07:49] yes everything is ok for me [16:07:55] thank you [16:07:55] pizzzacat: thx [16:07:58] dr0ptp4kt: ok in hangout [16:08:04] joakino: thx, too [16:08:22] still nothing on youtube.. [16:08:31] ^same [16:08:35] ^ dr0ptp4kt [16:08:44] refresh [16:08:47] I had to [16:08:59] jdlrobson: mdholloway ^ any luck w/ refresh? ^^ [16:09:01] cool, refreshing did the trick [16:09:02] yep [16:09:06] that's pretty common -- needing to refresh [16:09:10] you can probably fast forward.... [16:09:17] it was lead in the last couple minutes [16:09:18] k in [16:09:32] :) [16:09:37] it's like phabricating, just refresh all the time [16:09:48] we'll be publishing this video on mediawiki.org, as a reminder [16:10:19] ah, refresh *magic* :P [16:18:52] quick check: audio and video still good? [16:19:03] yes, all good for me [16:19:11] πŸ‘ [16:19:15] thx again pizzzacat and joakino [16:20:41] the question was whether this is organization or vertical related [16:20:53] toby is explaining it can actually be a bit of both [16:20:55] thanks dr0ptp4kt [16:24:23] competitive advantage: being open source [16:24:29] iOS for sure is not capitalizing on this [16:24:34] (enough) [16:25:32] like the framework so far, looking forward to details/examples [16:25:36] dr0ptp4kt: is our vertical the only one going through this process? how are other parts of the org working on this [16:25:48] πŸ‘ ^ [16:25:51] joakino: question received. will relay [16:25:58] thx! [16:26:22] dr0ptp4kt: is on the πŸ€ [16:26:56] is coming up with our own strategy the same as coming up with our own vision? [16:27:18] ^ [16:30:11] I guess I am looking at the 4 levels: "vision" -> "strategic, operational, and tactical planning" (excuse my military background coming through… lol) [16:30:30] coreyfloyd: i think the earlier diagram required a vision/set of goals in order to define and, eventually, help us refine a strategy [16:30:38] i think you wrote that in an email a while ago [16:31:32] probably [16:31:47] Important to note that right now Toby is presenting a description of our *current* strategy [16:32:18] aka not very directed or intentional [16:32:54] thx for clarifying kristenlans [16:32:57] and a dictionary, and a library, and ... [16:33:05] we aren't just enwiki [16:33:42] bd808 clarifying if we are focusing on wikipedia or not is a good decision point we should clarify [16:33:49] +1 JonKatz__ [16:34:08] needs more underscores JonKatz_____________ [16:34:28] bd808: i will paraphrase [16:34:29] "many, many languages" + accessibility? [16:34:38] everywhere and everyone? [16:35:31] joakino I removed one for you [16:35:56] I liked the other one... [16:35:58] that's only really true in enwiki [16:36:09] the trust thing... and some other areas [16:36:12] 😌 thx, i was starting to fade [16:36:14] atgo: NPOV? [16:36:18] but in mexico, for example, people don't buy in [16:36:30] the "truth" of it [16:36:37] atgo: neutral pov? [16:36:44] neutral and factual [16:37:23] i spoke with a bunch of people in mexico (not just at wikimania), and many of them were like "is it factual? you can't trust it" [16:37:36] re: wikipedia [16:37:40] aha [16:37:44] interesting [16:37:46] were they talking En wiki or ES wiki? [16:37:53] i would assume es [16:37:55] i heard the same in SF interviews in the allhands [16:38:35] mexico reading traffic on the decline, along with much of Lat Am..wonder if there is a growing distrust in ES [16:38:43] just something i think we should be careful about assuming is totally handled [16:38:48] interested JonKatz_ [16:39:36] In my personal experience (obviously not scientific) trust of enwiki is generational. Younger == more trust. Maybe related to early bashing in the popular press not having been refuted later by the same channels [16:39:56] ^ same [16:40:18] RE: fewer, larger donors. platforms like patreon are interesting in that they embrace and encourage this [16:40:19] press and academia - when it was new, teachers would tell you it was wrong. now they might say to use it to find resources [16:40:22] and kickstarter [16:40:35] "trust" is very vague [16:40:38] I think there are different types of trust e.g. there's trusting content and also trusting that we are "doing the right thing" in the fight for access to knowledge [16:40:40] (anecdotal) in Spain wikipedia is god's word mostly, magic elves that know the truth update it [16:40:41] w/ explicit perks, maybe helps avoid add'l obligation [16:40:50] bgerstle: interesting! maybe we should talk to patreon [16:40:56] would you trust it to decide whether taking medication X is a good idea? [16:41:12] would you trust it to prepare for an important exam? [16:41:34] IME (enwiki again) my wife & lots of her colleagues in med school use wikipedia very often [16:41:54] use it and trust it is not the same though [16:41:54] maybe worth doing some "profile pieces" to highlight areas where people trust it [16:42:04] get some good publicity [16:42:20] (medical is possibly the strongest area of enwiki btw) [16:42:53] standard classroom advice is "use it to find sources" typically [16:43:03] wich is not exactly trust [16:43:16] tgr: but it's a good idea [16:43:30] the sources are very usefu [16:43:31] l [16:43:42] to your point, bgerstle it is probably worth thinking about readership outside the context of product/features: partnerships, outreach, publicity...our team, is focusing on features, but we should be at least synching/coordinateing the holistic picture [16:43:54] +1 JonKatz_ [16:44:48] JonKatz_ in reference to my accessibility comment? [16:45:30] bgerstle in reference to "maybe worth doing some "profile pieces" to highlight areas where people trust it" [16:45:48] oh [16:45:51] dr0ptp4kt: so isn't "contribute & discover knowledge" covered by the discovery and editing vertical? isn't our job mostly done? [16:46:03] JonKatz_: improperly nested "; parse failed [16:46:07] regarding slide 29 cc/dr0ptp4kt [16:47:03] bd808 ;) [16:47:28] joakino: i think that's partially a matter of semantics, but it's a noteworthy question. lemme see if i can paraphrase [16:47:42] i'd be interested to find out how trust in wikipedia correlates with knowledge about how it works (in terms of things like quality control mechanisms) [16:48:24] or trust in wiki* for that matter [16:49:21] dividing the verticals is tricky. For example, who covers mobile editing if editing team has no mobile engineers? [16:49:41] on subject of trust - i still remember the time i saw a billboard in India with a * and * source: wikipedia.org [16:49:45] i thought that was interesting [16:50:19] we should def connect with zero team about exposure/brand/trust in non-en [16:50:21] they know a lot [16:50:23] What meeting is this? [16:50:30] Q2 planning for reading [16:50:30] * marktraceur sees nothing on the schedule [16:50:31] marktraceur: i'll share the youtube [16:50:37] No, just wondering [16:50:51] ACRip__: completely agree, my question was more of "improving contributions and discovery is still a long way to go, but in terms or reading the content, there's little to be done that doesn't fall in discovery" [16:50:53] one minute marktraceur gotta change presenters [16:51:02] For the future, please schedule use of #wikimedia-office on https://meta.wikimedia.org/wiki/IRC_office_hours [16:51:15] And change the topic with stream link so people can join [16:51:15] marktraceur: Reading vertical strategic planning kick-off [16:51:29] Cool that you guys are doing this publicly, though, glad to see it [16:51:45] marktraceur: good point [16:59:36] on what infrastructure do we plan to run all these tests? [16:59:42] including A/B tests? [17:00:02] kristenlans: YT stream is like 20 seconds off so maybe wait more [17:00:02] yeah... well... [17:00:04] so we are saying that we will only work on promoting existing beta/app features? What if we think those are not worth promoting? [17:00:24] bgerstle part of the reason we are trying to get survey abilities on the web [17:01:30] JonKatz_: right, what "capabilities" do we need/want? which are feasible to acquire while also doing (or allowing time for) "real" work? [17:01:46] jdlrobson: we may not solely focus on promoting features. promoting features is one potential part of the work in q2. if we don't think things should be promoted, then i think it's fair to rule them out. [21:01:36] #startmeeting Streamlining Composer usage | RFC meeting | If you're looking for the breakfast meeting, use #wikimedia-staff | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:01:37] Meeting started Wed Aug 5 21:01:36 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:37] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:37] The meeting name has been set to 'streamlining_composer_usage___rfc_meeting___if_you_re_looking_for_the_breakfast_meeting__use__wikimedia_staff___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' [21:01:57] o/ [21:01:59] that's one long class name [21:02:25] o/ [21:02:33] #link https://www.mediawiki.org/wiki/Requests_for_comment/Streamlining_Composer_usage [21:02:50] o/ [21:02:52] #chair gwicke brion DanielK_WMDE_ [21:02:53] Current chairs: DanielK_WMDE_ TimStarling brion gwicke [21:03:47] who is "The person creating the wmf branches and deploying the train" [21:03:51] ? [21:03:54] twentyafterfour [21:04:04] legoktm: yo [21:04:23] er, yeah TimStarling that would usually be me [21:04:44] it is mentioned in this RFC that you are a relevant stakeholder [21:05:04] jzerebecki: do you want to give us an introduction? [21:05:38] twentyafterfour: we would like to drop some work into your lab :) specifically running composer and updating mediawiki/vendor.git after branching [21:06:36] introduction: [21:06:46] There really isn't any reason this couldn't be automated? I've been streamlining the branching process quite a bit, and we intend to make it fully automated soon [21:07:01] by we I mean releng team [21:07:22] twentyafterfour: i agree, it can be fully automated [21:08:13] except that full automation bypasses content review, which is mostly the crux of this RfC I think [21:08:23] and CI would use the same automation? [21:08:36] CI would actually be driving the automation, I think [21:08:46] bd808: review would happen when someone changes composer.json, I suppose [21:09:23] using composer for mediawiki core and extensions is currently quite painful this RFC intends to find ways to make it less so. the RFC has a number of problems listed, lets hope we can agree how to solve some of them. [21:10:00] DanielK_WMDE_: sure, but ... see https://www.mediawiki.org/wiki/Requests_for_comment/Streamlining_Composer_usage#Package_integrity [21:10:05] jzerebecki: As we discussed in person, I disagree with your conclusion about package integrity [21:10:26] bd808, DanielK_WMDE_: extactly that is my intent. and for many components that we maintain ourselves it is when we merge a patch into them. [21:10:40] I think we need either a fix for composer, or a process to mitigate the issues, before we automate it. [21:11:05] csteipp: which issue exactly? [21:11:30] in what regard would composer need to be fix [21:11:45] csteipp: can you respond to https://phabricator.wikimedia.org/T101123#1493285 ? [21:12:01] DanielK_WMDE_: That composer doesn't validate the manifests, packages, or the TLS connection where it gets the initial list of packages it's downloading [21:12:42] and version numbers are just a shorthand for get tags in most cases thus highly mutable [21:12:53] *git tags [21:13:48] right... so if we could attach hashes to the depedencies in composer.json, and composer would check them, that would be good, I guess [21:13:55] can that be done via a composer plugin? [21:14:42] That would work for me... no idea if that's possible as a plugin [21:15:07] I think a plugin could inspect the files changed on disk. I'm not sure if it can before composer makes thos files available to the runtime [21:15:17] or it could be done in a step after composer runs. is that what you suggested, jzerebecki? [21:15:29] the deployment tooling could validate hashes if they were stored as metadata somewhere that is easily accessible [21:15:40] csteipp: in the link above you seem to suggest that ensuring transport security would be enough for you, is that correct? [21:16:00] this was an interesting thing I discovered about composer recently. a package install can include a plugin and that plugin will be immediately loaded and start receiving callbacks [21:16:11] twentyafterfour: like inside composer.json? [21:16:31] bd808: that's scary as hell! [21:16:49] jzerebecki: That would reduce my concern-- then either packagist.org or github have to be compromised, which raises the bar significantly for an attacker. [21:16:55] DanielK_WMDE_: yeah [21:17:00] according to the docs you can do things like [21:17:01] "require": { [21:17:01] "monolog/monolog": "dev-master#2eb0c0978d290a1c45346a1955188929cb4e5db7", [21:17:41] TimStarling: but there is no crypto ensuring that 2eb0c0978d290a1c45346a1955188929cb4e5db7 from git is what you get as tar/zip from github [21:17:44] DanielK_WMDE_: https://github.com/wikimedia/composer-merge-plugin/pull/36 [21:18:15] TimStarling: that would lock it to a commit, right? would anything check whether the files actually match that hash? would probably work for things pulled in via git... [21:18:46] I have been told that git checkout verifies hashes [21:18:59] I think we would need pinned git hashes and a checksum of the contents that hash represents plus tooling to validate that on the install side [21:19:04] bd808: haha "Even though it sounds a bit mad, it's not that dirty in practice." [21:19:06] but if it is just a tarball from github like jzerebecki says then we would need to verify it separately [21:20:19] there is a Composer install mode that would fetch via git rather than tarballs prepared by github [21:20:24] right now we have composer set up to download tarballs from github instead of cloning because then it can cache the tarballs locally instead of re-cloning over and over again [21:20:47] a git commit hash can't be validated against a tarball of just the latest version. we'd need a separate checksum for that. [21:20:51] I think git does verify hashes match the content... [21:21:19] yeah, it was one of the design goals of git to make the history cryptographically secure [21:22:06] you can download the tarball, have git generate a tree id for it, and compare to the tree id of the commit [21:22:09] as secure as a sha-1 can be, at least [21:22:27] yeah, you know sha-1 is deprecated by NIST [21:22:36] gwicke: nope. linus has written in public that that was exactly never the goal [21:22:50] due to risk of collision attacks which are relevant in this case [21:22:53] TimStarling: yup ;) [21:23:07] tgr: the tree id wouldn't be identical to the commit hash, because the commit hash also depends on the hash of the parent commit(s) [21:23:08] jzerebecki: but in https://git.wiki.kernel.org/index.php/LinusTalk200705Transcript he says it works like that [21:23:09] at least when being pushed about sha1 now needing replacement [21:23:27] " If I have those 20 bytes, I can download a git repository from a completely untrusted source and I can guarantee that they did not do anything bad to it." [21:23:31] tgr: unless you can somehow put that info into the tarball [21:23:38] DanielK_WMDE_: a tree id and a commit id are different, but a commit does have a tree id [21:23:45] the risk of collisions is very minimal [21:24:16] the sha-1 deprecation is a forward-looking one [21:24:20] tgr: so we'd need to put that into the composer.json instead of the commit hash? [21:24:24] would work i guess [21:24:29] until very recently people were still using md5 to sign their TLS certs.. [21:24:52] Finding a sha-1 collision for an existing hash and prefix is hard, definitely [21:25:45] shouldn't we just mirror the repositories that we use in production? [21:25:48] DanielK_WMDE_: I'm not sure what's the reason for using tarballs in the first place, using commit ids and have composer download via git seems easier [21:25:52] rather than depending on external sources [21:25:58] (external servers, that is) [21:25:59] tgr: performance [21:26:04] tgr: composer caches tarballs, it doesn't cache git clones [21:26:14] I'm just saying that it's theoretically possible to verify a tarball against a git repo [21:26:22] tgr: legoktm said it was because tarballs can easily be cached [21:26:25] but I wonder if we go all paranoid couldn't we just make a plugin to composer or script that would calculate whatever hash we want of the new module and verify it? [21:26:33] satis can cache git repos iirc [21:26:38] DanielK_WMDE_: csteipp already said fixing composer so it ensures transport security is enough for now, does anyone disagree? [21:26:47] ok [21:27:25] SMalyshev: I think we probably could if we can decide on what to verify [21:27:26] jzerebecki, csteipp: just enforcing TLS security? or also checking hashes? [21:27:28] #info tgr says we could download a tarball, generate a tree ID from the tarball, and verify that against the tree ID of the git commit [21:27:38] jzerebecki: And I'll qualify that slightly-- composer should use https, verify https, and refuse to download anything over http [21:27:43] TLS only seems to weak, if the source is compromised... [21:27:57] I'm pretty sure that we could work with upstream to get some new callbacks if needed [21:28:44] so why the hell does composer use http instead of https? that seems broken to begin with [21:29:23] twentyafterfour: any security was never a design goal for composer [21:29:38] csteipp: so if we could make compsoer require https, verify the server cert and checksum the download, would that be enough? [21:29:46] jzerebecki: someone should have said that before we started using it [21:29:49] jzerebecki: so, basically, we now want to change that [21:30:07] TimStarling: we did. that's why we built the process we build for WMF prod [21:30:20] TimStarling: I bet someone said that [21:30:47] so, who could work on fixing composer (or write post-composer checksum checking code)? [21:30:48] bd808: we probably would also want to make a list of trusted servers [21:31:04] there are very few language package management systems that care about security honestly [21:31:34] bd808: I mean, if packageserver.com is hacked and made to redirect to to evil.com (which as valid ceritifcate for evil.com) you're still downloading evil :) [21:31:51] sure. [21:31:54] s/as/has/ [21:31:58] SMalyshev: but if we have the checksums, that doesn#t matter [21:32:34] DanielK_WMDE_: if we have package checksums, sure. [21:32:47] SMalyshev: the idea is to put these into composer.json [21:32:55] DanielK_WMDE_: so you plan to get everyone to sign their commmits to core and all extensions? [21:33:00] DanielK_WMDE_: I could work on composer bits and pieces if we can get the requirements down to something reasonable [21:33:37] bd808: are csteipp his requirements reasonable? [21:33:40] what are the treat exact threat models we are trying to defend against? [21:33:51] jzerebecki: wouldn't it be enough to supply a hash / tree-id when putting a dependincy into composer.json? [21:34:08] if we can use git tree-ids, we don't even need any extra tool to compute these [21:34:23] 1) someone manipulates some part of the composer install process in such a way that the downloaded files differ from the reviewed files [21:34:27] DanielK_WMDE_: where do you get that composer.json from? [21:34:43] 2) someone makes composer download and execute hostile code [21:34:51] tgr: one threat would be: we use package X, hosted on packageX.com, the maintainer loses the domain, somebody buys it and puts bad code there under the same version. Now our code is pulling bad code [21:35:13] 3) somone corrupts git.wikimedia.org [21:35:32] jzerebecki: from *our* repo. which we already trust to load code from. we can publish the tree-ids for our own releases, so 3rd parties can check. [21:35:49] tgr: same with packages hosted on git (somebody hacks maintainer's git key, etc.) [21:36:00] github I mean [21:36:00] 1) is achievable in many ways, some HTTPS can defend against, some not, for properly preventing it we need checksums or something equivalent [21:36:02] so do we specify hashes as versions in composer.json like that snippet I pasted earlier, or do we extend composer.json to have a new verification section? [21:36:09] DanielK_WMDE_: but that is only secured by tls [21:36:16] SMalyshev: sure, but if they messed with the code, the hashes won't match [21:36:18] i.e. do we want to retain human-readable versions? [21:36:24] 2) can be prevented by using throwaway environments for executing composer commands [21:36:45] DanielK_WMDE_: sure. the question was what we need to protect from - so that's exactly what the hashes protect us from [21:36:57] in case of 3) we are almost certainly screwed as we rely on it in too many ways [21:37:04] jzerebecki: yes. i nthe end, you always need a bit of infrastructure that you trust blindly, even it's just your keyoard. we already trust our repo. why question that trust now? [21:37:05] TimStarling: If we want to support tarballs (do we?) then I think we need to extend [21:37:07] TimStarling: yes, i think we want to retain human-readable versions (which are also machine readable!) [21:37:21] so as far I can see HTTPS for composer doesn't really solve anything [21:37:32] but extending would be easy along with the custom verification plugin [21:37:59] DanielK_WMDE_: so why not trust a dependency dowloaded from the same host secured in the same way? [21:38:19] jzerebecki: if we can get them onto that system securely, sure [21:38:31] The composer.json schema has a freeform "extra" section for plugins to add data to [21:38:32] https for composer only solves scenario that somebody hacks either wikimedia DNS (less likely) or DNS of one of the devs with access to lots of stuff (more likely) and manages to inject code that way [21:38:34] (that's what mediawiki-vendor is, right?) [21:38:51] dpkg uses plain HTTP and manages to build a reasonably secure system on top of it [21:39:00] SMalyshev: or hack the box that hosts the code [21:39:26] dpkg signs things with gpg [21:39:29] if you verify end-to-end then the weakness of the transport doesn't matter so much [21:39:31] DanielK_WMDE_: well, if you hack the box you probably could leave https cert intact so https doesn't help with that [21:39:37] if we have content hashes I think TLS is moot [21:39:47] so yes if we know the signatures of the files then we don't care about transport security [21:39:57] we either got the bits we asked for or we didn't [21:39:59] but how do we know the right hash? [21:40:09] #info if we have content hashes I think TLS is moot [21:40:10] hardcode the hash in composer.json's extra section? [21:40:12] computing those hashes in the first place would be required [21:40:18] yeah content hashes kind of remove the need - except for updating composer itself [21:40:20] well, we trust gerrit [21:40:36] which probably won't have content hash unless we manage to pull that off too [21:40:43] bd808: git already does that [21:40:49] SMalyshev: I think the weakest link is someone taking over a 3rd party github repo we use, and https does not help there [21:40:50] I don't think there is a pressing need to stop trusting gerrit [21:40:56] legoktm: yea [21:41:10] DanielK_WMDE_: you used securely without a qualifier there. I'm saying if you secure your repo in a certain way but you require to secure a dependency in more secure way you are winning nothing in overall security. [21:41:14] so I think we'd have to have a review step for each new library version where we checked the code, computed the hash and stuck it in the composer json. right? [21:41:16] so we can commit hashes to composer.json in gerrit and be confident that we know who calculated those hashes [21:41:37] yeah we'd need some way to review a hash change [21:42:00] well, same way the rest of the code is reviwed I guess [21:42:13] tgr: what about our own github repos? [21:42:15] I'm not sure adding all of that is any better than what we do today honestly [21:42:25] it's actually more work rather than less [21:42:27] e.g. if somebody does system("rm -rf ".__DIR__) that probably would be caught on review :) [21:43:05] compute hash, store hash, verify hash vs review code [21:43:09] * twentyafterfour can think of a lot more harmful and more subtle attacks than rm -rf .... [21:43:11] jzerebecki: same threats, probably slightly smaller attack surface if we assume that random wmf dev has better personal security practices than random 3rd party dev [21:43:25] jzerebecki: right. we trust gerrit more than other sites we can access via https because we control it. it's not sufficient for other sites to use https. because we have less trust in the people that run that other server than we have in the people who run gerrit. [21:43:45] we can just use commit ids as hashes [21:43:58] only if we give up cached tarballs [21:44:03] I guess there is the question of secondary dependencies, if that is the right term [21:44:26] Yes. a library can require multiple other libraries [21:44:40] we'd need to checksum all of them [21:45:00] today we don't have much (any?) of that in mw/vendor [21:45:03] bd808: no, just store the commit id in the extra key, then the CI system fetches the tarball, fetches the commit metadata, gets the tree id from the commit metadata, calculates the tree id from the files and compares [21:45:20] every step of that is cryptographically secure [21:45:21] TimStarling: good point. [21:45:34] so we'd have a verification section in the root composer.json which would verify the entire tree of dependencies? [21:45:52] DanielK_WMDE_: so our gerrit security is better than github.com even though we don't even roll out minor patch upgrades and are using weak ssh ciphers? [21:45:54] I don't think we should give up on cached tarballs, CI and local dev would get pretty slow without them [21:45:54] bd808: we have one I think, pygments depends on symfony stuff [21:45:57] Hm... we'd not only have to check all checksums we have. We also have to make sure nothing extra is installed that doesn#t have a checksum. [21:46:06] I think this would still be in mediawiki/vendor, not $IP/compsoer.json [21:46:24] legoktm: ah, yes I think you are right [21:46:25] DanielK_WMDE_: as in our gerrit does not support modern ssh ciphers at all [21:47:00] jzerebecki: our security is better than the weakest link in a potential chain of dependencies that we might end up pulling in. [21:47:17] jzerebecki: github.com itself is secure but github repos are not [21:47:22] legoktm: pygments? link? [21:47:29] there is no audit trail for force pushes, for one thing [21:47:34] DanielK_WMDE_: we were talking only about our own repo on github not any chain [21:47:45] tgr: one possible flaw in tree ids is that the tarballs aren't necessarily the complete git repo (assuming a tree id is based on the whole git managed filesystem) [21:48:00] JeroenDeDauw: https://gerrit.wikimedia.org/r/#/c/219473/ [21:48:10] we exclude tests and things from the acrchies for out own libraries [21:48:13] jzerebecki: oh, i was referring to the source we would be downloading other dependencies from [21:48:15] * legoktm has to run, will look at the logs later [21:48:15] *archives [21:48:28] tgr: there is no audit trail for that on gerrit, right? [21:48:31] if we don't pin the exact version of everything in the root composer.json then every time an upstream author releases a new minor version, it would invalidate the verification, correct? [21:49:01] TimStarling: yes. that's part of why we use full version numbers and not loose ranges now [21:49:04] https://github.com/composer/composer/issues/38 [21:49:16] jzerebecki: not sure but not many have force push rights on gerrit [21:50:00] bd808: what tarballs are we talking about? github builds a ball from every commit, that is guaranteed to match the repo content [21:50:30] tgr: not if you have a file like this -- https://github.com/wikimedia/cdb/blob/master/.gitattributes [21:50:58] hm... http://theupdateframework.com/ [21:51:12] tgr: not true everyone in the mediawiki group effectively has that permission, that means also the human resources department or anyone else emplyed at the WMF [21:51:14] #info https://github.com/composer/composer/issues/38 [21:51:29] I think commit signing and gerrit weakness is a separate topic [21:51:49] TimStarling: agreed [21:52:04] we aren't going to be able to change the whole world here [21:52:30] #info a plug-and-play library for securing a software updater http://theupdateframework.com/ [21:52:35] but can we find a system that gets jzerebecki what he wants with less work than what we do today (or more automatable) [21:52:59] I'm leaning towards no, I think [21:53:22] well, you say "just review the code twice" [21:53:22] based on the system currently described being more work than what we do today [21:53:32] bd808: so you think csteipp his requirements are not enough or unobtainable? [21:53:59] jzerebecki, HR people have permission to force push to some gerrit repos? [21:54:15] jzerebecki: checksum, review, store, retrieve, checksum is more work than review, commit [21:54:15] it says that composer verifies an md5 hash of the archive [21:54:27] Krenair: they are in the wmf ldap group, right? [21:54:31] I don't think so [21:54:40] but code review can potentially be very expensive, many hours of work [21:54:53] TUF actually looks quite interesting http://www.linux-magazine.com/Issues/2014/160/Security-Lessons-TUF [21:54:54] almost entirely wmf engineers only [21:55:10] jzerebecki: no, wmf ldap is by invitation and not all staff [21:55:12] I guess the problem is that it is not normally so expensive [21:55:27] so we risk weighing down the common case for the sake of the rare case [21:55:36] although there are some ex-staff in that group still [21:56:16] anyway, we need to wrap up, since we are out of time for this meeting [21:56:25] TimStarling: you mean, by forcing reviews for changes in 3rd party libraries? [21:56:26] bd808: csteipp his requirements did not involve any checksum for code, it only involved fail closed transport security [21:56:55] in theory we do review changes in 3rd party libraries, when we commit the changes to mediawiki/vendor, correct? [21:57:07] that is what the RFC says [21:57:35] TimStarling: yes, we do [21:57:47] so anyway, there is no consensus [21:58:03] jzerebecki: I guess he said that so quietly I didn't hear it. I see it backscroll [21:58:12] but there is a rough plan which can be used for prototyping [21:58:28] jzerebecki: https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki,access - a lot of people can force push tags but everything else is fairly limited [21:58:36] TimStarling: so who thinks csteipp his requirements are not enough? [21:59:27] I don't honestly see how verifying TLS certs guarantees that a tag isn't changed on gitub [21:59:29] there is lack of consensus because bd808 says that what we are doing currently is good enough [21:59:40] I didn't say that [22:00:00] I said it was easier than what I thought this was turning into [22:00:29] composer transport security does not address the most likely attacks such as stealing github credentials of random developers [22:00:39] but you think maybe we should do it anyway? [22:01:11] well, no... but not becuase what we have is good enought [22:01:12] I think having TLS in composer won't hurt but doesn't make it secure against many common threats [22:01:36] tgr: yes and we don't defend against that for gerrit [22:01:52] checksums address every threat that TLS would address, and TLS does not address the most common ones, IMO TLS is a waste of time [22:01:52] jzerebecki: ah that's a whole other argument [22:01:56] bd808, what are your proposed action items? [22:02:09] jzerebecki: we defend against it by controlling who has access to gerrit [22:02:40] we have no control over who has access to, say, monolog repo on github [22:02:41] tgr: I don't see how from that follows defending against credentials being stolen [22:02:42] for checksums to work, we need to checksum *all* dependencies, recursively. Which would be not a small piece of work [22:02:59] #action get csteipp and TimStarling to agree on what would be needed to trust automated downloads [22:03:11] it's too late in the hour to talk about project design [22:03:17] tgr: there are two cases, repos wikimedia maintains on github and repos non-wikimedians maintain [22:03:25] we just need to talk about meeting results and next steps [22:03:30] #info for checksums to work, we need to checksum *all* dependencies, recursively. Which would be not a small piece of work [22:04:08] I think jzerebecki should keep working on the requirements side of this [22:04:23] and mostly ignore the implementation aspects until those are nailed down [22:04:45] yeah, sounds good [22:04:45] eg can we only trust servers we control? [22:04:53] SMalyshev: doesn't seem that complicated to me, attach a commit id to each dependency and you have effectively checksummed them [22:04:58] if not what does it take to trust a blob of code [22:05:02] bd808: what do you think is missing in there, besides the things we didn't get to an agreement here? [22:05:05] which is incindentally exactly what composer.lock does [22:05:10] should we schedule another meeting in 2 weeks? [22:05:40] TimStarling: that sounds good to me [22:05:45] jzerebecki: well just that. it seems there is no consensus on what it takes to be trusted [22:05:53] ok [22:06:05] github provides (admittedly weak) md5 hashes, which composer verifies on the tarballs (according to that composer issue I linked earlier) [22:06:25] #action jzerebecki to refine requirements and threat models [22:06:30] tgr: you have to review them to ensure everything is ok and then checksum (commit id is not the same, I think, though we might use them maybe if we clone repos, but not if we download tars). that's going to cost some time [22:06:32] bd808: yes that is what the rfc already states, and that most of us don't even specify our threat models when we talk about it [22:06:36] #action TimStarling to schedule another meeting on this RFC in 2 weeks [22:07:17] jzerebecki: so we really should have been focusing on the threat models for the last hour? [22:07:18] thanks for coming everyone [22:07:21] or were we? [22:07:29] thanks TimStarling [22:07:45] SMalyshev: a commit id is a cryptographically secure fingerprint of the whole repo contents [22:07:50] TimStarling: thx [22:07:57] certainly not the most practical one, but usable [22:07:59] next week we will talk about https://phabricator.wikimedia.org/T107595 , which is Daniel's current thoughts on multiple content slots associated with each revision [22:08:15] unless there is a more urgent thing that comes along, since that RFC is still an early draft [22:08:45] tgr: exactly, whole repo which is not the same as current snapshot (tar). So if we do git clone, it's ok, but if we do export, etc. - it's not [22:08:45] #endmeeting [22:08:46] Meeting ended Wed Aug 5 22:08:45 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:08:46] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-08-05-21.01.html [22:08:46] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-08-05-21.01.txt [22:08:46] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-08-05-21.01.wiki [22:08:46] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-08-05-21.01.log.html [22:10:04] ~=[,,_,,]:3 [22:10:52] I think I may need some better technical tools to help wrap up meetings [22:11:16] SMalyshev: the git metadata for the commit contains a checksum for every file and every subdirectory, and the commit id depends on them [22:11:21] TimStarling: like turning the lights off and on? ;) [22:11:30] something like that [22:11:42] Or a 5 minute warning buzzer maybe [22:11:53] well, 5 minutes out, I start talking about wrapping up [22:11:54] you can work with that, it's not convenient but still easier than doing a checkout I think [22:11:58] maybe it should be 10 minutes out [22:12:17] but people just carry on talking amongst themselves as if I'm not here [22:12:25] tgr is still doing it [22:12:36] *nod* it always seems to take 10 minutes to get started and another 10 to shut down [22:13:24] we could use a channel where you could give/take voice I guess [22:13:25] TimStarling: is it a goal that everyone should be shutting up as soon as the meeting ends? I don't see much benefit [22:15:26] yes, that is a goal, but more importantly, I would like people to actually participate in wrapping up and writing notes and action items [22:16:32] the idea of these meetings is to help people in their non-meeting work [22:17:03] and when you and SMalyshev get off on a tangent and discuss things that are not really relevant to the proposal, that is not aligned with that goal [22:22:01] in this case, it wasn't very clear what's relevant to the proposal and what's not, using lockfiles for verification does seem to me like a plausible way to solve the problem outlined in the RfC [22:22:47] at any rate if you have expectations on how people should behave it's a good idea to write them down and point to them regularly [22:22:57] some kind of RfC meeting etiquette? [22:24:29] TimStarling: It might be good to have a refresher email on the intent and general process of an RfC meeting. The description at https://www.mediawiki.org/wiki/Requests_for_comment/Process#.22IRC_office_hours.22_public_RFC_discussion is pretty short [23:27:30] I wrote a wikitech-l post [23:27:49] Big news? [23:28:31] I mean in response to tgr and bd808 above [23:29:26] TimStarling: awesome, I'll check it out soon