[00:43:20] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2307604 (kaldari) I've made a request for interim funding for using Google's Search API....
[03:50:11] <wikibugs>	 Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, and 2 others: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2307724 (Beetstra) @Sadads, @Kaldari - if this is supposed to help the anti-spam efforts, this should...
[06:01:09] <grrrit-wm>	 (PS1) Mobrovac: Scap: execute checks after the restart_service stage [analytics/aqs/deploy] - https://gerrit.wikimedia.org/r/289594 (https://phabricator.wikimedia.org/T135609)
[06:03:41] <mobrovac>	 elukey: joal: ^
[06:17:01] <elukey>	 mobrovac: will review it! Thanks
[06:17:43] <elukey>	 also morning ;)
[06:18:24] <mobrovac>	 buongiorno1
[06:18:25] <mobrovac>	 !
[07:01:57] <wikibugs>	 Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, Ops-Access-Requests: Add kartik to analytics-privatedata-users group - https://phabricator.wikimedia.org/T135704#2307853 (KartikMistry)
[07:10:12] <wikibugs>	 Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, Ops-Access-Requests: Add kartik to analytics-privatedata-users group - https://phabricator.wikimedia.org/T135704#2307872 (jcrespo) I suggested this request so their queries can be done on analytics s...
[09:29:42] <elukey>	 joal: o/
[09:32:33] <elukey>	 do want to see the latest on aqs?
[09:32:39] <elukey>	 or hear better
[09:33:38] <elukey>	 we are running cassandra 2.1.12 atm, restbase 2.1.13 buuut since in the apt repo we have 2.1.14 try to guess what ended up on aqs100[456] ?
[09:36:03] <elukey>	 (now I get what urandom was telling me yesterday about https://phabricator.wikimedia.org/T95253)
[09:46:17] <wikibugs>	 Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, Ops-Access-Requests: Add kartik to analytics-privatedata-users group - https://phabricator.wikimedia.org/T135704#2308172 (Joe) p:Triage>Normal
[09:47:25] <wikibugs>	 Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, Ops-Access-Requests: Add kartik to analytics-privatedata-users group - https://phabricator.wikimedia.org/T135704#2307853 (Joe) We will need manager's approval for this request for Kartik.  In the mea...
[10:12:41] <elukey>	 ok joal I *think* that doing something like "stop cassandra instance, downgrading cassandra, starting cassandra instance" might work
[10:13:16] <elukey>	 if you have ever watched "Slevin" this is the kansas city move
[10:17:46] <mforns>	 hi joal, elukey and team, I have to go to the airport and fetch a rented car this morning, I will be working on east coast hours today
[10:18:17] <elukey>	 mforns: okkkkk
[10:18:44] <mforns>	 see ya :]
[10:29:04] <wikibugs>	 Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, Ops-Access-Requests: Add kartik to analytics-privatedata-users group - https://phabricator.wikimedia.org/T135704#2308272 (Arrbee) This is an approved request for Kartik. Thanks.
[10:39:56] * elukey lunch, brb
[11:24:48] <elukey>	 urandom: nice!! https://apachebigdata2016.sched.org/speaker/john.eric.evans
[11:32:11] <elukey>	 joal: we'd need to restart Hue and Oozie :D
[12:24:35] <elukey>	 joal: aqs100[456] downgraded to 2.1.13, so you can start adding data :P
[12:28:33] <elukey>	 !log restarted hue on analytics1027 for security upgrades
[12:29:02] <mobrovac>	 nice elukey!
[12:29:06] <mobrovac>	 thnx for the downgrade
[12:30:50] <elukey>	 mobrovac: it was easy with an empty cluster :P
[12:31:01] <mobrovac>	 hehe
[12:31:10] <elukey>	 mobrovac: what is the plan for aqs100[123]? It is still running on 2.1.12
[12:32:10] <elukey>	 !log suspended all the oozie bundles as prep step for oozie's restart (security upgrades)
[12:32:12] <mobrovac>	 not sure what's the smartest move tbh elukey
[12:32:21] <mobrovac>	 given the soon-ish move to 2.2.6
[12:33:05] <mobrovac>	 either way, one thing you should know is that aqs100[123] and aqs100[456] ought to have the same cass version once you start joining the clusters and replacing them
[12:34:15] <elukey>	 mobrovac: yep.. and I'd prefer to do the migration before 2.2.6 :D
[12:34:51] <elukey>	 so I think that we should move aqs100[123] to 2.1.14, meanwhile testing aqs100[456] and eventually joining the two
[12:34:58] <elukey>	 final step 2.2.6 only on aqs100[456]
[12:35:21] <mobrovac>	 elukey: s/2.1.14/2.1.13/
[12:36:27] <elukey>	 yes sorry,
[12:37:18] <mobrovac>	 :P
[12:37:24] <mobrovac>	 but yeah, that sounds sensible
[12:37:36] <mobrovac>	 better than upgrading 6 nodes to 2.2.6 only to get rid of 3 of them
[12:44:57] <elukey>	 !log restarted oozie on analytics1003 for security upgrades
[13:25:17] <wikibugs>	 Analytics, ArticlePlaceholder, Pageviews-API, Wikidata: Track pageviews of ArticlePlaceholders - https://phabricator.wikimedia.org/T132223#2308862 (Lucie)
[13:25:24] <wikibugs>	 Analytics, ArticlePlaceholder, Pageviews-API, Wikidata: Track pageviews of ArticlePlaceholders - https://phabricator.wikimedia.org/T132223#2308864 (Lucie) a:Addshore>None
[13:30:14] <wikibugs>	 Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2308897 (elukey) I followed @fgiunchedi's advice and had a chat with @ema about this. His code updates initramfs only after the first time that puppet runs, meanwhi...
[13:34:34] <joal>	 Hi elukey
[13:34:43] <joal>	 Hi elukeysorry had trouble connecting before
[13:34:54] <joal>	 Thanks for handling both cassandra and oozie elukey !
[13:35:36] <elukey>	 joal: o/
[13:35:42] <joal>	 elukey: Heyy :)
[13:35:57] <joal>	 Had you paused every bundle before restarting A?
[13:35:58] <elukey>	 soo finally you can put stuff on cassandra :P
[13:36:07] <joal>	 elukey: awesome !
[13:36:20] <joal>	 Will start playing this afternoon
[13:36:22] <elukey>	 yes all of them, waited also for yarn -> running
[13:36:30] <elukey>	 everything looked good
[13:36:36] <elukey>	 then restarted and checked for a while
[13:36:38] <elukey>	 nothing weird came up
[13:37:03] <joal>	 elukey: so that you know, only load jobs are really needed (webrequest and mediawiki)
[13:37:21] <joal>	 When those ones are paused, everything else naturally stops
[13:37:33] <joal>	 But pausing everything works as well :)
[13:40:04] <elukey>	 joal: yess I remember it because you patiently explained to me a lot of times, buuuut I wanted to be extra careful :P
[13:41:11] <joal>	 np elukey, better more than less :)
[13:42:32] <joal>	 elukey : So I can try connecting to cassandra on aqs1004 on port 9042?
[13:43:01] <elukey>	 elukey@aqs1004:~$ sudo netstat -nlpt
[13:43:04] <elukey>	 tcp6       0      0 10.64.0.127:9042        :::*                    LISTEN      2240/java
[13:43:07] <elukey>	 tcp6       0      0 10.64.0.126:9042        :::*                    LISTEN      2239/java
[13:43:49] <elukey>	 so you should use aqs1004-{ab}.eqiad.wmnet
[13:44:13] <elukey>	 same thing for the other two guys 1005/6
[13:45:01] <elukey>	 joal --^
[13:47:40] <joal>	 elukey: Ooooh, they have different hostnames?
[13:47:45] <joal>	 Sweet
[13:50:12] <elukey>	 joal: I've also updated https://wikitech.wikimedia.org/wiki/Analytics/AQS
[13:51:32] <joal>	 That's great elukey
[13:52:06] <joal>	 elukey: here is my plan: currently there is an existing keyspace for per-article, created by restbase
[13:52:14] <joal>	 This keyspace uses DTCS
[13:53:34] <joal>	 My plan is to create another keyspace using the same config; except for compation, and test to load one day of data, and see, then multiple days etc
[13:54:00] <joal>	 First monitoring load/compression only (no read yet)
[13:56:19] <elukey>	 +1
[13:56:37] <elukey>	 joal: let me know if I can help, really curious
[13:57:19] <joal>	 elukey: I'll ping you when starting, to show you
[13:57:38] <joal>	 elukey: currently doing some work on pageview backfilling
[13:57:43] <elukey>	 sure!
[13:59:42] <joal>	 elukey: Actually will try to alter the existing table if you agree
[14:00:15] <joal>	 elukey: mwarf ... wondering ...
[14:02:33] <elukey>	 :D
[14:03:51] <wikibugs>	 Analytics-Tech-community-metrics, Phabricator-Upstream, Upstream: List of Phabricator users - https://phabricator.wikimedia.org/T37508#400343 (Aklapper) >>! In T37508#2302594, @jayvdb wrote: > Was the "People" a WMF requested feature?  Not that I know.  > It seems to have existed long before WMF adop...
[14:05:44] <joal>	 elukey: Let's be safe, try a new keysapce :)
[14:07:16] <elukey>	 joal: well the keyspaces are empty atm, and we can re-create the cluster very easity
[14:07:19] <elukey>	 *easily
[14:07:38] <elukey>	 so if you prefer to play the card of altering keyspace, you can try :)
[14:07:48] <joal>	 elukey: yes, but I want to be sure restbase won't change the compaction strategy in the middle of us testing ;)
[14:07:59] <joal>	 So trying with a new one :)
[14:09:43] <joal>	 New keyspace created
[14:09:57] <joal>	 Now, a loading job :)
[14:12:18] <elukey>	 gooooood
[14:12:55] <elukey>	 joal: is it possible to run a node with DTCS and another one with LC, or is it a per-keyspace property? I don't remember
[14:13:14] <joal>	 elukey: per-table actually, not keyspace
[14:14:02] <elukey>	 joal: table == section of the keyspace ending up in node X ? (super ignorant about this)
[14:14:36] <joal>	 elukey: table = abstract view of the stuff you store in a keyspace
[14:15:17] <joal>	 elukey: in a keyspace, you can have multiple tables, each having a data schema, and various config options (compaction, compression etc)
[14:15:38] <elukey>	 ahhhh
[14:16:19] <elukey>	 so what happens is we choose LC over DTCS? We will need to join one new instance at the time to the pre-existing cluster that has tables already with DTCS
[14:16:37] <elukey>	 (at least, the per article one)
[14:16:46] <elukey>	 maybe we already discussed it but I forgot sorry :(
[14:17:39] <joal>	 elukey: IIRC urandom said you could set compaction strategy per node, but I'm not expert enough to know how :)
[14:17:48] <joal>	 I only know the easy way ;)
[14:19:21] <elukey>	 joal: all right I'll ask him, this is kinda important
[14:19:29] <joal>	 it is indeed
[14:19:37] <joal>	 not for now, but for after
[14:19:48] <elukey>	 yep
[14:19:59] <elukey>	 joal: completely unrelated news, https://www.youtube.com/user/Kurzgesagt
[14:20:17] <elukey>	 I think that you'll like it, those folks are amazing
[14:21:25] <joal>	 elukey: Thanks a million ! Not a good time now, too many things to do, but will definitely keep it for a quiter time :)
[14:35:22] <wikibugs>	 Analytics, Revision-Slider, TCB-Team, WMDE-Analytics-Engineering, and 2 others: Data need: User Behaviour when comparing article revisions - https://phabricator.wikimedia.org/T134861#2309205 (Addshore)
[14:49:41] <elukey>	 urandom: goooood morning :)
[14:50:21] <elukey>	 whenever you have time I have a couple of questions
[14:50:41] <urandom>	 elukey: morning!  i have time!
[14:51:43] <elukey>	 thanks!
[14:52:13] <elukey>	 I've read your comment about downgrading aqs100[123] to 2.1.12 but I'd need some guidance about how to prepare the work
[14:52:29] <elukey>	 sorry, upgrading to 2.1.13
[14:52:45] <elukey>	 I have too versions in my head atm so I confuse them :P
[14:52:51] <urandom>	 yeah :)
[14:53:22] <urandom>	 so, 2.1.13 has been thoroughly tested here, under the config you are using (basically you are using the same config as us)
[14:53:48] <urandom>	 and the upgrade was very straightforward
[14:54:44] <urandom>	 tl;dr you should be able to just upgrade the package, and bounce the machine
[14:54:54] <elukey>	 ah nice
[14:55:12] <urandom>	 in a rolling fashion, of course
[14:55:19] <elukey>	 yeah not all in once P
[14:55:20] <elukey>	 :P
[14:55:28] <urandom>	 but otherwise nothing special should be needed
[14:55:35] <urandom>	 *this time* :)
[14:55:36] <elukey>	 bounce the machine == simply restart cassandra right?
[14:55:40] <urandom>	 yeah
[14:55:43] <elukey>	 all right
[14:56:12] <elukey>	 joal: --^
[14:56:22] <urandom>	 elukey: you could do one first, use it as a canary
[14:56:41] <urandom>	 but make sure to complete the upgrade before attempting any bootstraps and/or decommissions
[14:57:01] <elukey>	 urandom: all right, so I can keep multiple versions as long as I don't add/remove nodes
[14:57:10] <elukey>	 add/remove == bootstrap/decommission
[14:57:39] <urandom>	 well, best-practice for a cluster of machines, would be to run mixed versions as part of a transition from one version to another
[14:57:59] <urandom>	 but not for any longer than is necessary to complete that transition
[14:58:22] <urandom>	 so your new nodes are not a part of the production cluster yet, i assume
[14:58:25] <urandom>	 ?
[14:58:31] <urandom>	 last i checked, anyway
[14:58:36] <urandom>	 so that is fine
[14:59:28] <urandom>	 but once you start upgrading your prod cluster, you should plan to finish that upgrade in whatever time makes sense
[14:59:45] <urandom>	 like, you could upgrade one now, and come back and do the rest on monday, that would probably be OK
[15:00:13] <urandom>	 but i wouldn't leave it open-ended :)
[15:00:14] <elukey>	 urandom: correct 100[456] are only joal's playground for the moment :P
[15:01:42] <elukey>	 urandom: second question! We'd like to test LC instead of DTCS, and I was thinking if it is possible to assign a specific compaction setting for a keyspace/table to a node, in order for example to have a mixed config during the migration.
[15:01:56] <elukey>	 you have already answered this question I know, but I don't recall the answer :)
[15:03:48] <urandom>	 elukey: yes, it can be done using a jmx op, the setting is ephemeral though; it'll revert if the node is restarted
[15:04:17] <urandom>	 elukey: where you looking for this? http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
[15:04:35] <urandom>	 joining a node without joining it? so-called write-survey mode?
[15:05:35] <urandom>	 elukey: a test of LC should be done under import
[15:05:47] <urandom>	 that is what will make-or-break it
[15:06:04] <urandom>	 you could do that now, while you have the new machines setup as a test cluster
[15:06:59] <elukey>	 urandom: what we want to do is load real data to aqs100[456] setting LC straight away and check how it goes
[15:07:06] <elukey>	 as alternative to write survey
[15:07:40] <urandom>	 ok, can't you just set the compaction normally... oh damn, restbase changes it back...
[15:07:44] * urandom sighs
[15:08:09] <urandom>	 is that the problem?  if you alter the keyspace restbase alters it back?
[15:08:55] <elukey>	 urandom: nono I was wondering what to do if we like LC, because I'll need to eventually join the new nodes/instances to a cluster with pre-existing settings
[15:09:11] <urandom>	 oh i see
[15:09:39] <urandom>	 thinking out loud here...
[15:09:57] <urandom>	 one option would be to bootstrap the new nodes, decomm the old, and then alter the keyspace
[15:10:25] <urandom>	 that way you have all SSDs, and the IO to handle a recompaction
[15:11:32] <urandom>	 the inverse would be to alter, and then start bootstrapping the new nodes, but that puts the existing nodes to work recompacting, which might not be awesome if they are already struggling under high IO
[15:12:19] <urandom>	 i suppose you could use jmx to poke one of the existing nodes into LC long enough to guage the impact, maybe during an off-peak period
[15:13:13] <urandom>	 i think what you are suggesting, is to bootstrap a new node, for the compaction strategy locally, and then lather-rinse-repeat for each new node/instance
[15:13:25] <urandom>	 s/for the compaction/force the compaction/
[15:13:36] <elukey>	 yep...
[15:13:47] <elukey>	 a lot of homework to do :P
[15:14:11] <urandom>	 which might work... but any restart will revert it back to DTCS, and cause it to start recompaction again
[15:14:36] <urandom>	 seems a little janky
[15:14:57] <elukey>	 I think at this point that migration and alter AFTER on ssd might be the best and cleanest option, having data from our current testing
[15:15:01] <urandom>	 maybe there is a way to automate this
[15:16:06] <joal>	 mobrovac: are you here?
[15:17:13] <urandom>	 elukey: if we could disable compaction on startup, then automate the change in compaction strategy, and reenabling of compaction, as part of the startup sequence, then maybe this would work
[15:17:54] <urandom>	 otherwise i'd be worried that every routine restart would generate problem
[15:19:21] <urandom>	 heh, i wonder if we could set concurrent compactors to 0 in the config
[15:19:36] <elukey>	 urandom: might be best to try the path that we know could work, like altering the compaction settings after the migration.. we already done if moving to DTCS for aqs right?
[15:20:41] <urandom>	 elukey: well, if you *know* you intend to move to LCS immediately after, then it might be worth thinking about ways to avoid doubling handling all of that compaction work
[15:20:59] <elukey>	 true true
[15:21:28] <urandom>	 this would probably be easy to test, setting concurrent_compactors to 0
[15:22:32] <urandom>	 if that disables compaction, then we could script a startup sequence that locally altered the strategy, and then set concurrent_compactors to it's usual setting via nodetool
[15:22:51] <urandom>	 and use this temporarily until your migration was complete
[15:23:02] <urandom>	 then alter, and undo the startup hacks
[15:23:14] <urandom>	 worth thinking about
[15:23:32] <elukey>	 so basically trickying the new hosts with a different statup sequence until the old ones are deprecated
[15:23:43] <urandom>	 yeah
[15:23:57] <elukey>	 mmmmm starting to like the idea
[15:24:05] <elukey>	 I'll add it as a test to perform
[15:24:29] <urandom>	 maybe wait an see how LCS works out for you, before investing too much time into it
[15:24:37] <elukey>	 yep yep
[15:25:25] <elukey>	 last question I promise: where did you get cassandra_2.1.13_all.deb? Is it from the debian stable repo right? Just wanted to have a rollback to 2.1.12 if needed
[15:25:41] <joal>	 urandom: thinking aloud as well on the topic: Couldn't we use the jmx op trick and monitor no restart while bootstrapping?
[15:26:46] <grrrit-wm>	 (PS5) Nuria: Initial content of analytics.wikimedia.org [analytics/analytics.wikimedia.org] - https://gerrit.wikimedia.org/r/289062 (https://phabricator.wikimedia.org/T134506)
[15:27:03] <urandom>	 joal: in the event a restart was necessary, you'd have to bring the node up (compacting with DTCS), and then change it
[15:27:19] <urandom>	 i'm not sure what (if any) implications there are with this
[15:27:40] <joal>	 right urandom ... expecting a no-restart is kinda not a good idea :)
[15:28:50] <mobrovac>	 joal: sorry, meetings for the next two hours
[15:29:53] <joal>	 np mobrovac, wanted to know some more on the scap modification (mostly for me getting better at scap, ot to challenge the change :)
[15:31:13] <elukey>	 urandom: sorry forgot to mention you - last question I promise: where did you get cassandra_2.1.13_all.deb? Is it from the debian stable repo right? Just wanted to have a rollback to 2.1.12 if needed
[15:32:43] <elukey>	 (seems so from elukey@aqs1004:/usr/share/doc/cassandra$ zless changelog.gz)
[15:32:46] <ottomata>	 hiii!
[15:32:55] <ottomata>	 elukey:  FYI am running election to bring back 1013
[15:32:56] <elukey>	 ottomata: o/
[15:33:00] <elukey>	 sureee
[15:33:11] <elukey>	 what was wrong yesterday with the retention ms?
[15:33:21] <elukey>	 I saw that you made it work
[15:33:31] <ottomata>	 oh yeah
[15:33:34] <ottomata>	 uh, i think i just had a bad value
[15:33:41] <ottomata>	 some how my multiplication and division was off
[15:33:46] <ottomata>	 i had set it higher!  not lower :p
[15:33:53] <ottomata>	 dunno why bytes didn't work
[15:34:06] <ottomata>	 but with a actual 48 hour ms value, it deleted logs
[15:34:41] <elukey>	 all right, might be useful in https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Purge_broker_logs whenever you have time
[15:35:16] <ottomata>	 jaaa thanks, editing now
[15:35:19] <ottomata>	 you da best :)
[15:36:02] <elukey>	 da best annoying colleague :P
[15:37:19] * elukey brb!
[15:39:08] <wikibugs>	 Analytics-Tech-community-metrics: List of Phabricator users - https://phabricator.wikimedia.org/T37508#2309485 (Qgil)
[15:40:45] <wikibugs>	 Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2309486 (Nuria) @Gwicke: are  you able to  provide sample kandemila config?
[15:43:13] <ottomata>	 elukey:  done! https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Purge_broker_logs
[15:46:13] <elukey>	 ottomata: niceeeee thanks!
[15:47:22] <urandom>	 elukey: http://dl.bintray.com/apache/cassandra/pool/main/c/cassandra/
[15:47:58] <elukey>	 urandom: how many beers do I owe you? :D
[15:48:22] <urandom>	 elukey: heh
[15:48:26] <elukey>	 thanks :)
[15:49:33] <urandom>	 elukey: no worries; happy to help!
[15:58:45] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Augment oozie load SLA + Add URL to oozei error messages - https://phabricator.wikimedia.org/T134876#2309549 (Nuria) Open>Resolved
[15:59:04] <wikibugs>	 Analytics-Kanban: Get jenkins to automate releases {hawk} - https://phabricator.wikimedia.org/T130122#2309553 (Nuria)
[15:59:06] <wikibugs>	 Analytics-Kanban, Continuous-Integration-Config: Add a maven-release user to Gerrit {hawk} - https://phabricator.wikimedia.org/T132176#2309552 (Nuria) Open>Resolved
[15:59:55] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Pageview definition bug for apps pageviews on rest endpoint - https://phabricator.wikimedia.org/T135168#2309554 (Nuria) Open>Resolved
[16:00:23] <wikibugs>	 Analytics, Revision-Slider, TCB-Team, WMDE-Analytics-Engineering, and 2 others: Data need: User Behaviour when comparing article revisions - https://phabricator.wikimedia.org/T134861#2309556 (Tobi_WMDE_SW) p:Triage>Normal
[16:00:44] <nuria_>	 a-team: ahem.. standuppp
[16:00:50] <joal>	 :)
[16:08:55] <wikibugs>	 Analytics-Kanban: Figure out if the Changelog file can be updated in the release process by Jenkins {hawk} - https://phabricator.wikimedia.org/T132181#2309579 (madhuvishy)
[16:09:26] <wikibugs>	 Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2309587 (Nuria) a:Milimetric>Nuria
[16:09:55] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Create repo analytics.wikimedia.org with index and build of browser reports for puppet to source  and deploy to analytics.wikimedia.org - https://phabricator.wikimedia.org/T134506#2309593 (Nuria) a:Nuria
[16:35:20] <nuria_>	 madhuvishy: coming to tasking?
[16:35:58] <elukey>	 nuria_: coming in 5 mins, finishing the upgrade
[16:37:20] <ottomata>	 oh tasskibg
[16:38:38] <wikibugs>	 Analytics: Extract edit oriented data from MySQL from simplewiki (small size)  - https://phabricator.wikimedia.org/T134790#2309700 (Nuria)
[16:39:32] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2277147 (Nuria)
[16:55:13] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2277147 (Nuria) First question:  1) Do we want to move data to event bus first or rather we want to go directly to analytics schemas?  Given that our goal is to be able to have a prototype of data pip...
[17:20:07] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2309855 (Nuria) The DB loading is assumed to be a bootstrapping step, to happen only once. Updates to the past data that are happening to db data should come as eventbus events so we are not consideri...
[17:23:13] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2309857 (Nuria)
[17:23:22] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2277147 (Nuria)
[17:25:24] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2309864 (Nuria)
[17:27:44] <wikibugs>	 Analytics: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2309876 (Nuria) An idea on how to approach that task: http://www.gv.com/sprint/
[17:29:42] <wikibugs>	 Analytics-Kanban: Extract edit oriented data from MySQL for small  wiki - https://phabricator.wikimedia.org/T134790#2309884 (Nuria)
[17:32:59] <wikibugs>	 Analytics, WMDE-Analytics-Engineering: Remove http://datasets.wikimedia.org/aggregate-datasets/wikidata/ - https://phabricator.wikimedia.org/T125407#1986786 (Nuria) We need to:  - see if anyone is using this data - see if there are any crons creating data - removing files entirely  - removing apache access
[17:35:00] <wikibugs>	 Analytics-Kanban, WMDE-Analytics-Engineering: Remove http://datasets.wikimedia.org/aggregate-datasets/wikidata/ - https://phabricator.wikimedia.org/T125407#2309892 (Nuria)
[17:48:26] <wikibugs>	 Analytics-Kanban, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276672 (Nuria)
[17:49:57] <wikibugs>	 Analytics: Describe threat model for sanitized pageview data {mole} - https://phabricator.wikimedia.org/T131158#2309967 (Nuria)
[17:51:10] <wikibugs>	 Analytics-Cluster, Operations, ops-eqiad: kafka1013 hardware crash - https://phabricator.wikimedia.org/T135557#2309973 (Ottomata)
[17:56:54] <wikibugs>	 Analytics: Describe threat model for sanitized pageview data {mole} - https://phabricator.wikimedia.org/T131158#2310009 (Nuria) Goal: "If we release the dataset with and without page title info. what are exploits that we know are possible (cross-checking with other datasets)?"  Compute a month of data we cou...
[17:58:28] <wikibugs>	 Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2310013 (GWicke) @nuria: Again, T135240#2302880 has a link to a sample config, and instructions on which values to set for the peers.  If you would like us to set up & deploy a config for you, then I think we c...
[17:59:24] <elukey>	 a-team: logging off, talk with you tomorrow! I'll check later on aqs but look goot atm
[17:59:27] <elukey>	 byyyeeee
[17:59:34] <joal>	 bye elukey !
[17:59:35] <mforns>	 elukey, bye!
[17:59:44] <elukey>	 (tomorrow I'll migrate aqs100[23])
[18:00:01] <ottomata>	 byyye
[18:13:56] <nuria_>	 joal: see my e-mail about xanalytics lazy value
[18:14:11] <nuria_>	 joal: it cannot be used for bucketing (if that is what it was intended)
[18:14:28] <nuria_>	 joal: but it can report the bucketing that has happened client side
[18:14:39] <urandom>	 ottomata: hi.
[18:17:25] <jdlrobson>	 nuria_: hey you there?
[18:17:36] <nuria_>	 jdlrobson: holaaa yes
[18:17:43] <jdlrobson>	 figured might be easier to talk here :)
[18:18:11] <nuria_>	 jdlrobson: sure, that works too, do read my last e-mail i just sent
[18:18:15] <jdlrobson>	 yup
[18:18:20] <ottomata>	 urandom:  hiyaa
[18:18:21] <nuria_>	 jdlrobson: and let me know if it makes sense
[18:18:24] <jdlrobson>	 i figured i could clear up misunderstanding and then we could clarify
[18:18:31] <nuria_>	 jaja
[18:18:34] <jdlrobson>	 So all I want to do is be able to see there was more engagement in bucket A  than bucket B (e.g. which bucket had more page views)
[18:18:39] <urandom>	 ottomata: how are you!? :)
[18:18:47] <jdlrobson>	 bucket A would lazy load images, bucket B would not
[18:18:47] <nuria_>	 jdlrobson: yes, understood
[18:18:55] <jdlrobson>	 the hope is that through better performance we'll see more page views
[18:19:12] <jdlrobson>	 So in my head this has nothing to do with JS
[18:19:13] <nuria_>	 jdlrobson: you can REPORT bucketing with x-analaytics
[18:19:20] <ottomata>	 urandom:  am well!  a little sore throaty today so meh, but not enough to take day off :)  got my head stuck up in druid puppetization land :)
[18:19:24] <ottomata>	 how are you?
[18:19:27] <nuria_>	 jdlrobson: but bucketing has to be stored client side elsewhere
[18:19:39] <nuria_>	 jdlrobson: how does the client know that it has to lazy load images?
[18:19:40] <urandom>	 ottomata: i'm ok
[18:19:51] <jdlrobson>	 nuria_: the 50% a/b test will be done in varnish
[18:19:53] <jdlrobson>	 based on IP address
[18:20:05] <urandom>	 ottomata: i'm looking for someone to exploit^H^H^H^Hto ask for a merge
[18:20:13] <nuria_>	 jdlrobson: that is what i am saying you cannot dowith x-analytics
[18:20:31] <urandom>	 ottomata: you have a fetching timezone, and i think, +2 on puppet, yes? :)
[18:20:47] <nuria_>	 jdlrobson: cause lazy loading of images is happening from client correct?
[18:20:58] <nuria_>	 jdlrobson: or is varnish initiating teh lazy loading?
[18:21:00] <nuria_>	 *the
[18:21:39] <jdlrobson>	 nuria_: varnish. Haven't quite worked out how we propagate that down to the PHP as im waiting on the implementation from brandon black
[18:21:47] <ottomata>	 hah, urandom yes indeed
[18:21:51] <ottomata>	 whatchu neeeed? :)
[18:21:57] <urandom>	 https://gerrit.wikimedia.org/r/#/c/289685/
[18:22:00] <jdlrobson>	 but am hoping PHP will be able to read a header
[18:22:14] <urandom>	 it's for the restbase staging env, not production per say
[18:22:17] <nuria_>	 jdlrobson: varnish is initating the lazy loading of images .. wait.. how?
[18:22:26] <ottomata>	 ok cool
[18:22:32] <ottomata>	 was about to ask, also looked it up in site.pp
[18:22:33] <ottomata>	 sounds fine to me
[18:22:35] <urandom>	 ottomata: \o/
[18:22:37] <ottomata>	 can you run puppet?
[18:22:40] <ottomata>	 or shall I?
[18:22:41] <urandom>	 i can
[18:22:43] <ottomata>	 ok
[18:22:46] <nuria_>	 jdlrobson: listening, sorry, go ahead
[18:22:46] <urandom>	 i have it disabled atm
[18:23:53] <jdlrobson>	 nuria https://phabricator.wikimedia.org/T127883 so it was my understanding that varnish will look at the IP of the incoming user if (is_ipv4 and ipv4_string ~ '[0-4]$') { turn_on_lazy_loading }
[18:24:06] <urandom>	 ottomata: thank you sir!
[18:25:14] <ottomata>	 ya puppet-merged
[18:25:15] <ottomata>	 yw
[18:27:04] <jdlrobson>	 i'm not entirely sure about the actions of turn_on_lazy_loading but am hoping it sets a header
[18:27:26] <joal>	 nuria_: I understand your point and that idea of header ids for reporting was implicit for me
[18:27:47] <nuria_>	 jdlrobson: so the php is going to look different for the users in the test? are we rendering on the php side different "<img>" tags given a header passed from varnish?
[18:27:49] <joal>	 nuria_: You surely can't enforce information client side based on server side event :)
[18:28:24] <joal>	 nuria_: but as you said, you can report, and that my point
[18:28:42] <nuria_>	 joal: the part i missed was that is varnish (not the client) turning on lazy loading which seems odd but not so much on our stack
[18:29:42] <joal>	 nuria_: Would be good to double check as I suggested before turning the actual feature on that both population look the same (meaning having varnish split populations, send different headers, but no difference in code, and after av few days, activate code difference
[18:29:49] <nuria_>	 jdlrobson: brandon's comment might imply that this cache bypass is only happening for logged in users though..
[18:30:25] <jdlrobson>	 nuria_: php html output will be different yes for 2 groups
[18:30:33] <jdlrobson>	 the cache will be fragmented
[18:30:41] <jdlrobson>	 for anons and logged in
[18:30:49] <jdlrobson>	 which comment makes you think that?
[18:31:13] <joal>	 a-team, logging off, I'll finish pageview backfiling tomorrow morning and start playing with new cassandra, but will be off in the afternoon
[18:31:52] <joal>	 Have a good end of day and a good weekend (except for European time people that I'll see tomorrow :)
[18:33:16] <nuria_>	 jdlrobson: ok, read the whole ticket and if you want a 50% split even you wouldn't get that with IPs
[18:33:32] <nuria_>	 jdlrobson: but that is something else i can talk to brandon about
[18:35:03] <nuria_>	 jdlrobson: let me talk to brandon about the ip strategy
[18:44:33] <jdlrobson>	 nuria_: that would be great
[18:44:54] <nuria_>	 jdlrobson: it is a great risk to launch this to 50% of our users
[18:45:27] <jdlrobson>	 nuria_: we are ready to launch it. We are not worried about the code. We just want to measure impact at this point. You are right though it does not need to be a clear 50% split
[18:46:02] <jdlrobson>	 the most important thing is that with regards to performance we can distinguish 2 buckets of users who use a cache populating by the same amount of users
[18:46:18] <nuria_>	 jdlrobson: we can find out how well it works with a much smaller set of users, impact in a user base as large as ours can be stablish by pinging 10% of users
[18:46:23] <jdlrobson>	 which i guess would mean we need group a, group b and control where control = 80%, A and B are 10% each
[18:47:01] <jdlrobson>	 cos if A is using a cache populated by 90% of users and B is using one cached by 10% we have a problem measuring performance impact
[18:57:03] <wikibugs>	 Analytics-Kanban: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310174 (Nuria)
[18:57:12] <wikibugs>	 Analytics: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310186 (Nuria)
[19:05:58] <wikibugs>	 Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, and 2 others: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2310198 (Sadads) I would think standard enabling it across the board, is going to be important, both...
[19:17:17] <mforns>	 milimetric, do you know where do the metrics starting with average- in cloud9 come from? I could not find a definition anywhere...
[19:17:27] <milimetric>	 looking
[19:17:42] <milimetric>	 yes, hang on
[19:18:38] <milimetric>	 mforns: we added the word "average" ourselves so we could understand what they meant, because it's a big cryptic just transfering the title from where they are in wikistats without the numbers that give you an idea of what they're measuring:
[19:18:39] <milimetric>	 https://stats.wikimedia.org/EN/SummaryEN.htm
[19:18:50] <milimetric>	 https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
[19:18:57] <milimetric>	 (mostly that first one)
[19:19:21] <milimetric>	 so like the row that says "New Articles per Day" would be average-new-articles-per-day
[19:20:24] <milimetric>	 but they're still kind of confusing to me, mforns
[19:20:50] <milimetric>	 it sounds like we'd count something every day, then sum and divide by the count, and that's the "monthly" number
[19:21:09] <mforns>	 milimetric, thanks a lot, looking into it
[19:24:43] <wikibugs>	 Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-Ninety-nine-problems-but-Nirzar-aint-one: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2310242 (dr0ptp4kt)
[19:28:04] <wikibugs>	 Analytics, Hovercards, Unplanned-Sprint-Work: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2310250 (dr0ptp4kt)
[19:28:10] <wikibugs>	 Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-Ninety-nine-problems-but-Nirzar-aint-one: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2310248 (dr0ptp4kt) Open>Resolved Looks to be working based on data from the hour 2016051801 on E...
[19:35:57] <HaeB>	 ottomata: so on stat1002 we have mailx available, which is super useful for auto-emailing results to yourself after a long query has completed
[19:36:05] <HaeB>	 ...can we have mailx on stat1004 too?
[19:39:38] <ottomata>	 milimetric:  woooo finally! io.druid.indexing.worker.executor.ExecutorLifecycle: Task completed with status: {
[19:39:39] <ottomata>	 \  "status" : "SUCCESS",
[19:39:46] <ottomata>	 malix!
[19:39:50] <ottomata>	 huh never heard of it...
[19:39:51] <ottomata>	 looking
[19:40:10] <ottomata>	 HaeB:  do you know if it is from a .deb package? and if so, what?
[19:40:24] <milimetric>	 awesome ottomata
[19:40:39] <ottomata>	 so ok, milimetric, now what?
[19:40:40] <ottomata>	 :)
[19:40:52] <ottomata>	 oh HaeB mailx?
[19:41:19] <madhuvishy>	 ya mailx
[19:41:24] <HaeB>	 https://en.wikipedia.org/wiki/Mailx ;)
[19:42:02] <HaeB>	 stat1003 has it too
[19:42:18] <ottomata>	 oh HaeB sorry, i read that as malix
[19:42:18] <ottomata>	 haha
[19:42:25] <ottomata>	 yeah, stat1004 is a little different, but ja we can do that
[19:43:21] <HaeB>	 that would be great
[19:43:44] <HaeB>	 how else is it different btw? (just curious)
[19:43:47] <milimetric>	 ottomata: I mean we've played with it in labs enough.  I think the unknowns are mostly performance.  But we do want to look at some of the new functionality like the lookups.  So don't tear it down yet.
[19:44:28] <ottomata>	 oh i won't tear it down, i have lots more to do, but i guess i want to verify deep storage stuff next
[19:44:33] <ottomata>	 i'll turn back on mysql and hdfs storage stuff
[19:44:36] <ottomata>	 try to get that all straight
[19:44:46] <milimetric>	 makes sense
[19:44:46] <ottomata>	 then we'll try indexing some bigger files out of hdfs
[19:44:54] <ottomata>	 might need some help to test querying in a bit
[19:44:56] <ottomata>	 maye tomorrow
[19:45:17] <milimetric>	 yeah, we're working on exporting lots of data next I guess
[19:47:05] <ottomata>	 ah rats actually, now historical is having problems
[19:47:05] <ottomata>	 hm
[19:47:34] <milimetric>	 memory?
[19:48:38] <ottomata>	 no
[19:48:38] <ottomata>	  Instantiation of [simple type, class io.druid.segment.loading.LocalLoadSpec] value failed: [/tmp/druid/localStorage/pageviews/pageviews/2015-09-01T00:00:00.000Z_2015-09-02T00:00:00.000Z/2016-05-19T19:37:17.074Z/0/index.zip] does not exist
[19:48:51] <ottomata>	 HaeB:  done
[19:49:16] <HaeB>	 nice, thanks!
[19:49:40] <milimetric>	 maybe one thing's looking in hdfs and the other's saving to disk or vice versa
[19:51:10] <ottomata>	 yeah i think its a java tmp dir thing, looking into it...
[19:52:46] <ottomata>	 huh weird milimetric this is because I have local storage set to noop
[19:52:51] <ottomata>	 i thought it woudln't try to use deep storage then
[19:52:53] <ottomata>	 clearly it is
[19:53:01] <ottomata>	 the default localstorage directory is in /tmp/druid/localStorage
[19:53:04] <ottomata>	 huh
[19:53:33] <ottomata>	 i guess i'll make the default be local, instead of noop
[19:53:36] <ottomata>	 and just set that properly
[19:54:05] <milimetric>	 yeah, I never saw noop before, maybe it's a bad/old doc or something
[19:56:01] <urandom>	 ottomata: could i trouble you for one more merge? -- https://gerrit.wikimedia.org/r/#/c/289722/
[19:56:13] <urandom>	 ottomata: same as before, but widens it from one host, to all (in rb staging)
[19:56:38] <ottomata>	 urandom:  this will do all of them?
[19:56:41] <urandom>	 ottomata: and baring some 'oh crap' moment where i needed to rollback, this would be the last
[19:56:43] <ottomata>	 including prodones?
[19:56:47] <ottomata>	 prod ones
[19:56:50] <urandom>	 just staging
[19:56:56] <urandom>	 staging in eqiad and codfw
[19:57:29] <ottomata>	 ah i see _test_ ones
[19:57:29] <ottomata>	 ok
[19:57:47] <urandom>	 yeah, our staging is in the prod network... but they aren't prod hosts per say
[19:57:53] <ottomata>	 aye
[19:58:01] <ottomata>	 done
[19:58:11] <urandom>	 ottomata: sweet; thanks again man!
[19:58:14] <ottomata>	 np!
[20:03:48] <grrrit-wm>	 (CR) Jdlrobson: [C: -1] "Needs rebase" [analytics/wikistats] - https://gerrit.wikimedia.org/r/145862 (owner: Nemo bis)
[20:04:16] <grrrit-wm>	 (CR) Jdlrobson: [C: -1] "Needs rebase" [analytics/wikistats] - https://gerrit.wikimedia.org/r/118261 (owner: Nemo bis)
[20:18:14] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310391 (BBlack) From irc conversation in wikimedia-operations w/ @Nuria, @Krinkle, and myself.  This is the varnish-level pseudo-code proposed (ignore arbitrary names and con...
[20:26:40] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310419 (Nuria) Looks great, one minor nit: rather than having distinct cookies per feature we can have one weblab cookie that contains all features and bucketing for those....
[20:26:48] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310422 (BBlack) Other nits and notes:  1. Don't send the cookie to the applayer, just the feature header 2. Validate the cookie's value, clear+reset if invalid 3. Block the f...
[21:05:25] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310635 (BBlack) Better pseudo-code, after more conversation:  Data structure (which we update as we add/remove experiments): ``` experiments => { # 100 total bins to use: 0-9...
[21:10:21] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310677 (BBlack) If the `setRequestHeader` bits didn't use else-if, you could have overlapping buckets with multiple features in play too, but this seems simpler for the momen...
[21:11:42] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310694 (BBlack) Another complication that didn't come up in conversation earlier: what about domainnames for these cookies?  They'll be getting binned independently for every...
[21:14:47] <wikibugs>	 Analytics, Operations, Performance-Team, Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2310716 (BBlack) Another thought: with the above code, I've intentionally set it so that both sides of the experiment will initially share an empty cache split (they'll have t...
[21:39:47] <wikibugs>	 Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2310840 (DannyH) @eranroz @Earwig  What's an estimate of how many queries Eranbot and cop...
[22:11:03] <grrrit-wm>	 (Abandoned) Madhuvishy: Revert "Test commit for jenkins release testing" [analytics/refinery/source] (release) - https://gerrit.wikimedia.org/r/279410 (owner: Madhuvishy)