[00:53:24] yurik: I'm pretty sure not [01:37:34] Quarry: Setup an easy way to have Quarry dump information / results on a wiki page - https://phabricator.wikimedia.org/T137179#2359905 (yuvipanda) [01:38:20] Quarry: Setup an easy way to have Quarry dump information / results on a wiki page - https://phabricator.wikimedia.org/T137179#2359918 (yuvipanda) [03:41:03] (CR) KartikMistry: "We should add check for clicks.txt formatting too, but that's fine as of now." [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/291358 (https://phabricator.wikimedia.org/T135584) (owner: Amire80) [08:43:37] elukey: ALERTTTTT ;) [08:50:55] joal: ??? [08:51:19] I received an email about puppet on memcached on labs in project analytics ;) [08:51:34] nothing dramatic, but still wanted to let you know :) [08:51:51] memcached??? [08:52:22] elukey: forwareded the message [08:52:27] thanks :) [08:52:56] elukey: I thought it was you, cause you're the only one I have heard speak of memcached lately :) [08:53:50] nono and I've never head about memcached in analytics labs :P [08:53:58] usually if I break memcached it is only in prod [08:53:59] :D [08:54:06] :D [09:03:42] Analytics, Commons, Multimedia, Tabular-Data, and 4 others: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426#2265305 (rufuspollock) Hi, I'm one of the authors of JSON Table Schema and of the Data Package family of specifications (I was also an... [10:10:48] !log hue restarted on analytics1027 for security upgrades [10:10:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [10:10:55] joal --^ [10:11:03] k elukey [10:44:23] joal: currently re-building and prepping the new version of vk [10:44:49] Great ! [10:44:50] it should be ready today, and we were thinking to deploy it in maps first then misc [10:44:56] aesome :) [10:44:58] in a couple of days more or less [10:45:15] then we'll add the "end:" prefix to the config [10:45:20] and the issue should go away [10:45:31] right [10:49:23] all right, lunch! [10:49:24] brb [11:24:10] back! [11:25:06] I think that I will need to purge some kafka logs again [11:25:29] some 1022 partitions are down to 17% of free space [11:26:36] pfff [11:26:49] Man, that's bad [11:27:20] maybe we can avoid the purge since the restarts happened on the first [11:27:30] so probably tomorrow we should see some relief [11:27:51] hm, didn't understand, can you refomulate please? [11:28:36] sure! So tomorrow the regular 7 days retention deletion policy should kick in [11:28:41] and clear some logs [11:28:55] because we restarted the first ones on the first [11:29:02] Ah, yes, got it : the first we restarted, it's 6 days ago [11:29:03] and all the log got mtime == Jun 1 [11:29:07] yup [11:29:21] k [11:30:48] but it is indeed very bad [11:31:39] also elukey, about cassandra: perf when loading the table is actually not as good, but still very much ok (500 reqs/s max while writing) [11:31:52] I'll do the test while compacting today [11:32:42] joal: yeah it was expected, stil a good result :) [11:32:47] yessir [11:32:57] we could think about scaling the cluster in the future if we need more powa [11:33:05] yup [11:33:09] costs money though [11:34:57] * elukey plays "Pink Floyd - Money" in the background [11:35:18] * joal hears the cashier rings [11:35:30] :D [13:29:34] * elukey afk for a bit! [14:12:30] Analytics, MediaWiki-Authentication-and-authorization: Verify there are no analytics jobs accessing Zero portal - https://phabricator.wikimedia.org/T137174#2359752 (Tgr) Per IRC discussion with Yuri, this does not block enabling AuthManager. [14:53:56] new vk installed on cp1046 (maps), let's see how it goes [14:57:30] o/ [14:57:52] Hey folks. I'm working with a researcher (likely future intern) who is going to need access to GPU resources for computing. [14:58:11] He'll be using them to generate Neural Networks for predicting fun things like Article Quality, Vandalism, etc. [14:58:19] Do we have any GPU computing resources? [14:58:50] * joal not know about GPU :( [15:00:45] halfak: no, we don't have any of those [15:08:33] o/ moritzm [15:08:38] Bummer. [15:09:00] I wonder what it would take to get these computing resources. [15:09:27] ottomata: Heyaa [15:09:58] is there some process I could engage with via the analytics team to get these types of resources available to our stats machines? [15:10:04] E.g. goes analytics have budget for this? [15:10:13] or maybe we should just use amazon's services [15:10:25] HIII [15:10:37] halfak: whatcha talking about? [15:10:44] o/ [15:10:49] halfak: unfortunately CUDA only works with the proprietary Nvidia driver, so it would clash with our FOSS only policy [15:11:49] OpenCL is FOSS, but it seems CUDA is more heavily used [15:12:03] vinh, ^ [15:12:25] ottomata, talking GPU computing resources for vinh (who is likely going to work with us this summer) [15:12:31] us = research [15:12:46] ah hm [15:12:57] hello [15:13:07] With our Machine Learning work, it seems that this demand for GPU computing will grow [15:13:27] from my experience, OpenCL is even slower than CPU :'( [15:13:29] no idea why [15:14:02] moritzm: I remember when Linus tried to explain what you just said less politely :D [15:15:17] heheh [15:16:20] halfak: for the immediate internship work AWS seems the most viable option [15:17:06] ottomata: new vk running on cp1046 :) [15:17:15] there is an option for requesting free AWS credits for education [15:17:28] ottomata: brain bounce on avro/kryo for flink? [15:17:29] not sure if it works for WMF [15:18:08] elukey: awesooome! [15:18:26] elukey: if that runs fine for a while, i'd say install it on all misc and let [15:18:32] let's see if our alerts disappear [15:18:37] it works with the actual config, now I am going to live hack adding "end:" [15:18:42] joal: sure, gimme 5 mins... [15:21:17] ottomata: plan would be to install it on maps, let it boil, then misc [15:21:38] milimetric, yt? [15:23:04] aye cool [15:23:06] hi mforns yea [15:23:08] joal: batcave! [15:23:13] OMW [15:23:24] milimetric, can you batcave-2 for couple mins? [15:23:38] mforns: in https://plus.google.com/hangouts/_/wikimedia.org/a-batcave-2 [15:26:11] vinh, seems like we should bring this up with nuria when she gets online [15:26:29] who is Nuria, sorry? [15:26:34] She's the manager of the analytics team and will be able to help us work out whether or not analytics wants to own this and how [15:26:49] I see [15:26:52] In the end, we could do this ad-hoc with Research budget [15:27:05] yeah [15:27:18] for instance, I am running some exps with AWS credits of my team [15:27:26] it costs 0.65$/hour [15:27:29] :-? [15:27:30] quite lot [15:34:55] halfak / vinh: nuria's on vacation for a while [15:36:11] well, this week but after that we have our offsite so it may be a while until you get a response [15:36:15] I'm reading up to catch up [15:39:40] ok, yeah, I agree that asking AWS for free educational credits is probably your best option. They've been nice to us in the past, just remember the privacy policy and what data is ok to upload to third parties (basically roughly data that's already public) [15:49:06] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2361207 (Ottomata) [15:51:18] Analytics, Analytics-Cluster, Deployment-Systems, scap, Scap3 (Scap3-Adoption-Phase1): Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2361210 (Ottomata) a:elukey [15:52:38] Analytics-Cluster, Analytics-Kanban: Procure hardware for future druid cluster - https://phabricator.wikimedia.org/T116293#2361215 (Ottomata) [16:01:04] a-team standup! [16:41:39] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2361334 (elukey) @Cmjohnson: Hi! Any idea if we could replace the disk during the next two weeks? Thanks! [16:45:26] milimetric: Heya [16:45:30] Can I be of any help ? [16:47:47] hey joal no, I just have to write this scala code, I'm just still slow at it (just doesn't come natural) [16:48:00] milimetric: ok no prob :) [16:48:30] I talked to Marcel and I'll include more details about the events in the state / event hybrid table [16:49:03] k milimetric [16:50:50] hi mforns. a quick note about the purging of data for the 3 schema we were discussing: I'm working on some queries to understand the data that we have collected better. I'd like us to discuss the details of what to be deleted and what not after my first round of analysis is complete. [16:51:25] mforns: we still have almost a month for purging the first line of data that should go out. By when do you want the decision about what to purge and what to not purge be finalized? [16:51:33] leila, ok! thanks for the update [16:53:34] leila, you mean the data is 2 months old now, and that we have a month for the records to start being purged (if purge is needed)? [16:53:43] yes, mforns. [16:54:31] leila, makes sense. I'm putting together a white list of all tables and fields that will be kept for more than 90 days in EL database [16:55:01] going afk a-team, byeeee o/ [16:55:05] I'm shooting for having a response for you no later than the middle of next week, mforns. [16:55:08] byee elukey. [16:55:13] o/ [16:55:22] Bye elukey :) [16:55:32] leila, maybe I can just generate the white-list without your tables, for now. And we'll have 1 month time to add your tables/fields in case we want to keep them [16:55:44] leila, what do you think? [16:55:47] ottomata: so atm only cp1046 is running with the new vk, didn't do anything else. Tomorrow will proceed with carbon upload and maps installs [16:56:06] works well for me mforns. [16:57:14] leila, and probably, even if the whitelist is ready in a couple days, the start of the auto-purging will actually be later than your update, so I think we can do it like this. [16:57:45] cool leila, thanks for looking into this! [17:03:53] a-team, no retro? [17:04:06] no mforns sorry we forgot to tell you [17:04:13] milimetric, ah ok np at all [17:04:16] two people are missing so we cancelled [17:04:19] sure [17:12:53] elukey: cool souds good [17:16:56] wikimedia/mediawiki-extensions-EventLogging#561 (wmf/1.28.0-wmf.5 - 9acd456 : thcipriani): The build has errored. [17:16:56] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/9acd4561b40a [17:16:56] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/135929900 [17:34:18] gone for tonight a-team, see you tomorrow [17:34:25] bye joal! [17:34:29] nite jo [17:42:57] laters! [17:51:38] that's me again^ [17:51:40] oops [17:51:46] !log restarting broker on kafka1020 [17:51:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [18:07:34] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [18:08:06] that's me^ it'll flop in a sec [18:19:33] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [19:30:08] Analytics, Commons, Multimedia, Tabular-Data, and 4 others: Review shared data namespace (tabular data) implementation - https://phabricator.wikimedia.org/T134426#2361980 (Yurik) * I keep wondering if we can use Wikidata/Wikibase more for this. Wikidata IDs for the license might be much more con... [19:48:30] Analytics-Cluster, Analytics-Kanban: Kafka 0.9's partitions rebalance causes data log mtime reset messing up with time based log retention - https://phabricator.wikimedia.org/T136690#2362021 (Ottomata) Just created https://issues.apache.org/jira/browse/KAFKA-3802 [20:32:25] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2362205 (Ottomata) Ja anytime, we can stop this server with no service downtime, just have to be ready to do it. [21:14:57] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: Smartctl disk defects on kafka1012 - https://phabricator.wikimedia.org/T136933#2362347 (Cmjohnson) @elukey We can do it whenever you want. I have disks on-site. Let me know a good day and time. [21:47:01] Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941#832144 (Capt_Swing) Hey @yuvipanda, possible we could get this implemented in the near future? Every time I use Quarry and share the results with anyone, I wish for this feature. [22:16:22] Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941#2362536 (yuvipanda) I'll try to get to it before end of next week.