[00:58:51] NICe! [00:58:54] o/ [06:14:42] away [06:14:45] morning :) [06:14:53] (wrong irssi command) [06:18:19] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10MoritzMuehlenhoff) 05Resolved→03Open @herron : You've added her to the wrong group, staff members need to be a member of cn=wm... [06:19:56] (03CR) 10Elukey: [V: 03+2] aqs: move the oozie hourly coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525247 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [06:20:04] (03PS2) 10Elukey: banner_activity: move oozie daily coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525248 (https://phabricator.wikimedia.org/T227257) [06:20:10] (03CR) 10Elukey: [V: 03+2] banner_activity: move oozie daily coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525248 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [06:20:24] these were not merged :( [06:21:23] and we already deployed refinery afaics [06:33:44] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Allow analytics VLAN to reach eventgate-analytics.discovery.wmnet:31192 - https://phabricator.wikimedia.org/T228882 (10elukey) 05Open→03Resolved a:03elukey ` + term eventgate { + from { + destination-address { +... [06:33:54] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Operations, and 2 others: Make oozie swift upload emit event to Kafka about swift object upload complete - https://phabricator.wikimedia.org/T227896 (10elukey) [08:11:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Load Netflow to Druid - https://phabricator.wikimedia.org/T225314 (10elukey) @mforns if you have time, let's add the missing step to load periodically data into Druid if possible. I had a chat with @ayounsi and they have two use cases: 1) enable/disable n... [09:08:19] (03PS1) 10Elukey: browser-general: move oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525503 (https://phabricator.wikimedia.org/T227257) [09:21:29] (03PS1) 10Elukey: cassandra: move oozie bundle to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525507 (https://phabricator.wikimedia.org/T227257) [09:26:43] 10Analytics, 10EventBus, 10serviceops, 10Patch-For-Review: helmfile apply with values.yaml file change did not deploy new k8s pods - https://phabricator.wikimedia.org/T228700 (10fsero) 05Open→03Resolved a:03fsero [10:07:22] 10Analytics, 10Analytics-Kanban: wikistats editor graphs broken - https://phabricator.wikimedia.org/T228931 (10fdans) Thank you for fixing this @Nuria [10:07:47] mforns: o/ [10:07:49] ready to merge? [10:07:58] elukey, yea! [10:08:08] elukey, was checking the netflow table [10:08:16] but if you said the data is already there, then ok! [10:08:44] mforns: atm I think no, but sometimes data flows when Arzhel enables it.. and IIUC they are going to add it soon [10:08:56] just wanted to create all the automation so they'll get data as theyneed [10:08:59] makes sense [10:08:59] ? [10:09:29] elukey, ah! mmmm [10:10:06] not sure what happens if there's no data... [10:10:24] lookin [10:10:30] ah okok [10:13:26] elukey, druid loading is a refine job, so if there's no source data, it will not run for that partition, so yea, no problemo [10:13:31] we can merge I think! [10:13:56] I see some data in Hive for intermitent days [10:17:20] yep exactly [10:23:37] 10Analytics: Deletion of limn-language-data repository - https://phabricator.wikimedia.org/T228975 (10fdans) [10:30:17] 10Analytics: Deletion of limn-language-data repository - https://phabricator.wikimedia.org/T228975 (10Amire80) They are definitely valuable, but we can probably move to some other repositories. Let me clean that up. [10:32:41] 10Analytics: Deletion of limn-ee-data repository - https://phabricator.wikimedia.org/T228979 (10fdans) [10:38:09] 10Analytics: Deletion of limn-flow-data repository - https://phabricator.wikimedia.org/T228981 (10fdans) [10:41:26] 10Analytics: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10fdans) [10:48:34] (03CR) 10Mforns: Hash temporary identifiers in app schemas (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525280 (https://phabricator.wikimedia.org/T226852) (owner: 10Bearloga) [10:50:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Load Netflow to Druid - https://phabricator.wikimedia.org/T225314 (10mforns) Just for the record, @elukey and I looked into this, and we confirmed that we can merge the puppet patch that will launch the druid loading job. [10:51:13] (03CR) 10Mforns: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525503 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [10:52:27] (03CR) 10Mforns: [C: 03+1] "LGTM" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525507 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [10:52:53] mforns: next week we can talk about why we need --^ [10:53:05] elukey, sure :] [10:53:50] elukey, you saw that I confirmed we can merge netflow druid load job right? [10:57:56] yep yep! Will do it asap [11:01:52] 10Analytics: Deletion of limn-ee-data repository - https://phabricator.wikimedia.org/T228979 (10mforns) The limn-ee-data/ee-migration folder contains several RU queries and config, but they are not currently scheduled for execution in puppet. I don't know if we can delete them though, maybe @Catrope knows? [11:04:59] elukey, ok sorry, because I said that and immediately commented on sth else, so I wasn't sure [11:07:23] np! Thank you for the review [11:07:28] review/work/etc.. [11:12:38] 10Analytics: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10mforns) The limn-edit-data/edit/ folder contains several RU queries and config, however they are not currently scheduled in puppet for execution. I think those are the reports used in the old compare Wikitext vs Vis... [11:48:36] * elukey lunch + errand! [13:49:56] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Marostegui) [14:25:28] its not clear, when loading a high cardinality numerical column into druid to i want to pre-bucketize (0-10, 11-100, etc) as strings? [14:25:36] s/to i/do i/ [14:29:19] 10Analytics, 10Analytics-Kanban: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) p:05Triage→03Normal [14:32:53] mforns_: can probably answer erik's question bets ^^ [14:32:54] best* [14:51:37] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move refinery to hive 2 actions - https://phabricator.wikimedia.org/T227257 (10EBernhardson) >>! In T227257#5360500, @elukey wrote: > @EBernhardson hi! Can we restart the coordinators listed in https://gerrit.wikimedia.org/r/523212 to pick up the new changes?... [14:53:58] ebernhardson, depends on the cardinality, do you know how high it is? [14:54:06] (03CR) 10Nuria: [C: 03+2] "Let's keep track on chu chu train etherpad of what jobs need to be restarted as we merge changes such as these" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525503 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [14:54:36] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Marostegui) [14:55:22] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Marostegui) a:05Marostegui→03RobH These two hosts are ready for #dc-ops to decommission [14:56:06] mforns_: 1-40M or so. Its the number of search results found [14:56:11] s/1/0/ [14:56:21] ebernhardson: any column in druid is a string, https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines#Ingestion_into_Druid [14:56:49] nuria: right. I think it should be bucketized, but then i looked through refinery and didn't see any examples [14:57:00] ebernhardson, and is that something you want to store in druid as a dimension? to be able to i.e. split your charts by number_of_results? [14:57:00] i can bucketize easily, but after not finding any examples i thought maybe you have some other solution? [14:57:05] ebernhardson: but i can help further if you tell me what type of column it is, our ingestion process will bucketize it but druid doesn't do it directly [14:58:16] mforns_: some of them yes. Mostly a few very small numbers (0,1,2,3-5) because we do special behaviours on those types. [14:59:30] ebernhardson, is your data source formatted as parquet? [15:00:09] mforns_: well, it's currently json because i was following virtualpageviews as example code [15:00:17] mforns_: but it's parquet before then [15:01:11] ebernhardson, we have a scala/spark module in our refinery-source codebase: HiveToDruid, that can load to druid any Hive table [15:01:34] and it has the ability to apply druid transforms to the source data, like bucketing [15:02:00] however, Druid ingestion for our current Druid version does not allow for transforms if the undelying data is parquet... [15:02:29] mforns_: i have no problems doing it in spark, i already wrote a short python scrit that does all that for my use case (didn't realize it existed) [15:02:35] (03CR) 10Nuria: "Much easiert to read python than pure shell for this workflow." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525435 (https://phabricator.wikimedia.org/T227896) (owner: 10Ottomata) [15:03:31] mforns_: basically i can already run the full pipeline and import the data into druid through oozie coordinators [15:03:46] mforns_: i'm just curious if i should change my spark code to use a different bucketizing solution [15:03:46] ebernhardson, oh ok, so then, yes, I'd say definitely bucket that field [15:04:10] ok, excellent! [15:05:07] ebernhardson, if you have already written the code that loads that, I'd say just bucket the field in spark, directly to the RDD or DataFrame before calling the ingestion process, I guess [15:05:59] 10Analytics, 10EventBus, 10MassMessage, 10Operations, and 2 others: Write incident report for jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10WDoranWMF) [15:10:05] a-team: i am going to re-start media jobs [15:10:48] nuria: let me know if I can help [15:10:53] milimetric: k [15:13:58] elukey: I have no problem deploying refinery so we can also get the jobs that did not have the +2, let me know [15:17:49] nuria: it would be great, possibly also adding the other two that I filed this morning? [15:18:01] (the cassandra bundle will have to be restarted next week though) [15:18:17] ebernhardson: when you have a min https://gerrit.wikimedia.org/r/c/mediawiki/event-schemas/+/525562/1/jsonschema/swift/upload/complete/current.yaml [15:18:28] elukey: then let's leave the cassandra one to be merged next week [15:18:40] elukey: are all the workflows merged to refinery? [15:18:41] ottomata: i have it oppen just havn't gotten around to it, will look! [15:18:55] :) thanks! [15:19:03] nuria: merged the browser general one, all set [15:19:04] oh ebernhardson i just submitted that [15:19:05] that is the jsonschema [15:19:13] ottomata: oh, i have the other one open i guess :) [15:19:19] the python one is still wip [15:19:20] ok [15:19:20] no worries htere [15:19:28] elukey: ok, and let's write on https://etherpad.wikimedia.org/p/analytics-weekly-train what needs to be retsarted [15:19:38] *restarted [15:20:29] nuria: re-added, I moved them to "next" [15:20:44] (I moved them previously I meant) [15:21:01] ottomata: one thought, when sending single-url events should the prefix of the event also be the single-url? That would ensure querying the listing url returns the same set as the embedded urls [15:21:16] elukey: ah ya [15:21:35] err it wouldn't be the full-url, but swift doesn't care about directories or such the listing is a prefix search. So we could set the prefix to the full path [15:21:46] elukey: ok, i saw a change today that i think mforns_ CR-ed, is that the additional one? [15:22:28] correct [15:22:35] the browser-general one [15:25:08] elukey: ok,! [15:25:30] !log deploying refinery (just refinery but not refinery-source) [15:25:56] (03CR) 10Elukey: [V: 03+2] browser-general: move oozie coordinator to hive2 actions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525503 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [15:32:04] (03CR) 10Milimetric: [C: 03+2] Commit package-lock.json to make CI builds much faster [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/524290 (owner: 10Jforrester) [15:32:11] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10RobH) [15:33:56] (03CR) 10Milimetric: cassandra: move oozie bundle to hive2 actions (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525507 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [15:35:22] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: `dbproxy1004.eqiad.wmnet` - dbproxy1004.eqi... [15:35:27] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts: `dbproxy1009.eqiad.wmnet` - dbproxy1009.eqi... [15:35:28] (03CR) 10Elukey: ">" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525507 (https://phabricator.wikimedia.org/T227257) (owner: 10Elukey) [15:43:49] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10RobH) a:05RobH→03None [15:44:43] a-team Im out running two errands, I should be in for post standup/ groskin, sorry!! [15:59:44] * mforns_ tests [16:03:12] fdans: ok, please send e-scrum [16:05:13] 10Analytics, 10Analytics-Kanban: wikistats editor graphs broken - https://phabricator.wikimedia.org/T228931 (10Nuria) a:03Nuria [16:09:21] 10Analytics: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10Jdforrester-WMF) That's @Neil_P._Quinn_WMF and @ppelberg's call, I'm not the product owner or analyst any more. :-) [16:09:36] 10Analytics, 10Editing-team: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10Jdforrester-WMF) [16:13:10] ebernhardson: you mean in meta.uri? [16:13:51] hm [16:14:07] the meta.uri could be anything, but in the swift-upload.py case, i'm only supporting uploading of directories [16:14:23] so i will always use an object prefix; either a custom one, or the basename of the dir [16:18:44] 10Analytics, 10Analytics-Kanban: Roll restart all openjdk-8 jvms in Analytics - https://phabricator.wikimedia.org/T229003 (10elukey) Dry run for the hadoop test cluster: ` elukey@cumin1001:~$ sudo cookbook -d sre.hadoop.rolling-restart-workers.py test --yarn-nm-batch-size 1 --hdfs-dn-batch-size 1 DRY-RUN: Exe... [16:19:56] ottomata: not the meta uri, but the object prefix [16:21:08] ottomata: essentially what i'm thinking is if the consumer were to query swift for the user+container+prefix provided in the message, it will get a different list than the urls provided [16:21:23] (in the case of single-file per event) [16:22:37] basically i think it would less surprising if querying the user+container+prefix provided in the message provided the same url(s) as the event [16:26:30] ebernhardson why would it give a different url? [16:26:43] if there is only one file, quertying with the prefix will only return that file, no? [16:27:09] 10Analytics: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10Danielsberger) Hi Francisco (@fdans) and @Nuria , Since we're now in Q3 and a few weeks have passed, I wanted to check in with you. It would be great to release this dataset in Augus... [16:27:53] (03CR) 10Jforrester: [C: 03+2] Update Semantic and change to Headless for node 10 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/525344 (https://phabricator.wikimedia.org/T228452) (owner: 10Milimetric) [16:30:48] (03Merged) 10jenkins-bot: Update Semantic and change to Headless for node 10 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/525344 (https://phabricator.wikimedia.org/T228452) (owner: 10Milimetric) [16:31:39] 10Analytics, 10Analytics-Kanban, 10JavaScript, 10Patch-For-Review: Fix the analytics/wikistats2 repo to work on node10 - https://phabricator.wikimedia.org/T228452 (10Jdforrester-WMF) 05Open→03Resolved Thank you! [16:37:19] ottomata: it thought you are returning the original upload prefix and not the per-file prefix? perhaps i'm mistaken [16:41:27] per-file prefix? not sure what the difference is! [16:41:49] e.g. if you were doing this on the cli [16:41:51] it would be [16:42:16] (03CR) 10Jforrester: "recheck" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/525344 (https://phabricator.wikimedia.org/T228452) (owner: 10Milimetric) [16:43:00] swift upload ...object-name custom_prefix ./source_directory/fileA [16:43:02] will become [16:43:09] conatiner/custom_prefix [16:43:10] ottomata: a per-file prefix would be a prefix that fully specifies one file name [16:43:15] 10Analytics, 10Editing-team: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10fdans) p:05Triage→03High [16:43:18] 10Analytics: Deletion of limn-flow-data repository - https://phabricator.wikimedia.org/T228981 (10fdans) p:05Triage→03High [16:43:21] if it is a directory with one file [16:43:23] 10Analytics: Deletion of limn-ee-data repository - https://phabricator.wikimedia.org/T228979 (10fdans) p:05Triage→03High [16:43:26] 10Analytics: Deletion of limn-language-data repository - https://phabricator.wikimedia.org/T228975 (10fdans) p:05Triage→03High [16:43:26] e.g. [16:43:27] as in, not really a prefix but a full path. But if you pass a full path to prefix search it returns one file [16:43:37] swift upload ...object-name custom_prefix ./source_directory [16:43:41] you'd get [16:43:50] container/custom_prefix/fileA [16:43:54] 10Analytics: Add agent type split to wikistats pageviews - https://phabricator.wikimedia.org/T228937 (10fdans) p:05Triage→03High [16:43:59] in either case [16:44:04] 10Analytics: Add agent type split to wikistats pageviews - https://phabricator.wikimedia.org/T228937 (10fdans) a:03fdans [16:44:07] list ?prefix=custom_prefix [16:44:10] will return the same thing, no? [16:44:22] ottomata: ottomata i'm saying to set that to ?prefix=custom_prefix/fileA [16:44:30] ah, but why? [16:44:34] ottomata: so if i get an event, and i query the prefix you provided, it returns the same urls [16:44:46] i think it would return the same url [16:44:56] with just ?prefix=custom_prefix [16:44:58] ottomata: it would return [fileA, fileB, ...] [16:45:07] i thouht you said there's only one file [16:45:12] ottomata: one file per event [16:45:42] ottomata: so i upload enough data to be 30 hours worth of bulk indexing work. It needs to push through as one event per file [16:45:58] 10Analytics, 10Analytics-Kanban: wikistats editor graphs broken - https://phabricator.wikimedia.org/T228931 (10fdans) 05Open→03Resolved [16:46:16] ebernhardson: since this supports uploading of directories only... [16:46:24] it'lll be one file per directory? [16:46:41] 10Analytics: mediawiki-history-wikitext-coord job fails every month - https://phabricator.wikimedia.org/T228883 (10fdans) p:05Triage→03Normal [16:46:47] ottomata: no, it will be one directory with perhaps 500 files [16:47:02] so you want one swift upload to emit 500 events [16:47:03] hm [16:47:35] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Portals, 10cloud-services-team: projectview-hourly-coordinator needs to alarm when in error - https://phabricator.wikimedia.org/T228747 (10fdans) p:05Triage→03High [16:47:36] that's basically the request, to allow splitting a large event into bite-sized pieces [16:47:47] (not for all uploads, but when needed) [16:48:12] will have to think more about that, but [16:48:20] in that case, could the swift_object_uris be enoughg? [16:48:23] do you ned the prefix? [16:48:32] it'll have an item with one object uri in it [16:48:45] ottomata: just the urls is fine, it just seemed like if the event has a prefix it should match the urls provided [16:48:59] match *only* [16:49:02] well, in this case it does, it is the prefix used when uploading the files [16:49:19] but yeah, it doesn' tmaatch the per event thing. [16:49:20] hm [16:49:36] swift doesn't care about directories, so putting the full path in the prefix should be fine [16:49:50] (as in directories dont exist in swift, / is just another codepoint) [16:49:53] 10Analytics, 10Analytics-Kanban, 10Research-Backlog: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Milimetric) a:05Milimetric→03mforns [16:49:58] i mean, maybe that is better? a singel event could be used to determine the full list of files that were uploaded with that prefix, but if you want the individual object url [16:49:59] it is in the list [16:50:31] 10Analytics, 10Analytics-Kanban: Bug: geoeditors (editors per country data) 2019-06 snapshot broken - https://phabricator.wikimedia.org/T227812 (10Milimetric) a:05Milimetric→03None [16:52:11] ottomata: hmm, i suppose there could be use cases. Hard to say [16:52:56] why would you use the swift_object_prefix if you have the swift_object_uris? [16:53:33] you probably wouldn't, but would it be surprising to look at the event in isolation, query the prefix, and then get extra urls? [16:54:00] i think it wouldn't; this event is about a swift (directory) upload completion [16:54:11] the event is an upload event. [16:54:21] ya it is one http post per file... [16:54:33] but logically the job is a single 'upload' [16:54:46] hmm, i suppose i can buy that [16:55:44] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric) a:05mforns→03Milimetric [16:55:46] 10Analytics, 10Analytics-Kanban: Bug: geoeditors (editors per country data) 2019-06 snapshot broken - https://phabricator.wikimedia.org/T227812 (10Nuria) 05Open→03Resolved [16:58:51] !log restart the hdfs datanode on an-worker1080 to pick up new Ipv6 settings [16:58:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:58:55] ottomata: --^ [16:59:02] +1 [16:59:50] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) BTW, my notebook on this is Test_geoeditors_pyspark.ipynb [17:02:37] I am testing the roll restart cookbook on the testing cluster [17:02:57] (if you are using it) [17:11:03] ottomata: restarting the hadoop workers now is so awesome [17:11:58] I am doing one daemon at the time on the test cluster [17:12:07] without running any command [17:12:27] awesooome [17:13:31] ebernhardson: if you are ok with https://gerrit.wikimedia.org/r/c/mediawiki/event-schemas/+/525562, i'll merge [17:16:49] elukey: WITH YOUR MIND! [17:20:10] ottomata: why do current.yaml and 1.0.0.yaml differ? [17:20:23] 1.0.0 is dereferenced! [17:20:33] it is a full static schema version [17:20:49] current.yaml is so we only ever have to edit one file [17:20:53] the versioned ones are materialized [17:21:10] see readme [17:21:10] https://github.com/wikimedia/mediawiki-event-schemas [17:23:57] milimetric: can you take a look [17:24:14] https://www.irccloud.com/pastebin/RdXwOWkU/ [17:24:31] this would be the command to re-start mediacounts [17:26:27] nuria: ahahha yes [17:29:22] (03PS1) 10MNeisler: Hash temporary identifiers in web team schemas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525599 (https://phabricator.wikimedia.org/T226850) [17:34:40] ebernhardson: o/ I am seeing some alerts for search_satisfaction-druid-daily sent by oozie to analytics-alerts@ [17:34:57] if you are testing, can you switch email pls? :) [17:38:19] elukey: doh, sorry. Yea i thought i've been submitting with the approrpiate error email set..will double check [17:40:04] * ebernhardson wasn't passing the email through and it took the default... [17:42:25] PROBLEM - Check the last execution of refinery-import-page-history-dumps on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:46:41] PROBLEM - Check the last execution of reportupdater-interlanguage on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:47:11] PROBLEM - Check the last execution of archive-maxmind-geoip-database on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:47:17] PROBLEM - Check the last execution of reportupdater-browser on stat1007 is CRITICAL: connect to address 10.64.21.118 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:47:22] sigh [17:48:19] 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Hash all pageTokens or temporary identifiers from the EL Sanitization white-list for Web - https://phabricator.wikimedia.org/T226850 (10MNeisler) I hashed any tokens or temporary identifiers for the following three acti... [17:49:50] elukey: i think stat1007 is kaput [17:50:23] yeah see #operations :( [17:51:12] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10RobH) [17:56:46] I had to reboot stat1007 :( [17:57:19] RECOVERY - Check the last execution of reportupdater-interlanguage on stat1007 is OK: OK: Status of the systemd unit reportupdater-interlanguage https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:57:49] RECOVERY - Check the last execution of archive-maxmind-geoip-database on stat1007 is OK: OK: Status of the systemd unit archive-maxmind-geoip-database https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:57:57] RECOVERY - Check the last execution of reportupdater-browser on stat1007 is OK: OK: Status of the systemd unit reportupdater-browser https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:59:10] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10RobH) [17:59:56] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:00:05] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:00:16] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:00:46] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10RobH) [18:03:17] elukey: the machine was so toasted that there is not even metrics [18:03:47] RECOVERY - Check the last execution of refinery-import-page-history-dumps on stat1007 is OK: OK: Status of the systemd unit refinery-import-page-history-dumps https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:03:53] 10Analytics, 10LDAP-Access-Requests, 10Operations, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10herron) 05Open→03Resolved >>! In T227496#5363988, @MoritzMuehlenhoff wrote: > staff members need to be a member of cn=wmf, cn=... [18:04:02] oof [18:04:27] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:04:36] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:04:42] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by robh@cumin1001 for hosts:... [18:05:22] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10RobH) [18:07:20] * elukey off! [18:08:04] ottomata: is there a way we could see process that werre being executed on 1007 that killed machine? [18:08:42] after it is rebooted? [18:08:44] no tthat I know of [18:09:01] maybe there will be some oom killer log if it tried to do something [18:11:58] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10RobH) [18:15:28] (03CR) 10Nuria: [C: 04-1] "Couple questions" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525599 (https://phabricator.wikimedia.org/T226850) (owner: 10MNeisler) [18:18:43] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Nuria) [18:20:41] !Log restarting media-counts loading job [18:26:44] ebernhardson: stream/topic names? [18:26:54] it sounds like you'll want to emit to a few different stream names, right? [18:27:14] (i use stream instead of topic, as it is more generic, topic is more specific, and will be DC prefixex) [18:27:49] we should figure out a regex that will suit all/most use cases [18:27:51] yours and others [18:28:18] /^.*\.swift.upload-complete$/ [18:28:18] ? [18:28:46] then you could stream=discover.search_glent.swift.upload-complete [18:28:50] ? [18:36:40] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Jdlrobson) a:03Jdlrobson [18:37:11] 10Analytics, 10Product-Analytics, 10Readers-Web-Backlog: Reading_depth remove eventlogging instrumentation? - https://phabricator.wikimedia.org/T229042 (10Jdlrobson) a:05Jdlrobson→03phuedx [18:40:01] ottomata: i can read a generic stream, i key off the container name to decide what to do [18:40:37] hm, ya but you might get events for others? [18:40:51] ottomata: right, but if it's not a configured container just ignore it [18:40:52] also it might make sense to send the per object events to a different stream than the per upload events [18:41:34] i could make stream default to container? [18:41:44] search_glent.swift.upload-complete ? [18:42:00] or mabye swift.search_glent.upload-complete? [18:42:02] if it's not much overhead to define things per-container i suppose thats fine. [18:42:14] its very easy, we just need the regex to allow whatever we want [18:42:36] i kinda like swift..upload-complete [18:43:48] ottomata: seems reasonable [18:43:58] k [18:45:58] mildly amusing, i was trying to check if my temp files were properly cleaned up. `hdfs dfs -ls /tmp` errors out with a GC problem :) [18:46:52] haha [18:46:57] hm [18:47:01] fixable with the right env variable [18:47:08] HADOOP_CLIENT_OPS="-Xmx8g" [18:48:07] but there are 518k files/directories in /tmp, so maybe more cleanup is deserved :) [18:48:21] aye looking [18:50:15] i still can't do anything in /tmp even with that [18:50:24] oh maybe i didn't export hang on [18:50:32] `HADOOP_CLIENT_OPTS="-Xmx8g" hdfs dfs -ls '/tmp' | less` works for me [19:15:27] 10Analytics, 10EventBus, 10serviceops: Allow eventgate-analytics service to reach schema.svc.{eqiad,codfw}.wmnet:8190 - https://phabricator.wikimedia.org/T229051 (10Ottomata) [19:53:48] ebernhardson: i know i've seen this before...isn't there a way to put unittests in the same file as the code? [19:53:54] and then run them with e.g. python -m unittest ? [19:54:04] looking for examples but not having luck [19:56:26] 10Analytics, 10Analytics-Kanban: Page creation data stream died June 6 - https://phabricator.wikimedia.org/T228188 (10kaldari) @Nuria - Yes, that's the main feature I would be interested in: being able to combine variables (for example, all content pages created by anonymous users) instead of having to choose... [19:56:50] ottomata: yes there is, one sec [19:57:00] 10Analytics, 10Analytics-Kanban: Page creation data stream died June 6 - https://phabricator.wikimedia.org/T228188 (10kaldari) 05Open→03Declined [19:57:02] 10Analytics, 10Analytics-EventLogging: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10kaldari) [19:57:17] i think i just got it, but not in a nice way [19:57:43] if __name__ == '__main__': [19:57:43] ... [19:57:43] else: [19:57:43] class TestA(..)... [19:57:43] etc. [19:57:48] ottomata: [19:57:57] https://www.irccloud.com/pastebin/oRsIHsjB/ [19:58:00] ottomata: you might be thinking of doctest? [19:58:09] ottomata: it's what pyspark uses [19:58:12] https://www.irccloud.com/pastebin/4ojbakZF/ [19:58:19] i don'ot want that [19:58:23] because this is a script [19:58:30] so name == main means execute [19:58:32] i want somethign liike [19:58:35] if module == unittest [19:58:53] OPH [19:58:55] it is working now... [19:58:59] .if i just declare the tests [19:59:02] dunno what i was doing wrong before [19:59:03] ok. [19:59:04] thank you! [19:59:09] heh, ok :) [19:59:24] ayayayay [20:03:02] ottomata: with doctest at least, you just run `python -m doctest script.py` and it runs all the doctests, which imports as __name__ != "__main__" [20:07:48] fdans: i think next step for mediacounts should be modifying the load oozie job, right? [20:08:22] fdans: so new fields are entered in the table (so we need table alter) [20:08:30] fdans: this one: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/mediacounts/load/insert_hourly_mediacounts.hql#L49 [20:09:26] fdans: is that what you mean when you said backfilling this morning? [20:24:41] (03CR) 10Nuria: Add access type to mediacounts hourly dataset (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/517426 (https://phabricator.wikimedia.org/T225910) (owner: 10Fdans) [20:26:45] !log restarting browser-general oozie job [20:26:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:28:31] (03PS3) 10Ottomata: [WIP] swift-upload.py to handle upload and event emitting [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525435 (https://phabricator.wikimedia.org/T227896) [20:47:31] !log restarting banner_activity/druid/daily [20:47:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:53:10] (03PS1) 10Nuria: Improving examples of how to start jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525652 [20:54:04] (03PS2) 10Nuria: Improving examples arround how to start jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/525652 [20:56:37] ottomata: another question :) the load druid oozie workflow requires specific usernames, that don't include analytics-search. I could adjust the workflow to accept users named analytics* perhaps? [21:06:36] 10Analytics, 10EventBus, 10Patch-For-Review, 10phan: Add phan to EventBus extension - https://phabricator.wikimedia.org/T224778 (10Jdforrester-WMF) 05Open→03Resolved [21:06:41] 10Analytics, 10EventBus, 10phan: Result of EventFactory in EventBus extension is passed to undeclared arrays - https://phabricator.wikimedia.org/T224352 (10Jdforrester-WMF) [21:50:28] 10Analytics: Deletion of limn-flow-data repository - https://phabricator.wikimedia.org/T228981 (10EBernhardson) I haven't worked with flow in 3 or 4 years, my understanding is no one is interested in these metrics and the future of structured discussions is something different. I think these are safe to be clean... [22:23:28] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: 429 Too Many Requests hit despite throttling to 100 req/sec - https://phabricator.wikimedia.org/T219857 (10MusikAnimal) >>! In T219857#5361530, @Milimetric wrote: > This can be tricky to diagnose because we don't really know what if any upstream changes are mad... [22:59:01] 10Analytics, 10Android-app-Bugs, 10Wikipedia-Android-App-Backlog: App requests classified as pageviews that probably should not be so - https://phabricator.wikimedia.org/T229068 (10Nuria)