pici

Rawsugar Recipes

make-public
Refs
- Lacey workflow examples: https://docs.google.com/document/d/1yA51jh8QTyd2KpAeCxt1E_o3RmboM3SrOcsREqCdINE/edit
- with my working notes
  - https://docs.google.com/document/d/1s0xOL4xDXVerjLMKbKCe7IJbuHzK1BcpBiWzJcFkdw0/edit
- this is a somewhat older and higer level version? https://docs.google.com/document/d/1i0hoqA7Es7RE7LM2CnjTjKrTYZB4l_pstIRHk0OMxoU/edit
Tasks
- Get one step (upload manifest) working full cycle
  - Merge local/gs upload for the sake of step UI
  - {{DONE}} Link step to op
  - {{DONE}} Display link to uploaded sheet in step when complete
- {{TODO}} Lacey idea -- each run-step should have a notes field
  - backend trivial, UI needs some design
Ontology
- (getting into the class/prototype weeds)
- 2.1 recipe
  - has multiple recipe-steps
- 2.2 recipe-step
  - has predecessors, succesors
- 2.3 run
  - part-of batch
  - has-a recipe (? maybe not direect, inherit from batch or datatype (but consistency with run-step))
  - has time and agent info (so do steps, is that weird?)
- 2.4 run-step
  - part-of a run
  - has-a recipe-step
  - has-a state (todo, in-progress, done etc)
  - has predecessors, succesors
  - has time and agent info
- 2.5 some informal step types
  - upload
  - match files
  - gate in cellengine
  - (note this conceivably has substeps; do we want to have this built in)
  - send files to RStudio (for clustering eg)
  - export to CANDEL
UI
- 3.1 Should I literally follow mockup?
  - Having a tree display is hard and maybe overkill (OTOH would be interesting)
  - https://codepen.io/ross-angus/pen/jwxMjL (no, this is a neat trick for drawing trees but only works when the nodes are simple text)
  - https://www.codeproject.com/articles/16192/graphic-javascript-tree-with-layout
  - Hm, looks like the right(?) way is:
    - svg
    - embedded HTML via https://developer.mozilla.org/en-US/docs/Web/SVG/Element/foreignObject
    - (but you can't compute anythings size)
    - write my own tree layout algorithm
    - or maybe https://github.com/d3/d3-hierarchy#tree
    - ugh
- 3.2 Recipes read only?
  - Yes an editing interface would be a big project, ok to define via edn for now
Questions (for Lacey mostly)
- Defining recipes per-batch, but do we ever want steps that combine data types or otherwise involved multiple batches?
  - Lacey: no current cases, mgiht be in the future
- Are recipes 1-1 with datatypes (if so don't need to have both fields on batch)
  - NO, there might be different recipes
- evolution of recipes over time (think)
- Since each step can generate its own files, we should be associating files with steps rather than batches (or with both…unclear). Mentioned in comment in recipe doc.
Questions (for me mostly)
- 5.1 Steps and ops are not that different, might consider sharing some infrastructure.
- 5.2 Use a prototype approach?
  - It's not like Datomic has a real class system anyway (although that is what Alzabo provides, sort of)
  - Explicit Alzabo support? Not sure what that would look like or if its ncessary.
  - Hm, this could be a general EDN thing…and might already exist. What about that thing Rob uses for configs? Aero? Oh hey looks like it might work with merge and ref tags…
  - Wait a minit, a purely syntactic inheritance is not going to work for this…
  - Practicalities: instead of separate types for recipe/run and recipe-step/run-step, they are mooshed together.
  - Extra attributes? :entity/prototype and maybe :entity/prototype?
- 5.2.1 A sudden obvious insight
  - The recipe-level stuff doesn't even have to be in Datomic; it's defined by EDN and can stay that way.
  - Still useful to have prototype-like inheritance in EDN definition file. So Aero might be useful after all.
- 5.3 Acceptance / types
  - Means we need to bring back column labeling. Probably a good thing
- 5.4 Task State and multiple tries
- 5.4.1 Basic: :todo, :done, :in-progress, :blocked(waiting)
  - with obvious allowed transitions
- 5.4.2 UI state transitions
  - normal op: todo → done
  - matching (eg): todo → in-progress → done (when acceptance test passes)
  - OK then the failed condition should be explicit: "all files must have metadata" or whatever.
  - Should have manual override (eg declare a step done no matter what)?
- 5.4.3 Going from :done to :todo
  - Doover, reset affordance
  - should make a new parallel run-step instance
  - (I guess, still not sure how those are going to work)
  - alternatively – keep things simple, that just clears out the run state and artifacts, you can use Datomic history to get old ones if your really must
- 5.4.4 Oh fuck client/server
  - I guess this means that run should be:
    - compact
    - serialized along with batch
7 Design Review
- 7.1 Problem
  - We have a lot of data types and each has its own complicated analytical procedure.
  - From design at beginning of the year:
  - ⪢ RawSugar 2.0 should be a system that not only stores data but also tracks provenance and relationships as data is processed through various workflows on external systems. It should serve as a unifying “one stop shop” for viewing, finding, and accessing data in various stages of processing.
  - Helps with
    - reproducibility
    - documentation / knowledge capture (in case any more data scientists leave!)
    - work tracking
- 7.2 Lacey Recipes
- 7.3 UI
- 7.4 Underlying schema
- 7.5 Machinery / Issues
  - projects
  - sheets
  - files
  - projects
  - batches
  - sheets
  - files
  - projects
  - batches
  - run-steps
  - sheets
  - files
8 Multiple runs
- From Federico's comments, realizing this is indeed sn important feature and not captured in current prototype
- How should this work?
  - doesn't quite make sense to have run objects because we want to be able to fork at arbitrary places
  - but it would be easier for users…
  - DONE run-steps will need to have an explicit predecessor link (or equivalent); can't rely on inferring from recipes
  - so there will be multiple instances of a recipe step type for a given batch
  - need some way to pick out "current" set of steps
  - going to be hard to visualize (in part because we already hve branching; can't use a tree )
  - ui: show stacked boxes? And a selector? Eh.
  - default display to show the latest? But you might have disjoint sets…ugh.
- Use case:
  - user doesn't like some result
  - find the step responsible, clicks on some "Redo" affordance (? better name)
  - system makes new run-steps for that and all downstream steps.
  - UI redisplays to reflect above (not sure how the old steps should appear, but can whiff for now)
- Guess that's it….
- Alternatively, reuse the existing run-step objects and just use Datomic history…that means less plumbing to deal with multiple run-steps
- Maybe have a current? attribute that only applies to one graph? Then query can work as normal. Ugh.
{{DONE}} Rearchitecting object system
- Have a local cache of entities, indexed by eid
- Need to augment with type probably (and parent links)
- Means page urls can be simpler

Referred in

RawSugar

[[Rawsugar Recipes]]

Rawsugar Recipes