Yes an editing interface would be a big project, ok to define via edn for now
Questions (for Lacey mostly)
Defining recipes per-batch, but do we ever want steps that combine data types or otherwise involved multiple batches?
Lacey: no current cases, mgiht be in the future
Are recipes 1-1 with datatypes (if so don't need to have both fields on batch)
NO, there might be different recipes
evolution of recipes over time (think)
Since each step can generate its own files, we should be associating files with steps rather than batches (or with both…unclear). Mentioned in comment in recipe doc.
Questions (for me mostly)
5.1Â Steps and ops are not that different, might consider sharing some infrastructure.
5.2Â Use a prototype approach?
It's not like Datomic has a real class system anyway (although that is what Alzabo provides, sort of)
Explicit Alzabo support? Not sure what that would look like or if its ncessary.
Hm, this could be a general EDN thing…and might already exist. What about that thing Rob uses for configs? Aero? Oh hey looks like it might work with merge and ref tags…
Wait a minit, a purely syntactic inheritance is not going to work for this…
Practicalities: instead of separate types for recipe/run and recipe-step/run-step, they are mooshed together.
Extra attributes? :entity/prototype and maybe :entity/prototype?
5.2.1Â A sudden obvious insight
The recipe-level stuff doesn't even have to be in Datomic; it's defined by EDN and can stay that way.
Still useful to have prototype-like inheritance in EDN definition file. So Aero might be useful after all.
5.3Â Acceptance / types
Means we need to bring back column labeling. Probably a good thing
matching (eg): todo → in-progress → done (when acceptance test passes)
OK then the failed condition should be explicit: "all files must have metadata" or whatever.
Should have manual override (eg declare a step done no matter what)?
5.4.3Â Going from :done to :todo
Doover, reset affordance
should make a new parallel run-step instance
(I guess, still not sure how those are going to work)
alternatively – keep things simple, that just clears out the run state and artifacts, you can use Datomic history to get old ones if your really must
5.4.4Â Oh fuck client/server
I guess this means that run should be:
compact
serialized along with batch
7Â Design Review
7.1Â Problem
We have a lot of data types and each has its own complicated analytical procedure.
From design at beginning of the year:
⪢ RawSugar 2.0 should be a system that not only stores data but also tracks provenance and relationships as data is processed through various workflows on external systems. It should serve as a unifying “one stop shop” for viewing, finding, and accessing data in various stages of processing.
Helps with
reproducibility
documentation / knowledge capture (in case any more data scientists leave!)
work tracking
7.2Â Lacey Recipes
7.3Â UI
7.4Â Underlying schema
7.5Â Machinery / Issues
projects
sheets
files
projects
batches
sheets
files
projects
batches
run-steps
sheets
files
8Â Multiple runs
From Federico's comments, realizing this is indeed sn important feature and not captured in current prototype
How should this work?
doesn't quite make sense to have run objects because we want to be able to fork at arbitrary places
but it would be easier for users…
DONE run-steps will need to have an explicit predecessor link (or equivalent); can't rely on inferring from recipes
so there will be multiple instances of a recipe step type for a given batch
need some way to pick out "current" set of steps
going to be hard to visualize (in part because we already hve branching; can't use a tree )
ui: show stacked boxes? And a selector? Eh.
default display to show the latest? But you might have disjoint sets…ugh.
Use case:
user doesn't like some result
find the step responsible, clicks on some "Redo" affordance (? better name)
system makes new run-steps for that and all downstream steps.
UI redisplays to reflect above (not sure how the old steps should appear, but can whiff for now)
Guess that's it….
Alternatively, reuse the existing run-step objects and just use Datomic history…that means less plumbing to deal with multiple run-steps
Maybe have a current? attribute that only applies to one graph? Then query can work as normal. Ugh.