Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
This superceds HPCC-20164, which outlined a single 'createandpublish' service that would have publish a blank file, with the client then overwriting it's content.
A better scheme would be to do:
1) create (not published)
2) write data
3) publish (with extra meta info)
Steps:
- Client asks Esp 'createFile' service for file to be created
- Service [createFile]: creates a [unattached] file - and serializes all the info. just as it would for a real [attach] file, except the name of the file is a temporary constructed name based on the request logical file name.
- Client uses that info to write to physical parts (eventually when dafilesrv supports a secure streaming write protocol, the info will be like read, client->dafilesrv will says write to part N)
- Client asks Esp 'publishFile' service to publish, in the request it passes enough info to identify the file in the createFile stage (probably just the temporary name it constructed). The client will also pass in extra metainfo at this stage, like # of records etc.
- Service [publishFile]: reads the identifying name and from it works out the real name from the info passed in.
- Service [publishFile]: recreates the IDistributedFile
- Service [publishFile]: renames all physicals (possibly using IDistributedFile::renamePhysicalPartFiles())
- Service [publishFile]: publishes the file (by attach()'ing it).
Create request/response should contain the following:
REQUEST: { Name, Cluster, ExpirySecs, AccessRole, JobId } RESPONSE: { FileID, physical-file-part-info }
Publish request/response should contain the following:
REQUEST: { FileID, ECL-Record-Structure, array-of-part-locations, RecordCount, FileSize } RESPONSE: { true or exception }
Attachments
Issue Links
- blocks
-
HPCC-19204 Spark-HPCC class(es) to store data on an RDD and a Dataframe to THOR
-
- Resolved
-
- relates to
-
HPCC-20670 Move some common code out of esp
-
- Resolved
-