-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
HDF5-based parsing with a focus on EBSD as an example (e.g. APEX, Bru…
…ker HDF5, H5Oina, and others) (#7) * Removed dead code that pointed to portions of the em_om parser that are now replaced by the nxs_pyxem parser * Preparations to run with more examples for APEX and EMD * Removed currently irrelevant other hdf5 formats * Minor edit on the batch queue * Removed dead code that pointed to portions of the em_om parser that are now replaced by the nxs_pyxem parser * Preparations to run with more examples for APEX and EMD * Removed currently irrelevant other hdf5 formats * Minor edit on the batch queue * linting * Tested also EPFL and FHI *.emd examples, no apparent problems found * Fixes in the case handling logic and testing that files which certain parsers do not support are recognized by these parsers and the parser reports back to the reader to skip and try with the next parser * Removed manually added NXdata that were in the past left as todo now reveal several places where concepts seem no resolvable one example is /ENTRY[entry]/roi*/em_ebsd/ipfID(NXms_ipf), now APEX parsing passes without errors to the converter, however that step turns out ot be super slow, a practical example from the IKZ GeSi.nxs takes about 2min to instantiate the template, that is okay given there are thousands of entries in the HDF5 to navigate and much number crunching done on many of these but ones having forwarded to the writer it takes another 25min and nothing happens, all sequential, and like on a >4.5GHz clocked modern CPU core, I assume the code spends much time in concept traversal, would need proper benchmarking but maybe this is also related to the fact that the code does not auto-annotate certain groups deep down in NXem_ebsd instance * linting * Add debug code to help understanding why ipfID/map and others are not autodecorated correctly with their NX_class attributes NXdata etc. * Added explanations from meeting with @sanbrock, @sherjeelshabih, @lukaspie * Testing NeXus fixes from @domna to see if these resolve issues with auto-decoration that come from problems with resolving nameTYPE symbols like phaseID in NeXus * Fixing the nx_template_path for APEX which fixes most of the so far breaking auto-decoration of NX_class attributes * Using proper pynxtools core version * Initial version of docs customized from pynxtools for em plugin * Harmonized pyproject toml with that of MPES --------- Co-authored-by: mkuehbach <[email protected]>
- Loading branch information
1 parent
51ca4c1
commit 3a1c610
Showing
36 changed files
with
731 additions
and
144 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: build_docs | ||
on: | ||
push: | ||
branches: [main] | ||
permissions: | ||
contents: write | ||
jobs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Configure Git Credentials | ||
run: | | ||
git config user.name github-actions[bot] | ||
git config user.email 41898282+github-actions[bot]@users.noreply.github.com | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.x | ||
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV | ||
- uses: actions/cache@v3 | ||
with: | ||
key: mkdocs-material-${{ env.cache_id }} | ||
path: .cache | ||
restore-keys: | | ||
mkdocs-material- | ||
- run: pip install ".[docs]" | ||
- run: mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Implementation design | ||
|
||
Pynxtools-em addresses current challenges with information representation in the field of electron microscopy through leading by examples. Specifically, the tool implements functionalities for ontology matching of instance data from the proprietary representation to an open format and standard - [NeXus and the NXem application definitions and respective concepts](https://fairmat-nfdi.github.io/nexus_definitions). The examples which pynxtools-em currently supports are for sure a compromise. This is clear, if we consider how large the potential space of all possible naming conventions and combinations of pieces of information is that one could and in which formatting one could share knowledge within the field of electron microscopy (even if we ignore ow technically such information can be serialized using datatypes with different precision, languages, or encoding constraints). | ||
|
||
Therefore, the following design patterns guide our implementation: | ||
|
||
- We do not consider that our work is complete (from the perspective of the idea in mind that a user can expect to drag-and-drop arbitrary content). | ||
- We consider ontology matching a team effort that can only be achieved with technology partners and scientists working together. | ||
- We acknowledge the efforts and key contributions that went into the development of file format reading libraries and data analysis libraries of the electron microscopy community. We had to start somewhere and we did so with tools from the hyperspy ecosystem [rosettasccio](https://github.com/hyperspy/rosettasciio), [pyxem](https://github.com/pyxem/pyxem), and [kikuchipy](https://github.com/pyxem/kikuchipy). We would be happy to work together with representatives from the many other great software packages within the [community] https://github.com/FAIRmat-NFDI/AreaB-software-tools/blob/main/software-list-b1-electron-microscopy.md). | ||
- Our work is open to suggestions by the electron microscopy community, always realizing that just being able to read from a specific file alone is not solving the challenge that pynxtools-em addresses. | ||
- We choose specific tangible examples of (meta)data semantic mapping for specific methods that are used in electron microscopy. | ||
- Current examples implemented explore along two routes: | ||
- There are examples of parsing capabilities (like TIFF and PNG) which address rather technical aspects (e.g. how to read from such files and pick technology-partner-specific formatting of instance data). These examples must not be understood as that they are of now implemented such that parsing arbitrary examples with that specific mime type. There are two practical reasons for this: Limited manpower to implement all this. Limited availability of reliable documentation what specific file formats from technology partners encode. Both challenges can be solved: The first one with support from the community. The second one with support of technology partners. | ||
- There are examples of parsing examples for a specific method - for now Electron Backscatter Diffraction (EBSD) and Transmission Kikuchi Diffraction (TKD) - for which we explored how a large number of different formats can be parsed and that parsing be made more general and robust than to support just one prototypic example file. Again, we do not parse everything but rather an exemplar subset that suffices to offer a comprehensive example how at least a summary of e.g. the orientation map and region-of-interest that was analyzed in that research can be harmonized. The reason for this selection has been to motivated by the fact to show that there is at all a benefit of normalizing comprehensive and technically deep representations of electron microscopy data. | ||
|
||
# Purpose and aim of pynxtools-em | ||
We would like to provide context to the purpose and aim of pynxtools-em. The software implements a suggestion how diverse (meta)data from the research field of electron microscopy can be parsed and normalized to enable users to compare data. The software maps instance data from different formatting and concepts on a proposal for a common information exchange and representation via NeXus. The software achieves this through a two-stepped process of parsing. Firstly, via reading data from technology partner or specific serialization and formatting. Secondly, via applying transformations (if required) to map on NeXus. One of the key motivation for the development of [pynxtools and its plugins](https://github.com/FAIRmat-NFDI/pynxtools) was to explore and show how pieces of information can be harmonized and matched to enable the development of data-centric software tools and services in research data management systems (RDMS). The key reason to place such code into plugins rather than the RDM source code itself is to promote reusability, to offer users a stronger modularizability, tailoring and customizability of the RDM, and avoid duplication of development efforts that typically come with having to maintain many instead of a few codes. | ||
|
||
# Software tools in electron microscopy - a mixture of proprietary and open-source solutions | ||
Typically, users work with proprietary software from technology partners and custom-written software (many of which nowadays have an open-source license). Proprietary software offers frequently a lower usage barriers for end users surplus specifically tailored | ||
access to and capabilities of storing instrument-specific (meta)data via a user interface that is optimized for working efficiently with the instrument. | ||
As a burden, proprietary software write often (not exclusively though) to proprietary serialization formats (file or database entries). These formats are composed from proprietary semantic concepts. The key challenge is that the content and meaning of these concepts is very often not documented publicly. Therefore, there is a frequent necessity for having to convert between formats. When such conversions are performed ad hoc, substantial contextual information can get lost or become disconnected which makes tracking of workflows in electron microscopy difficult. | ||
|
||
Several proprietary software tools implement the execution of script-based analyses. This scripting is also a key signature of the many software tools with an open-source development mindset and license. These offer an increasingly competitive alternative to proprietary software tools in electron microscopy. The combination of open source code, customizability, and the rooting (or often only reason for their existence) in exploring cutting-edge prototyping of algorithms and ideas by the scientific community, has made script-based software (especially in the Python world) nowadays a reality in many electron microscopy labs. This justifies thoughts on how using such software aligns with the aims of the [FAIR principles of data stewardship](https://doi.org/10.1162/dint_r_00024). | ||
|
||
Here, script-based analyses can be considered a benefit and a burden when it comes to FAIR principles: The flexibility of being able to script ones analysis is a clear benefit. It can be considered also as a burden because of the current state of how such workflow is documented. Often this includes only the scripts themselves alongside some data in some processed state surplus the (close to) or final figures that were generated with the publication. Hence, it is often just assumed that these scripts not only work for different versions of the execution environment (i.e. different Matlab version) but also that users can obtain the same results - provided they run the scripts again using the data if these are provided. | ||
|
||
This can leave room for substantial interpretation and ambiguity as there is neither a community agreed-upon standard of information exchange nor a thorough documentation of the execution environment, and possibly a lack of a serialized summary of the research via files or database entries. The practical challenge is not that no output files to such script execution are shared but that these are shared with a large variety of formats many of which using ad hoc data schemas. This is a substantial burden from the perspective of ontology matching because pieces of information are encoded and named differently although they (to human experts) represent instances of similar or even exactly the same concepts. So far it demands the capabilities of members and often domain experts within the electron microscopy community to assure that data can safely be compared from a scientific point of view and it is this not yet realized mapping which technically limits interoperable knowledge exchange in electron microscopy. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# How-to |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Kikuchi diffraction via MTex | ||
|
||
[MTex](https://mtex-toolbox.github.io/index) is a software for texture analysis that for use in [MATLAB](https://de.mathworks.com/products/matlab.html). The software is one of the key software tools that is used every day by materials engineers and geologists to perform computations in orientation space. The software offers the [largest and most mature solution](https://doi.org/10.1107/S002188981003027X) for analyzing and [plotting](https://doi.org/10.1107/S1600576716012942) texture analyses to answer related research questions about texture quantification and visualization using pole figure, orientation distribution, inverse pole figure, and grain boundary network based data. Thanks to its support of all symmetry classes, the tool has not only found a wide acceptance within the field of materials engineering but also many users in the geoscience communities. | ||
|
||
As a result, this MTex-based parsing of certain Kikuchi diffraction relevant content equips the pynxtools-em parser and normalizer currently with functionalities to read the following content and map on respective NeXus concepts that are defined in the NXem application definition and the NXem_ebsd base class: | ||
|
||
| Orientation, phase | NeXus/HDF5 | | ||
| --------------- | -------------- | | ||
| Oxford Instruments ANG | :heavy_check_mark: | | ||
| HKL Channel5 CPR/CRC | :heavy_check_mark: | | ||
| OSC | :heavy_check_mark: | | ||
| CTF | :heavy_check_mark: | | ||
|
||
|
||
|
||
<!--TODO: Technically, the parsing has to bridge between two software ecosystems: Python and MATLAB. For now, we went for the following strategy: | ||
- A customized MTex export routine has been generated which automatically formats EBSD-specific content as represented in MTex to NeXus. | ||
- That information is serialized into an HDF5 file with a structure matching that of NXem but not necessarily demanding that this file is complete. | ||
- Pynxtools-em parses from this file using the nxs_mtex parser and adds additional pieces of information and decoration to obtain a complete NXem-compliant file. | ||
[file formats used in EBSD](https://mtex-toolbox.github.io/EBSD.load.html)--> | ||
|
||
<!--TODO: Give guidance (like commands) what to do with a file in MTex, conflict of interest with our paper, will be filled in and set active only at the point of the submission of the paper--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Kikuchi diffraction via pyxem | ||
|
||
In recent years shortcomings of classical text-based file formats for serializing Kikuchi diffraction data have been realized and lead to a replacement of these formats with proprietary and open-source alternatives that use the [Hierarchical Data Format (HDF)5](https://www.hdfgroup.org/solutions/hdf5) library. At the point when we started the implementation of the Kikuchi diffraction examples for the FAIRmat project several of these formats were not yet supported by MTex. In parallel, we recognize that pyxem and kikuchipy entered the stage to offer complementary analyses capabilities for Python users. Therefore, we decided that as a technical implementation example we will implement the first version of the Kikuchi diffraction ontology matching using the I/O and orientation math capabilities of these Python libraries. | ||
|
||
As a result, this pyxem-based parsing of HDF5-serialized content equips the pynxtools-em parser and normalizer currently with functionalities to read the following content and map on respective NeXus concepts that are defined in the NXem application definition and the NXem_ebsd base class: | ||
|
||
| Orientation, phase | NeXus/HDF5 | | ||
| --------------- | -------------- | | ||
| [Oxford Instruments H5OINA HDF5](https://github.com/oinanoanalysis/h5oina) | :heavy_check_mark: | | ||
| [Bruker Esprit HDF5](https://www.bruker.com/de/products-and-solutions/elemental-analyzers/eds-wds-ebsd-SEM-Micro-XRF/software-esprit-family.html) | :heavy_check_mark: | | ||
| [H5EBSD-based community format](https://link.springer.com/article/10.1186/2193-9772-3-4) | :heavy_check_mark: | | ||
| [ThermoFisher Velox](https://www.thermofisher.com/de/de/home/electron-microscopy/products/software-em-3d-vis.html) | :heavy_check_mark: | | ||
| [EDAX APEX](https://www.edax.com/products/ebsd/apex-software-for-ebsd) | :heavy_check_mark: | | ||
| [DREAM.3D v6](https://dream3d.bluequartz.net) | (:heavy_check_mark:) | | ||
| [EMsoft HDF5](https://github.com/EMsoft-org/EMsoft) | (:heavy_check_mark:) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
hide: toc | ||
--- | ||
|
||
# pynxtools-em documentation | ||
|
||
<!-- A single sentence that says what the product is, succinctly and memorably --> | ||
|
||
<!-- A paragraph of one to three short sentences, that describe what the product does. --> | ||
|
||
<!-- A third paragraph of similar length, this time explaining what need the product meets --> | ||
|
||
<!-- Finally, a paragraph that describes whom the product is useful for. --> | ||
|
||
<div markdown="block" class="home-grid"> | ||
<div markdown="block"> | ||
|
||
### Tutorial | ||
|
||
This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type tutorial. | ||
|
||
- [Convert electron microscopy content to NeXus](tutorial/standalone.md) | ||
- [How to use a NeXus/HDF5 file](tutorial/nexusio.md) | ||
<!-- - [Convert data to NeXus using NOMAD Oasis](tutorial/oasis.md) --> | ||
|
||
</div> | ||
<div markdown="block"> | ||
|
||
### How-to guides | ||
|
||
This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type how-to guides. | ||
|
||
- [Kikuchi diffraction via pyxem](how-tos/pyxem.md) | ||
- [Kikuchi diffraction via MTex](how-tos/mtex.md) | ||
|
||
</div> | ||
|
||
<div markdown="block"> | ||
|
||
### Learn | ||
|
||
This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type explanation. | ||
|
||
- [Implementation design](explanation/implementation.md) | ||
|
||
</div> | ||
<div markdown="block"> | ||
|
||
### Reference | ||
|
||
This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type reference. | ||
Here you can learn which specific pieces of information and concepts pynxtools-em currently supports | ||
for the respective file formats of technology partners of the electron microscopy community. | ||
|
||
- [How to map pieces of information to NeXus](reference/contextualization.md) | ||
- [Tagged Image File Format (TIFF)](reference/tiff.md) | ||
- [Portable Network Graphics (PNG)](reference/png.md) | ||
- [Velox EMD](reference/vemd.md) | ||
- [EDAX APEX](reference/apex.md) | ||
- [Nion Co. projects](reference/nion.md) | ||
|
||
</div> | ||
</div> | ||
|
||
<h2>Project and community</h2> |
Oops, something went wrong.