Skip to content

Commit

Permalink
HDF5-based parsing with a focus on EBSD as an example (e.g. APEX, Bru…
Browse files Browse the repository at this point in the history
…ker HDF5, H5Oina, and others) (#7)

* Removed dead code that pointed to portions of the em_om parser that are now replaced by the nxs_pyxem parser

* Preparations to run with more examples for APEX and EMD

* Removed currently irrelevant other hdf5 formats

* Minor edit on the batch queue

* Removed dead code that pointed to portions of the em_om parser that are now replaced by the nxs_pyxem parser

* Preparations to run with more examples for APEX and EMD

* Removed currently irrelevant other hdf5 formats

* Minor edit on the batch queue

* linting

* Tested also EPFL and FHI *.emd examples, no apparent problems found

* Fixes in the case handling logic and testing that files which certain parsers do not support are recognized by these parsers and the parser reports back to the reader to skip and try with the next parser

* Removed manually added NXdata that were in the past left as todo now reveal several places where concepts seem no resolvable one example is /ENTRY[entry]/roi*/em_ebsd/ipfID(NXms_ipf), now APEX parsing passes without errors to the converter, however that step turns out ot be super slow, a practical example from the IKZ GeSi.nxs takes about 2min to instantiate the template, that is okay given there are thousands of entries in the HDF5 to navigate and much number crunching done on many of these but ones having forwarded to the writer it takes another 25min and nothing happens, all sequential, and like on a >4.5GHz clocked modern CPU core, I assume the code spends much time in concept traversal, would need proper benchmarking but maybe this is also related to the fact that the code does not auto-annotate certain groups deep down in NXem_ebsd instance

* linting

* Add debug code to help understanding why ipfID/map and others are not autodecorated correctly with their NX_class attributes NXdata etc.

* Added explanations from meeting with @sanbrock, @sherjeelshabih, @lukaspie

* Testing NeXus fixes from @domna to see if these resolve issues with auto-decoration that come from problems with resolving nameTYPE symbols like phaseID in NeXus

* Fixing the nx_template_path for APEX which fixes most of the so far breaking auto-decoration of NX_class attributes

* Using proper pynxtools core version

* Initial version of docs customized from pynxtools for em plugin

* Harmonized pyproject toml with that of MPES

---------

Co-authored-by: mkuehbach <[email protected]>
  • Loading branch information
mkuehbach and atomprobe-tc authored Apr 23, 2024
1 parent 51ca4c1 commit 3a1c610
Show file tree
Hide file tree
Showing 36 changed files with 731 additions and 144 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/build_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: build_docs
on:
push:
branches: [main]
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install ".[docs]"
- run: mkdocs gh-deploy --force
73 changes: 67 additions & 6 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ asteval==0.9.32
# via lmfit
asttokens==2.4.1
# via stack-data
babel==2.14.0
# via mkdocs-material
build==1.2.1
# via pip-tools
certifi==2024.2.2
Expand All @@ -22,12 +24,15 @@ click==8.1.7
# via
# click-default-group
# dask
# mkdocs
# pip-tools
# pynxtools
click-default-group==1.2.4
# via pynxtools
cloudpickle==3.0.0
# via dask
colorama==0.4.6
# via mkdocs-material
comm==0.2.2
# via ipykernel
contourpy==1.2.0
Expand Down Expand Up @@ -78,6 +83,8 @@ fsspec==2024.3.1
# hyperspy
future==1.0.0
# via uncertainties
ghp-import==2.1.0
# via mkdocs
greenlet==3.0.3
# via sqlalchemy
h5py==3.10.0
Expand Down Expand Up @@ -127,7 +134,11 @@ ipython==8.23.0
jedi==0.19.1
# via ipython
jinja2==3.1.3
# via hyperspy
# via
# hyperspy
# mkdocs
# mkdocs-macros-plugin
# mkdocs-material
joblib==1.3.2
# via scikit-learn
jupyter-client==8.6.1
Expand All @@ -154,8 +165,15 @@ lxml==5.2.1
# via
# fabio
# pynxtools
markdown==3.6
# via
# mkdocs
# mkdocs-material
# pymdown-extensions
markupsafe==2.1.5
# via jinja2
# via
# jinja2
# mkdocs
matplotlib==3.8.3
# via
# ase
Expand All @@ -173,7 +191,23 @@ matplotlib-inline==0.1.6
matplotlib-scalebar==0.8.1
# via orix
mergedeep==1.3.4
# via pynxtools
# via
# mkdocs
# mkdocs-get-deps
# pynxtools
mkdocs==1.5.3
# via
# mkdocs-macros-plugin
# mkdocs-material
# pynxtools-em (pyproject.toml)
mkdocs-macros-plugin==1.0.5
# via pynxtools-em (pyproject.toml)
mkdocs-material==9.5.18
# via pynxtools-em (pyproject.toml)
mkdocs-material-extensions==1.3.1
# via
# mkdocs-material
# pynxtools-em (pyproject.toml)
mpmath==1.3.0
# via sympy
mypy==1.9.0
Expand Down Expand Up @@ -268,12 +302,15 @@ packaging==24.0
# hyperspy
# ipykernel
# matplotlib
# mkdocs
# pooch
# pytest
# scikit-image
# silx
# xarray
# xraydb
paginate==0.5.6
# via mkdocs-material
pandas==2.2.1
# via
# pynxtools
Expand All @@ -282,6 +319,8 @@ parso==0.8.3
# via jedi
partd==1.4.1
# via dask
pathspec==0.12.1
# via mkdocs
pexpect==4.9.0
# via ipython
pillow==10.0.1
Expand All @@ -301,6 +340,8 @@ pip-tools==7.4.1
platformdirs==4.2.0
# via
# jupyter-core
# mkdocs
# mkdocs-get-deps
# pooch
# xraydb
pluggy==1.4.0
Expand Down Expand Up @@ -330,8 +371,12 @@ pycifrw==4.4.6
pyfai==2024.2.0
# via pyxem
pygments==2.17.2
# via ipython
pynxtools @ git+https://github.com/FAIRmat-NFDI/pynxtools@sprint19-em-tracking-01
# via
# ipython
# mkdocs-material
pymdown-extensions==10.8
# via mkdocs-material
pynxtools @ git+https://github.com/FAIRmat-NFDI/pynxtools
# via pynxtools-em (pyproject.toml)
pyparsing==3.1.2
# via matplotlib
Expand All @@ -345,10 +390,12 @@ python-box==6.1.0
# via rosettasciio
python-dateutil==2.9.0.post0
# via
# ghp-import
# hyperspy
# ipyparallel
# jupyter-client
# matplotlib
# mkdocs-macros-plugin
# pandas
# rosettasciio
pytz==2024.1
Expand All @@ -362,20 +409,30 @@ pyyaml==6.0.1
# dask
# hyperspy
# kikuchipy
# mkdocs
# mkdocs-get-deps
# mkdocs-macros-plugin
# pymdown-extensions
# pynxtools
# pyyaml-env-tag
# rosettasciio
pyyaml-env-tag==0.1
# via mkdocs
pyzmq==25.1.2
# via
# ipykernel
# ipyparallel
# jupyter-client
regex==2024.4.16
# via mkdocs-material
requests==2.31.0
# via
# hyperspy
# mkdocs-material
# pooch
rosettasciio==0.4
# via pynxtools-em (pyproject.toml)
ruff==0.3.5
ruff==0.3.4
# via pynxtools-em (pyproject.toml)
scikit-image==0.22.0
# via
Expand Down Expand Up @@ -417,6 +474,8 @@ stack-data==0.6.3
# via ipython
sympy==1.12
# via hyperspy
termcolor==2.4.0
# via mkdocs-macros-plugin
threadpoolctl==3.4.0
# via scikit-learn
tifffile==2024.2.12
Expand Down Expand Up @@ -476,6 +535,8 @@ uncertainties==3.1.7
# via lmfit
urllib3==2.2.1
# via requests
watchdog==4.0.0
# via mkdocs
wcwidth==0.2.13
# via
# prettytable
Expand Down
28 changes: 28 additions & 0 deletions docs/explanation/implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Implementation design

Pynxtools-em addresses current challenges with information representation in the field of electron microscopy through leading by examples. Specifically, the tool implements functionalities for ontology matching of instance data from the proprietary representation to an open format and standard - [NeXus and the NXem application definitions and respective concepts](https://fairmat-nfdi.github.io/nexus_definitions). The examples which pynxtools-em currently supports are for sure a compromise. This is clear, if we consider how large the potential space of all possible naming conventions and combinations of pieces of information is that one could and in which formatting one could share knowledge within the field of electron microscopy (even if we ignore ow technically such information can be serialized using datatypes with different precision, languages, or encoding constraints).

Therefore, the following design patterns guide our implementation:

- We do not consider that our work is complete (from the perspective of the idea in mind that a user can expect to drag-and-drop arbitrary content).
- We consider ontology matching a team effort that can only be achieved with technology partners and scientists working together.
- We acknowledge the efforts and key contributions that went into the development of file format reading libraries and data analysis libraries of the electron microscopy community. We had to start somewhere and we did so with tools from the hyperspy ecosystem [rosettasccio](https://github.com/hyperspy/rosettasciio), [pyxem](https://github.com/pyxem/pyxem), and [kikuchipy](https://github.com/pyxem/kikuchipy). We would be happy to work together with representatives from the many other great software packages within the [community] https://github.com/FAIRmat-NFDI/AreaB-software-tools/blob/main/software-list-b1-electron-microscopy.md).
- Our work is open to suggestions by the electron microscopy community, always realizing that just being able to read from a specific file alone is not solving the challenge that pynxtools-em addresses.
- We choose specific tangible examples of (meta)data semantic mapping for specific methods that are used in electron microscopy.
- Current examples implemented explore along two routes:
- There are examples of parsing capabilities (like TIFF and PNG) which address rather technical aspects (e.g. how to read from such files and pick technology-partner-specific formatting of instance data). These examples must not be understood as that they are of now implemented such that parsing arbitrary examples with that specific mime type. There are two practical reasons for this: Limited manpower to implement all this. Limited availability of reliable documentation what specific file formats from technology partners encode. Both challenges can be solved: The first one with support from the community. The second one with support of technology partners.
- There are examples of parsing examples for a specific method - for now Electron Backscatter Diffraction (EBSD) and Transmission Kikuchi Diffraction (TKD) - for which we explored how a large number of different formats can be parsed and that parsing be made more general and robust than to support just one prototypic example file. Again, we do not parse everything but rather an exemplar subset that suffices to offer a comprehensive example how at least a summary of e.g. the orientation map and region-of-interest that was analyzed in that research can be harmonized. The reason for this selection has been to motivated by the fact to show that there is at all a benefit of normalizing comprehensive and technically deep representations of electron microscopy data.

# Purpose and aim of pynxtools-em
We would like to provide context to the purpose and aim of pynxtools-em. The software implements a suggestion how diverse (meta)data from the research field of electron microscopy can be parsed and normalized to enable users to compare data. The software maps instance data from different formatting and concepts on a proposal for a common information exchange and representation via NeXus. The software achieves this through a two-stepped process of parsing. Firstly, via reading data from technology partner or specific serialization and formatting. Secondly, via applying transformations (if required) to map on NeXus. One of the key motivation for the development of [pynxtools and its plugins](https://github.com/FAIRmat-NFDI/pynxtools) was to explore and show how pieces of information can be harmonized and matched to enable the development of data-centric software tools and services in research data management systems (RDMS). The key reason to place such code into plugins rather than the RDM source code itself is to promote reusability, to offer users a stronger modularizability, tailoring and customizability of the RDM, and avoid duplication of development efforts that typically come with having to maintain many instead of a few codes.

# Software tools in electron microscopy - a mixture of proprietary and open-source solutions
Typically, users work with proprietary software from technology partners and custom-written software (many of which nowadays have an open-source license). Proprietary software offers frequently a lower usage barriers for end users surplus specifically tailored
access to and capabilities of storing instrument-specific (meta)data via a user interface that is optimized for working efficiently with the instrument.
As a burden, proprietary software write often (not exclusively though) to proprietary serialization formats (file or database entries). These formats are composed from proprietary semantic concepts. The key challenge is that the content and meaning of these concepts is very often not documented publicly. Therefore, there is a frequent necessity for having to convert between formats. When such conversions are performed ad hoc, substantial contextual information can get lost or become disconnected which makes tracking of workflows in electron microscopy difficult.

Several proprietary software tools implement the execution of script-based analyses. This scripting is also a key signature of the many software tools with an open-source development mindset and license. These offer an increasingly competitive alternative to proprietary software tools in electron microscopy. The combination of open source code, customizability, and the rooting (or often only reason for their existence) in exploring cutting-edge prototyping of algorithms and ideas by the scientific community, has made script-based software (especially in the Python world) nowadays a reality in many electron microscopy labs. This justifies thoughts on how using such software aligns with the aims of the [FAIR principles of data stewardship](https://doi.org/10.1162/dint_r_00024).

Here, script-based analyses can be considered a benefit and a burden when it comes to FAIR principles: The flexibility of being able to script ones analysis is a clear benefit. It can be considered also as a burden because of the current state of how such workflow is documented. Often this includes only the scripts themselves alongside some data in some processed state surplus the (close to) or final figures that were generated with the publication. Hence, it is often just assumed that these scripts not only work for different versions of the execution environment (i.e. different Matlab version) but also that users can obtain the same results - provided they run the scripts again using the data if these are provided.

This can leave room for substantial interpretation and ambiguity as there is neither a community agreed-upon standard of information exchange nor a thorough documentation of the execution environment, and possibly a lack of a serialized summary of the research via files or database entries. The practical challenge is not that no output files to such script execution are shared but that these are shared with a large variety of formats many of which using ad hoc data schemas. This is a substantial burden from the perspective of ontology matching because pieces of information are encoded and named differently although they (to human experts) represent instances of similar or even exactly the same concepts. So far it demands the capabilities of members and often domain experts within the electron microscopy community to assure that data can safely be compared from a scientific point of view and it is this not yet realized mapping which technically limits interoperable knowledge exchange in electron microscopy.
1 change: 1 addition & 0 deletions docs/how-tos/howto.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# How-to
22 changes: 22 additions & 0 deletions docs/how-tos/mtex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Kikuchi diffraction via MTex

[MTex](https://mtex-toolbox.github.io/index) is a software for texture analysis that for use in [MATLAB](https://de.mathworks.com/products/matlab.html). The software is one of the key software tools that is used every day by materials engineers and geologists to perform computations in orientation space. The software offers the [largest and most mature solution](https://doi.org/10.1107/S002188981003027X) for analyzing and [plotting](https://doi.org/10.1107/S1600576716012942) texture analyses to answer related research questions about texture quantification and visualization using pole figure, orientation distribution, inverse pole figure, and grain boundary network based data. Thanks to its support of all symmetry classes, the tool has not only found a wide acceptance within the field of materials engineering but also many users in the geoscience communities.

As a result, this MTex-based parsing of certain Kikuchi diffraction relevant content equips the pynxtools-em parser and normalizer currently with functionalities to read the following content and map on respective NeXus concepts that are defined in the NXem application definition and the NXem_ebsd base class:

| Orientation, phase | NeXus/HDF5 |
| --------------- | -------------- |
| Oxford Instruments ANG | :heavy_check_mark: |
| HKL Channel5 CPR/CRC | :heavy_check_mark: |
| OSC | :heavy_check_mark: |
| CTF | :heavy_check_mark: |



<!--TODO: Technically, the parsing has to bridge between two software ecosystems: Python and MATLAB. For now, we went for the following strategy:
- A customized MTex export routine has been generated which automatically formats EBSD-specific content as represented in MTex to NeXus.
- That information is serialized into an HDF5 file with a structure matching that of NXem but not necessarily demanding that this file is complete.
- Pynxtools-em parses from this file using the nxs_mtex parser and adds additional pieces of information and decoration to obtain a complete NXem-compliant file.
[file formats used in EBSD](https://mtex-toolbox.github.io/EBSD.load.html)-->

<!--TODO: Give guidance (like commands) what to do with a file in MTex, conflict of interest with our paper, will be filled in and set active only at the point of the submission of the paper-->
15 changes: 15 additions & 0 deletions docs/how-tos/pyxem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Kikuchi diffraction via pyxem

In recent years shortcomings of classical text-based file formats for serializing Kikuchi diffraction data have been realized and lead to a replacement of these formats with proprietary and open-source alternatives that use the [Hierarchical Data Format (HDF)5](https://www.hdfgroup.org/solutions/hdf5) library. At the point when we started the implementation of the Kikuchi diffraction examples for the FAIRmat project several of these formats were not yet supported by MTex. In parallel, we recognize that pyxem and kikuchipy entered the stage to offer complementary analyses capabilities for Python users. Therefore, we decided that as a technical implementation example we will implement the first version of the Kikuchi diffraction ontology matching using the I/O and orientation math capabilities of these Python libraries.

As a result, this pyxem-based parsing of HDF5-serialized content equips the pynxtools-em parser and normalizer currently with functionalities to read the following content and map on respective NeXus concepts that are defined in the NXem application definition and the NXem_ebsd base class:

| Orientation, phase | NeXus/HDF5 |
| --------------- | -------------- |
| [Oxford Instruments H5OINA HDF5](https://github.com/oinanoanalysis/h5oina) | :heavy_check_mark: |
| [Bruker Esprit HDF5](https://www.bruker.com/de/products-and-solutions/elemental-analyzers/eds-wds-ebsd-SEM-Micro-XRF/software-esprit-family.html) | :heavy_check_mark: |
| [H5EBSD-based community format](https://link.springer.com/article/10.1186/2193-9772-3-4) | :heavy_check_mark: |
| [ThermoFisher Velox](https://www.thermofisher.com/de/de/home/electron-microscopy/products/software-em-3d-vis.html) | :heavy_check_mark: |
| [EDAX APEX](https://www.edax.com/products/ebsd/apex-software-for-ebsd) | :heavy_check_mark: |
| [DREAM.3D v6](https://dream3d.bluequartz.net) | (:heavy_check_mark:) |
| [EMsoft HDF5](https://github.com/EMsoft-org/EMsoft) | (:heavy_check_mark:) |
65 changes: 65 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
hide: toc
---

# pynxtools-em documentation

<!-- A single sentence that says what the product is, succinctly and memorably -->

<!-- A paragraph of one to three short sentences, that describe what the product does. -->

<!-- A third paragraph of similar length, this time explaining what need the product meets -->

<!-- Finally, a paragraph that describes whom the product is useful for. -->

<div markdown="block" class="home-grid">
<div markdown="block">

### Tutorial

This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type tutorial.

- [Convert electron microscopy content to NeXus](tutorial/standalone.md)
- [How to use a NeXus/HDF5 file](tutorial/nexusio.md)
<!-- - [Convert data to NeXus using NOMAD Oasis](tutorial/oasis.md) -->

</div>
<div markdown="block">

### How-to guides

This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type how-to guides.

- [Kikuchi diffraction via pyxem](how-tos/pyxem.md)
- [Kikuchi diffraction via MTex](how-tos/mtex.md)

</div>

<div markdown="block">

### Learn

This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type explanation.

- [Implementation design](explanation/implementation.md)

</div>
<div markdown="block">

### Reference

This is the place where to add documentation of [diátaxis](https://diataxis.fr) content type reference.
Here you can learn which specific pieces of information and concepts pynxtools-em currently supports
for the respective file formats of technology partners of the electron microscopy community.

- [How to map pieces of information to NeXus](reference/contextualization.md)
- [Tagged Image File Format (TIFF)](reference/tiff.md)
- [Portable Network Graphics (PNG)](reference/png.md)
- [Velox EMD](reference/vemd.md)
- [EDAX APEX](reference/apex.md)
- [Nion Co. projects](reference/nion.md)

</div>
</div>

<h2>Project and community</h2>
Loading

0 comments on commit 3a1c610

Please sign in to comment.