Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using both DMA and separate AXI Slave as PCIe requester? #40

Open
dbarrie opened this issue Oct 3, 2023 · 2 comments
Open

Using both DMA and separate AXI Slave as PCIe requester? #40

dbarrie opened this issue Oct 3, 2023 · 2 comments

Comments

@dbarrie
Copy link

dbarrie commented Oct 3, 2023

I've currently got a design set up using the pcie_us_axi_dma as the sole user of the PCIe requester interface. This works just as I'd expect and I can DMA between the device and the host, but the design has changed and now calls for the ability to have the device access the host's memory using individual AXI transactions from a separate AXI slave module.

I realize that having a DMA lets me basically do the same thing as having the device able to access the CPU memory directly, but behavior of parts of the system external to the design are forcing my hand a bit, here. The device must be able to deal with AXI transactions generated by the design that potentially cross the PCIe interface and end up in the host's address space.

Is there any way using the library as it is now to split the RQ and RC interfaces and have them shared by two separate users? Are there any plans to add a drop-in AXI slave module that would let the design treat the entire PCIe host address space as an AXI bus?

@alexforencich
Copy link
Owner

The pcie_us_axi_dma module is basically deprecated. There will be no extensive modifications or extensions to that module. The more recent DMA engine is dma_if_pcie, which is used in combination with device-specific interface shims (pcie_if_us, etc.) and there are several FIFO and mux/demux modules available. There isn't really a nice way to share the DMA side though due to how the tag space works. But, the dma_if_pcie module also supports immediate data for performing small writes without having to manage internal buffer space. Currently, I only have client modules for AXI stream, I'm planning on making both master and slave AXI DMA clients, but have not yet had the time to do so.

@dbarrie
Copy link
Author

dbarrie commented Oct 3, 2023

I originally chose to use pcie_us_axi_dma simply because it let me go straight from the PCIe interface to an AXI interface that I could plug in to the rest of the design without having to go through any additional steps; when looking through the library, I didn't see any other (built-in) way to connect the PCIe interface to an AXI bus master. Since that module is deprecated though, it seems that the intention is to have the user write their own conversion between the RAM interface exposed by dma_if_pcie and an AXI master to translate the RAM requests into AXI transactions? It doesn't look like there is any mechanism with that interface to perform larger (than the data width) memory transactions - could doing it this way potentially reduce the performance of the DMA?

If I refactor my design to use dma_if_pcie, it looks like I should be able to MUX the TLPs (exactly like I'm currently doing for the CQ/CC interface that lets the host poke at registers and memory directly) and then just write my own AXI slave -> TLP module? For the CQ interface, I demux the TLPs based on the BAR specified by the TLP, but I'm not sure how I would know how to route the RC TLPs back to the correct destination (either the dma_if_pcie or my own AXI slave). Is there something obvious I'm missing that would let me distinguish the correct destination for RC TLPs? Could I maybe have the DMA use fewer tag bits than are available and utilize one of the remaining tag bits to control which of the two modules the RC TLP should be routed to?

The design itself needs to utilize the DMA to move data quickly between the host and device, but the AXI slave I'm now trying to add is meant to allow for access to the CPU's memory space by any other AXI master in the design (of which there are many - this is a toy GPU!) through a page table mapping similar to GART.

(As an aside, I just wanted to say how awesome your verilog-pcie and verilog-axi libraries are! I'm using them extensively in my design, and after writing some custom SV wrappers for them, they've been nothing but easy to use and a massive time savings over having to deal with Xilinx's versions. My familiarity with PCIe is pretty limited, and I was still able to drop verilog-pcie into my design and get things stood up incredibly quickly. The benefit these libraries have shown to the HDL community can not be overstated!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants