ARM Cortex-M3 MemManage exception for mpu

December 8, 2013, 2:37 am

≫ Next: Porting issue with ARM Compiler armcc

≪ Previous: Re: Want a ticket to MakerFaire Bay Area 2014?

// mpu test

#include "stm32.h"

#include "type.h"

extern u32 mpu_reg1_begin_;

extern u32 mpu_reg1_end_;

__attribute__ ((section (".mpu_r1")))

int reg1[256];

bool init_mpu();

// ref: http://blog.feabhas.com/2013/02/setting-up-the-cortex-m34-armv7-m-memory-protection-unit-mpu/

// mymain is extern "C" declare

void mymain()

{

init_mpu();

reg1[0] = 10;

//int a = reg1[0];

int i=5;

while(1)

{

i++;

}

#define MPU_TYPE_REG_ADDR 0xe000ed90

#define MPU_TYPE_REG (*((u32 volatile *)MPU_TYPE_REG_ADDR))

#define MPU_CTRL_REG_ADDR 0xe000ed94

#define MPU_CTRL_REG (*((u32 volatile *)MPU_CTRL_REG_ADDR))

#define MPU_NUM_REG_ADDR 0xe000ed98

#define MPU_NUM_REG (*((u32 volatile *)MPU_NUM_REG_ADDR))

#define MPU_BASE_REG_ADDR 0xe000ed9c

#define MPU_BASE_REG (*((u32 volatile *)MPU_BASE_REG_ADDR))

#define MPU_ATTR_SIZE_REG_ADDR 0xe000eda0

#define MPU_ATTR_SIZE_REG (*((u32 volatile *)MPU_ATTR_SIZE_REG_ADDR))

// ref: arm cortex-m3: 嵌入式系統設計入門 p13-9

bool init_mpu()

{

// ref: Cortex™-M3 Technical Reference Manual 9.2 (file name: DDI0337E_cortex_m3_r1p1_trm.pdf)

// in stm32f4discovery the value is 0x800

// so there are 8 regions

// qemu-system-arm -M lm3s6965evb -kernel list.bin -S -gdb tcp::1234

// the mpu_type is 0x0

#if 0

u32 volatile *mpu_type_reg_addr = (u32*)0xe000ed90;

if (*mpu_type_reg_addr == 0) // there is no mpu

return false;

#endif

if (MPU_TYPE_REG == 0) // there is no mpu

return false;

#if 1

// base 0x0, 4g size

MPU_NUM_REG = 0;

MPU_BASE_REG = 0;

MPU_ATTR_SIZE_REG = 0x307003f;

//(*((u32 volatile *)MPU_BASE_REG_ADDR)) = 0x307002f;

#else

#endif

MPU_NUM_REG = 1;

MPU_BASE_REG = (u32)&mpu_reg1_begin_;

MPU_ATTR_SIZE_REG = 0x707000F; // read only

//MPU_ATTR_SIZE_REG = 0x307000F; // r/w

// ap: 111 read only

// size: 256 byte

// S: 1, C: 1, B: 1

// TEX: 000

MPU_CTRL_REG = 1; // enable MPU

__asm__ ("isb");

__asm__ ("dsb");

return true;

}

My platform is stm32f4discovery.

language: c++/arm-none-eabi-g++ (32-bit ARM EABI Toolchain JBS-2013.05-23-v2013.05-1-gd66a29f) 4.7.3

I setup 2 regions, one for base 0x0, size 4g, another is reg1, size: 256 byte, read only,

I expect invoke MemManage exception isr, but always invoke hard-fault, when I write reg1[], (reg1[0] = 10;) why?

↧

Porting issue with ARM Compiler armcc

March 27, 2014, 8:13 am

≫ Next: Re: Looking for a microcontroller with Hardware AES and v.9v 56k capability

≪ Previous: ARM Cortex-M3 MemManage exception for mpu

porting issue

↧

Re: Looking for a microcontroller with Hardware AES and v.9v 56k capability

May 7, 2014, 11:37 am

≫ Next: Re: Want a ticket to MakerFaire Bay Area 2014?

≪ Previous: Porting issue with ARM Compiler armcc

Steven,

All the Silicon Labs EFM32 products have hardware AES encryption. A lot of the Freescale Kinetis also have it K6x, K7x and I believe the newer K21/K22.

For the 56k dialup, you will have to rely on a third party vendor like VOCAL Technologies (http://www.vocal.com/data-modem/56k-modem/) but I'm not sure if they have the code for a Cortex-M.

- Philippe

@KeremiMCU

↧

Re: Want a ticket to MakerFaire Bay Area 2014?

May 16, 2014, 1:17 pm

≫ Next: Re: Error: C9932E: Cannot obtain license for Compiler (feature compiler5) with license version >= 5.0201311

≪ Previous: Re: Looking for a microcontroller with Hardware AES and v.9v 56k capability

That's ok. Thanks for trying.

↧

Re: Error: C9932E: Cannot obtain license for Compiler (feature compiler5) with license version >= 5.0201311

May 16, 2014, 3:03 pm

≫ Next: Re: Android question - how to connect Nexus 7 through I2C

≪ Previous: Re: Want a ticket to MakerFaire Bay Area 2014?

Hi,

There are various editions of DS-5 available. Most likely the edition you have does not license ARM Compiler. The table at the bottom of the following webpage explains which features are available in each edition: http://ds.arm.com/altera/. I believe Altera SoC EDS includes a bare-metal gcc toolchain that you can use instead.

↧

Re: Android question - how to connect Nexus 7 through I2C

May 16, 2014, 6:09 pm

≫ Next: Re: How to change my solution to run on multicore processor like Cortex - A9?

≪ Previous: Re: Error: C9932E: Cannot obtain license for Compiler (feature compiler5) with license version >= 5.0201311

Thanks, we have updated the wiki documentation with this information.

↧

Re: How to change my solution to run on multicore processor like Cortex - A9?

May 16, 2014, 8:43 pm

≫ Next: How to use the free rtos in lpc1768...?

≪ Previous: Re: Android question - how to connect Nexus 7 through I2C

Hi sir,

Hope the following introduction will help you. We have both Freesclale and TI board.

We are a professional ARM board Manufacturer and cooperated with TI, Freescale, ATMEL and other popular processors.So hope we can support you.

Following I just take freescale as a example to give you some better impression.

We have Freescale Evaluation Board, which has high performance and long lifetime. Also many of my customers feedback to me that they like its richful interfaces and performance. Here I can share some brief parameters to you. Hope that it can help you better understand it.

· ARM® Cortex™-A9,1.2GHz, compatible with solo/dual/quad core;

· 1MB L2 Cache,32 KB instruction and data caches, NEON SIMD Media Accelerator;

· 2D/3D/VG Accelerator,1080P h.264 video hardware codec,support dual 720P video encoding;

· 1x 20-bit parallel, MIPI-CSI2 (4-channel), three simultaneous inputs;

· 2-ch HOST USB HSIC, 1-ch OTG and 1-ch HOST USB integrated PHY;

· 1 industrial gigabit Ethernet MAC(10/100/1000MHZ);

· 2-ch CAN ports, each channel can up to 1 Mbps, support CAN2.0;

· 3 SD/MMC 4.4 and 1 SDXC;

· 5 SPI, 5 UART, 3 I2C, 4 PWM;

· Integrated MIPI-HSI interface, 1-ch PCIe2.0 interface;

· Dual LVDS interface, support resolution up to 2048*1536;

· Freescale PF100 PMU;

· High Assurance Boot, cryptographic cipher engines, random number generator, and tamper detection

Also, if you have any questions about this, You can contact me directly by email or Skype.

my email address is hedy.hzxz@gmail.com

Skype: hedy.meng

Phone: 86-151-5813-5292'

↧

How to use the free rtos in lpc1768...?

May 16, 2014, 10:23 pm

≫ Next: How to get CPU cycles of a function using DS5 (or Streamline)?

≪ Previous: Re: How to change my solution to run on multicore processor like Cortex - A9?

How to use the free rtos in lpc1768...?

Where i can get the sample code for this controller by using the free rtos...?

Regards

VIKRAM

↧

How to get CPU cycles of a function using DS5 (or Streamline)?

May 17, 2014, 1:29 am

≫ Next: Omap5432-uEvm Best SDK choice

≪ Previous: How to use the free rtos in lpc1768...?

Hello, I'm trying to optimize some functions in ARM Cortex A9 architecture. Now, I want to judge whether the optimization is valid by reading and comparing the executed CPU cycles. But I can't find some tools in DS-5 to do this job. Could you give me some suggestion?

↧

Omap5432-uEvm Best SDK choice

May 17, 2014, 2:09 am

≫ Next: Re: How to get CPU cycles of a function using DS5 (or Streamline)?

≪ Previous: How to get CPU cycles of a function using DS5 (or Streamline)?

I'm searching the good way, for the best kernel compilation of the Omap5432-uEvm.
Which is the best SDK for this plateform ?

Ubuntu/PanDa/ANdroiD ?

I mean Ubuntu stay the most Open to my case. But I want your impress about it.

Thank's for your participation

↧

Re: How to get CPU cycles of a function using DS5 (or Streamline)?

May 17, 2014, 2:49 am

≫ Next: Re: Raspberry pi with streamline ds-5

≪ Previous: Omap5432-uEvm Best SDK choice

By default DS-5 is a time-based profile - so it takes samples over the program counter every 1ms, and over a period of many samples it can build up a statistical picture of what the program is doing, and where the CPU time is being spent. The tool doesn't report cycles - it reports load in each function as a percentage of runtime. If you optimize a function which was taking 5% of run-time and after modification it takes 2% of run-time then you have made it faster.

You get the best view of the data if you enable call chains (compile with -g -fno-omit-frame-pointer) and enable stack unwinding in the target configuration dialog of DS-5 Streamline.

HTH,
Pete

↧

Re: Raspberry pi with streamline ds-5

May 17, 2014, 8:12 am

≫ Next: Re: C9912E: No --CPU selected

≪ Previous: Re: How to get CPU cycles of a function using DS5 (or Streamline)?

Hi,

It seems Streamline cannot communicate with the gator daemon running on your target.

Please check that gator daemon is running in the background, by using the 'ps' command.

If you are connected to the Android target by USB, then you will have to forward ports and enter 'localhost' in the Address field instead of the IP address.

Regards,
Marcelo

↧

Re: C9912E: No --CPU selected

May 17, 2014, 10:39 am

≫ Next: How Can We Empower Machines to Understand? (Find Out on May 29th in Santa Clara!)

≪ Previous: Re: Raspberry pi with streamline ds-5

This probably means that you need to set the CPU correctly for the compiler. To do this, right-click on your project in the Project Explorer view and select Properties. In the Properties dialog, navigate to C/C++ Build -> Settings in the left-hand tree, and on the right-hand side, select Code Generation in the ARM Compiler 5 section. You will see a field labelled Target CPU into which you can type a CPU name. Guessing that you are using an Altera device from one of your other posts, then probably the correct CPU type for you will be "Cortex-A9" (without the quotes). If you accept those changes and then build your project then hopefully you'll see the new --cpu option on the console being passed to the compiler and it should succeed.

↧

How Can We Empower Machines to Understand? (Find Out on May 29th in Santa Clara!)

May 17, 2014, 12:07 pm

≫ Next: Why is the I-cache designed as VIPT, while the D-cache as PIPT?

≪ Previous: Re: C9912E: No --CPU selected

I'm enthusiastic about the potential of "embedded vision" – the widespread, practical use of computer vision in embedded systems, mobile devices, PCs, and the cloud. Processors and sensors with sufficient performance for sophisticated computer vision are now available at price, size, and power consumption levels appropriate for many markets, including cost-sensitive consumer products and energy-sipping portable devices. This is ushering in an era of machines that "see and understand".

But while hardware has advanced rapidly, developing robust algorithmic solutions remains a vexing challenge for many vision applications. In real-world situations, reliably extracting meaning from pixels is often difficult, due to the diversity of scenes and imaging conditions that may be presented to an image sensor. For example, a vision-based automotive driver assistance system may be tasked with distinguishing pedestrians from other objects with similar geometry and coloration, such as road signs, utility poles, and trees.

For humans, distinguishing between pedestrians and other objects is virtually effortless, so it's natural to assume that this is a straightforward task for machines as well. But when you begin to contemplate the variations in how people dress, how they move, where they may be situated relative to the vehicle, lighting conditions, and so on, creating an algorithm to reliably detect pedestrians can be daunting. Similar challenges are found in many types of vision applications. These challenges often result in the creation of very complex, multi-layered algorithms that examine images for features, group features into objects, and then classify objects based on complex rules. These algorithms may include several alternative ways to perform a task depending on conditions. (For example, is the vehicle stopped or moving? Is it day or night?) Developing and validating such algorithms can be extremely challenging.

Is there a better way? Perhaps. Some of the most sophisticated image-understanding systems deployed today rely on machine learning. In some cases, machine learning systems dispense with procedural techniques for recognizing objects and situations, and instead provide a framework that enables a system to be trained through examples. So, rather than trying to describe in exhaustive detail how to tell the difference between a pedestrian and a tree under a wide range of conditions, a machine learning approach might endow a system with the ability to learn (and to generalize) through examples, and then train the system by showing it numerous examples—allowing the system to figure out for itself what visually distinguishes a pedestrian from other kinds of objects.

I believe that the potential of machine learning in vision applications is vast. Just as a skilled physician learns through long experience to quickly recognize certain illnesses from a brief examination of a patient, vision systems may soon be learning to recognize many kinds of things by being trained (rather than "told" how) to do so.

Machine learning isn’t a new idea. But it’s become a very hot field lately, and the pace of progress seems to be accelerating. I am particularly excited about the potential for machine vision to enable better solutions to challenging visual understanding problems. And that's why I’m thrilled that one of the giants of machine learning for computer vision, Yann LeCun, will be the morning keynote speaker at the Embedded Vision Summit West, a conference I'm organizing that will take place on May 29^th in Santa Clara, California. Yann is a professor at New York University and also recently joined Facebook as its Director of Artificial Intelligence. Yann's talk is titled "Convolutional Networks: Unleashing the Potential of Machine Learning for Robust Perception Systems," and it will be one of the highlights of a full day of high-quality, insightful educational presentations. The Summit will also feature over thirty demonstrations of leading-edge embedded vision technology, and opportunities to interact with experts in embedded vision applications, algorithms, tools, processors and sensors.

If you're involved in, or interested in learning about, incorporating visual intelligence into your products, I invite you to join us at the Embedded Vision Summit West on May 29th in Santa Clara. Space is limited, so please register now.

Jeff Bier is president ofBDTIand founder of theEmbedded Vision Alliance. Please post a comment here or send him your feedback athttp://www.BDTI.com/Contact. This column was originally published BDTI's InsideDSP newsletter.

↧

Why is the I-cache designed as VIPT, while the D-cache as PIPT?

May 17, 2014, 3:05 pm

≫ Next: Why is D-cache as PIPT, while I-cache VIPT in A8?

≪ Previous: How Can We Empower Machines to Understand? (Find Out on May 29th in Santa Clara!)

Hi,

In Cortex-A8's architecture, I'm trying to understand why the I-cache is chosen to be in VIPT form (Virtually Indexed Physically Tagged), while the D-cache is PIPT (Physically Indexed Physically Tagged). I know the advantages and disadvantages of using either VIPT/PIPT, but why not make both caches VIPT, or both PIPT?

Also, I'm trying to understand how VIPT can even work for certain A8 configuration in an OS like Linux, that uses 4KB pages?

For example, the ARM VMSA says, L1 caches have..

- fixed line length of 64 bytes

- support for 16KB or 32KB caches (Let's Pick 32KB.)

- an instruction cache that is virtually indexed, IVIPT

- 4-way set associative cache

So from this, the no.of cache lines would be = 512

Size of a cache line = 64 bytes (lower 6 bit's of address would be an offset within cache line)

As there are 4 ways, so the no.of indexes would be = size of cache / no.of ways = 512 / 4 = 128 (index will be 7-bit)

The rest of the bits would go for the physical tag, (32 - 6 - 7 = 19).

For VIPT to work (That is the translation of the VA -> PA should happen in parallel to the Cache Index lookup), the bits comprising of the Index and the the Cache Line offset, should not change between the VA and PA).

Now, if we take an OS like Linux which uses pages of size 4KB, only the lower 12-bits are constant between the VA and the PA, but the above VIPT configuration described requires the lower 13 bits (7 bits for Index and 6 bits for Cache line offset) to be fixed. So in this case, how would VIPT work for the instruction cache?

thanks,

-Joel

↧

Why is D-cache as PIPT, while I-cache VIPT in A8?

May 17, 2014, 3:09 pm

≫ Next: Interested in learning ASM for ARM architecture

≪ Previous: Why is the I-cache designed as VIPT, while the D-cache as PIPT?

Hi,

Also, I'm trying to understand how VIPT can even work for certain A8 configuration in an OS like Linux, that uses 4KB pages?

For example, the ARM VMSA says, L1 caches have..

- fixed line length of 64 bytes

- support for 16KB or 32KB caches (Let's Pick 32KB.)

- an instruction cache that is virtually indexed, IVIPT

- 4-way set associative cache

So from this, the no.of cache lines would be = 512

Size of a cache line = 64 bytes (lower 6 bit's of address would be an offset within cache line)

As there are 4 ways, so the no.of indexes would be = size of cache / no.of ways = 512 / 4 = 128 (index will be 7-bit)

The rest of the bits would go for the physical tag, (32 - 6 - 7 = 19).

thanks,

-Joel

↧

Interested in learning ASM for ARM architecture

May 17, 2014, 3:40 pm

≫ Next: Re: Code for integer division on Cortex-A8?

≪ Previous: Why is D-cache as PIPT, while I-cache VIPT in A8?

Can some one please suggest what's the best way to start learning assembly programming for arm

Also please kindly mention the best books and assemblers available out there

any insights into the future of asm for arm is highly appreciated.

Thank you

↧

Re: Code for integer division on Cortex-A8?

May 18, 2014, 3:11 pm

≫ Next: Re: How to get CPU cycles of a function using DS5 (or Streamline)?

≪ Previous: Interested in learning ASM for ARM architecture

If you're really desperate here's a routine which does unsigned division by a constant.d not equal to zero

Initialise dr and sh such that

Shift d left while zero to give dbig with top bit set and shwzcnt the shift count

recip = (2^64 - 1) / dbig, a long unsigned division yielding an unsigned integer

dr = recip + 1

sh = 31 - shwzcnt

n = r0 the number to be divided + the eventual result

dr = r1

sh = r2

tmp = r3

movs dr, dr

umullne tmp, dr, n, dr

subne n, n, dr

addne n, dr, n, lsr #1

mov n, n, lsr sh

mov pc, lr

This can be used with a run time divisor when it is going to be used a number of times. But I'd just write a C routine and copy the generated code for divides by constants..

↧